5,945 Matching Annotations
  1. Oct 2025
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This report contains two parts. In the first part, several experiments were carried out to show that CsoR binds to CheA, inhibits CheA phosphorylation, and impairs P. putida chemotaxis. The second part provides some evidence that CsoR is a copper-binding protein, binds to CheA in a copper-dependent manner, and regulates P. putida response to copper, a chemorepellent. Based on these results, a working model is proposed to describe how CsoR coordinates chemotaxis and resistance to copper in P. putida. While the second part of the study is relatively solid, there are some major concerns about the first part.

      Critiques:

      (1) The rigor from prior research is not clear. In addition to talking about other bacterial chemotaxis, the Introduction should briefly summarize previous work on P. putida chemotaxis and copper resistance.

      We summarized previous results on P. putida copper resistance and added those results to the introduction section of the revised manuscript. As for chemotaxis, most studies in P. putida focused on the sensing/responding of the bacteria to different chemical compounds and the methyl-accepting chemotaxis proteins (MCPs) involved in the sensing, which is not relevant to the main content of this study. The component of the chemotaxis system in P. putida is similar to that in E. coli, and the signaling mechanism is presumably similar.

      (2) The rationale for identifying those CheA-binding proteins is vague. CheA has been extensively studied and its functional domains (P1 to P5) have been well characterized. Compared to its counterparts from other bacteria, does P. putida CheA contain a unique motif or domain? Does CsoR bind to other bacterial CheAs or only to P. putida CheA?

      The original purpose of the pull-down assay was to detect the interaction between CheA and c-di-GMP metabolizing enzymes, which was another project. However, we ignored that most c-di-GMP metabolizing enzymes were membrane proteins, and we made a mistake by using whole-cell lysate in the pull-down experiment. Thus, we failed to identify c-di-GMP metabolizing enzymes in “target” proteins of the pull-down assay. However, we found several novel “target” proteins in the pull-down assay. We wondered about the function of these proteins and the physiological roles of the interaction between CheA and these proteins, which was the primary purpose of this study. Although the function of CheA has been well characterized, most previous results focused on the role of CheA in chemotaxis, and its role in other bacterial processes was poorly studied. To extend our knowledge about CheA, we analyzed the results of the pull-down assay and decided to test the interaction between CheA and identified proteins, as well as the physiological roles of the interaction.

      BLAST results showed that the CheA of P. putida shared 41.12% sequence similarity with the CheA of E. coli, and the CheA of P. putida had a similar domain pattern to those CheAs from other bacteria. To test whether  CsoR<sub>P. putida</sub> interacted with CheA from other bacteria, we performed a BTH assay to investigate the interaction between  CsoR<sub>P. putida</sub> and eight CheAs, including CheA from E. coli, CheA from A. caldus, CheA from B. diazoefficiens, CheA from B. subtilis, CheA from L. monocytogenes, CheA from P. fluorescens, CheA from P. syringae, and CheA from P. stutzeri. As shown in the following Fig. 1,  CsoR<sub>P. putida</sub> could interact with CheA from A. caldus, B. subtilis, L. monocytogenes, P. fluorescens, P. syringae, and P. stutzeri. Besides, among these strains, cheA and csoR coexist in A. caldus, B. diazoefficiens, B. subtilis, L. monocytogenes, P. fluorescens, P. syringae, and P. stutzeri. We previously tested the interaction of the two proteins from these bacterial species. The results showed that the CheA-CsoR interaction existed between proteins from A. caldus, B. subtilis, P. syringae, and P. stutzeri (Fig. 7 in the manuscript). However, CheA and CsoR from B. diazoefficiens, L. monocytogenes, and P. fluorescens showed no apparent interaction (Fig. 7 in the manuscript). These results suggested that unique amino acid sequences in the two proteins might be required to achieve interaction.

      (3) Line 133-136, "Collectively, using pull-down, BTH, and BiFC assays, we identified 16 new CheA-interacting proteins in P. putida." It is surprising that so many proteins were identified but none of them were chemotaxis proteins, in particular those known to interact with CheA, such as CheW, CheY and CheZ, which raises a concern about the specificity of these methods. BTH and BiFC often give false-positive results and thus should be substantiated by other approaches such as co-IP, surface plasmon resonance (SPR), or isothermal titration calorimetry (ITC) along with mutagenesis studies.

      The response regulator CheY and the phosphatase CheZ (two proteins known to be associated with CheA) were identified in the pull-down assay (Table S1), and the two proteins showed high Log<sub>2</sub>(fold change) values, indicating that they were obtained in the pull-down assay with high amount in the experimental group and low amount in the control group. Our study aimed to identify new CheA-interacting proteins; thus, the two proteins (CheY and CheZ) were not included in subsequent investigations. The CheA-interacting proteins were initially obtained through an in vitro assay (pull-down), followed by an in vivo assay (BTH and BiFC) to test the interaction further. Only proteins that showed positive results in all three assays were considered trustworthy CheA-interacting proteins and kept for further study.

      (4) Line 147-149, "Fig. 2a, five strains (WT+pcsoR, WT+pispG, WT+pnfuA, WT+pphaD, and WT+pPP_1644) displayed smaller colony than the control strain (WT+pVec), indicating a weaker chemotaxis ability in these five strains." If copper is a chemorepellent, these strains should swim away from high concentrations of copper; thus, the sizes of colonies couldn't be used to measure this response. In the cited reference (reference 29), bacterial response to phenol was measured using a response index (RI).

      Except for CsoR, the rest of the CheA-interacting proteins had no direct connection with copper and were involved in different processes (Table S1). A reasonable speculation is that these proteins involved in different processes can integrate signals from specific processes into chemotaxis by regulating CheA autophosphorylation, leading to better regulation of chemotaxis according to intracellular physiological state. We used semisolid nutrient agar plates to test and compare bacterial chemotaxis ability. In a fixed attractant/repellent gradient, chemokine, such as copper, can lead to two subpopulations traveling at different speeds, with the slower one being held back by the chemokinetic drift. In the case of semisolid plate migration, bacteria with chemotaxis ability formed large colonies by generating their gradient by consuming nutrients/producing toxic metabolic waste and following attractant/repellent gradients leading outward from the colony origin (Cremer et al., 2019. Nature 575:658–663). The observation of successive sharp circular bands (rings) progressing outward from the inoculation point was taken to confirm the chemotaxis genotype, and mutants without chemotaxis spread out uniformly and formed a small colony (Wolfe and Berg, PNAS. 1989, 86:6973-6977). In our experiment, we were unsure about the signals/chemokines of each target protein, so we could not design a fixed attractant/repellent gradient. Besides, all target proteins interacted with CheA, which is a crucial factor in chemotaxis, and we assume that these proteins would affect chemotaxis under overexpression conditions. Thus, we used semisolid nutrient plates to test and compare bacterial chemotaxis ability.

      (5) Figures 2 and 3 show both CsoR and PhaD bind to CheA and inhibit CheA autophosphorylation. Do these two proteins share any sequence or structural similarity? Does PhaD also bind to copper? Otherwise, it is difficult to understand these results.

      Thanks a lot. This is an enlightening comment. CsoR is a protein with a size of 10.8 kDa, and PhaD is 23.1 kDa. Because of the difference in size, we took it for granted that the two proteins were not similar. We recently compared their sequence on NCBI BLAST. Although both CsoR and PhaD are transcriptional regulators and interact with CheA, they have no significant sequence similarity. In terms of protein structure, we predicted their structures using AlphaFold. The results showed that CsoR consisted of three α-helixes and PhaD consisted of nine α-helixes (new Fig. S5a and S5b in the manuscript). We further compared their structure using Pymol but found no significant similarity between the two proteins (new Fig. S5c in the manuscript).

      PhaD is a TetR family transcriptional regulator located adjacent to the genes involved in PHA biosynthesis, and it behaves as a carbon source-dependent activator of the pha cluster related to polyhydroxyalkanoates (PHAs) biosynthesis (de Eugenio et al., Environ Microbiol. 2010, 12:1591-1603; Tarazona et al., Environ Microbiol. 2020, 22:3922-3936). Bacterial PHAs are isotactic polymers synthesized under unfavorable growth conditions in the presence of excess carbon sources. PHAs are critical in central metabolism, acting as dynamic carbon reservoirs and reducing equivalents (Gregory et al., Trends Mol Med. 2022, 28:331-342). The interaction between PhaD and CheA leads us to speculate that there might be some connection between PHA synthesis and bacterial chemotaxis. For example, chemotaxis helps bacteria move towards specific carbon sources that favor PHA synthesis, and the interaction between PhaD and CheA weakens chemotaxis, causing bacteria to linger in areas rich in these carbon sources. This is an interesting hypothesis worth testing in the future.

      (6) Line 195-196, "CsoR/PhaD had no apparent influence on the phosphate transfer between CheA and CheY". CheA controls bacterial chemotaxis through CheY phosphorylation. If this is true, how do CsoR and PhaD affect chemotaxis?

      During the autophosphorylation assay, CheA was mixed with CsoR/PhaD and incubated for about 10 min before adding [<sup>32</sup>P]ATP[γP]. Thus, the effect of CsoR/PhaD on CheA autophosphorylation happened through the assay, and a significant inhibition effect was observed in the final result. Regarding transphosphorylation, CheA was mixed with ATP and incubated for about 30 min, at which time the autophosphorylation of CheA happened. Then, CsoR/PhaD and CheY were added to the phosphorylated CheA to investigate transphosphorylation. CsoR and PhaD affected chemotaxis via inhibiting CheA autophosphorylation, which was a crucial step in chemotaxis signaling, and the decrease in CheA autophosphorylation caused decreased chemotaxis.

      (7) Figure 3 shows that CsoR/PhaD bind to CheA through P1, P3, and P4. This result is intriguing. All CheA proteins contain these three domains. If this is true, CsoR/PhaD should bind to other bacterial CheAs too. That said, this experiment is premature and needs to be confirmed by other approaches.

      As replied to comment (2) above, we performed a BTH assay to investigate whether  CsoR<sub>P. putida</sub> interacts with CheA from other bacterial species. The results revealed that  CsoR<sub>P. putida</sub> interacted with CheA from A. caldus, B. subtilis, L. monocytogenes, P. fluorescens, P. syringae, and P. stutzeri, but not with CheA from E. coli and B. diazoefficiens. This result suggested that CheA-CsoR interaction required specific/unique amino acid sequence patterns in the two proteins, and similar domain composition alone was insufficient.

      (8) Figure 5, does PhaD contain these three residues (C40, H65, and C69)? If not, how does PhaD inhibit CheA autophosphorylation and chemotactic response to copper?

      No, there is no significant sequence similarity between PhaD and CsoR, and PhaD contains none of the three residues of CsoR (C40, H65, and C69). The size of the two proteins is also quite different (CsoR 10.8 kDa, PhaD 23.1 kDa). The structure alignment also revealed no apparent similarity between the predicted structures of PhaD and CsoR (new Fig. S5c in the manuscript). Nevertheless, CsoR and PhaD interacted with CheA through its P1, P3, and P4 domains. It is interesting how the two proteins interacted with CheA, but we currently have no answer.

      (9) Does deletion of cosR or cheA have any impact on P. putida resistance to high concentrations of copper?

      No, deletion of cosR/cheA had no noticeable impact on P. putida's resistance to high concentrations of copper. We performed a growth assay to test the effect of CsoR and CheA on copper resistance under both liquid and solid medium conditions. The copper concentration was set at 0, 200, 500, 1000 μM. With the increase of copper concentration, the growth of bacteria was gradually inhibited, but the growth trends of csoR mutant, cheA mutant, and complementary strains were similar to that of the wild-type strain (new Fig. S6b and S6c in the manuscript). We speculated that this might be attributed to CsoR being a repressor and inhibiting gene expression in the absence of copper. When copper existed, the inhibitory effect of CsoR was relieved, which is the same as that in the csoR mutant. Besides, although deletion of cosR led to a slight increase (about 1.3-fold) in the expression of copper resistance genes (Fig. 4b in the manuscript), its effect on gene expression was much weaker than its homologous protein in other bacterial species. In M. tuberculosis, B. subtilis, C. glutamicum, L. monocytogenes, and S. aureus, deletion of csoR resulted in an about 10-fold increase in the expression of target genes in the absence of copper. This difference might be attributed to several vital regulators that activated the expression of copper-resistance genes in response to copper in P. putida, such as CueR and CopR (Adaikkalam and Swarup, Microbiology. 2002, 148:2857-2867; Hofmann et al., Int J Mol Sci, 2021, 22:2050; Quintana et al., J Biol Chem, 2017, 292:15691-15704). CueR positively regulated the expression of cueA, encoding a copper-transporting P1-type ATPase that played a crucial role in copper resistance. CopR was essential for expressing several genes implicated in cytoplasmic copper homeostasis, such as copA-II, copB-II, and cusA. The existence of these positive regulators makes the function of CosR a secondary or even dispensable insurance in the expression of copper-resistance genes. Consistent with this, there is no CosR homolog in P. aeruginosa, and copper homeostasis is mainly controlled by CueR and CopR.

      Reviewer #2 (Public Review):

      This manuscript focuses on the apparent involvement of a proposed copper-responsive regulator in the chemotactic response of Pseudomonas putida to Cu(II), a chemorepellent. Broadly, this area is of interest because it could provide insight into how soil microbes mitigate metal stress. Additionally, copper has some historical agricultural use as an antimicrobial, thus can accumulate in soil. The manuscript bases its conclusions on an in vitro screen to identify interacting partners of CheA, an essential kinase in the P. putida chemotaxis-signaling pathway. Much of the subsequent analysis focuses on a regulator of the CsoR/RcnR family (PP_2969).

      Weaknesses:

      The data presented in this work does not support the model (Figure 8). In particular, PP_2969 is linked to Ni/Co resistance, not Cu resistance. Further, it is not clear how the putative new interactions with CheA would be integrated into diverse responses to various chemoattract/repellents. These two comments are justified below.

      Thanks a lot for all these comments. Before designing experiments to explore the function of PP_2969, we found three clues: (i) its sequence showed 38% similarity to the copper-responsive regulator CsoR of M. tuberculosis, and the three conserved amino acids essential for copper-binding were conserved in PP_2969; (ii) it located next to a Ni<sup>2+</sup>/Co<sup>2+</sup> transporter (PP_2968) on the genome; (iii) a previous report revealed that PP_2969 (also named MreA) expression increased during metal stress, and overexpression of PP_2969 in P. putida and E. coli led to metal accumulation (Zn, Cd, and Cr) (Lunavat et al., Curr Microbiol. 2022, 79:142). These clues indicate that the function of PP_2969 is related to metal-binding, but it remains to be explored which metal(s) PP_2969 binds to. Thus, we played MST assay to test the interaction between PP_2969 and metals, including copper (Cu<sup>2+</sup>), zinc (Zn<sup>2+</sup>), nickel (Ni<sup>2+</sup>), cobalt (Co<sup>2+</sup>), cadmium (Cd<sup>2+</sup>), and magnesium (Mg<sup>2+</sup>). The result showed that PP_2969 was bound to three metal ions (Cu<sup>2+</sup>, Zn<sup>2+</sup>, Ni<sup>2+</sup>), and the binding to Cu<sup>2+</sup> was the strongest. Besides, the EMSA assay revealed that Cu<sup>2+</sup>/Ni<sup>2+</sup>/Zn<sup>2+</sup> inhibited the interaction between PP_2969 and promoter DNA, and Cu<sup>2+</sup> showed the most substantial inhibitory effect at the same concentration. These results suggested that PP_2969 was mainly bound to Cu<sup>2+</sup>, followed by Zn<sup>2+</sup> and Ni<sup>2+</sup>. To further test whether PP_2969 functioned as a metal-responsive repressor and which metal resistance was related to its target gene, we constructed a PP_2969 deletion mutant and complementary strain and performed a qPCR assay to compare the expression of metal resistance-related genes. 14 metal-resistant-related genes were chosen as targets. The results showed that PP_2969 deletion led to a weak but significant increase (about 1.3-fold) in expression of 10 genes, including three copper-resistance genes (copA-I, copA-II, and copB-II), one nickel-resistance gene (nikB), two cadmium-resistance genes (cadA-I and cadA-III), one cobalt-resistance gene (cbtA), and three multiple metal-resistance genes (czcC-I, czcB-II, and PP_0026) (Fig. 4b, Fig. S5a in the manuscript). Meanwhile, complementation with a multicopy plasmid containing the PP_2969 gene decreased the gene expression in Δ_PP_2969_. Although PP_2969 regulated the expression of multiple metal resistance genes, it showed the most robust binding to Cu<sup>2+</sup>. Thus, we considered its primary function as a Cu<sup>2+</sup>-responsive regulator.

      As for the second comment, “How would the putative new interactions with CheA be integrated into diverse responses to various chemoattract/repellents?”, We have some speculations based on our results and previous reports. For example, PP_2969 interacted with CheA and decreased its autophosphorylation activity, and copper inhibited the interaction between CheA and PP_2969. In the absence of copper, PP_2969 binds to promoters to inhibit the expression of copper resistance genes, and it also binds to CheA to inhibit its autophosphorylation, resulting in lower chemotaxis. When the bacteria move to an area of high copper concentration, PP_2969 binds to copper and falls off the DNA promoter, leading to higher expression of copper resistance genes. Meanwhile, copper-binding of PP_2969 decreases its interaction with CheA, increasing CheA autophosphorylation promoting chemotaxis, and bacteria swim away from the high copper concentration. Another attractive target protein is PhaD, a TetR family transcriptional regulator located adjacent to the genes involved in PHA biosynthesis, and it behaves as a carbon source-dependent activator of the pha cluster related to polyhydroxyalkanoates (PHAs) biosynthesis (de Eugenio et al., Environ Microbiol. 2010, 12:1591-1603; Tarazona et al., Environ Microbiol. 2020, 22:3922-3936). Bacterial PHAs are isotactic polymers synthesized under unfavorable growth conditions in the presence of excess carbon sources. PHAs are critical in central metabolism, acting as dynamic carbon reservoirs and reducing equivalents (Gregory et al., Trends Mol Med. 2022, 28:331-342). The interaction between PhaD and CheA leads us to speculate that there might be some connection between PHA synthesis and bacterial chemotaxis. For example, chemotaxis helps bacteria move towards particular carbon sources that favor PHA synthesis; the regulator PhaD activates the genes related to PHA synthesis. Meanwhile, the interaction between PhaD and CheA weakens chemotaxis, causing bacteria to linger in areas rich in these carbon sources. Collectively, we speculate that by interacting with CheA and modulating its autophosphorylation, target proteins such as CsoR/PhaD integrate signals from their original process pathway into chemotaxis signaling.

      PP_2969

      (1) The authors present a sequence alignment (Figure S5) that is the sole basis for their initial assignment of this ORF as a CsoR protein. There is a conservation of the primary coordinating ligands (highlighted with asterisks) known to be involved in Cu(I) binding to CsoR (ref 31). There are some key differences, though, in residues immediately adjacent to the conserved Cys (the preceding Ala, which is Tyr in the other sequences). The effect of these changes may be significant in a physiological context.

      We constructed a point mutation in PP_2969 by replacing the Ala residue before the conserved Cys with a Tyr (CsoR<sub>A39Y</sub>) and then analyzed the effect of this mutation on CsoR. As shown in Author response image 1a, CsoR<sub>A39Y</sub> showed similar promoter-binding ability as the wild-type CsoR and the presence of Cu<sup>2+</sup> abolished the interaction between CsoR<sub>A39Y</sub> and DNA, suggesting that the A39 residue in PP_2969 was not essential for the DNA-binding and Cu<sup>2+</sup>-binding abilities. Besides, CsoR<sub>A39Y</sub> interacted with CheA as the wild-type CsoR did (Author response image 1b), indicating that the Ala39 residue was not required to interact with CheA.

      The CsoR from B. subtilis has a Tyr before the conserved Cys, which is the same as other sequences, and the BTH result showed that interaction existed between CsoR and CheA from B. subtilis (Fig. 7 in the manuscript).

      Author response image 1.

      The effect of CsoR point mutation (CsoR<sub>A39Y</sub>) on the DNA-binding and Cu<sup>2+</sup>-binding abilities of CsoR. (a) Analysis for interactions between CsoR/CsoR<sub>A39Y</sub> and copA-I promoter DNA using EMSA. The concentrations of CsoR/CsoR<sub>A39Y</sub> and Cu<sup>2+</sup> added in each lane are shown above the gel. Free DNA and protein-DNA complexes are indicated. (b) The interaction between CsoR/CsoR<sub>A39Y</sub> and CheA was tested by BTH. Blue indicates protein-protein interaction in the colony after 60 h of incubation, while white indicates no protein-protein interaction. CK+ represents positive control, and CK- represents negative control.

      (2) The gene immediately downstream of PP_2969 is homologous to E. coli RcnA, a demonstrated Ni/Co efflux protein, suggesting that P2969 may be Ni or Co responsive. Indeed PP_2970 has previously been reported as Ni/Co responsive (J. Bact 2009 doi:10.1128/JB.00465-09). The host cytosol plays a critical role in determining metal response, in addition to the protein, which can explain the divergence from the metal response expected from the alignment.

      Correction: The gene immediately upstream (not downstream) of PP_2969 (the ID is PP_2968, not PP_2970) is homologous to E. coli RcnA, a demonstrated Ni/Co efflux protein. The previous JBact study (J. Bact 2009 doi:10.1128/JB.00465-09) named PP_2968 as MrdH, and mrdH disruption led to sensitivity to cadmium, zinc, nickel, and cobalt, but not copper. Their results also revealed that MrdH was a broad-spectrum metal efflux transporter with a substrate range including Cd<sup>2+</sup>, Zn<sup>2+</sup>, and Ni<sup>2+</sup>. However, the role of MrdH in Cu<sup>2+</sup> efflux was not tested. Commonly, metal efflux transporter has a broad substrate spectrum, allowing transporters to influence bacterial resistance to a variety of metals (Munkelt et al., J Bacteriol. 2004, 186:8036-8043; Grass et al., J Bacteriol. 2005, 187:1604-1611; Nies et al., J Ind Microbiol. 1995, 14:186-199; Kelley et al., Metallomics. 2021, 13:mfaa002). Our results showed that PP_2969 bound to Cu<sup>2+</sup>, Zn<sup>2+</sup>, and Ni<sup>2+</sup> under our experimental conditions, and CsoR regulated the expression of genes related to Cu<sup>2+</sup>, Zn<sup>2+</sup>, and Ni<sup>2+</sup> resistance, indicating that CsoR was involved in resistance to these metals. But the binding of CsoR to Cu<sup>2+</sup> was the strongest, and Cu<sup>2+</sup> showed the most substantial inhibitory effect on CsoR-DNA interaction. Thus, we considered its primary function as a Cu<sup>2+</sup>-responsive regulator.

      (3) The previous JBact study also explains the lack of an effect (Figure 5b) of deleting PP_2969 on copper-efflux gene expression (copA-I, copA-II, and copB-II) as these are regulated by CueR not PP_2969 consistent with the previous report. Deletion of CsoR/RcnR family regulator will result in constitutive expression of the relevant efflux/detoxification gene, at a level generally equivalent to the de-repression observed in the presence of the signal.

      We performed qPCR to test the effect of PP_2969 on gene expression, and we chose 14 target genes, including copper-resistance genes, nickel-resistance genes, zinc-resistance genes, cadmium-resistance genes, and cobalt-resistance genes. The results showed that PP_2969 deletion led to a weak but significant increase (about 1.3-fold) in the expression of 10 genes (Fig. 4b, new Fig. S5a in the manuscript), and complementation with a multicopy plasmid containing PP_2969 gene decreased the gene expression in Δ_PP_2969_. We were confused about these results. Why was the effect of PP_2969 on gene expression so weak? Did we pick the wrong target genes? In other bacteria, deletion of csoR led to an about ten-fold increase in gene expression, generally equivalent to the de-repression observed in the presence of metal. Thus, to further identify target genes, we performed RNA-seq to compare the gene expression in WT and Δ_PP_2969_ without copper. The result surprised us because no gene expression levels changed more than two-fold (data not shown). This result might be attributed to several vital regulators that activated the expression of metal-resistance genes in response to metal in P. putida, such as CueR and CopR (Adaikkalam and Swarup, Microbiology. 2002, 148:2857-2867; Hofmann et al., Int J Mol Sci, 2021, 22:2050; Quintana et al., J Biol Chem, 2017, 292:15691-15704). CueR positively regulated the expression of cueA, encoding a copper-transporting P1-type ATPase that played a crucial role in copper resistance. CopR was essential for expressing several genes implicated in cytoplasmic copper homeostasis, such as copA-II, copB-II, and cusA. The existence of these positive regulators might make the function of CosR a secondary or even dispensable insurance in the expression of copper-resistance genes. Consistent with this, there is no CosR homolog in P. aeruginosa, and copper homeostasis is mainly controlled by CueR and CopR.

      (4) Further, CsoR proteins are Cu(I) responsive so measuring Cu(II) binding affinity is not physiologically relevant (Figures 5a and S5b). The affinities of demonstrated CsoR proteins are 10-18 M and these values are determined by competition assay. The MTS assay and resulting affinities are not physiologically relevant.

      Thank you for this enlightening comment. This question also confused us during our experiment. The first study on CsoR from Mycobacterium tuberculosis showed that CsoR bound a single-monomer mole equivalent of Cu(I) to form a trigonally coordinated complex, and that was a convincing result from protein structure analysis (Liu et al., Nat Chem Biol. 2007, 3:60-68). They further revealed that the presence of Cu(I) in the EMSA assay abolished the DNA-binding ability of CsoR, but the impact of Cu(II) was not tested. Besides, their results also showed that adding CuCl<sub>2</sub> in the medium induced the expression of the cso operon involved in copper resistance. Perhaps Cu(II) converted to Cu(I) and then bound to CsoR in bacterial cells. Later studies in diverse bacterial species (including Listeria monocytogenes, Corynebacterium glutamicum, Deinococcus radiodurans, and Thermus thermophilus) showed that in vitro assays with Cu(II) abolished the DNA-binding ability of CsoR, indicating that CsoR bound to both Cu (I) and Cu(II) (Corbett et al., Mol Microbiol. 2011, 81:457-472; Teramoto et al., Biosci Biotechnol Biochem. 2012, 76:1952-1958; Zhao et al., Mol Biosyst. 2014, 10:2607-2616; Sakamoto et al., Microbiology. 2010, 156:1993-2005). Here, our results from in vitro assays (MST and EMSA) showed that CsoR bound to Cu(II) and Cu(II) affected the interaction between CsoR and promoter DNA. Compounds containing Cu(I) are poorly soluble in water and easily oxidized by Cu(II). DTT can reduce Cu(II) to Cu(I) (Krzel et al., J Inorg Biochem. 2001, 84:77-88). To test whether Cu(I) bound to CsoR and affected its DNA-binding ability, we recently performed an EMSA assay with the addition of CuCl<sub>2</sub>/DTT/CuCl<sub>2</sub>+DTT. As shown in Fig. 4d, the addition of DTT (0.1 and 1 mM) decreased CsoR-DNA interaction in the presence of 0.2 mM CuCl<sub>2</sub>, while the addition of DTT alone had no apparent influence on CsoR-DNA interaction, indicating that DTT enhanced the inhibition of CuCl<sub>2</sub> on CsoR-DNA interaction, and the Cu(I) converted from Cu(II) by DTT had stronger inhibitory effect than Cu(II) on CsoR-DNA interaction. Together, these results suggested that CsoR bound to Cu(I) more strongly than it bound to Cu(II). We have added these results to the new version of manuscript.

      (5) The DNA-binding assays are carried out at protein concentrations well above physiological ranges (Figures 5c and d, and S5c, d). The weak binding will in part result from using DNA sequences upstream of the copA genes and not from PP_2970.

      We performed the vitro DNA-binding assay several times, and the lowest CsoR concentration used to obtain a shifted band was about 3 μM, and a higher concentration (15 μM) caused total DNA binding. Thus, we used the concentration of 15 and 20 μM to test the effect of metal on protein-DNA interaction in the assay. We also realized that these concentrations were above physiological ranges. We considered that the in vitro DNA-binding assay was only a mimic of the in vivo process, and the extracellular physiological conditions in EMSA might restrict the activity of CsoR. Besides, we recently performed EMSA to investigate the interaction between CsoR and its own promoter (csoRpro). As shown in Author response image 2, CsoR bound to csoRpro with a similar intensity to that it bound to copA-Ipro. Thus, the weak binding was not caused by the promoter used in the assay. 

      Author response image 2.

      The binding of CsoR to its own promoter (csoRpro) and copA-I promoter (copA-1pro) in EMSA. The concentrations of CsoR added in each lane are shown above the gel. Free DNA and CsoR-DNA complex are indicated.

      CheA interactions

      (1) There is no consideration given to the likely physiological relevance of the new interacting partners for CheA.

      Thank you for this comment. The initial purpose of this research was to identify new CheA-interacting proteins to broaden our knowledge of CheA and bacterial chemotaxis. Thus, we are currently focusing on the effect of the interaction on CheA and chemotaxis and trying to find the link between different processes and bacterial chemotaxis. We infer that the interaction between these new interacting partners and CheA can integrate signals from different pathways into the chemotaxis signaling pathway so that bacteria can better sense and adapt to different environments. Besides, the other role of the interaction, which is the influence of CheA on these new interacting partners, is also an exciting question that remains to be answered. Among the 16 new CheA-interacting proteins, five showed significant influence on chemotaxis, and the remaining 11 proteins had no obvious impact on chemotaxis (Fig. 2a in the manuscript). CsoR and PhaD inhibited CheA autophosphorylation, and here we focused on the effect of CsoR on chemotaxis. We also investigated the impact of CheA on CsoR, such as gene regulation and copper resistance. However, the results showed that CheA had no obvious influence on these functions of CsoR. The interactions between CheA and these proteins may be physiologically biased, with some interactions affecting the function of CheA and others mainly affecting the function of partners. Future studies on the function of these new CheA-interacting proteins and the role of CheA in regulating their functions would further expand our knowledge of CheA.

      (2) How much CheA is present in the cell (copies) and how many copies of other proteins are present? How would specific responses involving individual interacting partners be possible with such a heterogenous pool of putative CheA-complexes in a cell? For PP_2969, the affinity reported (Figure 5A) may lay at the upper end of the CsoR concentration range (for example, CueR in Salmonella is present at ~40 nM).

      Thank you for this insightful comment. We don’t know the copy number of CheA and other proteins in the cell. We were also initially surprised and felt skeptical about the reliability of CheA interaction with so many proteins. CheA interacts with CheY, CheW, and CheB in the classical chemotaxis pathway. This study found 16 new CheA-interacting proteins using pull-down assay and subsequent analysis. Moreover, in another unpublished result, we found that CheA interacted with eight c-di-GMP-metabolizing proteins, and CheA transferred the phosphate group to one of them. Together, it seemed that CheA could interact with at least 27 proteins. With such a heterogeneous pool of CheA-complexes, performing a specific response seemed difficult. However, several previous studies have reported the example of one protein interacting with dozens of proteins. For example, the c-di-GMP effector LapD in Pseudomonas fluorescens and Pseudomonas putida can interact with a dozen different c-di-GMP-metabolizing proteins (Giacalone et al., mBio. 2018, 9:e01254-18; Nie et al., Mol Microbiol. 2024, 121:1-17.) In Escherichia coli, a subset of DGCs and PDEs operated as central interaction hubs in a larger “supermodule” by interacting with dozens of proteins (Sarenko et al., mBio. 2017, 8:e01639-17). We infer that the expression of different CheA-interacting proteins might happen at different growth stages or under different conditions, and their interaction with CheA under that stage/condition changed bacterial chemotaxis or the process in which the target protein was involved.

      (3) The two-hybrid system experiment uses a long growth time (60 h) before analysis. Even low LacZ activity levels will generate a blue color, depending upon growth medium (see doi: 10.1016/0076-6879(91)04011-c). It is also not clear how Miller units can be accurately or precisely determined from a solid plate assay (the reference cited describes a protocol for liquid culture).

      We didn’t observe a blue color on the colony after 60 h growth on a plate under our experimental conditions. The BTH experiment was described as follows: After transforming the two plasmids into E. coli BTH101 cells, the plates containing transformants were placed at 28° for 48 h, at which time the colonies of the transformants were big enough to be picked up and incubated in a liquid medium for 24 h at 28°. Then, 5 μL of the culture was spotted onto an LB agar plate supplemented with antibiotics, X-gal, and IPTG and incubated for 60 h at 28° before taking the photos. After the photos were taken, the bacteria on the plate were scraped off and resuspended with buffer, and then the LacZ activity of the bacteria was tested. According to our experience, culture at 28°(lower than 30°) is a critical condition, and we have not observed false positives in BTH assays under this condition.

      Reviewer #1 (Recommendations For The Authors):

      In addition to genetic and biochemical approaches, structural studies should be conducted to elucidate the molecular interaction between CheA and CsoR with/without copper.

      It would be more logical to first establish the role of CsoR in copper regulation and chemotaxis (the second part of this report) and then investigate its underpinning mechanism (the first part).

      Thank you for these recommendations. Structural analysis can reveal more details about the molecular mechanism of CheA-CsoR interaction, but we currently don’t have sufficient experimental conditions for such structural analysis.

      As for the presentation logic of the results, we wrote the manuscript following the sequence of experiments. Firstly, screening of CheA interacting proteins (pull-down assay) was conducted, and then the influence of interacting proteins on the chemotaxis of strains and CheA autophosphorylation activity was detected. Based on these results, we obtained two proteins, CsoR and PhaD, and decided to go deeper into the function of CsoR and its effect on chemotaxis. We considered that this writing logic reflected our research design better and could also lay a foundation for future exploration of the functions of other interacting proteins and the physiological significance of interactions.

      Reviewer #2 (Recommendations For The Authors):

      A huge amount of effort has gone into this work.

      It would be good to see at least one of the newly identified interactions turn out to be physiologically relevant.

      The experimental tools appear to be available to do this, but it is critical to consider how these tools can lead to attempts to prove rather than test and possibly refute a model or hypothesis. In particular, please consider some of the comments about the physiological relevance of affinities when generating models.

      Thank you for these recommendations. Our study aimed to screen new interacting proteins of CheA and explore how new interacting proteins affect CheA activity and bacterial chemotaxis, thereby broadening our understanding of chemotaxis. However, the impact of each protein-protein interaction has two sides: the influence of A to B and B to A. During experimental design, we focused more on the influence of identified interacting proteins on CheA function and chemotaxis but paid less attention to the function of interacting proteins and the influence of the interaction on their function. Moreover, our study found that the influence of protein-protein interaction was biased. In the interaction between CsoR and CheA, CsoR mainly affected the function of CheA and then affected the chemotaxis, while CheA had no significant effect on the function of CsoR. This might be attributed to the weak effect of CsoR in regulating metal resistance in P. putida, and we speculated that this interaction was more about favoring the sensing and avoiding metal stress. In addition, we planned to explore the interaction between CheA and another interacting protein (PhaD) in the future, reveal the effect of the interaction on PhaD function (regulation of PHAS synthesis in bacteria), and explore the effect of the interaction on CheA function and chemotaxis, to find out whether the association existed between PHAS anabolism and bacterial chemotaxis. Besides, for those proteins that did not have significant effects on CheA autophosphorylation and bacterial chemotaxis, we speculated that CheA might affect their function/activity through interactions, which meant that the physiological effects of the interaction mainly reflected through the interacting protein rather than CheA. These are speculations that need to be tested by experiments.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Rossi et al. asked whether gait adaptation is solely a matter of slow perceptual realignment or if it also involves fast/flexible stimulus-response mapping mechanisms. To test this, they conducted a series of split-belt treadmill experiments with ramped perturbations, revealing behavior indicative of a flexible, automatic stimulus-response mapping mechanism.

      Strengths:

      (1) The study includes a perceptual test of leg speed, which correlates with the perceptual realignment component of motor aftereffects. This indicates that there are motor performances that are not accounted for by perceptual re-alignment.

      (2) They study incorporates qualitatively distinct, hypothesis-driven models of adaptation and proposes a new framework that integrates these various mechanisms.

      Weaknesses:

      (1) The study could benefit from considering other alternative models. As the authors noted in their discussion, while the descriptive models explain some patterns of behaviour/aftereffects, they don't currently account for how these mechanisms influence the initial learning process itself.

      (1a) For example, the pattern of gait asymmetric might differ for perceptual realignment (a smooth, gradual process), structural learning (more erratic, involving hypothesis testing/reasoning to understand the perturbation, see (Tsay et al. 2024) for a recent review on Reasoning), and stimulus-response mapping (possibly through a reinforcement based trial-and-error approach). If not formally doing a model comparison, the manuscript might benefit from clearly laying out the behavioural predictions for how these different processes shape initial learning.

      (1b) Related to the above, the authors noted that the absence of difference during initial learning suggests that the differences in Experiment 2 in the ramp-up phase are driven by two distinct processes: structural learning and memory-based processes. If the assumptions about initial learning are not clear, this logic of this conclusion is hard to follow.

      Thank you for this insightful comment. We agree that considering alternative models and clarifying their potential contributions to the initial learning process would enhance the manuscript. We performed additional analyses and revised the text to outline how the mechanisms of adaptation in our study align with the framework described by Tsay et al. (2024) regarding the initial learning process and other features of adaptation.

      First, we referenced the Tsay et al. framework in the Introduction and Discussion to highlight parallels between their description of implicit adaptation and our forward model recalibration mechanism (producing motor changes and perceptual realignment). Specifically, the features defining recalibration in our study – gradual, trial-by-trial adjustments, rigid learning that leads to aftereffects, and limited contribution to generalization – align with those described by Tsay et al.

      Second, we used the description provided by Tsay et al. to test the presence of explicit strategies in our study. We specifically test for the criteria of reportability and intentionality, corroborating the finding that our stimulus response mapping mechanism differs from explicit strategies.

      “A recent framework for motor learning by Tsay et al. defines explicit strategies as motor plans that are both intentional and reportable (Tsay et al., 2024). Within this framework, Tsay et al. clarify that "intentional" means participants deliberately perform the motor plan, while "reportable" means they are able to clearly articulate it.” (Experiment 2 Results, lines 515-518).

      “…the motor adjustments reported by participants consistently fail to meet the criteria for explicit strategies as outlined by Tsay et al.: reportability and intentionality (Tsay et al., 2024).” (Discussion, lines 657-660).

      Third, we interpreted the operation of stimulus-response mapping within the Tsay theoretical framework for the three stages of motor learning: 1) “reasoning” to acquire new action–outcome relationships, 2) “refinement” of the motor action parameters, and 3) “retrieval” of learnt motor actions based on contextual cues. We note that the definition of these stages closely aligns with our definition for stimulus response mapping mechanisms. Moreover, according to Tsay’s definition, both implicit and explicit learning mechanisms can involve similar reasoning and retrieval processes. This shared operational basis may explain why our stimulus-response mapping mechanism exhibits some characteristics associated with explicit strategies, such as flexibility and generalizability.

      We performed a new analysis to evaluate Tsay’s framework predictions that, if walking adaptation includes a stimulus-response mapping mechanism following these three stages of motor learning, the learning process would initially be erratic and would then stabilize as learning progresses. We assessed within-participant residual variance in step length asymmetry around a double exponential model fit during adaptation, testing the prediction that this variability would decrease between the start and end of adaptation. Experiment 1 results confirmed this prediction, showing that a significant reduction in variability as adaptation progressed.

      “We finally tested whether the pattern of motor variability during adaptation aligns with predictions for learning new  stimulus response maps. In contrast to recalibration, mapping mechanisms are predicted to be highly  variable  and  erratic  during  early learning, and stabilize as learning progresses (Tsay et al., 2024). Consistent with these predictions,  the  step  length  asymmetry residual  variance  (around  a  double exponential  fit)  decreased  significantly between the start and end of adaptation (residual variance at start minus end of adaptation = 0.005 [0.004, 0.007], mean [CI]; SI Appendix, Fig. S3). These control analyses corroborate the hypothesis that the “no aftereffects” region of the Ramp Down reflects the operation of a mapping mechanism.”

      (Experiment 1 Results, lines 187-194; Methods, lines 1040-1050).

      Moreover, Experiment 2 results demonstrated that the pattern of variability (its magnitude and decay in adaptation) did not differ between participants using memory-based versus structure-based stimulus-response mapping mechanisms. These findings suggest that both types of mapping operate accordingly to Tsay’s stages of motor learning.

      “Furthermore, the pattern of step length asymmetry variability was similar between the subgroups (structure – memory difference in residual variance relative to double exponential during initial adaptation = -0.0052 [0.0161, 0.0044], adaptation plateau = -0.0007 [-0.0021, 0.0003], difference in variance decay = -0.0045 [-0.0155, 0.0052], mean [CI]; SI Appendix, Fig. S16). This confirms that the distinct performance clusters in the Ramp Up & Down task are not driven by natural variations in learning ability, such as differences in learning speed or variability. Rather, these findings indicate that the subgroups employ different types of mapping mechanisms, which perform similarly during initial learning but differ fundamentally in how they encode, retrieve, and generalize relationships between perturbations and Δ motor outputs.” (Experiment 2 Results, lines 503-511).

      “Both memory- and structure-based operations of mapping align with Tsay et al.’s framework for motor learning: first, action–outcome relationships are learned through exploration; second, motor control policies are refined to optimize rewards or costs, such as reducing error; and finally, learned mappings or policies are retrieved based on contextual cues (Tsay et al., 2024). Consistent with the proposed stages of exploration followed by refinement, we found that motor behavior during adaptation was initially erratic but became less variable at later stages of learning. Similarly, consistent with the retrieval stage, the generalization observed in the ramp tasks indicates that learned motor outputs are flexibly retrieved based on belt speed cues.” (Discussion, lines 701-708).

      Finally, we addressed the prediction outlined by Tsay et al. that repeated exposure to perturbations attenuates the magnitude of forward model recalibration, with savings being driven by stimulus-response mapping mechanisms. While we could not directly test savings for the primary perturbation used during adaptation, we were able to indirectly evaluate savings for a different perturbation through analyses of our control experiments combined with previous results from Leech et al. (Leech et al., 2018). Specifically, we examined how motor aftereffects and perceptual realignment evolved across repeated iterations of the speed-matching task post-adaptation in Ascending groups. Each task began with the right leg stationary and the left leg moving at 0.5 m/s – a configuration corresponding to a perturbation of -0.5 m/s, which is opposite in direction to the adaptation perturbation. By analyzing repeated exposures to this -0.5 m/s perturbation across iterations, we gained insights into the learning dynamics associated with this perturbation and the effect of repeated exposures on motor aftereffects and perceptual realignment. Consistent with predictions from Tsay et al., our results combined with Leech et al. demonstrate that, with repeated exposures to the same perturbation, perceptual realignment decays while the contribution of stimulus-response mapping to aftereffect savings is enhanced. We present this analysis and interpretation in Control Experiments Results, lines 429-442; Figure 8B; Table S7; and Discussion lines 709-753.

      (1c) The authors could also test a variant of the dual-rate state-space model with two perceptual realignment processes where the constraints on retention and learning rate are relaxed. This model would be a stronger test for two perceptual re-alignment processes: one that is flexible and another that is rigid, without mandating that one be fast learning and fast forgetting, and the other be slow learning and slow forgetting.

      We tested multiple variants of the suggested models, and confirmed that they cannot capture the motor behavior observed in our Ramp Down task. We include Author response image 1 with the models fits, Author response table 1 with the BIC statistics, and the models equations below. Only the recalibration + mapping model captures the matching-then-divergent behavior of the Δ motor output, corroborating our interpretation that state-space based models cannot capture the mapping mechanism (see Discussion, “Implications for models of adaptation”). Furthermore, all models fit the data significantly worse than the recalibration+mapping model according to the BIC statistic.

      Model fits:

      Author response image 1.

      Statistical results:

      Author response table 1.

      Model definitions:

      • DualStateRelaxed: same equations as the original Dual State, but no constraints dictating the relative relationship between the parameters

      • DualStateRelaxedV2: same equations as the original Dual State, but no constraints dictating the relative relationship between the parameters, and “loose” parameter bounds (parameters can take values between -10 to 10).

      • PremoOriginalRelaxed: PReMo with two states (see below), no constraints dictating the relative relationship between the parameters

      • PremoOriginalRelaxed: PReMo with two states (see below), no constraints dictating the relative relationship between the parameters, and “loose” parameter bounds (parameters can take values between -10 to 10).

      PReMo with two states – the remaining equations are the same as the original PReMo (see Methods):

      (2) The authors claim that stimulus-response mapping operates outside of explicit/deliberate control. While this could be true, the survey questions may have limitations that could be more clearly acknowledged.

      (2a) Specifically, asking participants at the end of the experiments to recall their strategies may suffer from memory biases (e.g., participants may be biased by recent events, and forget about the explicit strategies early in the experiment), be susceptible to the framing of the questions (e.g., participants not being sure what the experimenter is asking and how to verbalize their own strategy), and moreover, not clear what is the category of explicit strategies one might enact here which dictates what might be considered "relevant" and "accurate".

      (2b) The concept of perceptual realignment also suggests that participants are somewhat aware of the treadmill's changing conditions; therefore, as a thought experiment, if the authors have asked participants throughout/during the experiment whether they are trying different strategies, would they predict that some behaviour is under deliberate control?

      We have expanded the discussion to explicitly acknowledge that our testing methodology for assessing explicit strategies may have limitations, recognizing the factors mentioned by the reviewer. Moreover, as mentioned in response to comment (1), we leveraged the framework from Tsay et al., 2024 and its definition of explicit strategies to ensure a robust and consistent approach in interpreting the survey responses.

      We revised the Experiment 2 Results section, lines 515-518, to specify that we are evaluating the presence of explicit strategies according to the criteria of intentionality and reportability:

      “A recent framework for motor learning by Tsay et al. defines explicit strategies as motor plans that are both intentional and reportable (Tsay et al., 2024). Within this framework, Tsay et al. clarify that "intentional" means participants deliberately perform the motor plan, while "reportable" means they are able to clearly articulate it.”

      We then reorganized the Discussion to include a separate section “Mapping operates independently of explicit control”, lines 646-661, where we discuss limitations of the survey methodology and interpretation of the results according to Tsay et al., 2024:

      “Here, we show that explicit strategies are not systematically used to adapt step length asymmetry and Δ motor output: the participants in our study either did not know what they did, reported changes that did not actually occur or would not lead symmetry. Only one person reported “leaning” on the left (slow) leg for as much time as possible, which is a relevant but incomplete description for how to walk with symmetry. Four reports mentioned pressure or weight, which may indirectly influence symmetry (Hirata et al., 2019; Lauzière et al., 2014), but they were vague and conflicting (e.g., “making heavy steps on the right foot” or “put more weight on my left foot”). All other responses were null, explicitly wrong or irrelevant, or overly generic, like wanting to “stay upright” and “not fall down”. We acknowledge that our testing methodology has limitations. First, it may introduce biases related to memory recall or framing of the questionnaire. Second, while it focuses on participants' intentional use of explicit strategies to control walking, it does not rule out the possibility of passive awareness of motor adjustments or treadmill configurations. Despite these limitations, the motor adjustments reported by participants consistently fail to meet the criteria for explicit strategies as outlined by Tsay et al.: reportability and intentionality (Tsay et al., 2024). Together with existing literature, this supports the interpretation that stimulus response mapping operates automatically.”

      We also made the following addition to the “Limitations” section of the Discussion (lines 917-919):

      “While mapping differs from explicit strategies as they are currently defined, we still lack a comprehensive framework to capture the varying levels and nuanced characteristics of intentionality and awareness of different mechanisms (Tsay et al., 2024).”

      We finally note that “Unlike explicit strategies, which are rapidly acquired and diminish over time, this mapping mechanism exhibits prolonged learning beyond 15 minutes, with a rate comparable to recalibration” (Discussion, lines 632-634).

      (3) The distinction between structural and memory-based differences in the two subgroups was based on the notion that memory-based strategies increase asymmetry. However, an alternative explanation could be that unfamiliar perturbations, due to the ramping up, trigger a surprise signal that leads to greater asymmetry due to reactive corrections to prevent one's fall - not because participants are generalizing from previously learned representations (e.g., (Iturralde & Torres-Oviedo, 2019)).

      We agree that reactive corrections could contribute to the walking pattern in response to split-belt perturbations, as detailed by Iturralde & Torres-Oviedo, 2019. We also acknowledge that reactive corrections are rapid, flexible, feedback-driven, and automatic – characteristics that make them appear similar to stimulus-response mapping. However, a detailed evaluation of our results suggests that the behaviors observed in the ramp tasks cannot be fully explained by reactive corrections. Reactive corrections occur almost immediately, quickly adjusting the walking pattern to reduce error and improve stability. This excludes the possibility that what we identified as stimulusresponse mapping could instead be reactive corrections, because the stimulus-response mapping observed in our study is acquired slowly at a rate comparable to recalibration. It also excludes the possibility that the increased asymmetry in the Ramp Up & Down could be due to reactive corrections, because these would operate alongside mapping to help reduce asymmetry rather than exacerbate it.

      We made substantial revisions to the Discussion and included the section “Stimulus-response mapping is flexible but requires learning” to explain this interpretation (lines 595-622):

      “The mapping mechanism observed in our study aligns with the corrective responses described by Iturralde and Torres-Oviedo, which operate relative to a recalibrated "new normal" rather than relying solely on environmental cues (Iturralde and Torres-Oviedo, 2019). Accordingly, our findings suggest a tandem architecture: forward model recalibration adjusts the nervous system's "normal state," while stimulus-response mapping computes motor adjustments relative to this "new normal." This architecture explains the sharp transition from flexible to rigid motor adjustments observed in our Ramp Down task. The transition occurs at the configuration perceived as "equal speeds" (~0.5 m/s speed difference) because this corresponds to the recalibrated “new normal”.

      In the first half of the Ramp Down, participants adequately modulated their walking pattern to accommodate the gradually diminishing perturbation, achieving symmetric step lengths. Due to the recalibrated “new normal”, perturbations within this range are perceived as congruent with the direction of adaptation but reduced in magnitude. This allows the mapping mechanism to flexibly modulate the walking pattern by using motor adjustments previously learned during adaptation. Importantly, the rapid duration of the Ramp Down task rules out the possibility that the observed modulation may instead reflect washout, as confirmed by the fact the aftereffects measured post-Ramp-Down were comparable to previous work (Kambic et al., 2023; Reisman et al., 2005).

      In the second half of the Ramp Down, aftereffects emerged as participants failed to accommodate perturbations smaller than the recalibrated “new normal”. These perturbations were perceived as opposite to the adaptation perturbation and, therefore, novel. Accordingly, the mapping mechanism responded as it would to a newly introduced perturbation, rather than leveraging previously learned adjustments (Iturralde and Torres-Oviedo, 2019). Due to the rapid nature of the Ramp Down, the mapping mechanism lacked sufficient time to learn the novel motor adjustments required for these perturbations – a process that typically takes several minutes, as shown by our baseline ramp tasks and control experiments. As mapping-related learning was negligible, the rigid recalibration adjustments dominated during this phase. Consequently, the walking pattern did not change to accommodate the gradually diminishing perturbation, leading to the emergence of aftereffects.”

      (4) Further contextualization: Recognizing the differences in dependent variables (reaching position vs. leg speed/symmetry in walking), could the Proprioceptive/Perceptual Re-alignment model also apply to gait adaptation (Tsay et al., 2022; Zhang et al., 2024)? Recent reaching studies show a similar link between perception and action during motor adaptation (Tsay et al., 2021) and have proposed a model aligning with the authors' correlations between perception and action. The core signal driving implicit adaptation is the discrepancy between perceived and desired limb position, integrating forward model predictions with proprioceptive/visual feedback.

      We appreciate the reviewer’s suggestion and agree that the Proprioceptive Re-alignment model (PReMo) and Perceptual Error Adaptation model (PEA), offer valuable insights into the relationship between perception and motor adaptation. To explore whether these frameworks apply to gait adaptation, we conducted an extensive modeling analysis. This is shown in Figure 5 and Supplementary Figures S7-S8, and is detailed in the text of Experiment 1 Results section “Modelling analysis for perceptual realignment” (lines 327–375), Methods section “Proprioceptive re-alignment model (PReMo)” (lines 1181-1221), Methods section “Perceptual Error Adaptation model (PEA)” (lines 1222-1247), Methods section “Perceptuomotor recalibration + mapping (PM-ReMap)” (lines 1248-1286), and SI Appendix section “Evaluation and development of perceptual models.” (lines 99-237).

      First, we evaluated how PReMo and PEA models fitted our Ramp Down data. We translated the original variables to walking adaptation variables using a conceptual equivalence explained by one of the features explored by Tsay et al. (2022). Specifically, the manuscript provides guidance on extending the PReMo model from visuomotor adaptation in response to visual-proprioceptive discrepancies, to force-field adaptation in response to mechanical perturbations – which share conceptual similarities with split-belt treadmill perturbations. The manuscript also discusses that, if vision is removed, the proprioceptive shift decays back to zero according to a decay parameter. This description entails that proprioceptive shift cannot increase or develop in the absence of vision. We applied the models to split-belt adaptation in accordance with this information, as described in the SI Appendix: “PReMo variables equivalents for walking adaptation”. As reported in Experiment 1 Results “Modelling analysis for perceptual realignment” (lines 327–375) and Figure 5, neither PReMo nor PEA adequately captured the key features of our Ramp Down data: “The models could not capture the matching-then-divergent behavior of Δ motor output, performing significantly worse than the recalibration + mapping model (PReMo minus recalibration+mapping BIC difference = 24.591 [16.483, 32.037], PEA minus recalibration+mapping BIC difference = 6.834 [1.779, 12.130], mean [CI]). Furthermore, they could not capture the perceptual realignment and instead predicted that the right leg would feel faster than the left throughout the entire Ramp Down”.

      Second, we used simulations to confirm that PReMo and PEA cannot account for the perceptual realignment observed in our study, and to understand why. At adaptation plateau, PReMo predicts that perceived and actual step length asymmetry converge, as shown in Fig. S7A, top, and as detailed in the SI Appendix “Original PReMo simulations”. We found that this is because PReMo assumes that perceptual realignment arises specifically from mismatches between different sensory modalities. This assumption works for paradigms that introduce an actual mismatch between sensory modalities, such as visuomotor adaptation paradigms with a mismatch between vision and proprioception. This assumption also works for paradigms that indirectly introduce a mismatch between integrated sensory information from different sensory modalities. In force-field adaptation, both proprioceptive and visual inputs are present and realistic, but when these inputs are integrated with sensory predictions, the resulting integrated visual estimate is mismatched compared to the integrated proprioceptive estimate. In contrast, the assumption that perceptual realignment arises from sensory modalities mismatches does not work for paradigms that involve a single sensory modality. Split-belt adaptation only involves proprioception as no visual feedback is given, and perceptual realignment arises from discrepancies between predicted and actual motor outcomes, rather than between integrated sensory modalities.

      To overcome this limitation, we reinterpreted the variables of the PReMo model, while keeping the original equations, to account for realignment driven by mismatches of the same nature as the perturbation driving adaptation. As reported in the SI Appendix “Iterative simulations for the development of PM-ReMap”, the simulation (Fig. S7A, middle row) “showed perceptual realignment at adaptation plateau, addressing a limitation of the original model. However, it failed to account for the Ramp Down perceptual results, inaccurately predicting that belt speeds feel equal when they are actually equal (Fig. S7A, middle row, perceived perturbation decays alongside actual perturbation and converge to zero at the end of the Ramp Down). […] This occurs because, under the retained PReMo equations, β<sub>p</sub> and β<sub>v</sub> change immediately and are proportional to the difference between and on each trial, so that they ramp down to zero in parallel with the perturbation”.

      We also noted that the simulations of the original and reinterpreted PReMo models could also not support the operation of the mapping mechanism observed in the Ramp Down (Fig. S7B). We describe that “This occurs because the overall motor output x<sub>p</sub>, which includes both recalibration and mapping mechanisms, changes gradually according to the learning rate 𝐾. Consequently, changes in 𝐺 take many trials to be fully reflected in x<sub>p</sub>. Hence, we found complementary limitations where PReMo assumes perceptual realignment changes immediately while mapping adjustments develop gradually – but the opposite is true in our data”.

      We therefore modified the PReMo equations and developed a new model, called perceptuomotor recalibration + mapping (PM-ReMap) that addresses these limitations and is able to capture our Ramp Down motor and perceptual results. As described in the SI Appendix “Iterative simulations for the development of PM-ReMap”, “we introduced an update equation for β<sub>p</sub> so that it changes gradually trial-by-trial according to the learning rate 𝐾. We then removed the learning rate from the update equation for x<sub>p</sub> so that it integrates two distinct types of changes: 1) the gradual changes in driven by β<sub>p</sub> and representing the recalibration mechanism, and 2) the immediate changes in 𝐺 – representing the mapping mechanism”. The final equations of the PM-ReMap model are as follows:

      As reported in Experiment 1 Results, “Modelling analysis for perceptual realignment”, and as shown in Fig. 5C, “the PM-ReMap model captured the Δ motor output in the Ramp Down with performance comparable to that of the recalibration + mapping model (BIC difference = 2.381 [-0.739, 5.147], mean [CI]). It also captured perceptual realignment, predicting that some intermediate belt speed difference in the Ramp Down is perceived as “equal speeds” (, Fig. 5C)”. We also found that the estimated aligned with the empirical measurement of the PSE in the Ramp Down both at group and individual level: “At group level, was comparable to the upper bound of compensation<sub>perceptual</sub> (difference = -7 [-15, 1]%, mean [CI]), but significantly larger than the lower bound (difference = 19 [8, 31]%, mean [CI]). Furthermore, we found a significant correlation between individual participants’ and their upper bound of compensation<sub>perceptual</sub> (r=0.63, p=0.003), but not their lower bound (r=0.30, p=0.203). Both sets of results are consistent with those observed for the recalibration + mapping model”.

      Based on these findings, we summarize that PM-ReMap “extends the recalibration + mapping model by incorporating the ability to account for forgetting – typical of state space models – while still effectively capturing both recalibration and mapping mechanisms. However, performance of the PM-ReMap model does not exceed that of the simpler recalibration + mapping model, suggesting that forgetting and unlearning do not have a substantial impact on the Ramp Down”.

      Reviewer #2 (Public review):

      Recent findings in the field of motor learning have pointed to the combined action of multiple mechanisms that potentially contribute to changes in motor output during adaptation. A nearly ubiquitous motor learning process occurs via the trial-by-trial compensation of motor errors, often attributed to cerebellar-dependent updating. This error-based learning process is slow and largely unconscious. Additional learning processes that are rapid (e.g., explicit strategy-based compensation) have been described in discrete movements like goal-directed reaching adaptation. However, the role of rapid motor updating during continuous movements such as walking has been either under-explored or inconsistent with those found during the adaptation of discrete movements. Indeed, previous results have largely discounted the role of explicit strategy-based mechanisms for locomotor learning. In the current manuscript, Rossi et al. provide convincing evidence for a previously unknown rapid updating mechanism for locomotor adaptation. Unlike the now well-studied explicit strategies employed during reaching movements, the authors demonstrate that this stimulus-response mapping process is largely unconscious. The authors show that in approximately half of subjects, the mapping process appears to be memory-based while the remainder of subjects appear to perform structural learning of the task design. The participants that learned using a structural approach had the capability to rapidly generalize to previously unexplored regions of the perturbation space.

      One result that will likely be particularly important to the field of motor learning is the authors' quite convincing correlation between the magnitude of proprioceptive recalibration and the magnitude error-based updating. This result beautifully parallels results in other motor learning tasks and appears to provide a robust marker for the magnitude of the mapping process (by means of subtracting off the contribution of error-based motor learning). This is a fascinating result with implications for the motor learning field well beyond the current study.

      A major strength of this manuscript is the large sample size across experiments and the extent of replication performed by the authors in multiple control experiments.

      Finally, I commend the authors on extending their original observations via Experiment 2. While it seems that participants use a range of mapping mechanisms (or indeed a combination of multiple mapping mechanisms), future experiments may be able to tease apart why some subjects use memory versus structural mapping. A future ability to push subjects to learn structurally-based mapping rules has the potential to inform rehabilitation strategies.

      Overall, the manuscript is well written, the results are clear, and the data and analyses are convincing. The manuscript's weaknesses are minor, mostly related to the presentation of the results and modeling.

      Weaknesses:

      The overall weaknesses in the manuscript are minor and can likely be addressed with textual changes.

      (1) A key aspect of the experimental design is the speed of the "ramp down" following the adaptation period. If the ramp-down is too slow, then no after-effects would be expected even in the alternative recalibration-only/errorbased only hypothesis. How did the authors determine the appropriate rate of ramp-down? Do alternative choices of ramp-down rates result in step length asymmetry measures that are consistent with the mapping hypothesis?

      We thank the reviewer for their insightful comment regarding the rate of the Ramp Down following the adaptation period and its potential impact on aftereffects under different hypotheses. We added a detailed explanation for how we determined the Ramp Down design, including analyses of previous work, to the SI Appendix, “Ramp Down design”, lines 22-98. We also describe the primary points in the main Methods section, “Ramp Tasks”, lines 978-991:

      As described in SI Appendix, “Ramp Down design”, the Ramp Down task was specifically designed to measure the pattern of aftereffects in a way that ensured reliable and robust measurements with sufficient resolution across speeds, and that minimized washout to prevent confounding the results. To balance time constraints with a measurement resolution adequate for capturing perceptual realignment, we used 0.05 m/s speed decrements, matching the perceptual sensitivity estimated from our re-analysis of the baseline data from Leech et al. (Leech et al., 2018a). To obtain robust motor aftereffect measurements, we collected three strides at each speed condition, as averaging over three strides represents the minimum standard for consistent and reliable aftereffect estimates in split-belt adaptation (typically used in catch trials) (Leech et al., 2018a; Rossi et al., 2019; Vazquez et al., 2015). To minimize unwanted washout by forgetting and/or unlearning, we did not pause the treadmill between adaptation and the post-adaptation ramp tasks, and ensured the Ramp Down was relatively quick, lasting approximately 80 seconds on average. Of note, the Ramp Down design ensures that even in cases of partial forgetting, the emergence pattern of aftereffects remains consistent with the underlying hypotheses.

      In the SI Appendix, we explain that, while we did not test longer ramp-down durations directly, previous data suggest that durations of up to at least 4.5 minutes would yield step length asymmetry measures consistent with our results and the mapping hypothesis. Additionally, our control experiments replicated the behavior observed in the Ramp Down using speed match tasks lasting only 30 seconds, further supporting the robustness of our findings across varying durations.

      (2) Overall, the modeling as presented in Figure 3 (Equation 1-3) is a bit convoluted. To my mind, it would be far more useful if the authors reworked Equations 1-3 and Figure 3 (with potential changes to Figure 2) so that the motor output (u) is related to the stride rather than the magnitude of the perturbation. There should be an equation relating the forward model recalibration (i.e., Equation 1) to the fraction of the motor error on a given stride, something akin to u(k+1) = r * (u(k) - p(k)). This formulation is easier to understand and commonplace in other motor learning tasks (and likely what the authors actually fit given the Smith & Shadmehr citation and the derivations in the Supplemental Materials). Such a change would require that Figure 3's independent axes be changed to "stride," but this has the benefit of complementing the presentation that is already in Figure 5.

      We reworked these equations (now numbered 4-6, lines 207-209) so that the motor output u is related to stride k as suggested by the reviewer:

      We changed Figure 2 and Figure 3 accordingly, adding a “stride” x-axis to the Ramp Down data figure.

      Reviewer #2 (Recommendations for the authors):

      I think that some changes to the text/ordering could improve the manuscript's readability. In particular:

      (1) My feeling is that much of the equations presented in the Methods section should be moved to the Results section. Particularly Equations 9-11. The introduction of these motor measures should likely precede Figure 1, as their definitions form the crux of Figure 1 and the subsequent analyses.

      (2) It is unclear to me why many of the analyses and discussion points have been relegated to Supplemental Material. I would significantly revise the manuscript to move much of the content from Supplemental Material to the Methods and Discussion (where appropriate). Even the Todorov and Herzfeld models can likely simply be referenced in the text without a need for their full description in the Supplemental material - as their implementations appear to this reviewer as consistent with those presented in the respective papers. Beyond the Supplementary Tables, my feeling is that nearly all of the content in Supplemental can either be simply cited (e.g. alternative model implementations) or directly incorporated into the main manuscript without compromising the readability of the manuscript.

      We reorganized the manuscript and SI Appendix substantially, moving content to the Results or other main text section. The changes included those recommended by the reviewer:

      • We moved the equations describing step length asymmetry, perturbation, and Δ motor output (originally numbered Eq. 9-11) to the Results section (Experiment 1, “Motor paradigm and hypothesis”, lines 131-133, now numbered Eq. 1-3).

      • We moved Supplementary Methods to the main Methods section

      • We moved the most relevant content of the Supplementary Discussion to the main Discussion, and removed the less relevant content altogether.

      • We moved the methods describing walking-adaptation specific implementation of the Todorov and Herzfeld models to the main Methods section and removed the portions that were identical to the original implementation.

      • We moved the control experiments to the main text (main Results and Methods sections).

      • We removed the SI Appendix section “Experiment 1 mechanisms characteristics”

      Reviewer #3 (Public review):

      Summary:

      In this work, Rossi et al. use a novel split-belt treadmill learning task to reveal distinct sub-components of gait adaptation. The task involved following a standard adaptation phase with a "ramp-down" phase that helped them dissociate implicit recalibration and more deliberate SR map learning. Combined with modeling and re-analysis of previous studies, the authors show multiple lines of evidence that both processes run simultaneously, with implicit learning saturating based on intrinsic learning constraints and SR learning showing sensitivity to a "perceptual" error. These results offer a parallel with work in reaching adaptation showing both explicit and implicit processes contributing to behavior; however, in the case of gait adaptation the deliberate learning component does not appear to be strategic but is instead a more implicit SR learning processes.

      Strengths:

      (1) The task design is very clever and the "ramp down" phase offers a novel way to attempt to dissociate competing models of multiple processes in gait adaptation.

      (2) The analyses are thorough, as is the re-analysis of multiple previous data sets.

      (3) The querying of perception of the different relative belt speeds is a very nice addition, allowing the authors to connect different learning components with error perception.

      (4) The conceptual framework is compelling, highlighting parallels with work in reaching but also emphasizing differences, especially w/r/t SR learning versus strategic behaviors. Thus the discovery of an SR learning process in gait adaptation would be both novel and also help conjoin different siloed subfields of motor learning research.

      Weaknesses:

      (1) The behavior in the ramp-down phase does indeed appear to support multiple learning processes. However, I may have missed something, but I have a fundamental worry about the specific modeling and framing of the "SR" learning process. If I correctly understand, the SR process learns by adjusting to perceived L/R belt speed differences (Figure 7). What is bugging me is why that process would not cause the SR system to still learn something in the later parts of the ramp-down phase when the perceived speed differences flip (Figure 4). I do believe this "blunted learning" is what the SR component is actually modeled with, given this quote in the caption to Figure 7: "When the perturbation is perceived to be opposite than adaptation, even if it is not, mapping is zero and the Δ motor output is constant, reflecting recalibration adjustments only." It seems a priori odd and perhaps a little arbitrary to me that a SR learning system would just stop working (go to zero) just because the perception flipped sign. Or for that matter "generalize" to a ramp-up (i.e., just learn a new SR mapping just like the system did at the beginning of the first perturbation). What am I missing that justifies this key assumption? Or is the model doing something else? (if so that should be more clearly described).

      We concur that this point was confusing, and we performed additional analyses and revised the text to improve clarity. Specifically, we clarify that the stimulus-response mapping does indeed still learn in the second portion of the Ramp Down, when the perceived speed differences flip. However, learning by the mapping mechanism proceeds slowly – at a rate comparable to that of forward model recalibration, taking several minutes. The duration of the task is relatively short, so that learning by the mapping mechanism is limited. We schematize the learning to be zero as an approximation. We have now included an additional modelling analysis (as part of our expanded perceptual modelling analyses), which shows there is no significant improvement in modelling performance when accounting for forgetting of recalibration or learning in the opposite direction by mapping in the second half of the ramp down, supporting this approximation. We explain this and other revisions in detail below.

      We include a Discussion section “Stimulus-response mapping is flexible but requires learning” where we improve our explanation of the operation of the mapping mechanism in the Ramp Down by leveraging the framework proposed by Iturralde and Torres-Oviedo, 2019. The section first explains that mapping operates relative to a new equilibrium corresponding to the current forward model calibration (lines 595-603):

      “The mapping mechanism observed in our study aligns with the corrective responses described by Iturralde and Torres-Oviedo, which operate relative to a recalibrated "new normal" rather than relying solely on environmental cues (Iturralde and Torres-Oviedo, 2019). Accordingly, our findings suggest a tandem architecture: forward model recalibration adjusts the nervous system's "normal state," while stimulus-response mapping computes motor adjustments relative to this "new normal." This architecture explains the sharp transition from flexible to rigid motor adjustments observed in our Ramp Down task. The transition occurs at the configuration perceived as "equal speeds" (~0.5 m/s speed difference) because this corresponds to the recalibrated “new normal”.”

      The following paragraph (lines 604-611) explain how this concept reflects in the first half of the Ramp Down:

      “In the first half of the Ramp Down, participants adequately modulated their walking pattern to accommodate the gradually diminishing perturbation, achieving symmetric step lengths. Due to the recalibrated “new normal”, perturbations within this range are perceived as congruent with the direction of adaptation but reduced in magnitude. This allows the mapping mechanism to flexibly modulate the walking pattern by using motor adjustments previously learned during adaptation. Importantly, the rapid duration of the Ramp Down task rules out the possibility that the observed modulation may instead reflect washout, as confirmed by the fact the aftereffects measured post-Ramp-Down were comparable to previous work (Kambic et al., 2023; Reisman et al., 2005).”

      The last paragraph (lines 612–622) explain the second half of the Ramp Down in light of the equilibrium concept and of the slow learning rate of mapping:

      “In the second half of the Ramp Down, aftereffects emerged as participants failed to accommodate perturbations smaller than the recalibrated “new normal”. These perturbations were perceived as opposite to the adaptation perturbation and, therefore, novel. Accordingly, the mapping mechanism responded as it would to a newly introduced perturbation, rather than leveraging previously learned adjustments (Iturralde and TorresOviedo, 2019). Due to the rapid nature of the Ramp Down, the mapping mechanism lacked sufficient time to learn the novel motor adjustments required for these perturbations – a process that typically takes several minutes, as shown by our baseline ramp tasks and control experiments. As mapping-related learning was negligible, the rigid recalibration adjustments dominated during this phase. Consequently, the walking pattern did not change to accommodate the gradually diminishing perturbation, leading to the emergence of aftereffects.”

      We also revised the Discussion section “Mapping operates as memory-based in some people, structure-based in others”, to clarify the processes of interpolation and extrapolation (lines 689-700). This revision helps explain why mapping may generalize to a ramp-up faster than learning a perturbation perceived in the opposite direction (when considered together with the explanation that mapping operates relative to the new recalibrated equilibrium) In the former case (generalize to a ramp-up), a structure-based mapping can use the extrapolation computation: it leverages previous knowledge of which gait parameters should be modified and how – e.g., modulating the positioning our right foot to be more forward on the treadmill – but must extrapolate the specific parameter values – e.g., how more far forward. In the latter case (learning a perturbation perceived in the opposite direction), even a structure-based mapping would need to figure out what gait parameters to change completely anew – e.g., modulating the positioning of the foot in the opposite way, to be less forward, requires a different set of control policies.

      We mentioned above that this illustration of the mapping mechanism relies on the assumption that the additional learning of the mapping mechanism in the second half of the Ramp Down is negligible. As part of our revisions for the “Modelling analysis for perceptual realignment”, we developed a new model – the perceptuomotor recalibration + mapping model (PM-ReMap) that extends the recalibration + mapping model by accounting for the possibility that Δ motor output is not constant in the second half of the Ramp Down (main points are at lines 355-275, and Figure 5; see response to Reviewer #1 (Public review), Comment 4, for a detailed explanation). We find that performance of the PM-ReMap model does not exceed that of the simpler recalibration + mapping model, suggesting that the Δ motor output does not change substantially in the second half of the Ramp Down. Note that, if the Δ motor output decayed in this phase, it could be due to forgetting or unlearning of the recalibration mechanism, or also it could be due to the mapping mechanism learning in the opposite direction than it did in adaptation. In the Results section, we focused on describing recalibration forgetting/unlearning for simplicity. However, in the Discussion section “Mapping may underly savings upon re-exposure to the same or different perturbation”, we explain in detail how the motor aftereffects also depend on the mapping mechanism learning in the opposite direction, as corroborated by our Control experiments and previous work. Therefore, the finding that the PM-ReMap model performance does not exceed that of the simpler recalibration + mapping model suggest that both effects – recalibration forgetting/unlearning and opposite-direction-learning of mapping – are not significant, nor is their combined effect on the Δ motor output.

      (2) A more minor point, but given the sample size it is hard to be convinced about the individual difference analysis for structure learning (Figure 5). How clear is it that these two groups of subjects are fully separable and not on a continuum? The lack of clusters in another data set seems like a somewhat less than convincing control here.

      We performed an additional analysis – a silhouette analysis – to confirm the presence of these clusters in our data (Methods, lines 1070-1072). The results, reported in Experiment 2 Results, lines 487-490, confirmed that there is strong evidence for the presence of these clusters:

      “A silhouette analysis confirmed strong evidence for these clusters: the average silhouette score was 0.90, with 19 of 20 participants scoring above 0.7 – considered strong evidence – and one scoring between 0.5 and 0.7 – considered reasonable evidence (Dalmaijer et al., 2022; Kaufman and Rousseeuw, 1990; Rousseeuw, 1987).”

      Reviewer #3 (Recommendations for the authors):

      (1) I think there is far too much content pushed into the supplement. The other models and full model comparison should be in the main text, as should the re-analysis of previous data sets. Also, key discussion points should not be in the supplement either.

      We reorganized the manuscript and SI Appendix substantially, including the changes recommended by the reviewer. Please refer to our response to “Reviewer #2 - Recommendations for the authors” for a detailed explanation.

      (2) Line 649: in reaching the calibration system does respond to different error sizes; why not here?

      We apologize for the confusion. Similar to reaching adaptation, the recalibration in walking adaptation also scales based on the error size experienced in adaptation. What we meant to convey is that, once a calibration has been acquired in adaptation, the recalibration process is rigid in that it can only change gradually. So if we jump the perturbation to a different value, the original calibration is transiently used until the system has the time to recalibrate again. For example, if we jump abruptly from the adaptation perturbation to a perturbation of zero in postadaptation, the adaptation calibration persists resulting in aftereffects.

      We revised the manuscript to clarity these points. First, we explicitly report that forward model recalibration scales based on the error size experienced in adaptation:

      “We next compared Medium Descend and Small Abrupt (1m/s or 0.4m/s perturbation), and found that recalibration contributed significantly more for the smaller perturbation (larger compensation<sub>perceptual</sub> / compensation<sub>motor-total</sub> in Small Abrupt than Medium Descend, Fig. 8A middle and Table S6).” (Control experiments Results, lines 422-425)

      “the mapping described here shares some characteristics with explicit mechanisms, such as flexibility and modulation by error size” (Discussion, lines 630-631)

      Additionally, we leverage the framework proposed by Tsay et al., 2024, to improve our explanation of the characteristics of the different learning mechanisms. Please refer to our response to “Reviewer #1 (Public review)”, Comment (1).

      (3) It would be nice to see bar graphs showing model comparison results for each individual subject in the main text, and to see how many subjects are best fit by the SR+calibration model.

      We included the recommended bar graphs to Figure 3 and Figure 5.

      (4) Why exactly does the "perturbation" in Figure 3 have error bars?

      In walking adaptation, the perturbation that participants experienced is closely dictated by the treadmill belt speeds, but not exactly, because participants are free to move their feet as they like, so that their ankle movement may not always match the treadmill belts exactly. Therefore, we record the perturbation that is actually experienced by each participant’s feet using markers. We then display the mean and standard error of this perturbation.

      We moved the equation describing the perturbation measure from the Methods to the Experiment 1 Results (lines 131-133, Eq. 1-3). We believe this change will help the reader understand the measures depicted.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable work provides a near-complete description of the mechanosensory bristles on the Drosophila melanogaster head and the anatomy and projection patterns of the bristle mechanosensory neurons that innervate them. The data presented are solid. The study has generated numerous invaluable resources for the community that will be of interest to neuroscientists in the field of circuits and behaviour, particularly those interested in mechanosensation and behavioural sequence generation.

      We express our gratitude to the Reviewers for their valuable suggestions, which significantly enhanced the manuscript. The revisions were undertaken, not with the expectation of acceptance, but rather driven by our sincere belief that these revisions would enhance the manuscript's impact for future readers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Sensory neurons of the mechanosensory bristles on the head of the fly project to the sub esophageal ganglion (SEZ). In this manuscript, the authors have built on a large body of previous work to comprehensively classify and quantify the head bristles. They broadly identify the nerves that various bristles use to project to the SEZ and describe their region-specific innervation in the SEZ. They use dye-fills, clonal labelling, and electron microscopic reconstructions to describe in detail the phenomenon of somatotopy - conserved peripheral representations within the central brain - within the innervation of these neurons. In the process they develop novel tools to access subsets of these neurons. They use these to demostrate that groups of bristles in different parts of the head control different aspects of the grooming sequence.

      Reviewer #2 (Public Review):

      The authors combine genetic tools, dye fills and connectome analysis techniques to generate a "first-of-its-kind", near complete, synaptic resolution map of the head bristle neurons of Drosophila. While some of the BMN anatomy was already known based on previous work by the authors and other researchers, this is the first time a near complete map has been created for the head BMNs at electron microscopy resolution.

      Strengths:

      (1) The authors cleverly use techniques that allow moving back and forth between periphery (head bristle location) and brain, as well as moving between light microscopy and electron microscopy data. This allows them to first characterize the pathways taken by different head BMNs to project to the brain and also characterize anatomical differences among individual neurons at the level of morphology and connectivity.

      (2) The work is very comprehensive and results in a near complete map of all I’m head BMNs.

      (3) Authors also complement this anatomical characterization with a first-level functional analysis using optogenetic activation of BMNs that results in expected directed grooming behavior.

      Weaknesses:

      (1) The clustering analysis is compelling but cluster numbers seem to be arbitrarily chosen instead of by using some informed metrics.

      We made revisions to the manuscript that address this concern. Please see our response to “recommendations for authors” for a description of these revisions.

      (2) It could help provide context if authors revealed some of the important downstream pathways that could explain optogenetics behavioral phenotypes and previously shown hierarchical organization of grooming sequences.

      We made revisions to the manuscript that address this recommendation. Please see our response to “recommendations for authors” for a description of these revisions.

      (3) In contrast to the rigorous quantitative analysis of the anatomical data, the behavioral data is analyzed using much more subjective methods. While I do not think it is necessary to perform a rigorous analysis of behaviors in this anatomy focused manuscript, the conclusions based on behavioral analysis should be treated as speculative in the current form e.g. calling "nodding + backward walking" as an avoidance response is not justified as it currently stands. Strong optogenetic activation could lead to sudden postural changes that due to purely biomechanical constraints could lead to a couple of backward steps as seen in the example videos. Moreover since the quantification is manual, it is not clear what the analyst interprets as backward walking or nodding. Interpretation is also concerning because controls show backward walking (although in fewer instances based on subjective quantification).

      While unbiased machine vision-based methods would nicely complement the present work, this type of analysis is not yet working to distinguish between different head grooming movements. Therefore, we are currently limited to manual annotation for our behavioral analysis. That said, we do not believe that our manual annotation is subjective. The grooming movements that we examine in this work are distinguishable from each other through frame-by-frame manual annotation of video at 30 fps. Our annotation of the grooming and backward motions performed by flies are based on previous publications that established a controlled vocabulary defining each movement (Hampel et al., 2020a, 2017, 2015; Seeds et al., 2014). In this work, we added head nodding to this controlled vocabulary that is described in the Materials and methods. We have added additional text to the third paragraph of the Material and methods section entitled “Behavioral analysis procedures” that we hope better describes our behavioral analysis. This description now reads:

      Head nodding was annotated when the fly tilted its head downward by any amount until it returned its head back in its original position. This movement often occurred in repeated cycles. Therefore, the “start” was scored at the onset of the first forward movement and the “stop” when the head returned to its original position on the last nod.

      We do not make any firm conclusions about the head movements (nodding) and backwards motions. We refer to nodding as a descriptive term that would allow the reader to better understand what the behavior looks like. We make no firm conclusions about any behavioral functional role that either the nodding or the backward motions might have, with the exception of nodding in the context of grooming. We only suggest that the behaviors appear to be avoidance responses. Furthermore, backward walking was not mentioned. Instead we refer to backward motions. We are only reporting our annotations of these movements that do occur, and are significantly different from controls. We speculate that these could be avoidance responses based on support from the literature. Future studies will be required to understand whether these movements serve real behavioral roles.

      Summary:

      The authors end up generating a near-complete map of head BMNs that will serve as a long-standing resource to the Drosophila research community. This will directly shape future experiments aimed at modeling or functionally analyzing the head grooming circuit to understand how somatotopy guides behaviors.

      Reviewer #3 (Public Review):

      Eichler et al. set out to map the locations of the mechanosensory bristles on the fly head, examine the axonal morphology of the bristle mechanosensory neurons (BMNs) that innervate them, and match these to electron microscopy reconstructions of the same BMNs in a previously published EM volume of the female adult fly brain. They used BMN synaptic connectivity information to create clusters of BMNs that they show occupy different regions of the subesophageal zone brain region and use optogenetic activation of subsets of BMNs to support the claim that the morphological projections and connectivity of defined groups of BMNs are consistent with the parallel model for behavioral sequence generation.

      The authors have beautifully cataloged the mechanosensory bristles and the projection paths and patterns of the corresponding BMN axons in the brain using detailed and painstaking methods. The result is a neuroanatomy resource that will be an important community resource. To match BMNs reconstructed in an electron microscopy volume of the adult fly brain, the authors matched clustered reconstructed BMNs with light-level BMN classes using a variety of methods, but evidence for matching is only summarized and not demonstrated in a way that allows the reader to evaluate the strength of the evidence. The authors then switch from morphology-based categorization to non-BMN connectivity as a clustering method, which they claim demonstrates that BMNs form a somatotopic map in the brain. This map is not easily appreciated, and although contralateral projections in some populations are clear, the distinct projection zones that are mentioned by the authors are not readily apparent. Because of the extensive morphological overlap between connectivity-based clusters, it is not clear that small projection differences at the projection level are what determines the post-synaptic connectivity of a given BMN cluster or their functional role during behavior. The claim the somatotopic organization of BMN projections is preserved among their postsynaptic partners to form parallel sensory pathways is not supported by the result that different connectivity clusters still have high cosine similarity in a number of cases (i.e. Clusters 1 and 3, or Clusters 1 and 2). Finally, the authors use tools that were generated during the light-level characterization of BMN projections to show that specifically activating BMNs that innervate different areas of the head triggers different grooming behaviors. In one case, activation of a single population of sensory bristles (lnOm) triggers two different behaviors, both eye and dorsal head grooming. This result does not seem consistent with the parallel model, which suggests that these behaviors should be mutually exclusive and rely on parallel downstream circuitry.

      We made revisions to the manuscript that address this recommendation. Please see our response to “recommendations for authors” for a description of these revisions.

      This work will have a positive impact on the field by contributing a complete accounting of the mechanosensory bristles of the fruit fly head, describing the brain projection patterns of the BMNs that innervate them, and linking them to BMN sensory projections in an electron microscopy volume of the adult fly brain. It will also have a positive impact on the field by providing genetic tools to help functionally subdivide the contributions of different BMN populations to circuit computations and behavior. This contribution will pave the way for further mechanistic study of central circuits that subserve grooming circuits.

      Recommendations for the authors:

      All three reviewers appreciated the work presented in this manuscript. There were also a few overlapping concerns that were raised that are summarised below, should the authors wish to address them:

      Somatotopy: We recommend that the authors describe the extent of prior knowledge in more detail to highlight their contribution better.

      We made revisions that better highlight the extent of prior knowledge about somatotopy. We describe how previous studies showed bristle mechanosensory neurons in insects are somatotopically organized, but these studies were not comprehensive descriptions of complete somatotopic maps for the head or body. To our knowledge, our study provides the first comprehensive and synaptic resolution somatotopic map of a head for any animal. This sets the stage for the complete definition of the interface between somatotopically-organized mechanosensory neurons and postsynaptic circuits, which has broad implications for future studies on aimed grooming, and mechanosensation in general. Below we itemize revisions to the Introduction, Discussion, and Figures to provide a clearer statement of the significance of our study as it relates to somatotopy.

      (1) Newly added Figure 1 – figure supplement 1 more explicitly grounds the study in somatotopy, providing a working model of the organization of the circuit pathways that produce the grooming sequence. This model features somatotopy as shown in Figure 1 – figure supplement 1C.

      (2) Figure 1 – figure supplement 1 is incorporated into the Introduction in the second, third, and fourth paragraphs, the first paragraph of the Results section titled “Somatotopically-organized parallel BMN pathways”, and the second and third paragraphs of the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence”.

      (3) We added text to the end of the fourth paragraph of the Introduction that now reads: “In this model, parallel-projecting mechanosensory neurons that respond to stimuli at specific locations on the head or body could connect with somatotopically-organized parallel circuits that elicit grooming of those locations (Figure 1 – figure supplement 1A-C). The previous discovery of a mechanosensory-connected circuit that elicits aimed grooming of the antennae provides evidence of this organization (Hampel 2015). However, the extent to which distinct circuits elicit grooming of other locations is unknown, in part, because the somatotopic projections of the mechanosensory neurons have not been comprehensively defined for the head or body.”

      (4) There is a Discussion section that further explains the extent of prior knowledge and our contributions on somatotopy that is titled “A synaptic resolution somatotopic map of the head BMNs”. Additionally, the previous version of this section had a paragraph on the broader implications of our work as it relates to somatotopy across species. In light of the reviewer comments, we decided to make this paragraph into its own Discussion section to better highlight the broader significance of our work. This section is titled “First synaptic resolution somatotopic map of the head”.

      The somatotopy isn't overtly obvious - perhaps they could try mapping presynaptic sites and provide landmarks to improve visualisation.

      We made the following revisions to better highlight the head BMN somatotopy. One point of confusion from the previous manuscript version stemmed from us not explicitly defining the somatotopic organization that we observed. There seemed to be confusion that we were defining the head somatotopy based only on the small projection differences among BMNs from neighboring head locations. While we believe that these small differences indeed correspond to somatotopy, we failed to highlight that there are overt differences in the brain projections of BMNs from distant locations on the head. For example, Figure 5B (right panel) shows the distinct projections between the LabNv (brown) and AntNv (blue) BMNs that innervate bristles on the ventral and dorsal head, respectively. Thus, BMN types innervating neighboring bristles show overlapping projections with small projection differences, whereas those innervating distant bristles show non overlapping projections into distinct zones.

      Our analysis of postsynaptic connectivity similarity also shows somatotopic organization among the BMN postsynaptic partners, as BMN types innervating the same or neighboring bristle populations show high connectivity similarity (Figure 8, old Figure 7). Below we highlight major revisions to the text and Figures that hopefully better reveal the head somatotopy.

      (1) In the last paragraph of the Introduction we added text that explicitly frames the experiments in terms of somatotopic organization: “This reveals somatotopic organization, where BMNs innervating neighboring bristles project to the same zones in the CNS while those innervating distant bristles project to distinct zones. Analysis of the BMN postsynaptic connectome reveals that neighboring BMNs show higher connectivity similarity than distant BMNs, providing evidence of somatotopically organized postsynaptic circuit pathways.”

      (2) We mention an example of overt somatotopy from Figure 5 in the Results section titled “EM-based reconstruction of the head BMN projections in a full adult brain”. The text reads “For example, BMNs from the Eye- and LabNv have distinct ventral and anterior projections, respectively. This shows how the BMNs are somatotopically organized, as their distinct projections correspond to different bristle locations on the head (Figure 5B,C).”

      (3) In new Figure 8 (part of old Figure 7), we modified panels that correspond to the cosine similarity analysis of postsynaptic connectivity. The major revision was to plot the cosine similarity clusters onto the head bristles so that the bristles are now colored based on their clusters (C). This shows how neighboring BMNs cluster together, and therefore show similar postsynaptic connectivity. We believe that this provides a nice visualization of somatotopic organization in BMN postsynaptic connectivity. We also added the clustering dendrogram as recommended by Reviewer #2 (Figure 8A).

      (4) In new Figure 8, we added new panels (D-F) that summarize our anatomical and connectomic analysis showing different somatotopic features of the head BMNs. Different BMN types innervate bristles at neighboring and distant proximities (D). BMNs that innervate neighboring bristles project into overlapping zones (E, example of reconstructed BM-Fr and -Ant neurons with non-overlapping BM-MaPa neurons) and show postsynaptic connectivity similarity (F, example connectivity map of three BM types on cosine similarity data).

      (5) To accompany the new Figure 8D-F panels, we added a paragraph to summarize the different somatotopic features of the head BMNs that were identified based on our anatomical and connectomic analysis. This is the last paragraph in the Results section titled “Somatotopically-organized parallel BMN pathways”:

      Our results reveal head bristle proximity-based organization among the BMN projections and their postsynaptic partners to form parallel mechanosensory pathways. BMNs innervating neighboring bristles project into overlapping zones in the SEZ, whereas those innervating distant bristles project to distinct zones (example of BM-Fr, -Ant, and -MaPa neurons shown in Figure 8D,E). Cosine similarity analysis of BMN postsynaptic connectivity revealed that BMNs innervating the same bristle populations (same types) have the highest connectivity similarity. Figure 8F shows example parallel connections for BM-Fr, -Ant, and -MaPa neurons (vertical arrows), where the edge width indicates the number of synapses from each BMN type to their major postsynaptic partners. Additionally, BMNs innervating neighboring bristle populations showed postsynaptic connectivity similarity, while BMNs innervating distant bristles show little or none. For example, BM-Fr and -Ant neurons have connections to common postsynaptic partners, whereas BM-MaPa neurons show only weak connections with the main postsynaptic partners of BM-Fr or -Ant neurons (Figure 8F, connections under 5% of total BMN output omitted). These results suggest that BMN somatotopy could have different possible levels of head spatial resolution, from specific bristle populations (e.g. Ant bristles), to general head areas (e.g. dorsal head bristles).

      We also refer to Figure 8D-F to illustrate the different somatotopic features in the Discussion. These references can be found in the following Discussion sections titled “A synaptic resolution somatotopic map of the head BMNs (fourth paragraph)”, and “Parallel circuit architecture underlying the grooming sequence (second paragraph)”.

      (6) In addition to improving the Figures, we provide additional tools that enable readers to explore the BMN somatotopy in a more interactive way. That is, we provide 5 different FlyWire.ai links in the manuscript Results section that enable 3D visualization of the different reconstructed BMNs (e.g. FlyWire.ai link 1).

      Note: In working on old Figure 7 to address this Reviewer suggestion, we also reordered panels A-E. We believe that this was a more logical ordering than in the previous draft. These panels are now the only data shown in Figure 7, as the cosine similarity analysis is now in Figure 8. We hope that splitting these panels into two Figures will improve manuscript readability.

      Light EM Mapping: A better description of methods by which this mapping was done would be helpful. Perhaps the authors could provide a few example parallel representations of the EM and light images in the main figure would help the reader better appreciate the strength of their approach.

      We have done as the Reviewers suggested and added panels to Figure 6 that show examples of the LM and EM image matching (Figure 6A,B). We added two examples that used different methods for labeling the LM imaged BMNs, including MCFO labeling of an individual BM-InOc neuron and driver line labeling of a major portion of BM-InOm neurons using InOmBMN-LexA. These panels are referred to in the first paragraph of the Results section titled “Matching the reconstructed head BMNs with their bristles”. Note that examples for all LM/EM matched BMN types are shown in Figure 6 – figure supplement 2.

      We had provided Figure 6 – figure supplement 2 in the reviewed manuscript that shows all the above requested “parallel representations of the EM and light images”. However, the Reviewer critiques made us realize that the purpose of this figure supplement was not clearly indicated. Therefore, we have revised Figure 6 – figure supplement 2 and its legend to make its purpose clearer. First, we changed the legend title to better highlight its purpose. The legend is now titled: “Matching EM reconstructed BMN projections with light microscopy (LM) imaged BMNs that innervate specific bristles”. Second, we added label designations to the figure panel rows that highlight the LM and EM comparisons. That is, the rows for light microscopy images of BMNs are indicated with LM and the rows for EM reconstructed BMN images are labeled with EM. Reviewer #3 had indicated that it was not clear what labeling methods were used to visualize the LM imaged BM-InOm neurons in Figure 6 – figure supplement 2N. Therefore, we added text to the figure and the legend to better highlight the different methods used. Panels A and B were also cropped to accommodate the above mentioned revisions.

      The manuscript also provides an extensive Materials and methods section that describes the different lines of evidence that were used to assign the reconstructed BMNs as specific types. We changed the title to better highlight the purpose of this methods section to “Matching EM reconstructed BMN projections with light microscopy imaged BMNs that innervate specific bristles”. The evidence used to support the assignment of the different BMN types is also summarized in Figure 6 – figure supplement 3.

      Parallel circuit model: The authors motivate their study with this. We're recommending that they define expectations of such circuitry, its alternatives (including implications for downstream pathways), and behavior before they present their results. We're also recommending that they interpret their behavioural results in the context of these circuits.

      Our primary motivation for doing the experiments described in this manuscript was to help define the neural circuit architecture underlying the parallel model that drives the Drosophila grooming sequence. This manuscript provides a comprehensive assessment of the first layer of this circuit architecture. A byproduct of this work is a contribution that offers immediate utility and significance to the Drosophila connectomics community. Namely, the description of the majority of mechanosensory neurons on the head, with their annotation in the recently released whole brain connectome dataset (FlyWire.ai). In writing this manuscript, we tried to balance both of these things, which was difficult to write. We very much appreciate the Reviewers' comments that have highlighted points of confusion in our original draft. We hope that the revised draft is now clearer and more logically presented. We have made revisions to the text and provided a new figure supplement (Figure 1 - figure supplement 1) and new panels in Figure 8. Below we highlight the major revisions.

      (1) The Introduction was revised to more explicitly ground the study in the parallel model, while also removing details that were not pertinent to the experiments presented in the manuscript.

      The first paragraph introduces different features of the parallel model. To better focus the reader on the parts of the model that were being assessed in the manuscript, we removed the following sentences: “Performance order is established by an activity gradient among parallel circuits where earlier actions have the highest activity and later actions have the lowest. A winner-take-all network selects the action with the highest activity and suppresses the others. The selected action is performed and then terminated to allow a new round of competition and selection of the next action.” Note that these sentences are included in the third and fourth paragraphs of the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence”.

      The first paragraph of the Introduction now introduces a bigger picture view of the model that emphasizes the two main features: 1) a parallel circuit architecture that ensures all mutually exclusive actions to be performed in sequence are simultaneously readied and competing for output, and 2) hierarchical suppression among the parallel circuits, where earlier actions suppress later actions.

      (2) Newly added Figure 1 – figure supplement 1 provides a working model of grooming (Reviewer # 1 suggestion). We now more strongly emphasize that the study aimed to define the parallel neural circuit architecture underlying the grooming sequence, focusing on the mechanosensory layer of this architecture. In particular, we refer to the new Figure 1 – figure supplement 1 that has been added to better convey the hypothesized grooming neural circuit architecture. Figure 1 – figure supplement 1 is incorporated into the Introduction (paragraphs two, three, and four), Results section titled “Somatotopically-organized parallel BMN pathways (first paragraph)”, and last Discussion section titled “Parallel circuit architecture underlying the grooming sequence (second and third paragraphs)”.

      (3) New panels in Figure 8 update the model of parallel circuit organization as it relates to somatotopy (D-F). These panels show the parallel circuits hypothesized by the model, but also indicate convergence, with different possible levels of head resolution for these circuits. We describe above where these panels are referenced in the text.

      (4) We added a new paragraph in the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence” that better incorporates the results from this manuscript into the working model of grooming. This paragraph is shown below.

      Here we define the parallel architecture of BMN types that elicit the head grooming sequence that starts with the eyes and proceeds to other locations, such as the antennae and ventral head. The different BMN types are hypothesized to connect with parallel circuits that elicit grooming of specific locations (described above and shown in Figure 1 – figure supplement 1A,C). Indeed, we identify distinct projections and connectivity among BMNs innervating distant bristles on the head, providing evidence supporting this parallel architecture (Figure 8D-F). However, we also find partially overlapping projections and connectivity among BMNs innervating neighboring bristles. Further, optogenetic activation of BMNs at specific head locations elicits grooming of both those locations and neighboring locations (Figure 9). These findings raise questions about the resolution of the parallel architecture underlying grooming. Are BMN types connected with distinct postsynaptic circuits that elicit aimed grooming of their corresponding bristle populations (e.g. Ant bristles)? Or are neighboring BMN types that innervate bristles in particular head areas connected with circuits that elicit grooming of those areas (e.g. dorsal or ventral head)? Future studies of the BMN postsynaptic circuits will be required to define the resolution of the parallel pathways that elicit aimed grooming.

      Aside from this summary of major concerns, the detailed recommendations are attached below.

      Reviewer #1 (Recommendations For The Authors):

      I appreciate the quality and exhaustive body of work presented in this manuscript. I have a few comments that the authors may want to consider:

      (1) The authors motivate this study by posing that it would allow them to uncover whether the complex grooming behaviour of flies followed a parallel model of circuit function. It would have been nice to have been introduced to what the alternative model might be and what each would mean for organisation of the circuit architecture. Some guiding schematics would go a long way in illustrating this point. Modifying the discussion along these lines would also be helpful.

      We made several revisions to the manuscript that address this recommendation. Among these revisions, we added Figure 1 – figure supplement 1 that includes a working model for grooming. Please see above for a description of these revisions.

      (2) The authors mention the body of work that has mapped head bristles and described somatotopy. It would be useful to discuss in more detail what these studies have shown and highlight where the gaps are that their study fills.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      (3) The dye-fills and reconstructions that are single colour could use a boundary to demarcate the SEZ. This would help in orienting the reader.

      We agree with Reviewer #1 that Figure 4 and its supplements could use some indicator that would orient the reader with respect to the dye filled or stochastically labeled neurons. The images are of the entire SEZ in the ventral brain, and in the case of some panels, the background staining enables visualization of the brain (e.g. Figure 4H,M,N. To help orient the reader in this region, we added a dotted line to indicate the approximate SEZ midline. This also enables the reader to more clearly see which of the BMN types cross the midline.

      Midline visual guides were added for Figure 4, Figure 4 – figure supplement 2, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4, Figure 4 – figure supplement 5, Figure 4 – figure supplement 6, Figure 4 – figure supplement 7, Figure 4 – figure supplement 8, Figure 6 – figure supplement 2.

      (4) The comparison between the EM and the fills/clones are not obvious. And particularly because they are not directly determined, it would be nice to have the EM reconstruction alongside the dye-fills. This would work very nicely in the supplementary figure with the multiple fills of the same bristles. I think this would really drive home the point.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      (5) Are there unnoticed black error-bars floating around in many of the gray-scale images?

      The black bars were masking white scale bars in the images. We have removed the black bars and remade the images without scale bars. This was done for the following Figures: Figure 4, Figure 4 – figure supplement 2, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4, Figure 4 – figure supplement 5, Figure 4 – figure supplement 6, Figure 4 – figure supplement 7, Figure 4 – figure supplement 8, Figure 6 – figure supplement 2.

      Reviewer #2 (Recommendations For The Authors):

      (1) The only point in the paper I found myself going back and forth between methods/supp and text was when authors discuss about the clustering. I think it would help the reader if a few sentences about cosine clustering used for connectivity based clustering were included in the main text. Also, for NBLAST hierarchical clustering, it would help if some informed metrics could be used for defining cluster numbers (e.g. Braun et al, 2010 PLOS ONE shows how Ward linkage cost could be used for hierarchical clustering).

      Depending on where the cut height is placed on the dendrogram for cosine similarity of BMNs, different features of the BMN type postsynaptic connectivity are captured. As the number of clusters is increased (lower cut height), clustering is mainly among BMNs of the same type, showing that these BMNs have the highest connectivity similarity. As the number of clusters is reduced (higher cut height), BMNs innervating neighboring bristles on the head are clustered, revealing three general clusters corresponding to the dorsal, ventral, and posterior head. This reveals somatotopy based clustering among same and neighboring BMN types. The cut height shown in Figure 8 and Figure 8 – figure supplement 2 was chosen because it highlighted both of these features.

      The NBLAST clustering shows similar results to the connectivity based clustering with respect to neighboring and distant BMN types. As the number of clusters increases BMNs of the same type are clustered, and these types can be further subdivided into morphologically distinct subtypes. As the number of clusters is reduced, the clustering captures neighboring BMNs. Thus, neighboring BMN types showed high morphology similarity (and proximity) with each other, and low similarity with distant BMN types.

      Please see our responses to a Reviewer #3 critique below for further description of the clustering results.

      On the same lines it would help if the clustering dendrograms were included in the main figure.

      We thank Reviewer #2 for this comment. We have added the dendrogram to Figure 8A, a change that we feel makes this Figure much easier to understand.

      (2) It could help provide intuition if the authors revealed some of the downstream targets and their implication in explaining the behavioral phenotypes.

      While this will be the subject of at least two forthcoming manuscripts, we have added text to the present manuscript that provides insight into BMN postsynaptic targets. Our previous work (Hampel et al. 2015) described a mechanosensory connected neural circuit that elicits grooming of the antennae. While this previous study demonstrated that the Johnston’s organ mechanosensory neurons are synaptically and functionally connected with this circuit, our preliminary analysis indicates that it is also connected with BM-Ant neurons. We hypothesize that there are additional such circuits that are responsible for eliciting grooming of other head locations.

      To better highlight potential downstream targets in the manuscript, we now mention the antennal circuit in the Introduction. This text reads: In this model, parallel-projecting mechanosensory neurons that respond to stimuli at specific locations on the head or body could connect with somatotopically-organized parallel circuits that elicit grooming of those locations (Figure 1 – figure supplement 1A-C). The previous discovery of a mechanosensory-connected circuit that elicits aimed grooming of the antennae provides evidence of this organization (Hampel 2015). However, the extent to which distinct circuits elicit grooming of other locations is unknown, in part, because the somatotopic projections of the mechanosensory neurons have not been comprehensively defined for the head or body.

      There is also text in the Discussion that addresses this Reviewer comment. It describes the antennal circuit and mentions the possibility that other similar circuits may exist. This can be found in the third paragraph of the section titled “Circuits that elicit aimed grooming of specific head locations”.

      (3) Authors find that opto activation of BMNs leads to grooming of targeted as well as neighboring areas. Is there any sequence observed here? i.e. first clean targeted area and then clean neighboring area? I wonder if the answer to this is something as simple as common post-synaptic targets which is essentially reducing the resolution of the BMN sensory map. Some more speculation on this interesting result could be helpful.

      We appreciate and agree with this point from Reviewer #2, and have tried to better emphasize the possible implications for grooming that the overlapping projections and connectivity among BMNs innervating neighboring bristles may have. This is now better addressed in the Results and Discussion sections. Below we highlight where this is addressed:

      (1) In the second paragraph of the Results section titled “Activation of subsets of head BMNs elicits aimed grooming of specific locations” we added text that suggests the possibility that grooming of the stimulated and neighboring locations could be due to the overlapping projections and connectivity. This text reads: This suggested that head BMNs elicit aimed grooming of their corresponding bristle locations, but also neighboring locations. This result is consistent with our anatomical and connectomic data indicating that BMNs innervating neighboring bristles show overlapping projections and postsynaptic connectivity similarity (see Discussion).

      (2) In the fourth paragraph of the Discussion section titled “A synaptic resolution somatotopic map of the head BMNs”, we added a sentence to the end of the fourth paragraph that alludes to further discussion of this topic. This sentence reads: This overlap may have implications for aimed grooming behavior. For example, neighboring BMNs could connect with common neural circuits to elicit grooming of overlapping locations (discussed more below).

      (3) In the fourth paragraph of the Discussion section titled “Circuits that elicit aimed grooming of specific head locations” there is a paragraph that mentions the possibility of mechanosensory convergence onto common postsynaptic circuits to promote grooming of the stimulated area, along with neighboring areas. This paragraph is below.

      We find that activation of specific BMN types elicits both aimed grooming of their corresponding bristle locations and neighboring locations. This suggests overlap in the locations that are groomed with the activation of different BMN types. Such overlap provides a means of cleaning the area surrounding the stimulus location. Interestingly, our NBLAST and cosine similarity analysis indicates that neighboring BMNs project into overlapping zones in the SEZ and show common postsynaptic connectivity. Thus, we hypothesize that neighboring BMNs connect with common neural circuits (e.g. antennal grooming circuit) to elicit overlapping aimed grooming of common head locations.

      (4) In the new second paragraph of the Discussion section titled “Parallel circuit architecture underlying the grooming sequence” we further discuss the issue of the BMN “sensory map. This paragraph is below.

      Here we define the parallel architecture of BMN types that elicit the head grooming sequence that starts with the eyes and proceeds to other locations, such as the antennae and ventral head. The different BMN types are hypothesized to connect with parallel circuits that elicit grooming of specific locations (described above and shown in Figure 1 – figure supplement 1A,C). Indeed, we identify distinct projections and connectivity among BMNs innervating distant bristles on the head, providing evidence supporting this parallel architecture (Figure 8D-F). However, we also find partially overlapping projections and connectivity among BMNs innervating neighboring bristles. Further, optogenetic activation of BMNs at specific head locations elicits grooming of both those locations and neighboring locations (Figure 9). These findings raise questions about the resolution of the parallel architecture underlying grooming. Are BMN types connected with distinct postsynaptic circuits that elicit aimed grooming of their corresponding bristle populations (e.g. Ant bristles)? Or are neighboring BMN types that innervate bristles in particular head areas connected with circuits that elicit grooming of those areas (e.g. dorsal or ventral head)? Future studies of the BMN postsynaptic circuits will be required to define the resolution of the parallel pathways that elicit aimed grooming.

      (4) If authors were to include a summary table that shows all known attributes about BMN type as columns that could be very useful as a resource to the community. Table columns could include attributes like "bristle name", "nerve tract", "FlyWire IDs of all segments corresponding to the bristle class". "split-Gal4 line or known enhancer" , etc.

      We provided a table that includes much of this information after the manuscript had already gone out for review. We regret that this was not available. This is now provided as Supplementary file 3. This table provides the following information for each reconstructed BMN: BMN name, bristle type, nerve, flywire ID, flywire coordinates, NBLAST cluster (cut height 1), NBLAST cluster (cut height 5), and cosine cluster (cut height 4.5). Note that the driver line enhancers for targeting specific BMN types are shown in Figure 3I.

      Specific Points:

      Figure 4C-V:

      • I find it a bit difficult to distinguish ipsi- from contra-lateral projections. Maybe indicate the midline as a thin, stippled line?

      We thank the Reviewer #2 for this suggestion. We have now added lines in the panels in Figure 4C-V to indicate the approximate location of the midline. We also added lines to the Figure 4 – figure supplements as described above.

      I think this Fig reference is wrong "the red-light stimulus also elicited backward motions with control flies (Figure 6B,C, control, black trace, Video 5)." should be Fig 8B,C

      We have fixed this error.

      Reviewer #3 (Recommendations For The Authors):

      Introduction:

      Motivating this study in terms of understanding the neural mechanisms that execute the parallel model seems to overstate what you will achieve with the current study. If you want to motivate it this way, I suggest focusing on the grooming sequence of the head along (eyes, antennae, proboscis).

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions. Please note that many of the revisions focus on the head grooming sequence. We also made minor revisions to the Introduction that further emphasize the focus on head grooming.

      Results:

      Figure 1. Please indicate that this is a male fly in either the figure title or in the figure itself.

      We added a male symbol to Figure 1A.

      Figure 3. Panel J is referenced in the main body text and in the figure caption, but there is no Fig 3J.

      Panel J is shown in the upper right corner of Figure 3. We realize that the placement of this panel is not ideal, but this was the only place that we could fit it. Additionally, the panel works nicely at that location to better enable comparison with panel C. We have revised the text in the Figure 3 legend to better highlight the location of this Figure panel: “Shown in the upper right corner of the figure are the aligned expression patterns of InOmBMN-LexA (red), dBMN-spGAL4 (green), and TasteBMN-spGAL4 (brown).”

      We also added text to a sentence in the results section entitled “Head BMNs project into discrete zones in the ventral brain” that indicates the panel location. This text reads: To further visualize the spatial relationships between these projections, we computationally aligned the expression patterns of the different driver lines into the same brain space (Figure 3J, upper right corner).

      Matching the BMNs to EM reconstructions: why cut the dendrogram at H=5? Would be better to determine cluster number using an unbiased method.

      To match the morphologically distinct EM reconstructed BMNs to their specific bristles, we relied on different lines of evidence, including NBLAST results (discussed more below), dye fill/stochastic labeling/driver line labeling matches, published morphology, nerve projection, bristle number, proximity to other BMNs, and postsynaptic connectivity (summarized in Figure 6 – figure supplement 3). The following Materials and methods section provides a detailed description of the evidence used to assign each BMN type in “Matching EM reconstructed BMN projections with light microscopy imaged BMNs that innervate specific bristles”. In many cases, BMN type could be assigned with confidence solely based on morphological comparisons with our light level data (e.g. dye fills), in conjunction with bristle counts to indicate an expected number of BMNs showing similar morphology. Thus, the LM/EM matches and NBLAST clustering were largely complementary.

      The EM reconstructed BMNs were matched as particular BMN types, in part based on examination of the NBLAST data at different cut heights. NBLAST clustering of the BMNs revealed general trends at higher and lower cut heights (Figure 6 – figure supplement 1A, Supplementary file 3). The lowest cut heights included mostly BMNs of the same type innervating the same bristle populations, and smaller clusters that subdivided into morphologically distinct subtypes (see Supplementary file 3 for clusters produced at cut height 1). This revealed that BMNs of the same type tended to show the highest morphological similarity with each other, but they also showed intratype morphological diversity. Higher cut heights produced clusters of BMNs innervating neighboring bristles populations (e.g. ventral head BMNs), showing high morphological similarity among neighboring BMN types.

      We selected the cut height 5 shown in Figure 6 – figure supplement 1A,B because it captures examples of both same and neighboring type clustering. For example, it captures a cluster of mostly BM-Taste neurons (Cluster 16), and neighboring BMN types, including those from the dorsal head (Cluster 14) or ventral head (Cluster 15).

      Based on reviewer comments, we realized that the way we wrote the BMN matching section in the Results indicated more reliance on the NBLAST clustering than what was actually necessary, distorting the way we actually matched the BMNs. Therefore, we softend the first couple of sentences to place less emphasis on the importance of the NBLAST. We also indicated that the readers can find the resulting clusters at different cut heights, referring to Figure 6 – figure supplement 1A and Supplementary file 3. The first two sentences of the first paragraph in the Results section titled “Matching the reconstructed head BMNs with their bristles” now read:

      The reconstructed BMN projections were next matched with their specific bristle populations. The projections were clustered based on morphological similarity using the NBLAST algorithm (example clustering at cut height 5 shown in Figure 6 – figure supplement 1A,B, Supplementary file 3, FlyWire.ai link 2) (Costa et al., 2016). Clusters could be assigned as BMN types based on their similarity to light microscopy images of BMNs known to innervate specific bristles.

      The number of reconstructed BMNs is remarkably similar to what is expected based on bristle counts for each group except for lnOm. Why do you think there is such a large discrepancy there?

      We believe that there is a discrepancy between the number of reconstructed BM-InOm neurons and the number expected based on InOm bristle counts because these bristle counts were based on few flies and these numbers appear to be variable. We did not further investigate the numbers of InOm bristles in this manuscript because we only needed an estimate of their numbers, given that there is over an order of magnitude difference in the eye bristles versus any other head bristle population. Therefore, we could relatively easily conclude that the head BMNs were related to the InOm bristles, based on their sheer numbers and their morphology.

      Figure 6 - figure supplement 2N, please describe these panels better. Main text says the upper image is from lnOmBMN-LexA, but the figure legend doesn't agree.

      We have added text to the figure legend that now makes the contents of panel 2N clear to the reader. Further, we now indicate in the figure legend for each panel, the method used to obtain the labeled neurons (i.e. fill, MCFO, driver), to avoid similar confusion for the other panels.

      Figure 6 - figure supplement 4D. How frequently is there a mismatch between the number of BMNs for a given type across hemispheres?

      Although the full reconstruction of the BMNs on both sides of the brain was beyond the scope of this work, the BMNs on both sides have since been reconstructed and annotated (Schlegal et al. 2023). We plan to provide more analysis of BMNs on both sides of the brain in a forthcoming manuscript. However, the BMN numbers tend to show agreement on both sides of the brain. The table below shows a comparison between the two sides:

      Author response table 1.

      Figures 6 and 7. It would be helpful to include a reference brain in all panels that show cluster morphology. Without landmarks there is nothing to anchor the eye to allow the reader to see the described differences in BMN projection zones and patterns.

      While we apologize for not making this specific change, we have made revisions to other parts of the manuscript to better highlight the somatotopic organization among the BMNs (revisions described above). Please note that we now provide FlyWire.ai publicly available links that enable readers to view the BMN projections in 3D. They can also toggle a brain mesh on and off to provide spatial reference.

      "BMN somatotopic map": It would be helpful to show or describe in more detail what the unique branch morphology for each zone is. It is quite difficult to appreciate, as the groups also have a lot of overlap. Would the unique regions that the BMN groups innervate be easier to see if you plotted presynaptic sites by group? I am left unsure about whether there is a somatotopic map here.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions. Please note that we did not examine the fine branch morphological differences between BMN types having overlapping projections. Showing these differences would require more extensive anatomical analysis that is beyond the scope of this work. For showing definitive somatotopy, we focused on the overt differences between BMNs innervating bristles at distant locations on the head.

      Overall the strict adherence to the parallel model impacts the interpretation of the data. It would be helpful for the authors to discuss which aspects of the current study are consistent with the parallel model and which results are not consistent.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      Discussion:

      "Circuits that elicit aimed grooming of specific head locations": In the previous paragraph you mention "BMN types innervating neighboring bristle populations have overlapping projections into zones that correspond roughly to the dorsal, ventral, and posterior head. The overlap is likely functionally significant, as cosine similarity analysis revealed that neighboring head BMN types have common postsynaptic partners. However, overlap between neighboring BMN types is only partial, as they show differing projections and postsynaptic connectivity." Then in this paragraph, you say, "How do the parallel-projecting head BMNs interface with postsynaptic neural circuits to elicit aimed grooming of specific head locations? Different evidence supports the hypothesis that the BMNs connect with parallel circuits that each elicit a different aimed grooming movement (Seeds et al., 2014)." The overlapping postsynaptic BMN connectivity seems in conflict with the claim that the circuits are parallel.

      We apologize for this confusion. We now better describe this apparent discrepancy between our results and the parallel model of grooming behavior. We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      We have made additional changes to the manuscript:

      (1) We added Supplementary file 2 that includes links for downloading the image stacks used to generate panels in Figure 1, Figure 2, Figure 3, Figure 4, and figure supplements for these figures. These image stacks are stored in the Brain Image Library (BIL). Rows in the spreadsheet correspond to each image stack. Columns provide information about each stack including: figure panels that each image stack contributed to, image stack title, DOI for each stack (link provides metadata for each stack and file download link), image stack file name, genotype of imaged fly, and information about image stack. References to this file have been made at different locations throughout the text and Figure legends. We also added a section on the BIL data in the Materials and methods entitled “Light microscopy image stack storage and availability”. Old Supplementary file 2 has been renamed Supplementary file 3.

      (2) We added a new reference for FlyWire.ai (Dorkenwald et al. 2023) that was posted as a preprint during the revision of this manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the manuscript titled "Vangl2 suppresses NF-κB signaling and ameliorates sepsis by targeting p65 for NDP52-mediated autophagic degradation" by Lu et al, the authors show that Vangl2, a planner cell polarity component, plays a direct role in autophagic degradation of NFkB-p65 by facilitating its ubiquitination via PDLIM2 and subsequent recognition and autophagic targeting via the autophagy adaptor protein NDP52. Conceptually it is a wonderful study with excellent execution of experiments and controls. The concerns with the manuscript are mainly on two counts - First issue is the kinetics of p65 regulation reported here, which does not fit into the kinetics of the mechanism proposed here, i.e., Vangl2-mediated ubiquitination followed by autophagic degradation of p65. The second issue is more technical- an absolute lack of quantitative analyses. The authors rely mostly on visual qualitative interpretation to assess an increase or decrease in associations between partner molecules throughout the study. While the overall mechanism is interesting, the authors should address these concerns as highlighted below:

      Major points:

      (1) Kinetics of p65 regulation by Vangl2: As mentioned above, authors report that LPS stimulation leads to higher IKK and p65 activation in the absence of Vangl2. The mechanism of action authors subsequently work out is that- Vangl2 helps recruit E3 ligase PDLIM to p65, which causes K63 ubiquitination, which is recognised by NDP52 for autophagic targeting. Curiously, peak p65 activation is achieved within 30 minutes of LPS stimulation. The time scale of all other assays is way longer. It is not clear that in WT cells, p65 could be targeted to autophagic degradation in Vangl2 dependent manner within 30 minutes. The HA-Myc-Flag-based overexpression and Co-IP studies do confirm the interactions as proposed. However, they do not prove that this mechanism was responsible for the Vangl2-mediated modulation of p65 activation upon LPS stimulation. Moreover, the Vangl2 KO line also shows increased IKK activation. The authors do not show the cause behind increased IKK activation, which in itself can trigger increased p65 phosphorylation.

      We thank the reviewer for this valuable suggestion.

      Indeed, we agreed with the reviewer that peak p65 activation is achieved within 30 minutes of LPS stimulation in vitro, and p65 could not be targeted to autophagic degradation in a Vangl2 dependent manner within 30 minutes. Given that the protein and mRNA levels of Vangl2 were elevated at 3-6 h of LPS stimulation (Fig. S1 C-E), we extended the stimulation time scale in the revised manuscript. The data (Fig. 2A-D in the revised manuscript) demonstrated that IKK phosphorylation was enhanced in Vangl2 KO myeloid cells during the early phase (within 3 h) of LPS stimulation, but not for the prolonged period of LPS stimulation. The underlying mechanism may be complex. Only p65 phosphorylation was continuously enhanced after long-term LPS stimulation in Vangl2 KO cells, compared to WT cells. Furthermore, the overexpression of Vangl2 in A549 cells also demonstrated a reduction of phosphorylation and total endogenous p65 (Fig. 2 I, J in the revised manuscript). These findings were corroborated by overexpression and Co-IP experiments, which collectively indicated that Vangl2 regulates the stability of p65 by promoting its interaction with NDP52 and autophagic degradation. (Page 7; Line 183-185).  

      (2) The other major concern is regarding the lack of quantitative assessments. For Co-IP experiments, I can understand it is qualitative observation. However, when the authors infer that there is an increase or decrease in the association through co-IP immunoblots, it should also be quantified, especially since the differences are quite marginal and could be easily misinterpreted.

      We are grateful to the reviewer for this suggestion. The quantitative analysis has been updated in the revised version.

      (3) Figure 4E and F: It is evident that inhibiting Autolysosome (CQ or BafA1) or autophagy (3MA) led to the recovery of p65 levels and inducing autophagy by Rapamycin led to faster decay in p65 levels. Did the authors also note/explore the possibility that Vangl2 itself may be degraded via the autophagy pathway? IB of WCL upon CQ/BAF/3MA or upon Rapa treatment does indicate the same. If true, how would that impact the dynamics of p65 activation?

      We thank the reviewer for this question. Previous studies have shown that Vangl2 is primarily degraded by the proteasome pathway, rather than by the autolysosomal pathway (doi: 10.1126/sciadv.abg2099; doi: 10.1038/s41598-019-39642-z). In our experiments, Vangl2 recruits E3 ligase PDLIM2 to enhance K63-linked ubiquitination on p65, which serves as a recognition signal for cargo receptor NDP52-mediated selective autophagic degradation. Vangl2 facilitated the interaction between p65 and NDP52, yet itself did not undergo significant autophagic degradation.

      (4) Autophagic targeting of p65 should also be shown through alternate evidence, like microscopy etc., in the LPS-stimulated WT cells.

      We thank the reviewer for this suggestion. We have added the data (co-localization of p65 and LC3 was detected by immunofluorescence) in the revised version (Fig. S4 H in the revised manuscript). (Page 9, lines 267-268)

      Reviewer #2 (Public Review):

      Vangl2, a core planar cell polarity protein involved in Wnt/PCP signaling, mediates cell proliferation, differentiation, homeostasis, and cell migration. Vangl2 malfunctioning has been linked to various human ailments, including autoimmune and neoplastic disorders. Interestingly, Vangl2 was shown to interact with the autophagy regulator p62, and indeed, autophagic degradation limits the activity of inflammatory mediators such as p65/NF-κB. However, if Vangl2, per se, contributes to restraining aberrant p65/NF-kB activity remains unclear.

      In this manuscript, Lu et al. describe that Vangl2 expression is upregulated in human sepsis-associated PBMCs and that Vangl2 mitigates experimental sepsis in mice by negatively regulating p65/NF-κB signaling in myeloid cells. Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to promote K63-linked poly-ubiquitination of p65. Vangl2 also facilitates the recognition of ubiquitinated p65 by the cargo receptor NDP52. These molecular processes cause selective autophagic degradation of p65. Indeed, abrogation of PDLIM2 or NDP52 functions rescued p65 from autophagic degradation, leading to extended p65/NF-κB activity.

      As such, the manuscript presents a substantial body of interesting work and a novel mechanism of NF-κB control. If found true, the proposed mechanism may expand therapeutic opportunities for inflammatory diseases. However, the current draft has significant weaknesses that need to be addressed.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested.

      Specific comments

      (1) Vangl2 deficiency did not cause a discernible increase in the cellular level of total endogenous p65 (Fig 2A and Fig 2B) but accumulated also phosphorylated IKK.

      Even Fig 4D reveals that Vangl2 exerts a rather modest effect on the total p65 level and the figure does not provide any standard error for the quantified data. Therefore, these results do not fully support the proposed model (Figure 7) - this is a significant draw back. Instead, these data provoke an alternate hypothesis that Vangl2 could be specifically mediating autophagic removal of phosphorylated IKK and phosphorylated IKK, leading to exacerbated inflammatory NF-κB response in Vangl2-deficient cells. One may need to use phosphorylation-defective mutants of p65, at least in the over-expression experiments, to dissect between these possibilities.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested.

      (1) Indeed, we agreed with the reviewer that Vangl2 deficiency did not cause a discernible increase in the cellular level of total p65 after a short time of LPS stimulation in vitro, and p65 could not be targeted to autophagic degradation in a Vangl2 dependent manner within 30 minutes. Given that the protein and mRNA levels of Vangl2 were elevated at 3-6 h of LPS stimulation (Fig. S1 C-E), we extended the stimulation time scale in the revised manuscript. The data (Fig. 2A-D in the revised manuscript) demonstrated that IKK phosphorylation was enhanced in Vangl2 KO myeloid cells during the early phase (within 3 h) of LPS stimulation, but not for the prolonged period of LPS stimulation. The underlying mechanism may be complex. Only phosphorylation of p65 and total endogenous p65 was continuously enhanced after long-term LPS stimulation in Vangl2 KO cells, compared to WT cells. Furthermore, the overexpression of Vangl2 in A549 cells also demonstrated a reduction of phosphorylation and total endogenous p65 (Fig. 2 I, J in the revised manuscript). These findings were corroborated by overexpression and Co-IP experiments, which collectively indicated that Vangl2 regulates the stability of p65 by promoting its interaction with NDP52 and autophagic degradation. (Page 7; Line 183-185).  

      (2) Similarly, the stimulation time scale in Fig 4D was extended, and it was demonstrated that p65 was more stable in Vangl2-deficient cells.

      3) Moreover, we constructed phosphorylation-defective mutants of p65 (S536A), and found that Vangl2 could also promote the degradation of the p65 phosphorylation mutants (Fig. S4 A, B in the revised manuscript). Thus, Vangl2 promote the degradation of the basal/unphosphorylated p65. (Page 8, lines 237-240)

      (2) Fig 1A: The data indicates the presence of two subgroups within the sepsis cohort - one with high Vangl2 expressions and the other with relatively normal Vangl2 expression. Was there any difference with respect to NF-κB target inflammatory gene expressions between these subgroups?

      As suggested, we conducted an analysis of NF-kB target inflammatory gene expressions between the high and relatively low Vangl2 expression groups in sepsis patients. The results showed that the serum of the high Vangl2 expression group exhibited lower levels of IL-6, WBC, and CRP than the low Vangl2 expression group, which suggested an inverse correlation between Vangl2 and the inflammatory response (Fig. S1 A in the revised manuscript) (Page 5, lines 126-128).

      (3) The effect of Vangl2 deficiency was rather modest in the neutrophil. Could it be that Vangl2 mediates its effect mostly in macrophages?

      As showed in Fig. S1C-E, the induction of Vangl2 by LPS stimulation is more rapid in macrophages than in neutrophils. This may contribute to its dominant effect in macrophages. Consequently, we primarily focused our investigation on the role of Vangl2 in macrophages.

      (4) Fig 1D and Figure 1E: Data for unstimulated Vangl2 cells should be provided. Also, the source of the IL-1β primary antibody has not been mentioned.

      Thank you for the suggestion. We have updated the data for unstimulated cells in the revised manuscript (Fig. 1 D, E in the revised manuscript). Also, IL-1β primary antibody was purchased from Cell Signaling Technology and the information has been included in the Materials and Methods section (Table S1).

      (5) The relevance and the requirement of RNA-seq analysis are not clear in the present draft. Figure 1E already reveals upregulation of the signature NF-κB target inflammatory genes upon Vangl2 deficiency.

      We agreed with the reviewer that the data presented in Figure 1E demonstrated the upregulation of the signature NF-kB target inflammatory genes upon Vangl2 deficiency in a murine model of LPS induced sepsis. Subsequently, we proceeded to investigate the mechanism by which Vangl2 regulates NF-kB target inflammatory genes at the cellular level in Figure 2. To this end, we performed RNA-seq analysis to screen signal pathways involved in LPS-induced septic shock by comparing LPS-stimulated BMDMs from Vangl2ΔM and WT mice, and identified that TNF signaling pathway and cytokine-cytokine receptor interaction were found to be significantly enriched in Vangl2ΔM BMDMs upon LPS stimulation. This analysis provides further evidence that Vangl2 plays a role in regulating NF-kB signaling pathways and the release of related inflammatory cytokines.

      (6) Fig 2A reveals an increased accumulation of phosphorylated p65 and IKK in Vangl2-deficient macrophages upon LPS stimulation within 30 minutes. However, Vangl2 accumulates at around 60 minutes post-stimulation in WT cells. Similar results were obtained for neutrophils (Fig 2B). There appears to be a temporal disconnect between Vangl2 and phosphorylated p65 accumulation - this must be clarified.

      This concern has been addressed above (see response to questions 1 from reviewer #2). 

      (7) Figure 2E and 2F do not have untreated controls. Presentations in Fig 2E may be improved to more clearly depict IL6 and TNF data, preferably with separate Y-axes.

      Thank you for the suggestion. We have added untreated controls and separated Y-axes for IL-6 and TNF data in the revised manuscript (Fig. 2 E, F in the revised manuscript).

      (8) Line 219: "strongly with IKKα, p65 and MyD88, and weak" - should be revised.

      We have improved the manuscript as suggested in the revised manuscript (Page 7; Line 203).

      (9) It is not clear why IKKβ was excluded from interaction studies in Fig S3G.

      We added the Co-IP experiment and showed that HA-tagged Vangl2 only interacted with Flag-tagged p65, but not with Flag-tagged IKKb in 293T cells (Fig S3H). Furthermore, endogenous co-IP immunoblot analyses showed that Vangl2 did not associate with IKKb (Fig. S3I)

      (10) Fig 3F- In the text, authors mentioned that Vangl2 strongly associates with p65 upon LPS stimulation in BMDM. However, no controls, including input or another p65-interacting protein, were used.

      As reviewer suggested, we have added input and positive control (IkBa) in this experiment (Fig. 3F in the revised manuscript). The results demonstrated that the interaction between p65 and IkBa was attenuated, although the total IkBa did not undergo significant degradation over long-term course of LPS stimulation.

      (11) Figure 4D - Authors claim that Vangl2-deficient BMDMs stabilized the expression of endogenous p65 after LPS treatment. However, p65 levels were particularly constitutively elevated in knockout cells, and LPS signaling did not cause any further upregulation. This again indicates the role of Vangl2 in the basal state. The authors need to explain this and revise the test accordingly.

      Thank you for the reviewer's comments. We repeated the experiment to ascertain whether Vangl2 could stabilize the expression of endogenous p65 before and after LPS treatment. It was found that, due to the extremely low expression of Vangl2 in WT cells in the absence of stimulation, there was no observable difference on the basal level of p65 between WT and Vangl2DM cells. However, upon prolonged LPS stimulation, Vangl2 expression was induced, resulting in p65 degradation in WT cells. In contrast, p65 protein was more stable in Vangl2 deficient cells after LPS stimulation (Fig. 4D in the revised manuscript).

      Reviewer #3 (Public Review):

      Lu et al. describe Vangl2 as a negative regulator of inflammation in myeloid cells. The primary mechanism appears to be through binding p65 and promoting its degradation, albeit in an unusual autolysosome/autophagy dependent manner. Overall, the findings are novel and the crosstalk of PCP pathway protein Vangl2 with NF-kappaB is of interest. …….Regardless, Vangl2 as a negative regulator of NF-kappaB is an important finding. There are, however, some concerns about methodology and statistics that need to be addressed.

      Thank you for your comments on our manuscript, and we have further improved the manuscript as suggested.

      (1) Whether PCP is anyway relevant or if this is a PCP-independent function of Vangl2 is not directly explored (the later appears more likely from the manuscript/discussion). PCP pathways intersect often with developmentally important pathways such as WNT, HH/GLI, Fat-Dachsous and even mechanical tension. It might be of importance to investigate whether Vangl2-dependent NF-kappaB is influenced by developmental pathways.

      Thank you for the reviewer's insightful comments. Our study revealed that Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to facilitate K63-linked ubiquitination of p65, which is subsequently recognized by autophagy receptor NDP52 and then promotes the autophagic degradation of p65. Our findings by using autophagy inhibitors and autophagic-deficient cells indicate that Vangl2 regulates NF-kB signaling through a selective autophagic pathway, rather than affecting the PCP pathway, WNT, HH/GLI, Fat-Dachsous or even mechanical tension. Moreover, a discussion section has been added to the revised version. (Page 12, lines 377-393)

      (2) Are Vangl2 phosphorylations (S5, S82 and S84) in anyway necessary for the observed effects on NF-kappaB or would a phospho-mutant (alanine substitution mutant) Vangl2 phenocopy WT Vangl2 for regulation of NF-kappaB?

      As suggested, we generated phospho-mutants of Vangl2 (S82/84A) and observed that Vangl2 (S82/84A) could still facilitate the degradation of p65 (Fig. S4 B in the revised manuscript), suggesting that Vangl2 regulates the NF-kB pathway independently of its phosphorylation.

      (3) Another area to strengthen might be with regards to specificity of cell types where this phenomenon may be observed. LPS treatment in mice resulted in Vangl2 upregulation in spleen and lymph nodes, but not in lung and liver. What explains the specificity of organ/cell-type Vangl2 upregulation and its consequences observed here? Why is NF-kappaB signaling not more broadly or even ubiquitously affected in all cell types in a Vangl2-dependent manner, rather than being restricted to macrophages, neutrophils and peritoneal macrophages, or, for that matter, in spleen and LN and not liver and lung? After all, one may think that the PCP proteins, as well as NF-kappaB, are ubiquitous.

      Thank you for the reviewer's comments.

      (1) LPS is an important mediator to trigger sepsis with excessive immune activation. As is well known, the spleen and lymph nodes are important peripheral immune organs, where immune cells (e.g., macrophages) are abundant and respond sensitively to LPS stimulation. Nevertheless, immune cells represent a minor fraction of the lungs and liver. Consequently, Vangl2 represents a pivotal regulator of immune function, exhibiting a more pronounced increase in the immune organs and cells.

      2) Induction of Vangl2 expression by LPS stimulation is cell specific. Given that different cells exhibit varying protein abundances, the molecular events involved may also differ. Moreover, we observed high Vangl2 expression in the liver at the basal state (Author response image 1), whereas it was not induced after 12 h of LPS stimulation. Therefore, the functional role of Vangl2 exhibits significant phenotype in macrophages and neutrophils/spleen and LN, rather than in liver or lung cells.

      Author response image 1.

      Vangl2 showed no significant changes in the liver after LPS treatment. Mice (n≥3) were treated with LPS (30 mg/kg, i.p.). Livers were collected at 12 h after LPS treatment. Immunoblot analysis of Vangl2.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      General points:

      Figure 4G- panels appear mislabeled. Pl correct.

      We have corrected this mislabeling as you suggested.

      The dynamics of Vangl2 interaction with p65 and autophagy adaptors is not clear/apparent. For example, Vangl2 expression destabilises p65 levels (as in Fig. 4), but in Fig. 5, it seems there is no decline in the p65 protein level, and a large fraction of it coprecipitates with NDP52.

      We appreciate the reviewer’s comments. In the co-IP assay, we used the lysosomal inhibitor CQ to inhibit p65 degradation to observe the interaction between p65 and NDP52 or Vangl2.

      Fig 5E- I would expect p65 levels to be lower in WT cells than Vangl2 KO cells. But as such, there is no difference between the two.

      We appreciate the reviewer’s comments. We repeated the experiments and updated the data. Firstly, Vangl2 was not induced in WT cells in the absence of LPS stimulation, thus there was no difference in p65 expression between the two groups at the basal level. Secondly, we used CQ/Baf-A1 to inhibit the degradation of Vangl2 in the co-IP assay to observe the interaction between p65 and other molecule.

      Reviewer #2 (Recommendations For The Authors):

      A few points that can be looked at and revised.

      (1) Quantification of the presented data is needed for Fig 4D and Fig 4E.

      We added the quantification analysis as suggested.  

      (2) The labeling of Fig 4G should be scrutinized.

      We have corrected this mislabeling as you suggested.

      (3) Fig 6B and Fig 6C should be explained in the result section more elaborately.

      We thank the reviewer for the suggestion, and we have rephrased this sentence to better describe the results. (Page 10, lines 306-313)

      (4) Line 85: "Vangl2 mediated downstream of Toll-like or interleukin (IL)-1" - unclear.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested in the revised manuscript. (Page 3, lines 68)

      (5) Line 181: "mice. Differentially expression analysis" - this should be revised.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested in the revised manuscript. (Page 11, lines 323)

      (6) Line 261-264- CHX-chase assay showed the degradation rate of p65 in Vangl2-deficient BMDM was slower compared with WT cells. However, Vangl2 is not induced in WT BMDMs upon CHX treatment (Fig. S4B).

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested in the revised manuscript (Fig. S4D).

      (7) Finally, some editing to provide data only critical for the conclusions could improve the ease of reading.

      We have further improved the manuscript as suggested in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Comments (general, please address at least in Discussion. Some experimental data, for example the role, if any, of Vangl2 phosphorylations will be very useful):

      (1) It might be interesting to explore whether there are any potential effects of developmental pathways on the observed effect mediated by Vangl2 or if the effects are entirely a PCP-independent function of Vangl2. Please see above public review.

      Thank you for the reviewer's insightful comments. Our study revealed that Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to facilitate K63-linked ubiquitination of p65, which is subsequently recognized by autophagy receptor NDP52 and then promotes the autophagic degradation of p65. Our findings by using autophagy inhibitors and autophagic-deficient cells indicate that Vangl2 regulates NF-kB signaling through a selective autophagic pathway, rather than affecting the PCP pathway, WNT, HH/GLI, Fat-Dachsous or even mechanical tension. Furthermore, we generated phospho-mutants of Vangl2 (S82/84A) and observed that Vangl2 (S82/84A) could still facilitate the degradation of p65 (Fig. S4 B), suggesting that Vangl2 regulates the NF-kB pathway independently of its phosphorylation. In addition, a discussion section has been added to the revised version. (Page 12, lines 377-393)

      (2) What explains the specificity of organ/cell-type Vangl2 upregulation and its consequences observed here? Why is NF-kappaB signaling not more broadly or even ubiquitously affected in all cell types in a Vangl2-dependent manner, rather than being restricted to macrophages, neutrophils and peritoneal macrophages, or, for that matter, in spleen and LN and not liver and lung? Afterall, one may think that the PCP proteins, as well as NF-kappaB, are ubiquitous.

      Thank you for the reviewer's comments. A similar question has been addressed above (refer to the response to question 3 of reviewer 3).

      (3) Another specificity-related question that comes to mind is whether the Vangl2 function in autolysomal/autophagic degradation is restricted to p65 as the exclusive substrate? The cytosolic targeting of p65 as opposed to the more well-known nuclear-targeting is interesting.

      Our previous finding demonstrated that Vangl2 inhibits antiviral IFN-I signaling by targeting TBK1 for autophagic degradation (doi: 10.1126/sciadv.adg2339), thereby indicating that p65 is not the sole substrate for Vangl2. However, in the NF-kB pathway, p65 is a specific substrate for Vangl2. Moreover, our findings indicate that the interaction between Vangl2 and p65 occurs predominantly in the cytoplasm, rather than in the nucleus (Fig. S4 C).

      (4) Pharmacological approach is used to tease apart autolysosome versus proteasome pathway. What is the physiological importance of autophagic degradation? It is interesting to note that Vangl2 was already previously implicated in degrading LAMP-2A and increasing chaperon-mediated autophagy (CMA)-lysosome numbers (PMID: 34214490).

      Previous literature has domonstrated that Vangl2 can inhibit CMA degradation (PMID: 34214490). However, in our study, we found that Vangl2 can promote the selective autophagic degradation of p65. It is important to note that CMA degradation and selective autophagic degradation are two distinct degradation modes, which is not contradictory.

      (5) Are these phenotypes discernable in heterozygotes or only when ablated in homozygosity? Any phenotypes recapitulated in the looptail heterozygote mice?

      We found that these phenotypes discernable only in homozygosity.

      (6) What is the conservation of the Vangl2 p65-interaction site between Vangl2 and Vangl1? PDLIM2 recruitment between Vangl2 and Vangl1?

      We appreciate the reviewer’s comments on our manuscript. Previous studies have shown that human Vangl1 and Vangl2 exhibit only 72% identity and exhibit distinct functional properties (doi: 10.1530/ERC-14-0141).Thus, the interaction of Vangl2 with p65 and PDLIM2 recruitment may not necessarily occur in Vangl1.

      Comments (specific to experiments and data analyses. Please address the following):

      (7) The patient population used in Fig 1 is not described in the Methods. This is a critical omission. Were age, sex etc. controlled for between healthy and disease? How was the diagnosis made? What times during sepsis were the samples collected? As presented, this data is impossible to evaluate and interpret.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested in the revised supplement materials. (Supplementary information, Page 12, lines 146-147)

      (8) In general, the statistical method should be described for each experiment presented in the figures. Comparisons should not be made only at the time point with maximal difference (such as in Fig 1F or Fig 2C, but at all time points using appropriate statistical methods). The sample size should also be included to allow determination appropriateness of parametric or non-parametric tests.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested in the revised manuscript (Figures 1F and 2C).

      (9) PCP pathways can activate p62/SQSTM1 or JNK via RhoA. JNK activation should be tested experimentally.

      According to the reviewer's comments, we further examined the effect of Vangl2 on the JNK pathway. The results showed that Vangl2 did not affect the JNK pathway (Author response image 2). This suggests that Vangl2 functions independently of the PCP pathway.

      Author response image 2.

      Vangl2 did not affect the JNK pathway. WT and Vangl2-deficient (n≥3) BMDMs were stimulated with LPS (100 ng/ml) for the indicated times. Immunoblot analysis of total and phosphorylated JNK.

      (10) Why are different cells such as A549, HEK293, CHO, 293T, THP-1 used during the studies for different experiments? Consistency would improve rigor. At least, logical explanation driving the cell type of choice for each experiment should be included in the manuscript. Nonetheless, one aspect of using a panel of cell lines indicate that the effect of Vangl2 on NF-kappa B is pleiotropic.

      We are grateful to the reviewer for their comments on our manuscript. A549, HEK293, CHO, and 293T cells are commonly utilized in protein-protein interaction studies. The selection of cell lines for overexpression (exogenous) experiment is dependent on their transfection efficiency and the ability to express TLR4 (the receptor for LPS). Additionally, we conducted endogenous experiments by using THP-1 and BMDMs, which are human macrophage cell lines and murine primary macrophages, respectively. Moreover, we generated Vangl2f/f lyz-cre mice by specifically knocking out Vangl2 in myeloid cells, and investigated the effect of Vangl2 on NF-kB signaling in vivo.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript describes the crystal structures of Streptococcus pneumoniae NOXs. Crystals were obtained for the wild-type and mutant dehydrogenase domain, as well as for the full-length protein comprising the membrane domain. The manuscript further carefully studies the enzyme's kinetics and substrate-specificity properties. Streptococcus pneumoniae NOX is a non-regulated enzyme, and therefore, its structure should provide a view of the NOX active conformation. The structural and biochemical data are discussed on this ground.

      Strengths:

      This is very solid work. The protein chemistry and biochemical analysis are well executed and carefully described. Similarly, the crystallography must be appreciated given the difficulty of obtaining good enzyme preparations and the flexibility of the protein. Even if solved at medium resolution, the crystal structure of the full-length protein conveys relevant information. The manuscript nicely shows that the domain rotations are unlikely to be the main mechanistic element of NOX regulation. It rather appears that the NADPH-binding conformation is pivotal to enzyme activation. The paper extensively refers to the previous literature and analyses the structures comprehensively with a comparison to previously reported structures of eukaryotic and prokaryotic NOXs.

      We thank the referee for these very nice comments about our work.

      Weaknesses:

      The manuscript is not always very clear with regard to the analysis of NADPH binding. The last section describes a "crevice" featured by the NADPH-binding sites in NOXs. It remains unclear whether this element corresponds to the different conformations of the protein C-terminal residues or more extensive structural differences. This point must be clarified.

      We agree with the referee that our terminology was not very clear. Responding to your comment helped us to improve our explanation: we have changed the text to emphasize the differences we observe in the distances between the FAD binding groove and the entire NADPH binding groove, which includes conserved NADPH-contacting motifs as well as the critical aromatic.

      A second less convincing point concerns the nature of the electron acceptor. The manuscript states that this NOX might not physiologically act as a ROS producer. A question then immediately arises: Is this protein an iron reductase?

      Can the authors better discuss or provide more data about this point?

      The referee has a legitimate point, which was also our first idea. In the initial work on SpNOX, where we discovered bacterial NOX enzymes (see Hajjar et al 2017 in mBio), we evaluated its possible role as an iron reductase. There we showed that SpNOX can reduce CytC directly; however, while some reduction of Fe3+-NTA complex (used classically in ferric reductase activity assay) occurred, this reduction was inhibitable by SOD and occurred indirectly by the superoxide produced, so therefore not a true iron reductase activity. This represents a mixed situation of direct and indirect reduction of an iron-containing acceptor that appears to preclude physiological iron reductase activity since it appears that the protein component of CytC allows it to interact with SpNOX. As these questions had been already addressed in a previous paper, we did not add anything here and we prefer to underline this possibility of another acceptor and to leave this question open for future works.

      Reviewer #2 (Public Review):

      The authors describe the structure of the S. pneumoniae Nox protein (SpNOX). This is a first. The relevance of it to the structure and function of eukaryotic Noxes is discussed in depth.

      Strengths and Weaknesses

      One of the strengths of this work is the effort put into preparing a pure and functionally active SpNOX preparation. The protein was expressed in E. coli and the purification and optimization of its thermostability and activity are described in detail, involving salt concentration, glycerol concentration, and pH.

      This reviewer was surprised by the fact that the purification protocol in the eLife paper differs from those in the mBio and Biophys. J. papers by the absence of the detergent lauryl maltose neopentyl glycol (LMNG). LMNG is only present in the activity assay at a low concentration (0.003%; molar data should be given; by my calculation, this corresponds to 30 μM).

      We regret this misunderstanding: our description was not clear enough. As the referee points out, in previous papers we purified the full length SpNOX with the detergent LMNG. In the current paper, we described only the protocol for SpNOX DH domain variant, a soluble cytoplasmic domain. We have now modified the text to clarify the difference between the purification of fulllength SpNOX variants, which were performed with detergent as cited in Vermot et al 2020, and the purification of DH domains, which are soluble and thus did not require detergent in the purification.

      In light of the presence of lipids in cryo-EM-solved structures of DUOX and NOX2, it is surprising that the authors did not use reconstitution of the purified SpNOX in phospholipid (nanodisk?). The issue is made more complicated by the statement on p. 18 of "structures solved in detergent like ours" when no use of detergent in the solubilization and purification of SpNOX is mentioned in the Methods section (p. 21-22).

      As stated above, detergent used to purify the full-length version of SpNOX. We did in fact perform some preliminary tests of reconstitution in nanodiscs. Different trials of negative staining studies showed heterogeneous size of SpNOX in nanodiscs and the initial images were not promising. Furthermore, in parallel, we had positive results in crystallography relatively quickly with protein in detergent. We thus focused on refining the crystals, which was a fairly long and mobilizing task; we decided to allocate time and resources to the promising avenue and did not further pursue nanodiscs.

      We did not go in theCryo-EM direction because the small size of the protein was initially believed to be a significant barrier to successful Cryo-EM. Perhaps we could have pursued this avenue: while our manuscript here was submitted to eLife, another group deposited a preprint in BioRxiv using CryoEM to solve the structure of SpNOX (see comment below). This structure was solved in detergent so even in this CryEM structure there is no information on the potential roles of lipids as asked by the referee.

      In this revised version, we have added a comment, in the last paragraph, in reference to the additional data available today thanks to the other structures generated by this other group (Murphy's group).

      Can the authors provide information on whether E. coli BL21 is sufficiently equipped for the heme synthesis required for the expression of the TM domain of SpNOX. Was supplementation with δaminolevulinic acid used

      The production of His-SpNox in E.coli C41(DE3) is without any δ-aminolevulinic acid supplementation. Supplementation was tested but no change was observed regarding the heme content (UV/Visible spectra) so we settled on the purification described by Vermot et al 2020. Initially, for the mBio paper (Haajar et al 2017), we performed heme titrations which gave stoichiometry between 1.35 to 1.5 heme/protein, indicating 2 hemes (these data were not shown). In the end in this work we observed two hemes in the crystal structure, thus confirming that E.coli, at least for this protein, did not need supplementation with δ-aminolevulinic acid .

      The 3 papers on SpNOX present more than convincing evidence that SpNOX is a legitimate Nox that can serve as a legitimate model for eukaryotic Noxes (cyanide resistance, inhibition by DPI, absolute FAD dependence, and NADPH/NADH as the donor or electrons to FAD). It is also understood that the physiological role of SpNOX in S. pneumoniae is unknown and that the fact that it can reduce molecular oxygen may be an experimental situation that does not occur in vivo.

      I am, however, linguistically confused by the statement that "SpNOX requires "supplemental" FAD". Noxes have FAD bound non-covalently and this is the reason that, starting from the key finding of Babior on NOX2 back in 1977 to the present, FAD has to be added to in vitro systems to compensate for the loss of FAD in the course of the purification of the enzyme from natural sources or expression in a bacterial host. I wonder whether this makes FAD more of a cosubstrate than a prosthetic group unless what the authors intend to state is that SpNOX is not a genuine flavoprotein.

      We believe there is some confusion between SpNOX – the full length transmembran protein -- and SpNOXDH -- the cytosolic domain only. The sentence pinpointed by the referee was in fact “The strict requirement of FAD addition for SpNOXDH activity suggests that the flavin behaves as a cosubstrate”. This statement was about the isolated cytosolic domain that does not contain the TM part of the protein.

      We agree that in WT NOX enzymes (including SpNOX) FAD is held within the enzyme structure and thus can be considered, by definition, as a prosthetic group. This is supported by the nanomolar affinity for FAD of SpNOX. We did not intend to say that NOX and SpNOX are not genuine flavoproteins.

      On the other hand, when isolated, the affinity of DH domain for flavins drops to the µM level. This µM level of affinity does not allow stable maintenance of the flavin in the active site as illustrated by the spectra of Figure 3. This is instead the typical affinity of a substrate or a co-substrate (similar to that of substrate NADPH) that can be exchangeable and diffuse in and out of the active site. The DH domain recognizes and reduces flavins but, as a consequence of its lower affinity, will release to its environment free reduced flavins. Thus the isolated DH behaves as a flavin reductase that uses flavin as substrate. Such enzymes have already been well described (and some of them are of the FNR family). Such enzymes, using flavin as substrate, typically have affinity for flavin in the µM range and share with the SpNOX DH binding properties centered on the isoalloxazine ring only.

      We understand that, in the text, to switch from the SpNOX to the SpNOX DH and for FAD from a prosthetic group to a diffusible co-substrate can be confusing. So, to make it clearer, we modified the following sentences and added references to “some flavin reductases characterization” that could provide support for the reader.

      “The strict requirement of FAD addition for SpNOXDH activity and its µM level of affinity suggests that the flavin behaves as a co-substrate rather than a prosthetic group. As an isolated domain, SpNOXDH may work as a flavin reductase enzyme (Gaudu et al, 1994; Fieschi et al 1995; Nivière et al 1996), ..”

      We hope that it will help.

      I am also puzzled by the statement that SpNOX "does not require the addition of Cyt c to sustain superoxide production". Researchers with a Cartesian background should differentiate between cause and effect. Cyt c serves merely as an electron acceptor from superoxide made by SpNOX but superoxide production and NADPH oxidation occur independently of the presence of added Cyt c.

      Thanks to the referee for pointing out this poor wording. We agree and have amended the text to clarify what we originally meant. It is now:

      “SpNOXDH requires supplemental FAD to sustain both superoxide production, which can be observed in the presence of Cyt c (Figure 2A), and NADPH oxidation, which can be observed in the absence of Cyt c (Figure 2B).”

      The ability of the DH domain of SpNOX (SpNOXDH) to produce superoxide is surprising to this reviewer.The result is based on the inhibition of Cyt c reduction by added superoxide dismutase (SOD) by 40%. In all eukaryotic Noxes superoxide is produced by the one-electron reduction of molecular oxygen by electrons originating from the distal heme, having passed from reduced FAD via two hemes. The proposal that superoxide is generated by direct transfer of electrons from FAD to oxygen deserves a more in-depth discussion and relies too heavily on the inhibitory effect of SOD. A control experiment with inactivated SOD should have been done (SOD is notoriously heat resistant and inactivation might require autoclaving).

      The initial reports of a NOX DH-domain-only construct (that of human Nox4) producing superoxide are cited in the text. Moreover, natural flavin reductases are known to produce superoxide due to the release of free reduced flavin in the medium.

      As explain above, FAD in full length SpNox is a relay for the electrons from NADPH to heme and is internal to the protein and thus devoted to this specific task.

      In the case of SpNOX DH, its flavin reductase behavior leads to the release in the medium of free reduced flavin as a nonspecific diffusible electron carrier. It has been already demonstrated that such free reduced flavin can efficiently reduce soluble O2 and be a source of superoxide.

      This has been particularly well documented in (Gaudu et al, 1994. J.Biol.Chem). We have added this reference to the text (see the modified sentence in a reply, 2 comments above).

      Furthermore, we want to point to the referee that the link between flavin and superoxide production here is not only based on the inhibition by SOD. When we added the flavin inhibitor DPI we observed no more superoxide production from the DH domain (Figure 2C). This supports the role of free-reduced flavin in both the production of superoxide and also part of direct cyt C reduction as observed.

      An unasked and unanswered question is that, since under aerobic conditions, both direct Cyt c reduction (60%) and superoxide production (40%) occur, what are the electron paths responsible for the two phenomena occurring simultaneously?

      We thank the referee for dedication to a clear understanding of the mechanism used by the SpNOXDH construct. It pushes us to develop a clear description of the mechanism at work here for the readers. Please find below a proposal mechanism describing the electron transfer from NAD(P)H to free flavin that can, as diffusible species, then reduce non-specifically either the O2 or the Cyt.C encountered.

      Author response image 1.

      However, it is important to remember that this is not physiological, and rather the result of using a DH domain isolated from the TM of SpNOX. Nonetheless, it shows that the DH domain is fully functional for NAD(P)H as well as the hydride transfer.

      This reviewer had difficulty in following the argument that the fact that the kcat of SpNOX and SpNOXDH are similar supports the thesis that the rate of enzyme activation is dependent on hydride transfer from nicotinamide to FAD.

      We have amended the text to clarify this point. If the reaction rate is not affected by the presence or absence of the hemes in the TM domain, this inevitably implies that the rate is NOT limited by the electron transfer to the heme, and ultimately to O2, from the FAD, and thus the hydride transfer step that oxidizes the FAD must be the rate limiting step.

      The section dealing with mutating F397 is a key part of the paper. There is a proper reference to the work of the Karplus group on plant FNRs (Deng et al). However, later work, addressing comparison with NOX2, should be cited (Kean et al., FEBS J., 284, 3302-3319, 2017). Also, work from the Dinauer group on the minimal effect of mutating or deleting the C-terminal F570 in NOX2 on superoxide production should be cited (Zhen et al., J. Biol. Chem. 273, 6575-6581, 1998).

      We thank the reviewer for pointing out our unintended omission of these important works; we have amended the text and added the citations.

      It is not clear why mutating F397 to W (both residues having aromatic side chains) would stabilize FAD binding.

      In a few words, trp’s double ring can establish larger and stronger vanderWaals contact with the isoalloxazine ring than the phe sidechain. Our discussion regarding this point is extensive in the structural section where we compare the structures with F and W in this position. At this time we do not think it is necessary to add anything to the text.

      Also, what is meant by "locking the two subdomains of the DH domain"? What subdomains are meant?

      The two subdomains are the NADPH-binding domain and the FAD-binding domain, which we define on p 11 (“SpNOXDH presents a typical fold of the FNR superfamily of reductase domain containing two sub-domains, the FAD-binding domain (FBD) and an NADPH-binding domain (NBD) “) and which are labeled in Fig. 4. By “locking” we meant to convey immobilizing them into a specific conformation; we have amended the text to clarify this point.

      Methodological details on crystallization (p. 11) should be delegated to the Methodology section. How many readers are aware that SAD means "Single Wavelength Anomalous Diffraction" or know what is the role of sodium bromide?

      We have amended the text to emphasize the intended point, which is the different origins of the two DH structures: the de novo structure was possible through co crystallization with bromide, and the molecular replacement structure used the de novo structure as a model.

      The data on the structure of SpNOX are supportive of a model of Nox activation that is "dissident" relative to the models offered for DUOX and NOX2 activation. These latter models suggested that the movement of the DH domain versus the TM domain was related to conversion from the resting to the activated state. The findings reported in this paper show that, unexpectedly, the domain orientation in SpNOX (constitutively active!) is much closer to that of resting NOX2. One of the criteria associated with the activated state in Noxes was the reduction of the distance between FAD and the proximal heme. The authors report that, paradoxically, this distance is larger in the constitutively active SpNOX (9.2 Å) than that in resting state NOX2 (7.6 Å) and the distance in Ca2+-activated DUOX is even larger (10.2 Å).

      A point made by the authors is the questioning of the paradigm that activation of Noxes requires DH domain motion.

      Instead, the authors introduce the term "tensing", within the DH domain, from a "relaxed" to a more rigid conformation. I believe that this proposal requires a somewhat clearer elaboration

      It is clear that the distance between the FAD and NADPH shown in the Duox and Nox2 structures is too large for the chemical reaction of hydride transfer. Wu et al used the terms ‘tense’ and ‘relaxed’ to describe conformations of the DH domain corresponding to ‘short distance’ and ‘longer distance’, respectively, between the two ligand binding sites. We quoted this terminology and have amended the text to clarify that we envision a motion of the NBD relative to the FBD, as distinct from a larger motion of the whole DH domain relative to the TM domain.

      The statement on p. 18, in connection to the phospholipid environment of Noxes, that the structure of SpNOX was "solved in detergent" is puzzling since the method of SpNOX preparation and purification does not mention the use of a detergent. As mentioned before, this absence of detergent in the present report was surprising because LMNG was used in the methods described in the mBio and Biophys. J. papers. The only mention of LMNG in the present paper was as an addition at a concentration of 0.003% in the activity assay buffers.

      Please see our response to similar points above. Detergent was present for the solubilization of the full-length SpNOX.

      The Conclusions section contains a proposal for the mechanism of conversion of NOX2 from the resting to the activated state. The inclusion of this discussion is welcome but the structural information on the constitutively active SpNOX can, unfortunately, contribute little to solving this important problem. The work of the Lambeth group, back in 1999 (cited as Nisimoto et al.), on the role of p67-phox in regulating hydride transfer from NADPH to FAD in NOX2 may indeed turn out to have been prophetic. However, only solving the structure of the assembled NOX2 complex will provide the much-awaited answer. The heterodimerization of NOX2 with p22-phox, the regulation of NOX2 by four cytosolic components, and the still present uncertainty about whether p67-phox is indeed the final distal component that converts NOX2 to the activated state make this a formidable task.

      The work of the Fieschi group on SpNOX is important and relevant but the absence of external regulation, the absence of p22-phox, and the uncertainty about the target molecule make it a rather questionable model for eukaryotic Noxes. The information on the role of the C-terminal Phe is of special value although its extension to the mechanism of eukaryotic Nox activation proved, so far, to be elusive.

      We really thank the referee for the positive comments on our work and the deep interest shown by this careful evaluation.

      We understand the arguments of the referee regarding the relevance of our work here to eukaryotic NOX, but we do not share the reservations expressed. While human NOXes need interactions with other proteins or have EF-hand or other domains that control them, SpNOX corresponds exactly to the minimal core common to any NOX isoform. In fact, because SpNOX has only this conserved core, it is unique in that it can work as a constitutively active NOX without protein-protein interactions or regulatory domains. Thus the fundamentals of electron transfer mechanisms of NOX enzyme are present in SpNOX.

      There might be some differences in the internal organization from isoform to isoform (as regarding the relative DH domain vs TM domain orientation) but considering the similarity between NOX2 and SpNOX topology we are rather confident that the SpNOX structure will turn out to be a reasonable model of the activated NOX2 structure. History will tell.

      In any case, this work on SpNOX allowed us to highlight hydride transfer as the limiting step and also to highlight some structural differences that could be at the source of the regulation in eukaryotic NOX. In itself, we think this is a significant contribution to the field.

      We warmly thank both referees for their constructive remarks and their help in the improvement of this manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • The manuscript states that the flavin "behaves" like a co-substrate and thereby reports on the Km for the flavins. I feel that this terminology might be confusing. The flavin is unchanged after the reaction, and what matters is the enzyme's affinity for the flavin and the flavin concentration needed to saturate the enzyme (to have it in the fully holo form).

      See above -- answering many questions from referee2, we have extensively commented on that point (substrate, cofactor, affinity, etc..) and made some adjustments in the text to clarify. We hope it is now satisfactory.

      • I could not find the methodological description of the experiments performed to measure the Km for the flavins, and the legend of Figure S4 does not help in this regard. I think that the data (left panels of S4) should be interpreted as binding curves with associated Kd values.

      We have changed the text to clarify the method used to measure Km for flavins.

      • A related point is that the manuscript refers to Km as an "affinity". This is inappropriate and should be avoided, as the Km is not the Kd.

      We agree with the referee that the Km is not the Kd. However, under the appropriate conditions, to which our experiments conform, Km is accepted as a relevant approximation of affinity (Srinisivan, FEBS Journal, v 289 pp 6086-6098 2022). We have added a sentence to clarify this point and cite this reference in the text.

      • The environment around the putative oxygen site should be shown. The text indicates that "the residues characteristic of the O2 reducing center in eukaryotic FRD domains of NOX and DUOX enzymes are not conserved in SpNOX." How does the site look? This point relates to the more general comment above on the oxidizing substrate used by this bacterial NOX.

      This is a really interesting point that contains many potential biological developments for future studies of this prokaryotic family of NOX enzymes. While we were submitting this work to eLife for evaluation, another group (Murphy's lab) filed a pre-publication in BioRXiv, in which they also solved the structure of SpNOX but this time by CryoEM with an unexpected level of resolution for such a small protein (their paper is not yet published but probably under peer review somewhere). In their work, they made a special effort to identify the O2 reducing center (bacterial NOX sequences alignment, mutation studies, …) They were not able to localize such a site with accuracy. There is also other complementary data between their work and ours. So, we will add a paragraph at the end of the discussion to comment on this parallel work and to emphasize on the complementarity of their studies and what it brings to the final understanding of this enzyme.

      • The section "A Close-up View of NOX's NAD(P)H Binding Domains vs the FNR Gold Standard" should be clarified.

      I found it difficult to understand. Is the different conformation of Phe397 creating the crevice? Could NADPH be modeled in NOX2 and DUOX in the same conformation observed in FNR and modeled in the bacterial NOX? Or would there be clashes, implying the necessity of larger conformational changes to bring the nicotinamide closer to the FAD?

      Please see responses above on this point; we have amended the text to clarify. In a few words, we propose that activation in the eukaryotic enzymes would entail NBD subdomain (containing NADPH site) towards the FBD subdomain (containing FAD) through an internal motion within the DH domain. Doing so, they would approach the DH domain topology of SpNOX, which models an active state.

      Reviewer #2 (Recommendations For The Authors):

      On p. 6, second line, it should be (Figure 1C and 1D). Space is missing between C and "and".

      On p. 9, in Figure 3, the labeling A and B are missing. Also, the legend of part B does not correspond to the actual graph colors. Thus, the tracing of F397W is red and not grey as indicated in the legend.

      Corrected. Thank you

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      In this work, the authors examine the activity and function of D1 and D2 MSNs in dorsomedial striatum (DMS) during an interval timing task. In this task, animals must first nose poke into a cued port on the left or right; if not rewarded after 6 seconds, they must switch to the other port. Critically, this task thus requires animals to estimate if at least 6 seconds have passed after the first nose poke - this is the key aspect of the task focused on here. After verifying that animals reliably estimate the passage of 6 seconds by leaving on average after 9 seconds, the authors examine striatal activity during this interval. They report that D1-MSNs tend to decrease activity, while D2-MSNs increase activity, throughout this interval. They suggest that this activity follows a drift-diffusion model, in which activity increases (or decreases) to a threshold after which a decision (to leave) is made. The authors next report that optogenetically inhibiting D1 or D2 MSNs, or pharmacologically blocking D1 and D2 receptors, increased the average wait time of the animals to 10 seconds on average. This suggests that both D1 and D2 neurons contribute to the estimate of time, with a decrease in their activity corresponding to a decrease in the rate of

      'drift' in their drift-diffusion model. Lastly, the authors examine MSN activity while pharmacologically inhibiting D1 or D2 receptors. The authors observe most recorded MSNs neurons decrease their activity over the interval, with the rate decreasing with D1/D2 receptor inhibition. 

      Major strengths: 

      The study employs a wide range of techniques - including animal behavioral training, electrophysiology, optogenetic manipulation, pharmacological manipulations, and computational modeling. The behavioral task used by the authors is quite interesting and a nice way to probe interval timing in rodents. The question posed by the authors - how striatal activity contributes to interval timing - is of importance to the field and has been the focus of many studies and labs; thus, this paper can meaningfully contribute to that conversation. The data within the paper is presented very clearly, and the authors have done a nice job presenting the data in a transparent manner (e.g., showing individual cells and animals). Overall, the manuscript is relatively easy to read and clear, with sufficient detail given in most places regarding the experimental paradigm or analyses used. 

      We are glad our main points came through to the reviewer.  

      Major weaknesses: 

      I perceive two major weaknesses. The first is the impact or contextualization of their results in terms of the results of the field more broadly. More specifically, it was not clear to me how the authors are interpreting the striatal activity in the context of what others have observed during interval timing tasks. In other words - what was the hypothesis going into this experiment? Does observing increasing/decreasing activity in D2 versus D1 support one model of interval timing over another, or does it further support a more specific idea of how DMS contributes to interval timing? Or was the main question that we didn't know if D2 or D1 neurons had differential activity during interval timing? 

      This is a helpful comment. Our hypothesis is that D1 and D2 MSNs had similar patterns of activity.  Our rationale is prior behavioral work from our group describing that blocking striatal D1 and D2 dopamine receptors had similar behavioral effects on interval timing (De Corte et al., 2019; Stutt et al., 2023), We rewrote our introduction with this idea in mind (Line 89)

      “We and others have found that striatal MSNs encode time across multiple intervals by time-dependent ramping activity or monotonic changes in firing rate across a temporal interval (Emmons et al., 2017; Gouvea et al., 2015; Mello et al., 2015; Wang et al., 2018). However, the respective roles of D2-MSNs and D1-MSNs are unknown. Past work has shown that disrupting either D2-dopamine receptors (D2) or D1-dopamine receptors (D1) powerfully impairs interval timing by increasing estimates of elapsed time (Drew et al., 2007; Meck, 2006). Similar behavioral effects were found with systemic (Stutt et al., 2024) or local dorsomedial striatal D2 or D1 disruption (De Corte et al., 2019a). These data lead to the hypothesis that D2 MSNs and D1 MSNs have similar patterns of ramping activity across a temporal interval. 

      We tested this hypothesis with a combination of optogenetics, neuronal ensemble recording, computational modeling, and behavioral pharmacology. We use a well-described mouse-optimized interval timing task (Balci et al., 2008; Bruce et al., 2021; Larson et al., 2022; Stutt et al., 2024; Tosun et al., 2016; Weber et al., 2023). Strikingly, optogenetic tagging of D2-MSNs and D1-MSNs revealed distinct neuronal dynamics, with D2-MSNs tending to increase firing over an interval and D1-MSNs tending to decrease firing over the same interval, similar to opposing movement dynamics (Cruz et al., 2022; Kravitz et al., 2010; Tecuapetla et al., 2016). MSN dynamics helped construct and constrain a four-parameter drift-diffusion computational model of interval timing, which predicted that disrupting either D2MSNs or D1-MSNs would increase interval timing response times. Accordingly, we found that optogenetic inhibition of either D2-MSNs or D1-MSNs increased interval timing response times. Furthermore, pharmacological blockade of either D2- or D1receptors also increased response times and degraded trial-by-trial temporal decoding from MSN ensembles. Thus, D2-MSNs and D1-MSNs have opposing temporal dynamics yet disrupting either MSN type produced similar effects on behavior. These data demonstrate how striatal pathways play complementary roles in elementary cognitive operations and are highly relevant for understanding the pathophysiology of human diseases and therapies targeting the striatum.”

      In the second, I felt that some of the conclusions suggested by the authors don't seem entirely supported by the data they present, or the data presented suggests a slightly more complicated story. Below I provide additional detail on some of these instances. 

      Regarding the results presented in Figures 2 and 3: 

      I am not sure the PC analysis adds much to the interpretation, and potentially unnecessarily complicates things. In particular, running PCA on a matrix of noisy data that is smoothed with a Gaussian will often return PCs similar to what is observed by the authors, with the first PC being a line up/down, the 2nd PC being a parabola that is up/down, etc. Thus, I'm not sure that there is much to be interpreted by the specific shape of the PCs here. 

      We are glad the reviewer raised this point. First, regarding the components in noisy data, what the reviewer says is correct, but usually, the variance explained by PC1 is small. This is the reason we include scree plots in our PC analysis (Fig 3B and Fig 6G). When we compare our PC1s to variance explained in random data, our PC1 variance is always stronger. We have now included this in our manuscript:

      First, we generated random data and examined how much variance PC1 might generate. 

      We added this to the methods (Line 634)

      “The variance of PC1 was empirically compared against data generated from 1000 iterations of data from random timestamps with identical bins and kernel density estimates. Average plots were shown with Gaussian smoothing for plotting purposes only.”

      These data suggested that our PC1 was stronger than that observed in random data (Line 183):

      “PCA identified time-dependent ramping activity as PC1 (Fig 3A), a key temporal signal that explained 54% of variance among tagged MSNs (Fig 3B; variance for PC1 p = 0.009 vs 46 (44-49)% variance for PC1 derived from random data; Narayanan, 2016).”

      And in the pharmacology data (Line 367):

      “The first component (PC1), which explained 54% of neuronal variance, exhibited “time-dependent ramping”, or monotonic changes over the 6 second interval immediately after trial start (Fig 6F-G; variance for PC1 p = 0.001 vs 46 (45-47)% variance in random data; Narayanan, 2016).”

      Second, we note that we have used this analysis extensively in the past, and PC1 has always been identified as a linear ramping in our work and in work by others (Line 179):

      “Work by our group and others has uniformly identified PC1 as a linear component among corticostriatal neuronal ensembles during interval timing (Bruce et al., 2021; Emmons et al., 2020, 2019, 2017; Kim et al., 2017a; Narayanan et al., 2013; Narayanan and Laubach, 2009; Parker et al., 2014; Wang et al., 2018).”

      Third, we find that PC1 is highly correlated to the GLM slope (Line 205):

      “Trial-by-trial GLM slope was correlated with PC1 scores in Fig 3A-C (PC1 scores vs. GLM slope r = -0.60, p = 10-8).”

      Fourth, our goal was not to heavily interpret PC1 – but to compare D1 vs. D2 MSNs, or compare population responses to D2/D1 pharmacology. We have now made this clear in introducing PCA analyses in the results (Line 177):

      “To quantify differences in D2-MSNs vs D1-MSNs, we turned to principal component analysis (PCA), a data-driven tool to capture the diversity of neuronal activity (Kim et al., 2017a).”

      Finally, despite these arguments the reviewer’s point is well taken. Accordingly, we have removed all analyses of PC2 from the manuscript which may have been overly interpretative. 

      We have now removed language that interpreted the components, and we now find the discussion of PC1 much more data-driven. We have also removed much of the advanced PC analysis in Figure S9. Given our extensive past work using this exact analysis of PC1, we think PCA adds a considerable amount to our manuscript justified as the reviewer suggested. 

      I think an alternative analysis that might be both easier and more informative is to compute the slope of the activity of each neuron across the 6 seconds. This would allow the authors to quantify how many neurons increase or decrease their activity much like what is shown in Figure 2.  

      We agree – we now do exactly this analysis in Figure 3D. We now clarify this in detail, using the reviewer’s language to the methods (Line 648):

      “To measure time-related ramping over the first 6 seconds of the interval, we used trial-by-trial generalized linear models (GLMs) at the individual neuron level in which the response variable was firing rate and the predictor variable was time in the interval or nosepoke rate (Shimazaki and Shinomoto, 2007). For each neuron, it’s time-related “ramping” slope was derived from the GLM fit of firing rate vs time in the interval, for all trials per neuron. All GLMs were run at a trial-by-trial level to avoid effects of trial averaging (Latimer et al., 2015) as in our past work (Bruce et al., 2021; Emmons et al., 2017; Kim et al., 2017b).”

      And to the results (Line 194):

      “To interrogate these dynamics at a trial-by-trial level, we calculated the linear slope of D2-MSN and D1-MSN activity over the first 6 seconds of each trial using generalized linear modeling (GLM) of effects of time in the interval vs trial-by-trial firing rate (Latimer et al., 2015).”

      Relatedly, it seems that the data shown in Figure 2D *doesn't* support the authors' main claim regarding D2/D1 MSNs increasing/decreasing their activity, as the trial-by-trial slope is near 0 for both cell types. 

      This likely refers to Figure 3D. The reviewer is correct that the changes in slope are small and near 0. Our goal was to show that D2-MSN and D1-MSN slopes were distinct – rather than increasing and decreasing. We have added this to the abstract (Line 46)

      “We found that D2-MSNs and D1-MSNs exhibited distinct dynamics over temporal intervals as quantified by principal component analyses and trial-by-trial generalized linear models.”

      We have clarified this idea in our hypothesis (Line 96):

      “These data led to the hypothesis that D2 MSNs and D1 MSNs have similar patterns of ramping activity across a temporal interval.”

      We have added this idea to the results (Line 194)

      “To interrogate these dynamics at a trial-by-trial level, we calculated the linear slope of D2-MSN and D1-MSN activity over the first 6 seconds of each trial using generalized linear modeling (GLM) of effects of time in the interval vs trial-by-trial firing rate (Latimer et al., 2015). Nosepokes were included as a regressor for movement. GLM analysis also demonstrated that D2-MSNs had significantly different slopes (-0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1MSNs (-0.20 (-0.47– -0.06; Fig 3D; F = 8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98; no reliable effect of sex (F = 0.02, p = 0.88) or switching direction (F = 1.72, p = 0.19)). We found that D2-MSNs and D1-MSNs had significantly different slopes even when excluding outliers (4 outliers excluded outside of 95% confidence intervals; F = 7.51, p = 0.008 accounting for variance between mice) and when the interval was defined as the time between trial start and the switch response on a trial-by-trial basis for each neuron (F = 4.3, p = 0.04 accounting for variance between mice). Trial-by-trial GLM slope was correlated with PC1 scores in Fig 3A-C (PC1 scores vs. GLM slope r = -0.60, p = 108). These data demonstrate that D2-MSNs and D1-MSNs had distinct slopes of firing rate across the interval and were consistent with analyses of average activity and PC1, which exhibited time-related ramping.”

      And Line 215:

      “In summary, we used optogenetic tagging to record from D2-MSNs and D1-MSNs during interval timing. Analyses of average activity, PC1, and trial-by-trial firingrate slopes over the interval provide convergent evidence that D2-MSNs and D1MSNs had distinct and opposing dynamics during interval timing. These data provide insight into temporal processing by striatal MSNs.”

      And in the discussion (Line 415):

      “We describe how striatal MSNs work together in complementary ways to encode an elementary cognitive process, interval timing. Strikingly, optogenetic tagging showed that D2-MSNs and D1-MSNs had distinct dynamics during interval timing. “

      We have now included a new plot with box plots to make the differences in Figure 3D clear

      Other reviewers requested additional qualitative descriptions of our data, and we have referred to increases / decreases in this context. 

      Regarding the results in Figure 4: 

      The authors suggest that their data is consistent with a drift-diffusion model. However, it is unclear how well the output from the model fits the activity from neurons the authors recorded. Relatedly, it is unclear how the parameters were chosen for the D1/D2 versions of this model. I think that an alternate approach that would answer these questions is to fit the model to each cell, and then examine the best-fit parameters, as well as the ability of the model to predict activity on trials held out from the fitting process. This would provide a more rigorous method to identify the best parameters and would directly quantify how well the model captures the data. 

      We are glad the reviewer raised these points. Our goal was to use neuronal activity to fit behavioral activity, not the reverse. While we understand the reviewer’s point, we note that one behavioral output (switch time) can be encoded by many patterns of neuronal activity; thus, we are not sure we can use the model developed for behavior to fit diverse neuronal activity, or an ensemble of neurons. We have made this clear in the manuscript (Line 251):

      “Our model aimed to fit statistical properties of mouse behavioral responses while incorporating MSN network dynamics. However, the model does not attempt to fit individual neurons’ activity, because our model predicts a single behavioral parameter – switch time – that can be caused by the aggregation of diverse neuronal activity.”

      To attempt to do something close to what the reviewer suggested, we attempted to predict behavior directly from neuronal ensembles.  We have now made this clear in the methods on Line 682):

      “Analysis and modeling of mouse MSN-ensemble recordings. Our preliminary analysis found that, for sufficiently large number of neurons (𝑵 > 𝟏𝟏), each recorded ensemble of MSNs on a trial-by-trial basis could predict when mice would respond. We took the following approach: First, for each MSN, we convolved its trial-by-trial spike train 𝑺𝒑𝒌(𝒕) with a 1-second exponential kernel 𝑲(𝒕) = 𝒘 𝒆-𝒕/𝒘 if 𝒕 > 𝟎 and 𝑲(𝒕) = 𝟎 if 𝒕 ≤ 𝟎 (Zhou et al., 2018; here 𝒘 = 𝟏 𝒔). Therefore, the smoothed, convolved spiking activity of neuron 𝒋 (𝒋 = 𝟏, 𝟐, … 𝑵),

      tracks and accumulates the most recent (one second, in average) firing-rate history of the 𝒋-th MSN, up to moment 𝒕. We hypothesized that the ensemble activity

      (𝒙𝟏(𝒕), 𝒙𝟐(𝒕), … , 𝒙𝑵(𝒕)), weighted with some weights 𝜷𝒋 , could predict the trial switch time 𝒕∗ by considering the sum

      and the sigmoid 

      that approximates the firing rate of an output unit. Here parameter 𝒌   indicates how fast 𝒙(𝒕) crosses the threshold 0.5 coming from below (if 𝒌 > 𝟎) or coming from above (if 𝒌 < 𝟎) and relates the weights 𝜷𝒋 to the unknowns 𝜷H𝒋 \= 𝜷𝒋/𝒌 and 𝜷H𝟎 \= −𝟎. 𝟓/𝒌. Next, we ran a logistic fit for every trial for a given mouse over the spike count predictor matrix 7𝒙𝟏(𝒕), 𝒙𝟐(𝒕), … , 𝒙𝑵(𝒕)9 from the mouse MSN recorded ensemble, and observed value 𝒕∗, estimating the coefficients 𝜷H𝟎 and 𝜷H𝒋, and so, implicitly, the weights 𝜷𝒋. From there, we compute the predicted switch time 𝒕∗𝒑𝒓𝒆𝒅 by condition 𝒙(𝒕) = 𝟎. 𝟓. Accuracy was quantified comparing the predicted accuracy within a 1 second window to switch time on a trial-by-trial basis (Fig S4).

      And in the results (Line 254): 

      We first analyzed trial-based aggregated activity of MSN recordings from each mouse (𝒙𝒋(𝒕)) where 𝒋 = 𝟏, … , 𝑵 neurons. For D2-MSN or D1-MSN ensembles of 𝑵 > 𝟏𝟏, we found linear combinations of their neuronal activities, with some 𝜷𝒋 coefficients,

      that could predict the trial-by-trial switch response times (accuracy > 90%, Fig S4; compared with < 20% accuracy for Poisson-generated spikes of same trial-average firing rate). The predicted switch time 𝒕∗𝒑𝒓𝒆𝒅 was defined by the time when the weighted ensemble activity 𝒙(𝒕) first reached the value 𝒙) = 0.5. Finally, we built DDMs to account for this opposing trend (increasing vs decreasing) of MSN dynamics and for ensemble threshold behavior defining 𝒕∗𝒑𝒓𝒆𝒅; see the resulting model (Equations 1-3) and its simulations (Figure 4A-B).”

      And we have added a new figure, Figure S4, that demonstrates these trial-by-trial predictions of switch response times.  

      Note that we have included predictions from shuffled data similar to what the reviewer suggested based on shuffled data. Predictions are derived from neuronal ensembles on that trial; thus we could not apply a leave-one-out approach to trial-by-trial predictions.

      These models are highly predictive for larger ensembles and poorly predictive for smaller ensembles.  We think this model adds to the manuscript and we are glad the reviewer suggested it. 

      Relatedly, looking at the raw data in Figure 2, it seems that many neurons either fire at the beginning or end of the interval, with more neurons firing at the end, and more firing at the beginning, for D2/D1 neurons respectively. Thus, it's not clear to me whether the drift-diffusion model is a good model of activity. Or, perhaps the model is supposed to be related to the aggregate activity of all D1/D2 neurons? (If so, this should be made more explicit. The comment about fitting the model directly to the data also still stands).  

      Our model was inspired by the aggregate activity.  We have now made this clear in the results (Line 227): 

      “Our data demonstrate that D2-MSNs and D1-MSNs have opposite activity patterns. However, past computational models of interval timing have relied on drift-diffusion dynamics with a positive slope that accumulates evidence over time (Nguyen et al., 2020; Simen et al., 2011). To reconcile how these MSNs might complement to effect temporal control of action, we constructed a four-parameter drift-diffusion model (DDM). Our goal was to construct a DDM inspired by average differences in D2MSNs and D1-MSNs that predicted switch-response time behavior.”

      Further, it's unclear to me how, or why, the authors changed the specific parameters they used to model the optogenetic manipulation. Were these parameters chosen because they fit the manipulation data? This I don't think is in itself an issue, but perhaps should be clearly stated, because otherwise it sounds a bit odd given the parameter changes are so specific. It is also not clear to me why the noise in the diffusion process would be expected to change with increased inhibition. 

      We have clarified that our parameters were chosen to best fit behavior (Line 266):

      “The model’s parameters were chosen to fit the distribution of switch-response times:

      𝑭 = 𝟏, 𝒃 = 𝟎. 𝟓𝟐 (so 𝑻 = 𝟎. 𝟖𝟕), 𝑫 = 𝟎. 𝟏𝟑𝟓, 𝝈 = 𝟎. 𝟎𝟓𝟐 for intact D2-MSNs (Fig 4A, in black); and  𝑭 = 𝟎, 𝒃 = 𝟎. 𝟒𝟖 (so 𝑻 = 𝟎. 𝟏𝟑), 𝑫 = 𝟎. 𝟏𝟒𝟏, 𝝈 = 𝟎. 𝟎𝟓𝟐 for intact D1-MSNs (Fig 4B, in black).”

      Furthermore, we have clarified the approach to noise in the results (Line 247):  

      “The drift, together with noise 𝝃(𝒕) (of zero mean and strength 𝝈), leads to fluctuating accumulation which eventually crosses a threshold 𝑻 (see Equation 3; Fig 4A-B).”

      And Line 279: 

      “The results were obtained by simultaneously decreasing the drift rate D  (equivalent to lengthening the neurons’ integration time constant) and lowering the level of network noise 𝝈: D = 𝟎. 𝟏𝟐𝟗, 𝝈 = 𝟎. 𝟎𝟒𝟑 for D2-MSNs in Fig 4A (in red; changes in noise had to accompany changes in drift rate to preserve switch response time variance); and 𝑫 = 𝟎. 𝟏𝟐𝟐, 𝝈 = 𝟎. 𝟎𝟒𝟑  for D1-MSNs in Fig 4B (in blue). The model predicted that disrupting either D2-MSNs or D1-MSNs would increase switch response times (Fig 4C and Fig 4D) and would shift MSN dynamics.”

      Regarding the results in Figure 6: 

      My comments regarding the interpretation of PCs in Figure 2 apply here as well. In addition, I am not sure that examining PC2 adds much here, given that the authors didn't examine such nonlinear changes earlier in the paper. 

      We agree – we removed PC2 for these reasons. We have also noted that the primary reason for PC1 was to compare results of D2/D1 blockade (Line 362):

      “We noticed differences in MSN activity across the interval with D2 blockade and D1 blockade at the individual MSN level (Fig 6B-D) as well as at the population level (Fig 6E). We used PCA to quantify effects of D2 blockade or D1 blockade (Bruce et al., 2021; Emmons et al., 2017; Kim et al., 2017a). We constructed principal components (PC) from z-scored peri-event time histograms of firing rate from saline, D2 blockade, and D1 blockade sessions for all mice together. The first component (PC1), which explained 54% of neuronal variance, exhibited “timedependent ramping”, or monotonic changes over the 6 second interval immediately after trial start (Fig 6F-G; variance for PC1 p = 0.001 vs 46 (45-47)% variance in random data; Narayanan, 2016).”

      As noted above, PC1 does not explain this level of variance in noisy data.

      We also reworked Figure 6 to make the effects of D2 and D1 blockade more apparent by moving the matched sorting to the main figure: 

      A larger concern though that seems potentially at odds with the authors' interpretation is that there seems to be very little change in the firing pattern after D1 or D2 blockade. I see that in Figure 6F the authors suggest that many cells slope down (and thus, presumably, they are recoding more D1 cells), and that this change in slope is decreased, but this effect is not apparent in Figure 6C, and Figure 6B shows an example of a cell that seems to fire in the opposite direction (increase activity). I think it would help to show some (more) individual examples that demonstrate the summary effect shown by the authors, and perhaps the authors can comment on the robustness (or the variability) of this result. 

      These are important suggestions, we changed our analysis to better capture the variability and main effects in the data, exactly as the reviewer suggested. First, we now included 3 individual raster examples, exactly as the reviewer suggested

      As the reviewer suggested, we wanted to compare variability for *all* MSNs. We sorted the same MSNs across saline, D2 blockade, and D1 blockade sessions. We detailed these sorting details in the methods (Line 618):

      “Single-unit recordings were made using a multi-electrode recording system (Open

      Ephys, Atlanta, GA). After the experiments, Plexon Offline Sorter (Plexon, Dallas, TX), was used to remove artifacts. Principal component analysis (PCA) and waveform shape were used for spike sorting. Single units were defined as those 1) having a consistent waveform shape, 2) being a separable cluster in PCA space, and 3) having a consistent refractory period of at least 2 milliseconds in interspike interval histograms. The same MSNs were sorted across saline, D2 blockade, and D1 blockade sessions by loading all sessions simultaneously in Offline Sorter and sorted using the preceding criteria. MSNs had to have consistent firing in all sessions to be included. Sorting integrity across sessions was quantified by comparing waveform similarity via correlation coefficients between sessions.”

      To confirm that we were able to track neurons across sessions, we quantified waveform similarity (Line 353):

      “We analyzed 99 MSNs in sessions with saline, D2 blockade, and D1 blockade. We matched MSNs across sessions based on waveform and interspike intervals; waveforms were highly similar across sessions (correlation coefficient between matched MSN waveforms: saline vs D2 blockade r = 1.00 (0.99 – 1.00 rank sum vs correlations in unmatched waveforms p = 3x10-44; waveforms; saline vs D1 blockade r = 1.00 (1.00 – 1.00), rank sum vs correlations in unmatched waveforms p = 4x10-50). There were no consistent changes in MSN average firing rate with D2 blockade or D1 blockade (F = 1.1, p = 0.30 accounting for variance between MSNs; saline: 5.2 (3.3 – 8.6) Hz; D2 blockade 5.1 (2.7 – 8.0) Hz; F = 2.2, p = 0.14; D1 blockade 4.9 (2.4 – 7.8) Hz).”

      As noted above, this enabled us to compare activity for the same MSNs across sessions in a new Figure 6 (previously, this analysis had been in Figure S9), and used PCA to quantify this variability.

      By tracking neurons across saline, D2 blockade, and D1 blockade, readers can see all the variability in MSNs. We added these data to the results (Line 362):  

      “We noticed differences in MSN activity across the interval with D2 blockade and D1 blockade at the individual MSN level (Fig 6B-D) as well as at the population level (Fig 6E). We used PCA to quantify effects of D2 blockade or D1 blockade (Bruce et al., 2021; Emmons et al., 2017; Kim et al., 2017a). We constructed principal components (PC) from z-scored peri-event time histograms of firing rate from saline, D2 blockade, and D1 blockade sessions for all mice together. The first component (PC1), which explained 54% of neuronal variance, exhibited “timedependent ramping”, or monotonic changes over the 6 second interval immediately after trial start (Fig 6F-G; variance for PC1 p = 0.001 vs 46 (45-47)% variance in random data; Narayanan, 2016). Interestingly, PC1 scores shifted with D2 blockade (Fig 6F; PC1 scores for D2 blockade: -0.6 (-3.8 – 4.7) vs saline: -2.3 (-4.2 – 3.2), F = 5.1, p = 0.03 accounting for variance between MSNs; no reliable effect of sex (F = 0.2, p = 0.63) or switching direction (F = 2.8, p = 0.10)). PC1 scores also shifted with D1 blockade (Fig 6F; PC1 scores for D1 blockade: -0.0 (-3.9 – 4.5), F = 5.8, p = 0.02 accounting for variance between MSNs; no reliable effect of sex (F = 0.0, p = 0.93) or switching direction (F = 0.9, p = 0.34)). There were no reliable differences in PC1 scores between D2 and D1 blockade. Furthermore, PC1 was distinct even when sessions were sorted independently and assumed to be fully statistically independent (Figure S10; D2 blockade vs saline: F = 5.8, p = 0.02; D1 blockade vs saline: F = 4.9, p = 0.03; all analyses accounting for variance between mice). Higher components explained less variance and were not reliably different between saline and D2 blockade or D1 blockade. Taken together, this data-driven analysis shows that D2 and D1 blockade produced similar shifts in MSN population dynamics represented by PC1. When combined with the major contributions of D1/D2 MSNs to PC1 (Fig 3C) these findings indicate that pharmacological D2 blockade and D1 blockade disrupt ramping-related activity in the striatum.”

      Finally, we included the data in which sessions were sorted independently and assumed to be fully statistically independent in a new Figure S10.

      And in the results (Line 376): 

      “Furthermore, PC1 was distinct even when sessions were sorted independently and assumed to be fully statistically independent (Figure S10; D2 blockade vs saline: F = 5.8, p = 0.02; D1 blockade vs saline: F = 4.9, p = 0.03; all analyses accounting for variance between mice). Higher components explained less variance and were not reliably different between saline and D2 blockade or D1 blockade.”

      These changes strengthen the manuscript and better show the main effects and variability of the data. 

      Regarding the results in Figure 7: 

      I am overall a bit confused about what the authors are trying to claim here. In Figure 7, they present data suggesting that D1 or D2 blockade disrupts their ability to decode time in the interval of interest (0-6 seconds). However, in the final paragraph of the results, the authors seem to say that by using another technique, they didn't see any significant change in decoding accuracy after D1 or D2 blockade. What do the authors make of this? 

      This was very unclear. The second classifier was predicting response time, but it was confusing, and we removed it. 

      Impact: 

      The task and data presented by the authors are very intriguing, and there are many groups interested in how striatal activity contributes to the neural perception of time. The authors perform a wide variety of experiments and analysis to examine how DMS activity influences time perception during an interval-timing task, allowing for insight into this process. However, the significance of the key finding - that D2/D1 activity increases/ decreases with time - remains somewhat ambiguous to me. This arises from a lack of clarity regarding the initial hypothesis and the implications of this finding for advancing our understanding of striatal functions. 

      As noted above, we clarified our hypothesis and implications, and strengthened several aspects of the data as suggested by this reviewer.  

      Reviewer #2 (Public Review): 

      Summary: 

      In the present study, the authors investigated the neural coding mechanisms for D1- and D2expressing striatal direct and indirect pathway MSNs in interval timing by using multiple strategies. They concluded that D2-MSNs and D1-MSNs have opposing temporal dynamics yet disrupting either type produced similar effects on behavior, indicating the complementary roles of D1- and D2- MSNs in cognitive processing. However, the data was incomplete to fully support this major finding. One major reason is the heterogenetic responses within the D1-or D2MSN populations. In addition, there are additional concerns about the statistical methods used. For example, the majority of the statistical tests are based on the number of neurons, but not the number of mice. It appears that the statistical difference was due to the large sample size they used (n=32 D2-MSNs and n=41 D1-MSNs), but different neurons recorded in the same mouse cannot be treated as independent samples; they should use independent mouse-based statistical analysis. 

      Strengths: 

      The authors used multiple approaches including awake mice behavior training, optogeneticassistant cell-type specific recording, optogenetic or pharmacological manipulation, neural computation, and modeling to study neuronal coding for interval timing. 

      We appreciate the reviewer’s careful read recognizing the breadth of our approach.  

      Weaknesses: 

      (1) More detailed behavior results should be shown, including the rate of the success switches, and how long it takes to wait in the second nose poke to get a reward. For line 512 and the Figure 1 legend, the reviewer is not clear about the reward delivery. The methods appear to state that the mouse had to wait for 18s, then make nose pokes at the second port to get the reward. What happens if the mouse made the second nose poke before 18 seconds, but then exited? Would the mouse still get the reward at 18 seconds? Similarly, what happens if the mice made the third or more nosepokes within 18 seconds? It is important to clarify because, according to the method described, if the mice made a second nose poke before 18 seconds, this already counted as the mouse making the "switch." Lastly, what if the mice exited before 6s in the first nosepoke? 

      We completely agree. We have now completely revised Figure 1 to include many of these task details.

      We have clarified remaining details in the methods (Line 548):

      “Interval timing switch task. We used a mouse-optimized operant interval timing task described in detail previously (Balci et al., 2008; Bruce et al., 2021; Tosun et al., 2016; Weber et al., 2023). Briefly, mice were trained in sound-attenuating operant chambers, with two front nosepokes flanking either side of a food hopper on the front wall, and a third nosepoke located at the center of the back wall. The chamber was positioned below an 8-kHz, 72-dB speaker (Fig 1A; MedAssociates, St. Albans, VT). Mice were 85% food restricted and motivated with 20 mg sucrose pellets (BioServ, Flemington, NJ). Mice were initially trained to receive rewards during fixed ratio nosepoke response trials. Nosepoke entry and exit were captured by infrared beams. After shaping, mice were trained in the “switch” interval timing task. Mice self-initiated trials at the back nosepoke, after which tone and nosepoke lights were illuminated simultaneously. Cues were identical on all trial types and lasted the entire duration of the trial (6 or 18 seconds). On 50% of trials, mice were rewarded for a nosepoke after 6 seconds at the designated first ‘front’ nosepoke; these trials were not analyzed. On the remaining 50% of trials, mice were rewarded for nosepoking first at the ‘first’ nosepoke location and then switching to the ‘second’ nosepoke location; the reward was delivered for initial nosepokes at the second nosepoke location after 18 seconds when preceded by a nosepoke at the first nosepoke location.  Multiple nosepokes at each nosepokes were allowed. Early responses at the first or second nosepoke were not reinforced. Initial responses at the second nosepoke rather than the first nosepoke, alternating between nosepokes, going back to the first nosepoke after the second nosepoke were rare after initial training. Error trials included trials where animals responded only at the first or second nosepoke and were also not reinforced. We did not analyze error trials as they were often too few to analyze; these were analyzed at length in our prior work (Bruce et al., 2021).

      Switch response time was defined as the moment animals departed the first nosepoke before arriving at the second nosepoke. Critically, switch responses are a time-based decision guided by temporal control of action because mice switch nosepokes only if nosepokes at the first location did not receive a reward after 6 seconds. That is, mice estimate if more than 6 seconds have elapsed without receiving a reward to decide to switch responses. Mice learn this task quickly (3-4 weeks), and error trials in which an animal nosepokes in the wrong order or does not nosepoke are relatively rare and discarded. Consequently, we focused on these switch response times as the key metric for temporal control of action. Traversal time was defined as the duration between first nosepoke exit and second nosepoke entry and is distinct from switch response time when animals departed the first nosepoke. Nosepoke duration was defined as the time between first nosepoke entry and exit for the switch response times only. Trials were self-initiated, but there was an intertrial interval with a geometric mean of 30 seconds between trials.”

      And in the results on Line 131: 

      “We investigated cognitive processing in the striatum using a well-described mouseoptimized interval timing task which requires mice to respond by switching between two nosepokes after a 6-second interval (Fig 1A; see Methods; (Balci et al., 2008; Bruce et al., 2021; Larson et al., 2022; Tosun et al., 2016; Weber et al., 2023)). In this task, mice initiate trials by responding at a back nosepoke, which triggers auditory and visual cues for the duration of the trial. On 50% of trials, mice were rewarded for nosepoking after 6 seconds at the designated ‘first’ front nosepoke; these trials were not analyzed. On the remaining 50% of trials, mice were rewarded for nosepoking at the ‘first’ nosepoke and then switching to the ‘second’ nosepoke; initial nosepokes at the second nosepoke after 18 seconds triggered reward when preceded by a first nosepoke. The first nosepokes occurred before switching responses and the second nosepokes occurred much later in the interval in anticipation of reward delivery at 18 seconds (Fig 1B-D). During the task, movement velocity peaked before 6 seconds as mice traveled to the front nosepoke (Fig 1E).

      We focused on the switch response time, defined as the moment mice exited the first nosepoke before entering the second nosepoke. Switch responses are a timebased decision guided by temporal control of action because mice switch nosepokes only if nosepoking at the first nosepokes does not lead to a reward after 6 seconds (Fig 1B-E). Switch responses are guided by internal estimates of time because no external cue indicates when to switch from the first to the second nosepoke (Balci et al., 2008; Bruce et al., 2021; Tosun et al., 2016; Weber et al., 2023). We defined the first 6 seconds after trial start as the ‘interval’, because during this epoch mice are estimating whether 6 seconds have elapsed and if they need to switch responses. In 30 mice, switch response times were 9.3 seconds (8.4 – 9.7; median (IQR)); see Table 1 for a summary of mice, experiments, trials, and sessions). We studied dorsomedial striatal D2-MSNs and D1-MSNs using a combination of optogenetics and neuronal ensemble recordings in 9 transgenic mice (4 D2-Cre mice switch response time 9.7 (7.0 – 10.3) seconds; 5 D1-Cre mice switch response time 8.2 (7.7 – 8.7) seconds; rank sum p = 0.73; Table 1).”

      (2) There are a lot of time parameters in this behavior task, the description of those time parameters is mentioned in several parts, in the figure legend, supplementary figure legend, and methods, but was not defined clearly in the main text. It is inconvenient, sometimes, confusing for the readers. The authors should make a schematic diagram to illustrate the major parameters and describe them clearly in the main text. 

      We agree. We have clarified this in a new schematic, shading the interval in gray:   

      And in the results on line 131:

      “We focused on the switch response time, defined as the moment mice exited the first nosepoke before entering the second nosepoke. Switch responses are a time-based decision guided by temporal control of action because mice switch nosepokes only if nosepoking at the first nosepokes does not lead to a reward after 6 seconds (Fig 1BE). Switch responses are guided by internal estimates of time because no external cue indicates when to switch from the first to the second nosepoke (Balci et al., 2008; Bruce et al., 2021; Tosun et al., 2016; Weber et al., 2023). We defined the first 6 seconds after trial start as the ‘interval’, because during this epoch mice are estimating whether 6 seconds have elapsed and if they need to switch responses. In 30 mice, switch response times were 9.3 seconds (8.4 – 9.7; median (IQR)); see Table 1 for a summary of mice, experiments, trials, and sessions). We studied dorsomedial striatal D2-MSNs and D1-MSNs using a combination of optogenetics and neuronal ensemble recordings in 9 transgenic mice (4 D2-Cre mice switch response time 9.7

      (7.0 – 10.3) seconds; 5 D1-Cre mice switch response time 8.2 (7.7 – 8.7) seconds; rank sum p = 0.73; Table 1).”

      (3) In Line 508, the reviewer suggests the authors pay attention to those trials without "switch". It would be valuable to compare the MSN activity between those trials with or without a "switch". 

      This is a great suggestion. We analyzed such error trials and MSN activity in Figure 6 of Bruce et al., 2021. However, this manuscript was not designed to analyze errors, as they are rare beyond initial training (Bruce et al., 2021 focused on early training), and too inconsistent to permit robust analysis. This was added to the methods on Line 567:

      “Early responses at the first or second nosepoke were not reinforced. Initial responses at the second nosepoke rather than the first nosepoke, alternating between nosepokes, going back to the first nosepoke after the second nosepoke were rare after initial training. Error trials included trials where animals responded only at the first or second nosepoke and were also not reinforced. We did not analyze error trials as they were often too few to analyze; these were analyzed at length in our prior work (Bruce et al., 2021).”

      (4) The definition of interval is not very clear. It appears that the authors used a 6-second interval in analyzing the data in Figure 2 and Figure 3. But from my understanding, the interval should be the time from time "0" to the "switch", when the mice start to exit from the first nose poke. 

      We have now defined it explicitly in the schematic: 

      Incidentally, this reviewer asked us to analyze a longer epoch – this analysis beautifully justifies our focus on the first 6 seconds (now in Figure S2).

      We focus on the first six seconds as there are few nosepokes and switch responses during this epoch; however, we consider the reviewer’s definition and analyze the epoch the reviewer suggests from 0 to the switch in analyses below. 

      (5) For Figure 2 C-F, the authors only recorded 32 D2-MSNs in 4 mice, and 41 D1-MSNs in 5 mice. The sample size is too small compared to the sample size usually used in the field. In addition to the small sample size, the single-cell activity exhibited heterogeneity, which created potential issues. 

      We are glad the reviewer raised these points. First, our tagging dataset is relatively standard for optogenetic tagging. Second, we now include Cohen’s d for both PC and slope results for all optogenetic tagging analysis, which demonstrate that we have adequate statistical power and medium-to-large effect sizes (Line 186): 

      “In line with population averages from Fig 2G&H, D2-MSNs and D1-MSNs had opposite patterns of activity with negative PC1 scores for D2-MSNs and positive PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-2.8 – 4.9); F = 8.8, p = 0.004 accounting for variance between mice (Fig S3A); Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F = 0.44, p = 0.51) or switching direction (F = 1.73, p = 0.19)).”

      And Line 197:

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.47– 0.06; Fig 3D; F = 8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98; no reliable effect of sex (F = 0.02, p = 0.88) or switching direction (F = 1.72, p = 0.19)).”

      We added boxplots to Figure 3, which better highlight differences in these distributions.

      However, the reviewer’s point is well-taken, and we have added a caveat to the discussion exactly as the reviewer suggested (Line 496):

      “Second, although we had adequate statistical power and medium-to-large effect sizes, optogenetic tagging is low-yield, and it is possible that recording more of these neurons would afford greater opportunity to identify more robust results and alternative coding schemes, such as neuronal synchrony.”

      For both D1 and D2 MSNs, the authors tried to make conclusions on the "trend" of increasing in D2-MSNs and decreasing in D1-MSNs populations, respectively, during the interval. However, such a conclusion is not sufficiently supported by the data presented. It looks like the single-cell activity patterns can be separated into groups: one is a decreasing activity group, one is an increasing activity group and a small group for on and off response. Because of the small sample size, the author should pay attention to the variance across different mice (which needs to be clearly presented in the manuscript), instead of pooling data together and analyzing the mean activity. 

      We were not clear – we now do exactly as the reviewer suggested. We are not pooling any data – instead – as we state on line 620 - we are using linear-mixed effects models to account for mouse-specific and neuron-specific variance. This approach was developed with our statistics core for exactly the reasons the reviewer suggested (see letter). We state this explicitly in the methods (Line 704):

      “Statistics. All data and statistical approaches were reviewed by the Biostatistics,

      Epidemiology, and Research Design Core (BERD) at the Institute for Clinical and Translational Sciences (ICTS) at the University of Iowa. All code and data are made available at http://narayanan.lab.uiowa.edu/article/datasets. We used the median to measure central tendency and the interquartile range to measure spread. We used Wilcoxon nonparametric tests to compare behavior between experimental conditions and Cohen’s d to calculate effect size. Analyses of putative single-unit activity and basic physiological properties were carried out using custom routines for MATLAB.

      For all neuronal analyses, variability between animals was accounted for using generalized linear-mixed effects models and incorporating a random effect for each mouse into the model, which allows us to account for inherent between-mouse variability. We used fitglme in MATLAB and verified main effects using lmer in R. We accounted for variability between MSNs in pharmacological datasets in which we could match MSNs between saline, D2 blockade, and D1 blockade. P values < 0.05 were interpreted as significant.”

      We have now stated in the results that we are explicitly accounting for variance between mice (Line 186): 

      “In line with population averages from Fig 2G&H, D2-MSNs and D1-MSNs had opposite patterns of activity with negative PC1 scores for D2-MSNs and positive PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-2.8 – 4.9); F = 8.8, p = 0.004 accounting for variance between mice (Fig S3A); Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F = 0.44, p = 0.51) or switching direction (F = 1.73, p = 0.19)).”

      And on Line 197:

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.47– 0.06; Fig 3D; F = 8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98; no reliable effect of sex (F = 0.02, p = 0.88) or switching direction (F = 1.72, p = 0.19)).”

      All statistics in the manuscript now explicitly account for variance between mice. 

      This is the approach that was recommended by our the Biostatistics, Epidemiology, and

      Research Design Core (BERD) at the Institute for Clinical and Translational Sciences (ICTS) at the University of Iowa, who reviews all of our work.

      We note that these Cohen d values usually interpret as medium or large. 

      We performed statistical power calculations and include these to aid readers’ interpretation. These are all >0.8. 

      Finally, the reviewer uses the word ‘trend’. We define p values <0.05 as significant in the methods, and do not interpret trends (on line 717): 

      “P values < 0.05 were interpreted as significant.”

      And, we have now plotted values for each mouse in a new Figure S3.

      As noted in the figure legend, mouse-specific effects were analyzed using linear models that account for between-mouse variability, as discussed with our statisticians. However, the reviewer’s point is well taken, and we have added this idea to the discussion as suggested (Line 496):

      “Second, although we had adequate statistical power and medium-to-large effect sizes, optogenetic tagging is low-yield, and it is possible that recording more of these neurons would afford greater opportunity to identify more robust results and alternative coding schemes, such as neuronal synchrony.”

      (6) For Figure 2, from the activity in E and F, it seems that the activity already rose before the trial started, the authors should add some longer baseline data before time zero for clarification and comparison and show the timing of the actual start of the activity with the corresponding behavior. What behavior states are the mice in when initiating the activity? 

      This is a key point. First, we are not certain what state the animal is in until they initiate trials at the back nosepoke (“Start”). Therefore, we cannot analyze this epoch.  

      However, we can show neuronal activity during a longer epoch exactly as the reviewer suggested. Although there are modulations, the biggest difference between D2 and D1 MSNs is during the 0-6 second interval. This analysis supports our focus on the 0-6 second interval. We have included this as a new Figure S2.

      (7) The authors were focused on the "switch " behavior in the task, but they used an arbitrary 6s time window to analyze the activity, and tried to correlate the decreasing or increasing activities of MSNs to the neural coding for time. A better way to analyze is to sort the activity according to the "switch" time, from short to long intervals. This way, the authors could see and analyze whether the activity of D1 or D2 MSNs really codes for the different length of interval, instead of finding a correlation between average activity trends and the arbitrary 6s time window. 

      This is a great suggestion. We did exactly this and adjusted our linear models on a trialby-trial basis to account for time between the start of the interval and the switch. This is now added to the methods (line 656): 

      “We performed additional sensitivity analysis excluding outliers and measuring firing rate from the start of the interval to the time of the switch response on a trialby-trial level for each neuron.”

      And to the results (Line 201):

      “We found that D2-MSNs and D1-MSNs had a significantly different slope even when excluding outliers (4 outliers excluded outside of 95% confidence intervals; F=7.51, p=0.008 accounting for variance between mice) and when the interval was defined as the time between trial start and the switch response on a trial-by-trial basis for each neuron (F=4.3, p=0.04 accounting for variance between mice).”

      We now state our justification for focusing on the first 6 seconds of the interval (Line 134)

      “Switch responses are guided by internal estimates of time and temporal control of action because no external cue indicates when to switch from the first to the second nosepoke (Balci et al., 2008; Bruce et al., 2021; Tosun et al., 2016; Weber et al., 2023). We defined the first 6 seconds after trial start as the ‘interval’, because during this epoch mice are estimating whether 6 seconds have elapsed and if they need to switch responses.”

      As noted previously, epoch is now justified by Figure S2E.

      And we note that this focus minimizes motor confounds (Line 511):

      “Four lines of evidence argue that our findings cannot be directly explained by motor confounds: 1) D2-MSNs and D1-MSNs diverge between 0-6 seconds after trial start well before the first nosepoke (Fig S2), 2) our GLM accounted for nosepokes and nosepoke-related βs were similar between D2-MSNs and D1-MSNs, 3) optogenetic disruption of dorsomedial D2-MSNs and D1-MSNs did not change task-specific movements despite reliable changes in switch response time, and 4) ramping dynamics were quite distinct from movement dynamics. Furthermore, disrupting D2-MSNs and D1-MSNs did not change the number of rewards animals received, implying that these disruptions did not grossly affect motivation. Still, future work combining motion tracking with neuronal ensemble recording and optogenetics and including bisection tasks may further unravel timing vs. movement in MSN dynamics (Robbe, 2023).”

      We are glad the reviewer suggested this analysis as it strengthens our manuscript.  

      Reviewer #3 (Public Review): 

      Summary: 

      The cognitive striatum, also known as the dorsomedial striatum, receives input from brain regions involved in high-level cognition and plays a crucial role in processing cognitive information. However, despite its importance, the extent to which different projection pathways of the striatum contribute to this information processing remains unclear. In this paper, Bruce et al. conducted a study using a range of causal and correlational techniques to investigate how these pathways collectively contribute to interval timing in mice. Their results were consistent with previous research, showing that the direct and indirect striatal pathways perform opposing roles in processing elapsed time. Based on their findings, the authors proposed a revised computational model in which two separate accumulators track evidence for elapsed time in opposing directions. These results have significant implications for understanding the neural mechanisms underlying cognitive impairment in neurological and psychiatric disorders, as disruptions in the balance between direct and indirect pathway activity are commonly observed in such conditions. 

      Strengths: 

      The authors employed a well-established approach to study interval timing and employed optogenetic tagging to observe the behavior of specific cell types in the striatum. Additionally, the authors utilized two complementary techniques to assess the impact of manipulating the activity of these pathways on behavior. Finally, the authors utilized their experimental findings to enhance the theoretical comprehension of interval timing using a computational model. 

      We are grateful for the reviewer’s consideration of our work and for recognizing the strengths of our approach.  

      Weaknesses: 

      The behavioral task used in this study is best suited for investigating elapsed time perception, rather than interval timing. Timing bisection tasks are often employed to study interval timing in humans and animals.

      This is a key point, and the reviewer is correct. We use our task because of its’ translational validity; as far as we know, temporal bisection tasks have been used less often in human disease and in rodent models. We have included a new paragraph describing this in the discussion (Line 472):

      “Because interval timing is reliably disrupted in human diseases of the striatum such as Huntington’s disease, Parkinson’s disease, and schizophrenia (Hinton et al., 2007; Singh et al., 2021; Ward et al., 2011), these results have relevance to human disease. Our task version has been used extensively to study interval timing in mice and humans (Balci et al., 2008; Bruce et al., 2021; Stutt et al., 2024; Tosun et al., 2016; Weber et al., 2023). However, temporal bisection tasks, in which animals hold during a temporal cue and respond at different locations depending on cue length, have advantages in studying how animals time an interval because animals are not moving while estimating cue duration (Paton and Buonomano, 2018; Robbe, 2023; Soares et al., 2016). Our interval timing task version – in which mice switch between two response nosepokes to indicate their interval estimate has elapsed – has been used extensively in rodent models of neurodegenerative disease (Larson et al., 2022; Weber et al., 2024, 2023; Zhang et al., 2021), as well as in humans (Stutt et al., 2024). Furthermore, because many therapeutics targeting dopamine receptors are used clinically, these findings help describe how dopaminergic drugs might affect cognitive function and dysfunction. Future studies of D2-MSNs and D1-MSNs in temporal bisection and other timing tasks may further clarify the relative roles of D2- and D1-MSNs in interval timing and time estimation.”

      Furthermore, we have modified the use of the definition of interval timing in the abstract, introduction, and results to reflect the reviewers comment. For instance, in the abstract (Line 43):

      “We studied dorsomedial striatal cognitive processing during interval timing, an elementary cognitive task that requires mice to estimate intervals of several seconds and involves working memory for temporal rules as well as attention to the passage of time.”

      However, we think it is important to use the term ‘interval timing’ as it links to past work by our group and others.   

      The main results from unit recording (opposing slopes of D1/D2 cell firing rate, as shown in Figure 3D) appear to be very sensitive to a couple of outlier cells, and the predictive power of ensemble recording seems to be only slightly above chance levels. 

      This is a key point raised by other reviewers as well. We have now included measures of statistical power (as we interpret the reviewer’s comment of predictive power), effect size, and perform additional sensitivity analyses (Line 187): 

      “PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-4.9 – -2.8); F=8.8, p = 0.004 accounting for variance between mice (Fig S3A);  Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F=1.9, p=0.17) or switching direction (F=0.1, p=0.75)).”

      And on Line 197:

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.45– 0.06; Fig 3D; F=8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98).  We found that D2-MSNs and D1-MSNs had a significantly different slope even when excluding outliers (4 outliers excluded outside of 95% confidence intervals; F=7.51, p=0.008 accounting for variance between mice) and when the interval was defined as the time between trial start and the switch response on a trial-by-trial basis for each neuron (F=4.3, p=0.04 accounting for variance between mice).”

      These are medium-to-large Cohen’s d results, and we have adequate statistical power. These results are not easily explained by chance. 

      We also added boxplots, which highlight the differences in distribution.

      Finally, we note that our conclusions are drawn from many convergent analyses (on Line 216): 

      “Analyses of average activity, PC1, and trial-by-trial firing-rate slopes over the interval provide convergent evidence that D2-MSNs and D1-MSNs had distinct and opposing dynamics during interval timing.”

      In the optogenetic experiment, the laser was kept on for too long (18 seconds) at high power (12 mW). This has been shown to cause adverse effects on population activity (for example, through heating the tissue) that are not necessarily related to their function during the task epochs. 

      This is an important point. We are well aware of heating effects with optogenetics and other potential confounds. For the exact reasons noted by the reviewer, we had opsinnegative controls – where the laser was on for the exact same amount of time (18 seconds) and at the same power (12 mW)– in Figure S5. We have now better highlighted these controls in the methods (Line 598):

      “In animals injected with optogenetic viruses, optical inhibition was delivered via bilateral patch cables for the entire trial duration of 18 seconds via 589-nm laser light at 12 mW power on 50% of randomly assigned trials. We performed control experiments in mice without opsins using identical laser parameters in D2-cre or D1-cre mice (Fig S6).”

      And in results (Line 298):

      “Importantly, we found no reliable effects for D2-MSNs with opsin-negative controls (Fig S6).”

      And Line 306): 

      “As with D2-MSNs, we found no reliable effects with opsin-negative controls in D1MSNs (Fig S6).”

      We have highlighted these data in Figure S6: 

      Furthermore, the effect of optogenetic inhibition is similar to pharmacological effects in this manuscript and in our prior work (De Corte et al., 2019; Stutt et al., 2024) on line 459): 

      “Past pharmacological work from our group and others has shown that disrupting D2- or D1-MSNs slows timing (De Corte et al., 2019b; Drew et al., 2007, 2003; Stutt et al., 2024), in line with pharmacological and optogenetic results in this manuscript.”

      And in the discussion section on Line 488: 

      “Our approach has several limitations. First, systemic drug injections block D2- and D1-receptors in many different brain regions, including the frontal cortex, which is involved in interval timing (Kim et al., 2017a). D2 blockade or D1 blockade may have complex effects, including corticostriatal or network effects that contribute to changes in D2-MSN or D1-MSN ensemble activity. We note that optogenetic inhibition of D2-MSNs and D1-MSNs produces similar effects to pharmacology in Figure 5.”

      Given the systemic delivery of pharmacological interventions, it is difficult to conclude that the effects are specific to the dorsomedial striatum. Future studies should use the local infusion of drugs into the dorsomedial striatum. 

      This is a great point - we did this experiment in De Corte et al, 2019 with local drug infusions. This earlier study was the departure point for this experiment. We now point this out in the introduction (Line 92): 

      “Past work has shown that disrupting either D2-dopamine receptors (D2) or D1dopamine receptors (D1) powerfully impairs interval timing by increasing estimates of elapsed time (Drew et al., 2007; Meck, 2006). Similar behavioral effects were found with systemic (Stutt et al., 2024) or local dorsomedial striatal D2 or D1 disruption (De Corte et al., 2019a). These data lead to the hypothesis that D2 MSNs and D1 MSNs have similar patterns of ramping activity across a temporal interval.”

      However, the reviewer makes a great point - and we will develop this in our future work (Line 485): 

      “Future studies might extend our work combining local pharmacology with neuronal ensemble recording.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Just a few minor notes: 

      (1) Figures 2C and D should have error bars. 

      We agree.  We added error bars to these figures and other rasters as recommended.  

      (2) Figures 2G and H seem to be smoothed - how was this done? 

      We added these details.

      (3) It is unclear what the 'neural network machine learning classifier' mentioned in lines 193-199 adds if the data relevant to this analysis isn't presented. I would potentially include this. 

      We agree. This analysis was confusing and not relevant to our main points; consequently, we removed it.  

      Reviewer #2 (Recommendations For The Authors): 

      Major: 

      (1)  For Figure 2, the description of the main results in (C-F) in the main text is too brief and is not clear. 

      We have added to and clarified this text (Line 147)

      “Striatal neuronal populations are largely composed of MSNs expressing D2dopamine or D1-dopamine receptors. We optogenetically tagged D2-MSNs and D1MSNs by implanting optrodes in the dorsomedial striatum and conditionally expressing channelrhodopsin (ChR2; Fig S1) in 4 D2-Cre (2 female) and 5 D1-Cre transgenic mice (2 female). This approach expressed ChR2 in D2-MSNs or D1MSNs, respectively (Fig 2A-B; Kim et al., 2017a). We identified D2-MSNs or D1MSNs by their response to brief pulses of 473 nm light; neurons that fired within 5 milliseconds were considered optically tagged putative D2-MSNs (Fig S1B-C). We tagged 32 putative D2-MSNs and 41 putative D1-MSNs in a single recording session during interval timing. There were no consistent differences in overall firing rate between D2-MSNs and D1-MSNs (D2-MSNs: 3.4 (1.4 – 7.2) Hz; D1-MSNs 5.2 (3.1 – 8.6) Hz; F = 2.7, p = 0.11 accounting for variance between mice). Peri-event rasters and histograms from a tagged putative D2-MSN (Fig 2C) and from a tagged putative D1-MSN (Fig 2D) demonstrate prominent modulations for the first 6 seconds of the interval after trial start. Z-scores of average peri-event time histograms (PETHs) from 0 to 6 seconds after trial start for each putative D2-MSN are shown in Fig 2E and for each putative D1-MSN in Fig 2F. These PETHs revealed that for the 6-second interval immediately after trial start, many putative D2-MSN neurons appeared to ramp up while many putative D1-MSNs appeared to ramp down. For 32 putative D2-MSNs average PETH activity increased over the 6second interval immediately after trial start, whereas for 41 putative D1-MSNs, average PETH activity decreased. These differences resulted in distinct activity early in the interval (0-1 seconds; F = 6.0, p = 0.02 accounting for variance between mice), but not late in the interval (5-6 seconds; F = 1.9, p = 0.17 accounting for variance between mice) between D2-MSNs and D1-MSNs. Examination of a longer interval of 10 seconds before to 18 seconds after trial start revealed the greatest separation in D2-MSN and D1-MSN dynamics during the 6-second interval after trial start (Fig S2). Strikingly, these data suggest that D2-MSNs and D1-MSNs might display opposite dynamics during interval timing.”

      (2)  For Figure3 

      (A)  Is the PC1 calculated from all MSNs of all mice (4 D2, 5 D1 mice)? 

      We clarified this (Line 182):

      “We analyzed PCA calculated from all D2-MSNs and D1-MSNs PETHs over the 6second interval immediately after trial start.”

      And for pharmacology (Line 362): 

      “We noticed differences in MSN activity across the interval with D2 blockade and D1 blockade at the individual MSN level (Fig 6B-D) as well as at the population level (Fig 6E). We used PCA to quantify effects of D2 blockade or D1 blockade (Bruce et al., 2021; Emmons et al., 2017; Kim et al., 2017a). We constructed principal components (PC) from z-scored peri-event time histograms of firing rate from saline, D2 blockade, and D1 blockade sessions for all mice together.”

      (B)  The authors should perform PCA on single mouse data, and add the plot and error bar. 

      This is a great idea. We have now included this as a new Figure S3:   

      (C)  As mentioned before, both D2-or D1- MSNs can be divided into three groups, it is not appropriate to put them together as each MSN is not an independent variable, the authors should do the statistics based on the individual mouse, and do the parametric or non-parametric comparison, and plot N (number of mice) based error bars. 

      We have done exactly this using a linear mixed effects model, as recommend by our statistics core. They have explicitly suggested that this is the best approach to these data (see letter). We have also included measures of statistical power and effect size (Line 704):  

      “All data and statistical approaches were reviewed by the Biostatistics, Epidemiology, and Research Design Core (BERD) at the Institute for Clinical and Translational Sciences (ICTS) at the University of Iowa. All code and data are made available at http://narayanan.lab.uiowa.edu/article/datasets. We used the median to measure central tendency and the interquartile range to measure spread. We used Wilcoxon nonparametric tests to compare behavior between experimental conditions and Cohen’s d to calculate effect size. Analyses of putative single-unit activity and basic physiological properties were carried out using custom routines for MATLAB.

      For all neuronal analyses, variability between animals was accounted for using generalized linear-mixed effects models and incorporating a random effect for each mouse into the model, which allows to account for inherent between-mouse variability. We used fitglme in MATLAB and verified main effects using lmer in R. We accounted for variability between MSNs in pharmacological datasets in which we could match MSNs between saline, D2 blockade, and D1 blockade. P values < 0.05 were interpreted as significant.”

      We have now included measures of ‘power’ (which we interpret to be statistical), effect size, and perform additional sensitivity analyses (Line 187): 

      “PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-4.9 – -2.8); F=8.8, p = 0.004 accounting for variance between mice (Fig S3A); Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F=1.9, p=0.17) or switching direction (F=0.1, p=0.75)).”

      And Line 197:

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.45– 0.06; Fig 3D; F=8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98).  We found that D2-MSNs and D1-MSNs had a significantly different slope even when excluding outliers (4 outliers excluded outside of 95% confidence intervals; F=7.51, p=0.008 accounting for variance between mice) and when the interval was defined as the time between trial start and the switch response on a trial-by-trial bases for each neuron (F=4.3, p=0.04 accounting for variance between mice).”

      These are medium-to-large Cohen’s d results, and we have adequate statistical power. These results are not easily explained by chance. 

      We also added boxplots, which highlight the differences in distributions.

      (3) For results in Figure 5 and Figure S7, according to Figure 1 legend, lines 4 to 5, the response times were defined as the moment mice exit the first nose poke (on the left) to respond at the second nose poke; and according to method session (line 522), "switch" traversal time was defined as the duration between first nose poke exit and second nose poke entry. It seems that response time is the switch traversal time, they should be the same, but in Figures B and D, the response time showed a clear difference between the laser off and on groups, while in Figures S7 C, and G, there were no differences between laser off and on group for switch traversal time. Please reconcile these inconsistencies. 

      We were not clear. We now clarify – switch responses are the moment when mice depart the first nosepoke, whereas traversal time is the time between departing the first nosepoke and arriving at the second nosepoke. We have reworked our figures to make this clear.

      And in the methods (Line 570):

      “Switch response time was defined as the moment animals departed the first nosepoke before arriving at the second nosepoke. Critically, switch responses are a time-based decision guided by temporal control of action because mice switch nosepokes only if nosepokes at the first location did not receive a reward after 6 seconds. That is, mice estimate if more than 6 seconds have elapsed without receiving a reward to decide to switch responses. Mice learn this task quickly (3-4 weeks), and error trials in which an animal nosepokes in the wrong order or does not nosepoke are relatively rare and discarded. Consequently, we focused on these switch response times as the key metric for temporal control of action. Traversal time was defined as the duration between first nosepoke exit and second nosepoke entry and is distinct from switch response time when animals departed the first nosepoke. Nosepoke duration was defined as the time between first nosepoke entry and exit for the switch response times only. Trials were self-initiated, but there was an intertrial interval with a geometric mean of 30 seconds between trials.”

      And in Figure S8, we have added graphics and clarified the legend.

      (4) The first nose poke and second nose poke are very close, why did it take so long to move from the first nose poke to the second nose poke, even though the mouse already made the decision to switch? Please see Figure S1A, it took less than 6s from the back nose poke to the first nose poke, but it took more than 6s (up to 12s) from the first nose poke to the second nose poke, what were the mice's behavior during this period? 

      This is a key detail. There is no temporal urgency as only the initial nosepoke after 18 seconds leads to reward. In other words, making a second nosepoke prior to 18 seconds is not rewarded and, in well-trained animals, is wasted effort. We have added these details to the methods (Line 124):

      “On the remaining 50% of trials, mice were rewarded for nosepoking at the ‘first’ nosepoke and then switching to the ‘second’ nosepoke; initial nosepokes at the second nosepoke after 18 seconds triggered reward when preceded by a first nosepoke. The first nosepokes occurred before switching responses and the second nosepokes occurred much later in the interval in anticipation of reward delivery at 18 seconds (Fig 1B-D). During the task, movement velocity peaked before 6 seconds as mice traveled to the front nosepoke (Fig 1E).”

      And in Figure 1, as described in detail above. 

      (5) How many trials did mice perform in one day? How many recordings/day for how many days were performed? 

      These are key details that we have now added to Table 1.

      We have added the number of recording sessions to the methods (Line 603): 

      “For optogenetic tagging, putative D1- and D2-MSNs were optically identified via 473-nm photostimulation. Units with mean post-stimulation spike latencies of ≤5 milliseconds and a stimulated-to-unstimulated waveform correlation ratio of >0.9 were classified as putative D2-MSNs or D1-MSNs (Ryan et al., 2018; Shin et al., 2018). Only one recording session was performed for each animal per day, and one recording session was included from each animal.”

      And Line 606: 

      “Only one recording session was performed for each animal per day, and one recording session was included from saline, D2 blockade, and D1 blockade sessions.”

      (6) For results in Figure 5, the authors should analyze the speed for the laser on and off group, since the dorsomedial striatum was reported to be related to control of speed (Yttri, Eric A., and Joshua T. Dudman. "Opponent and bidirectional control of movement velocity in the basal ganglia." Nature 533.7603 (2016): 402-406.). 

      We have some initial DeepLabCut data and have included it in a new Figure 1E.

      B) DeepLabCut tracking of position during the interval timing revealed that mice moved quickly after trial start and then velocity was relatively constant throughout the trial

      We measure movement speed using nosepoke duration and traversal time, which can give some measure of movement velocity.

      In Yttri and Dudman, the mice are head-fixed and moving a joystick, whereas our mice are freely moving. However, we have now included the lack of motor control as a major limitation (Line 510): 

      “Finally, movement and motivation contribute to MSN dynamics (Robbe, 2023). Four lines of evidence argue that our findings cannot be directly explained by motor confounds: 1) D2-MSNs and D1-MSNs diverge between 0-6 seconds after trial start well before the first nosepoke (Fig S2), 2) our GLM accounted for nosepokes and nosepoke-related βs were similar between D2-MSNs and D1-MSNs, 3) optogenetic disruption of dorsomedial D2-MSNs and D1-MSNs did not change task-specific movements despite reliable changes in switch response time, and 4) ramping dynamics were quite distinct from movement dynamics. Furthermore, disrupting D2-MSNs and D1-MSNs did not change the number of rewards animals received, implying that these disruptions did not grossly affect motivation. Still, future work combining motion tracking with neuronal ensemble recording and optogenetics and including bisection tasks may further unravel timing vs. movement in MSN dynamics (Robbe, 2023).”

      (7)  Figure S3 (C, E, and F), statistics should be done based on N (number of mice), not on the number of recorded neurons.  

      We have removed this section, and all other statistics in the paper properly account for mouse-specific variance, as noted above.

      (8)  Figure S1 

      (A) Are these the results from all mice superposed together, or from one mouse on one given day? How many of the trials' data were superposed?

      We included these details in a new Figure 1.

      (B, C) How many trials were included? 

      (D) How many days did these data cover? 

      We have included a new Table 1 with these important details.

      We have noted that only 1 recording session / mouse was included in analysis (Line 606):

      “Only one recording session was performed for each animal per day, and one recording session was included from each animal.”

      And Line 614: 

      “Only one recording session was performed for each animal per day, and one recording session was included from saline, D2 blockade, and D1 blockade sessions.”

      (9) Figure S2 

      (A) Can the authors add coordinates of the brain according to the mouse brain atlas or, alternatively, show it using a coronal section? 

      Great idea – added to Figure S2 legend: 

      “Figure S1: A) Recording locations in the dorsomedial striatum (targeting AP +0.4, ML -1.4, DV -2.7). Electrode reconstructions for D2-Cre (red), D1-Cre (blue), and wild-type mice (green). Only the left striatum was implanted with electrodes in all animals.”

      We have also added it to Figure S5 legend: 

      “Figure S5: Fiber optic locations from A) an opsin-expressing mouse with mCherrytagged halorhodopsin and bilateral fiber optics, and B) across 10 D2-Cre mice (red) and 6 D1-cre mice (blue) with fiber optics (targeting AP +0.9, ML +/-1.3, DV –2.5).”

      (C) Why did the waveform of laser and no laser seem the same? 

      The optogenetically tagged spike waveforms are highly similar, indicating that optogenetically-triggered spikes are like other spikes. That is the main point – optogenetically stimulating the neuron does not change the waveform. We have added this detail to the legend of S1: 

      “Inset on bottom right – waveforms from laser trials (red) and trials without laser (blue).  Across 73 tagged neurons, waveform correlation coefficients for laser trials vs. trials without laser was r = 0.97 (0.92-0.99). These data demonstrate that optogenetically triggered spikes are similar to non-optogenetically triggered spikes.”

      (10)  Figure S7, what was the laser power used in this experiment? Have the authors tried different laser powers? 

      We have now clarified the laser power on line 598: 

      “In animals injected with optogenetic viruses, optical inhibition was delivered via bilateral patch cables for the entire trial duration of 18 seconds via 589-nm laser light at 12 mW power on 50% of randomly assigned trials.”

      And for Figure S6 (was S7 previously): 

      We did not try other laser powers; our parameters were chosen a priori based on our past work.  

      (11)  In Figure S9, what method was used to sort the neurons? 

      We now clarify in the methods (Line 617): 

      “Electrophysiology. Single-unit recordings were made using a multi-electrode recording system (Open Ephys, Atlanta, GA). After the experiments, Plexon Offline Sorter (Plexon, Dallas, TX), was used to remove artifacts. Principal component analysis (PCA) and waveform shape were used for spike sorting. Single units were defined as those 1) having a consistent waveform shape, 2) being a separable cluster in PCA space, and 3) having a consistent refractory period of at least 2 milliseconds in interspike interval histograms.  The same MSNs were sorted across saline, D2 blockade, and D1 blockade sessions by loading all sessions simultaneously in Offline Sorter and sorted using the preceding criteria. MSNs had to have consistent firing in all sessions to be included. Sorting integrity across sessions was quantified by comparing waveform similarity via R2 between sessions.”

      And in the results (Line 353):

      “We analyzed 99 MSNs in sessions with saline, D2 blockade, and D1 blockade. We matched MSNs across sessions based on waveform and interspike intervals; waveforms were highly similar across sessions (correlation coefficient between matched MSN waveforms: saline vs D2 blockade r = 1.00 (0.99 – 1.00 rank sum vs correlations in unmatched waveforms p = 3x10-44; waveforms; saline vs D1 blockade r = 1.00 (1.00 – 1.00), rank sum vs correlations in unmatched waveforms p = 4x10-50). There were no consistent changes in MSN average firing rate with D2 blockade or D1 blockade (F = 1.1, p = 0.30 accounting for variance between MSNs; saline: 5.2 (3.3 – 8.6) Hz; D2 blockade 5.1 (2.7 – 8.0) Hz; F = 2.2, p = 0.14; D1 blockade 4.9 (2.4 – 7.8) Hz).”

      (C-F) statistics should be done based on the number of mice, not on the number of recorded neurons. 

      We agree, all experiments are now quantified using linear mixed effects models which formally accounts for variance contributed across animals, as discussed at length earlier in the review and with statistical experts at the University of Iowa.

      (12) For results in Figure 6, did the authors do cell-type specific recording on D1 or D2 MSNs using optogenetic tagging? As the D1- or D2- MSNs account for ~50% of all MSNs, the inhibition of a considerable amount of neurons was not observed. The authors should discuss the relation between the results from optogenetic inhibition of D1- or D2- MSNs and pharmacological disruption of D1 or D2 dopamine receptors. 

      This is a great point. First, we did not combine cell-type specific recordings with tagging as it was difficult to get enough trials for analysis in a single session in the tagging experiments, and pharmacological interventions can further decrease performance.  However, we have made our results in Figure 6 much more focused.

      We have discussed the relationship between these data in the results (Line 380): 

      “This data-driven analysis shows that D2 and D1 blockade produced similar shifts in MSN population dynamics represented by PC1.  When combined with major contributions of D1/D2 MSNs to PC1 (Fig 3C) these findings show that pharmacologically disrupting D2 or D1 MSNs can disrupt ramping-related activity in the striatum.”

      And in the discussion (Line 417): 

      “Strikingly, optogenetic tagging showed that D2-MSNs and D1-MSNs had distinct dynamics during interval timing. MSN dynamics helped construct and constrain a four-parameter drift-diffusion model in which D2- and D1-MSN spiking accumulated temporal evidence. This model predicted that disrupting either D2MSNs or D1-MSNs would increase response times. Accordingly, we found that optogenetically or pharmacologically disrupting striatal D2-MSNs or D1-MSNs increased response times without affecting task-specific movements. Disrupting D2MSNs or D1-MSNs shifted MSN temporal dynamics and degraded MSN temporal encoding. These data, when combined with our model predictions, demonstrate that D2-MSNs and D1-MSNs contribute temporal evidence to controlling actions in time.”

      And: 

      “D2-MSNs and D1-MSNs play complementary roles in movement. For instance, stimulating D1-MSNs facilitates movement, whereas stimulating D2-MSNs impairs movement (Kravitz et al., 2010). Both populations have been shown to have complementary patterns of activity during movements (Tecuapetla et al., 2016), with MSNs firing at different phases of action initiation and selection. Further dissection of action selection programs reveals that opposing patterns of activation among D2MSNs and D1-MSNs suppress and guide actions, respectively, in the dorsolateral striatum (Cruz et al., 2022). A particular advantage of interval timing is that it captures a cognitive behavior within a single dimension — time. When projected along the temporal dimension, it was surprising that D2-MSNs and D1-MSNs had opposing patterns of activity. Past pharmacological work from our group and others have shown that disrupting D2 or D1 MSNs slows timing (De Corte et al., 2019; Drew et al., 2007, 2003; Stutt et al., 2023), in line with pharmacological and optogenetic results in this manuscript. Computational modeling predicted that disrupting either D2-MSNs or D1-MSNs increased self-reported estimates of time, which was supported by both optogenetic and pharmacological experiments. Notably, these disruptions are distinct from increased timing variability reported with administrations of amphetamine, ventral tegmental area dopamine neuron lesions, and rodent models of neurodegenerative disease (Balci et al., 2008; Gür et al., 2020, 2019; Larson et al., 2022; Weber et al., 2023). Furthermore, our current data demonstrate that disrupting either D2-MSN or D1-MSN activity shifted MSN dynamics and degraded temporal encoding, supporting prior work (De Corte et al., 2019; Drew et al., 2007, 2003; Stutt et al., 2023). Our recording experiments do not identify where a possible response threshold T is instantiated, but downstream basal ganglia structures may have a key role in setting response thresholds (Toda et al., 2017).”

      (13) For Figure 2, what is the error region for G and H? Is there a statistically significant difference between the start (e.g., 0-1 s) and the end (e.g., 5-6 s) time? 

      G and H are standard error, which we have now clarified.

      And on Line 166: 

      “These differences resulted in distinct activity early in the interval (0-1 seconds; F = 6.0, p = 0.02 accounting for variance between mice), but not late in the interval (5-6 seconds; F = 1.9, p = 0.17 accounting for variance between mice) between D2-MSNs and D1-MSNs.”

      Minor: 

      (1)  Figure 2 legend showed the wrong label "Peri-event raster C) from a D2-MSN (red) and E) from a D1-MSN (blue). It should be (D). 

      Fixed, thank you.  

      (2)  Figure 2. Missing legend for (E) and (F).  

      Fixed, thank you.  

      (3)  Line 423: mistyped "\" 

      Fixed, thank you.  

      Reviewer #3 (Recommendations For The Authors): 

      -  To clarify that complementary means opposing in this context, I suggest changing the title. 

      This is a helpful suggestion. We have changed it exactly as the reviewer suggested: 

      “Complementary opposing D2-MSNs and D1-MSNs dynamics during interval timing”

      -  I recommend adding a supplementary figure to demonstrate all the nose pokes in all trials in a given session. The current figures make it hard to assess the specifics of the behavior. For example, what happens if, in a long-interval trial, the mouse pokes in the second nose poke before 6 seconds? Is that behavior punished? Do they keep alternating between the nose poke or do they stick to one nose poke? 

      We agree. We think this is a main point, and we have now redesigned Figure 1 to describe these details: 

      And added these details to the methods (Line 548): 

      “Interval timing switch task. We used a mouse-optimized operant interval timing task described in detail previously (Balci et al., 2008; Bruce et al., 2021; Tosun et al., 2016; Weber et al., 2023). Briefly, mice were trained in sound-attenuating operant chambers, with two front nosepokes flanking either side of a food hopper on the front wall, and a third nosepoke located at the center of the back wall. The chamber was positioned below an 8-kHz, 72-dB speaker (Fig 1A; MedAssociates, St. Albans, VT). Mice were 85% food restricted and motivated with 20 mg sucrose pellets (BioServ, Flemington, NJ). Mice were initially trained to receive rewards during fixed ratio nosepoke response trials. Nosepoke entry and exit were captured by infrared beams. After shaping, mice were trained in the “switch” interval timing task. Mice self-initiated trials at the back nosepoke, after which tone and nosepoke lights were illuminated simultaneously. Cues were identical on all trial types and lasted the entire duration of the trial (6 or 18 seconds). On 50% of trials, mice were rewarded for a nosepoke after 6 seconds at the designated first ‘front’ nosepoke; these trials were not analyzed. On the remaining 50% of trials, mice were rewarded for nosepoking first at the ‘first’ nosepoke location and then switching to the ‘second’ nosepoke location; the reward was delivered for initial nosepokes at the second nosepoke location after 18 seconds when preceded by a nosepoke at the first nosepoke location.  Multiple nosepokes at each nosepokes were allowed. Early responses at the first or second nosepoke were not reinforced. Initial responses at the second nosepoke rather than the first nosepoke, alternating between nosepokes, going back to the first nosepoke after the second nosepoke were rare after initial training. Error trials included trials where animals responded only at the first or second nosepoke and were also not reinforced. We did not analyze error trials as they were often too few to analyze; these were analyzed at length in our prior work (Bruce et al., 2021).”

      -  Figures 2E and 2F suggest that some D1 cells ramp up during the first 6 seconds, while others ramp down. The same is more or less true for D2s. I wonder if the analysis will lose its significance if the two outlier D1s are excluded from Figure 3D. 

      This is a great idea suggested by multiple reviewers. We repeated this analysis with outliers removed. We used a data-driven approach to remove outliers (Line 656): 

      “We performed additional sensitivity analysis excluding outliers outside of 95% confidence intervals and measuring firing rate from the start of the interval to the time of the switch response on a trial-by-trial level for each neuron.”

      And described these data in the results (Line 201): 

      “We found that D2-MSNs and D1-MSNs had a significantly different slope even when excluding outliers (4 outliers excluded outside of 95% confidence intervals; F=7.51, p=0.008 accounting for variance between mice) and when the interval was defined as the time between trial start and the switch response on a trial-by-trial basis for each neuron (F=4.3, p=0.04 accounting for variance between mice).”

      Finally, we removed the outliers the reviewers alluded to – two D1 MSNs – and found similar results (F=6.59, p=0.01 for main effect of D2 vs. D1 MSNs controlling for between-mouse variability). We elected to include the more data driven approach based on 95% confidence intervals.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study examined the associations of a healthy lifestyle with comprehensive and organ-specific biological ages defined using common blood biomarkers and body measures. Its large sample size, longitudinal design, and robust statistical analysis provide solid support for the findings, which will be of interest to epidemiologists and clinicians.

      Thank you very much for your thoughtful review of our manuscript. Your valuable comments have greatly helped us improve our manuscript. We have carefully considered all the comments and suggestions made by the reviewers and have revised them to address each point. Below, we provide detailed responses to each of the reviewers' comments. Please note that the line numbers mentioned in the following responses correspond to the line numbers in the clean version of the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study was to examine the associations of a healthy lifestyle with comprehensive and organ-specific biological ages. It emphasized the importance of lifestyle factors in biological ages, which were defined using common blood biomarkers and body measures.

      Strengths:

      The data were from a large cohort study and defined comprehensive and six-specified biological ages.

      Weaknesses:

      (1) Since only 8.5% of participants from the CMEC (China Multi-Ethnic Cohort Study) were included in the study, has any section bias happened?

      Thank you for your valuable question. We understand the concern regarding the potential selection bias due to only 8.5% of participants being included in the study. The baseline survey of China Multi-Ethnic Cohort Study (CMEC) employed a rigorous multi-stage stratified cluster sampling method and the repeat survey reevaluated approximately 10% of baseline participants through community-based cluster random sampling. Therefore, the sample of the repeat survey is representative. The second reason for the loss of sample size was the availability of biomarkers for BA calculation. We have compared characteristic of the overall population, the population included in and excluded from this study. Most characteristics were similar, but participants included in this study showed better in some health-related variables, one potential reason is healthier individuals were more likely to complete the follow-up survey. In conclusion, we believe that the impact of selection bias is limited.

      Author response table 1.

      Baseline characteristics of participants included and not included in the study

      BA, biological age; BMI, body mass index; CVD, cardiovascular disease; HLI, healthy lifestyle indicator.

      1 Data are presented as median (25th, 75th percentile) for continuous variables and count (percentage) for categorical variables.

      2 For HLI, "healthy" corresponds to a score of 4-5.

      3 Information on each validated BA has been reported. BA acceleration is the difference between each BA and CA in the same survey.

      (2) The authors should specify the efficiency of FFQ. How can FFQ genuinely reflect the actual intake? Moreover, how was the aMED calculated?

      Thank you for the comments and questions. We appreciate the opportunity to clarify these aspects of our study. For the first question, we evaluated the FFQ's reproducibility and validity by conducting repeated FFQs and 24-hour dietary recalls at the baseline survey. Intraclass correlation coefficients (ICC) for reproducibility ranged from 0.15 for fresh vegetables to 0.67 for alcohol, while deattenuated Spearman rank correlations for validity ranged from 0.10 for soybean products to 0.66 for rice. More details are provided in our previous study (Lancet Reg Health West Pac, 2021). We have added the corresponding content in both the main text and the supplementary materials.

      Methods, Page 8, lines 145-146: “The FFQ's reproducibility and validity were evaluated by conducting repeated FFQs and 24-hour dietary recalls.”

      Supplementary methods, Dietary assessment: “We evaluated the FFQ's reproducibility and validity by conducting repeated FFQs and 24-hour dietary recalls. Intraclass correlation coefficients for reproducibility ranged from 0.15 for fresh vegetables to 0.67 for alcohol, while deattenuated Spearman rank correlations for validity ranged from 0.10 for soybean products to 0.66 for rice.”

      For the second question, we apologize for any confusion. To avoid taking up too much space in the main text, we decided not to include the detailed aMED calculation (as described in Circulation, 2009) there and instead placed it in the supplementary materials:

      “Our calculated aMED score incorporates eight components: vegetables, legumes, fruits, whole grains, fish, the ratio of monounsaturated fatty acids (MUFA) to saturated fatty acids (SFA), red and processed meats, and alcohol. Each component's consumption was divided into sex-specific quintiles. Scores ranging from 1 to 5 were assigned based on quintile rankings to each component, except for red and processed meats and alcohol, for which the scoring was inverted. The alcohol criteria for the aMED was defined as moderate consumption. Since the healthy lifestyle index (HLI) already contained a drinking component, we removed the drinking item in the aMED, which had a score range of 7-35 with a higher score reflecting better adherence to the overall Mediterranean dietary pattern. We defined individuals with aMED scores ≥ population median as healthy diets.”

      Reference:

      (1) Xiao X, Qin Z, Lv X, Dai Y, Ciren Z, Yangla Y, et al. Dietary patterns and cardiometabolic risks in diverse less-developed ethnic minority regions: results from the China Multi-Ethnic Cohort (CMEC) Study. Lancet Reg Health West Pac. 2021;15:100252. doi: 10.1016/j.lanwpc.2021.100252.

      (2) Fung TT, Rexrode KM, Mantzoros CS, Manson JE, Willett WC, Hu FB. Mediterranean diet and incidence of and mortality from coronary heart disease and stroke in women. Circulation. 2009;119(8):1093-100. doi: 10.1161/circulationaha.108.816736.

      (3) HLI (range) and HLI (category) should be clearly defined.

      Thank you for the comment. We have added the definition of HLI (range) and HLI (category) in the methods section:

      Methods P9 lines 165-170: “The HLI was calculated by directly adding up the five lifestyle scores, ranging from 0-5, with a higher score representing an overall healthier lifestyle, denoted as HLI (range) in the following text. We then transformed HLI into a dichotomous variable in this study, denoted as HLI (category), where a score of 4-5 for HLI was considered a healthy lifestyle, and a score of 0-3 was considered an unfavorable lifestyle that could be improved.”

      (4) The comprehensive rationale and each specific BA construction should be clearly defined and discussed. For example, can cardiopulmonary BA be reflected only by using cardiopulmonary status? I do not think so.

      Thank you for the opportunity to clarify. We constructed the comprehensive BA based on all the available biochemical data from the CMEC study, selecting aging-related markers (J Gerontol A Biol Sci Med Sci, 2021), and further construct organ-specific BAs based on these selected biomarkers. The KDM algorithm does not specify biomarker types but requires them to be correlated with chronological age (CA) (Ageing Dev, 2006). Existing studies typically construct BA based on available biomarker, we included 15 biomarkers in this study, which could be considered comprehensive and extensive compared to previous research (J Transl Med. 2023; J Am Heart Assoc. 2024; Nat Cardiovasc Res. 2024). For how the biomarkers for each organ-specific BAs were selected, we categorized biomarkers primarily based on their relevance to the structure and function of each organ system according to the classification in previous studies (Nat Med, 2023; Cell Rep, 2022). Since the biomarkers we used came from clinical-lab data sets, they were categorized based on the clinical interpretation of blood chemistry tests following the methods outlined in the two referenced papers (Nat Med, 2023; Cell Rep, 2022). We only used biomarkers directly related to each specific system to minimize overlap between the indicators used for different BAs, thereby preserving the distinctiveness of organ-specific BAs. We acknowledge the limitations of this approach that a few biomarkers may not fully capture the complete aging process of a system, and certain indicators may be missing due to data constraints. However, the multi-organ BAs we constructed are cost-effective, easy to implement, and have been validated, making them valuable despite the limitations.

      Reference:

      (1) Verschoor CP, Belsky DW, Ma J, Cohen AA, Griffith LE, Raina P. Comparing Biological Age Estimates Using Domain-Specific Measures From the Canadian Longitudinal Study on Aging. J Gerontol A Biol Sci Med Sci. 2021;76(2):187-94. doi: 10.1093/gerona/glaa151.

      (2) Klemera P, Doubal S. A new approach to the concept and computation of biological age. Mech Ageing Dev. 2006;127(3):240-8. doi: 10.1016/j.mad.2005.10.004

      (3) Zhang R, Wu M, Zhang W, Liu X, Pu J, Wei T, et al. Association between life's essential 8 and biological ageing among US adults. J Transl Med. 2023;21(1):622. doi: 10.1186/s12967-023-04495-8.

      (4) Forrester SN, Baek J, Hou L, Roger V, Kiefe CI. A Comparison of 5 Measures of Accelerated Biological Aging and Their Association With Incident Cardiovascular Disease: The CARDIA Study. J Am Heart Assoc. 2024;13(8):e032847. doi: 10.1161/jaha.123.032847.

      (5) Jiang M, Tian S, Liu S, Wang Y, Guo X, Huang T, Lin X, Belsky DW, Baccarelli AA, Gao X. Accelerated biological aging elevates the risk of cardiometabolic multimorbidity and mortality. Nat Cardiovasc Res. 2024;3(3):332-42. doi: 10.1038/s44161-024-00438-8.

      (6) Tian YE, Cropley V, Maier AB, Lautenschlager NT, Breakspear M, Zalesky A. Heterogeneous aging across multiple organ systems and prediction of chronic disease and mortality. Nat Med. 2023;29(5):1221-31. doi: 10.1038/s41591-023-02296-6.

      (7) Nie C, Li Y, Li R, Yan Y, Zhang D, Li T, et al. Distinct biological ages of organs and systems identified from a multi-omics study. Cell Rep. 2022;38(10):110459. doi: 10.1016/j.celrep.2022.110459.

      (5) The lifestyle index is defined based on an equal-weight approach, but this does not reflect reality and cannot fully answer the research questions it raises.

      Thank you very much for your valuable suggestion. We used equal weight healthy lifestyle index (HLI) partly to facilitate comparisons with other studies. The equal-weight approach to construct the HLI is commonly used in current research (Bmj, 2021; Diabetes Care. 2022; Arch Gerontol Geriatr. 2022). The equal-weight HLI can demonstrate the average benefit of adopting each additional healthy lifestyle and avoid assumptions about the relative importance of different behaviors, which may vary depending on the population. To further clarify the importance of each lifestyle factor, we conducted quantile G-computation analysis, which can reflect the weight differences between lifestyle factors (PLoS Med, 2020; Clin Epigenetics, 2022).

      Reference:

      (1) Zhang YB, Chen C, Pan XF, Guo J, Li Y, Franco OH, Liu G, Pan A. Associations of healthy lifestyle and socioeconomic status with mortality and incident cardiovascular disease: two prospective cohort studies. Bmj. 2021;373:n604. doi: 10.1136/bmj.n604.

      (2) Han H, Cao Y, Feng C, Zheng Y, Dhana K, Zhu S, Shang C, Yuan C, Zong G. Association of a Healthy Lifestyle With All-Cause and Cause-Specific Mortality Among Individuals With Type 2 Diabetes: A Prospective Study in UK Biobank. Diabetes Care. 2022;45(2):319-29. doi: 10.2337/dc21-1512.

      (3) Jin S, Li C, Cao X, Chen C, Ye Z, Liu Z. Association of lifestyle with mortality and the mediating role of aging among older adults in China. Arch Gerontol Geriatr. 2022;98:104559. doi: 10.1016/j.archger.2021.104559.

      (4) Chudasama YV, Khunti K, Gillies CL, Dhalwani NN, Davies MJ, Yates T, Zaccardi F. Healthy lifestyle and life expectancy in people with multimorbidity in the UK Biobank: A longitudinal cohort study. PLoS Med. 2020;17(9):e1003332. doi: 10.1371/journal.pmed.1003332.

      (5) Kim K, Zheng Y, Joyce BT, Jiang H, Greenland P, Jacobs DR, Jr., et al. Relative contributions of six lifestyle- and health-related exposures to epigenetic aging: the Coronary Artery Risk Development in Young Adults (CARDIA) Study. Clin Epigenetics. 2022;14(1):85. doi: 10.1186/s13148-022-01304-9.

      Reviewer #2 (Public Review):

      This interesting study focuses on the association between lifestyle factors and comprehensive and organ-specific biological aging in a multi-ethnic cohort from Southwest China. It stands out for its large sample size, longitudinal design, and robust statistical analysis.

      Some issues deserve clarification to enhance this paper:

      (1) How were the biochemical indicators for organ-specific biological ages chosen, and are these indicators appropriate? Additionally, a more detailed description of the multi-organ biological ages should be provided to help understand the distribution and characteristics of BAs.

      We thank you for raising this point. As explained in our response to the fourth question from the first reviewer, we constructed the comprehensive BA b ased on all the available biochemical data from the CMEC study, selecting aging-related markers (J Gerontol A Biol Sci Med Sci, 2021), and further construct organ-specific BAs based on these selected biomarkers. The KDM algorithm does not specify biomarker types but requires them to be correlated with chronological age (CA) (Ageing Dev, 2006). Existing studies typically construct BA based on available biomarker, we included 15 biomarkers in this study, which could be considered comprehensive and extensive compared to previous research (J Transl Med. 2023; J Am Heart Assoc. 2024; Nat Cardiovasc Res. 2024). For how   the biomarkers for each organ-specific BAs were selected, we categorized biomarkers primarily based on their relevance to the structure and function of each organ system according to the classification in previous studies (Nat Med, 2023; Cell Rep, 2022). Since the biomarkers we used came from clinical-lab data sets, they were categorized based on the clinical interpretation of blood chemistry tests (Nat Med, 2023). We only used biomarkers directly related to each specific system to minimize overlap between the indicators used for different BAs, thereby preserving the distinctiveness of organ-specific BAs.

      We have added a descriptive table for the comprehensive and organ systems BAs in the supplementary materials to provide a more detailed understanding of the distribution and characteristics of BAs:

      Author response table 2.

      Description of BA and BA acceleration1

      BA, biological age

      1 Data are presented as mean (standard deviation).

      (2) The authors categorized the HLI score into a dichotomous variable, which may cause a loss of information. How did the authors address this potential issue?

      Thank you for raising this concern. We categorized each lifestyle factor into a binary variable based on relevant guidelines and studies, which recommend assigning a score of 1 if the guideline or study recommendations are met (Bmj, 2021; J Am Heart Assoc, 2023). While dichotomization may lead to some loss of information, it allows for a clearer interpretation and comparison of adherence to ideal healthy lifestyle behaviors. Another advantage of this treatment is that it allows for easy comparison with other studies. We categorized the HLI score into a dichotomous variable to enhance the practical relevance of the results (J Gerontol A Biol Sci Med Sci, 2021). Additionally, we conducted analyses using the continuous HLI score to ensure that our findings were robust, and the results were consistent with those obtained using the dichotomous HLI.

      Reference:

      (1) Verschoor CP, Belsky DW, Ma J, Cohen AA, Griffith LE, Raina P. Comparing Biological Age Estimates Using Domain-Specific Measures From the Canadian Longitudinal Study on Aging. J Gerontol A Biol Sci Med Sci. 2021;76(2):187-94. doi: 10.1093/gerona/glaa151.

      (2) Klemera P, Doubal S. A new approach to the concept and computation of biological age. Mech Ageing Dev. 2006;127(3):240-8. doi: 10.1016/j.mad.2005.10.004

      (3) Zhang R, Wu M, Zhang W, Liu X, Pu J, Wei T, et al. Association between life's essential 8 and biological ageing among US adults. J Transl Med. 2023;21(1):622. doi: 10.1186/s12967-023-04495-8.

      (4) Forrester SN, Baek J, Hou L, Roger V, Kiefe CI. A Comparison of 5 Measures of Accelerated Biological Aging and Their Association With Incident Cardiovascular Disease: The CARDIA Study. J Am Heart Assoc. 2024;13(8):e032847. doi: 10.1161/jaha.123.032847.

      (5) Jiang M, Tian S, Liu S, Wang Y, Guo X, Huang T, Lin X, Belsky DW, Baccarelli AA, Gao X. Accelerated biological aging elevates the risk of cardiometabolic multimorbidity and mortality. Nat Cardiovasc Res. 2024;3(3):332-42. doi: 10.1038/s44161-024-00438-8.

      (6) Tian YE, Cropley V, Maier AB, Lautenschlager NT, Breakspear M, Zalesky A. Heterogeneous aging across multiple organ systems and prediction of chronic disease and mortality. Nat Med. 2023;29(5):1221-31. doi: 10.1038/s41591-023-02296-6.

      (7) Nie C, Li Y, Li R, Yan Y, Zhang D, Li T, et al. Distinct biological ages of organs and systems identified from a multi-omics study. Cell Rep. 2022;38(10):110459. doi: 10.1016/j.celrep.2022.110459.

      (3) Because lifestyle data are self-reported, they may suffer from recall bias. This issue needs to be addressed in the limitations section.

      Thank you for your valuable suggestion. We acknowledge that the use of self-reported lifestyle data in our study may introduce recall bias, potentially affecting the accuracy of the information collected. We have added the following statement to the limitations section of our manuscript:

      Discussion, Page 22, lines 463-464: “Fifth, assessment of lifestyle factors was based on self-reported data collected through questionnaires, which may be subject to recall bias.”

      (4) It should be clarified whether the adjusted CA is the baseline value of CA. Additionally, why did the authors choose models with additional adjustments for time-invariant variables as their primary analysis? This approach does not align with standard FEM analysis (Lines 261-263).

      Thank you for the opportunity to clarify. We have changed the sentence to “baseline CA”. For the second question, in a standard fixed effects model (FEM), only time-varying variables are typically included. However, to enhance the flexibility of our models and account for potential variations in the association of time-invariant variables with CA, as has been commonly done in previous studies, we additionally adjusted for time-invariant variables and the baseline value of CA (BMC Med Res Methodol, 2024; Am J Clin Nutr, 2020). Moreover, sensitivity analyses using the standard FEM were conducted in this study, and robust results were obtained.

      Reference:

      (1) Tang D, Hu Y, Zhang N, Xiao X, Zhao X. Change analysis for intermediate disease markers in nutritional epidemiology: a causal inference perspective. BMC Med Res Methodol. 2024;24(1):49. doi: 10.1186/s12874-024-02167-9.

      (2) Trichia E, Luben R, Khaw KT, Wareham NJ, Imamura F, Forouhi NG. The associations of longitudinal changes in consumption of total and types of dairy products and markers of metabolic risk and adiposity: findings from the European Investigation into Cancer and Nutrition (EPIC)-Norfolk study, United Kingdom. Am J Clin Nutr. 2020;111(5):1018-26. doi: 10.1093/ajcn/nqz335.

      (5) How is the relative contribution calculated in the QGC analysis? The relative contribution of some lifestyle factors is not shown in Figure 2 and the supplementary figures, such as Supplementary Figure 7. These omissions should be explained.

      Thanks for the questions. The QGC obtains causal relationships and estimates weights for each component, which has been widely used in epidemiological research. More details about QGC can be found in the supplementary methods. The reason some results are not displayed is that we assumed all healthy lifestyle changes would have a protective effect on BA acceleration. However, the effect size of some lifestyle factors did not align with this assumption and lacked statistical significance. Because positive and negative weights were calculated separately in QGC, with all positive weights summing to 1 and all negative weights summing to 1, these factors would have had large positive weights. To avoid potential misunderstandings, we chose not to include these results in the figures. We have added explanations to the figure legends where applicable:

      “The blue bars represent results that are statistically significant in the FEM analysis, while the gray bars represent results in the FEM analysis that were not found to be statistically significant and positive weights were not shown.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      To enhance this paper, some issues deserve clarification:

      (1) How were the biochemical indicators for organ-specific biological ages chosen, and are these indicators appropriate? Additionally, please provide a more detailed description of the multi-organ biological ages to help understand BAs' the distribution and characteristics.

      (2) The authors categorized the HLI score into a dichotomous variable, which may cause a loss of information. How did the authors address this potential issue?

      (3) Because lifestyle data are self-reported, they may suffer from recall bias. This issue needs to be addressed in the limitations section.

      (4) Lines 261-263: Please clarify if the adjusted CA is the baseline value of CA. Additionally, why did you choose models with additional adjustments for time-invariant variables as your primary analysis? This approach does not align with standard FEM analysis.

      (5) How is the relative contribution calculated in the QGC analysis? The relative contribution of some lifestyle factors is not shown in Figure 2 and the supplementary figures, such as Supplementary Figure 7. Please explain these omissions.

      The above five issues overlap with those raised by Reviewer #2 (Public Review). Please refer to the responses provided earlier.

      Minor revision:

      Line 50: The expression "which factors" should be changed to "which lifestyle factor."

      Thank you for the suggestion. As suggested, we have used “which lifestyle factor” instead.

      Lines 91-92: "Aging exhibits variations across and with individuals" appears to be a clerical error. According to the context, it should be "Aging exhibits variations across and within individuals."

      We thank the reviewer for the correction. We have updated the text to read:

      “Aging exhibits variations across and within individuals.”

      Line 154: The authors mentioned "Considering previous studies" but lacked references. Please add the appropriate citations.

      Thank you for pointing this out. We apologize for the oversight. We have now added the appropriate citations to support the statement "Considering previous studies" in the revised manuscript.

      Lines 170-171: "regular exercise ("12 times/week", "3-5 times/week," or "daily or almost every day")"; the first item in parentheses should be "1-2 times/week"? Please verify and correct if necessary. Additionally, check the entire text carefully to avoid confusion caused by clerical errors.

      Thank you for your careful review. We have changed the sentence to "1-2 times/week." We have thoroughly checked the entire manuscript to ensure that no other clerical errors remain.

      Clarifications for Table 1:

      i. The expression "HLI=0" is difficult to understand. Please provide a more straightforward explanation or rephrase it.

      Thank you for your feedback. We have removed the confusing expression and provided a clearer explanation in the table legend for better understanding:

      “For HLI (category), "healthy" corresponds to a score of 4-5, while "unfavorable" corresponds to a score of 0-3.”

      ii. The baseline age is presented as an integer, but the follow-up age is not. Please clarify this discrepancy.

      Thank you for pointing out this discrepancy. We calculated the precise chronological age based on based on participants' survey dates and birth dates for the biological age calculations. Initially, the table presented age as integers, but we have now updated it to show the precise ages.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Despite the strengths, multiple analytical decisions have to be explained, justified, or clarified. Also, there is scope to enhance the clarity and coherence of the writing - as it stands, readers will have to go back and forth to search for information. Last, it would be helpful to add line numbers in the manuscript during the revision, as this will help all reviewers to locate the parts we are talking about.

      We thank the reviewer’s suggestions have added the line numbers to the revised manuscript.

      (1) Introduction:

      The introduction is somewhat unmotivated, with key terms/concepts left unexplained until relatively late in the manuscript. One of the main focuses in this work is "hyperaltruistic", but how is this defined? It seems that the authors take the meaning of "willing to pay more to reduce other's pain than their own pain", but is this what the task is measuring? Did participants ever need to PAY something to reduce the other's pain? Note that some previous studies indeed allow participants to pay something to reduce other's pain. And what makes it "HYPER-altruistic" rather than simply "altruistic"?

      As the reviewer noted, we adopted a well-established experimental paradigm to study the context-dependent effect on hyper-altruism. Altruism refers to the fact that people take others’ welfare into account when making decisions that concern both parties. Research paradigms investigating altruistic behavior typically use a social decision task that requires participants to choose between options where their own financial interests are pitted against the welfare of others (FeldmanHall et al., 2015; Hu et al., 2021; Hutcherson et al., 2015; Teoh et al., 2020; Xiong et al., 2020). On the other hand, the hyperaltruistic tendency emphasizes subjects’ higher valuation to other’s pain than their own pain (Crockett et al., 2014, 2015, 2017; Volz et al., 2017). One example for the manifestation of hyperaltruism would be the following scenario: the subject is willing to forgo $2 to reduce others’ pain by 1 unit (social-decision task) and only willing to forgo $1 to reduce the same amount of his/her own pain (self-decision task) (Crockett et al., 2014). On the contrary, if the subjects are willing to forgo less money to reduce others’ suffering in the social decision task than in the self-decision task, then it can be claimed that no hyperaltruism is observed. Therefore, hyperaltruistic preference can only be measured by collecting subjects’ choices in both the self and social decision tasks and comparing the choices in both tasks.

      In our task, as in the studies before ours (Crockett et al., 2014, 2015, 2017; Volz et al., 2017), subjects in each trial were faced with two options with different levels of pain on others and monetary payoffs on themselves. Based on subjects’ choice data, we can infer how much subjects were willing to trade 1 unit of monetary payoff in exchange of reducing others’ pain through the regression analysis (see Figure 1 and methods for the experimental details). We have rewritten the introduction and methods sections to make this point clearer to the audience.  

      Plus, in the intro, the authors mentioned that the "boundary conditions" remain unexplored, but this idea is never touched again. What do boundary conditions mean here in this task? How do the results/data help with finding out the boundary conditions? Can this be discussed within wider literature in the Discussion section?

      Boundary conditions here specifically refer to the variables or decision contexts that determine whether hyperaltruistic behavior can be elicited. Individual personality trait, motivation and social relationship may all be boundary conditions affecting the emergence of hyperaltruistic behavior. In our task, we specifically focused on the valence of the decision context (gain vs. loss) since previous studies only tested the hyperaltruistic preference in the gain context and the introduction of the loss context might bias subjects’ hyperaltruistic behavior through implicit moral framing.

      We have explained the boundary conditions in the revised introduction (Lines 45 ~ 49).

      “However, moral norm is also context dependent: vandalism is clearly against social and moral norms yet vandalism for self-defense is more likely to be ethically and legally justified (the Doctrine of necessity). Therefore, a crucial step is to understand the boundary conditions for hyperaltruism.”

      Last, what motivated the authors to examine the decision context? It comes somewhat out of the blue that the opening paragraph states that "We set out to [...] decision context", but why? Are there other important factors? Why decision context is more important than studying those others?

      We thank the reviewer for the comment. The hyperaltruistic preference was originally demonstrated between conditions where subjects’ personal monetary gain was pitted against others’ pain (social-condition) or against subjects’ own suffering (self-condition) (Crockett et al., 2014). Follow up studies found that subjects also exhibited strong egoistic tendencies if instead subjects needed to harm themselves for other’s benefit in the social condition (by flipping the recipients of monetary gain and electric shocks) (Volz et al., 2017). However, these studies have primarily focused on the gain contexts, neglecting the fact that valence could also be an influential factor in biasing subjects’ behavior (difference between gain and loss processing in humans). It is likely that replacing monetary gains with losses in the money-pain trade-off task might bias subjects’ hyperaltruistic preference due to heightened vigilance or negative emotions in the face of potential loss (such as loss aversion) (Kahneman & Tversky, 1979; Liu et al., 2020; Pachur et al., 2018; Tom et al., 2007; Usher & McClelland, 2004; Yechiam & Hochman, 2013). Another possibility is that gain and loss contexts may elicit different subjective moral perceptions (or internal moral framings) in participants, affecting their hyperaltruistic preferences (Liu et al., 2017; Losecaat Vermeer et al., 2020; Markiewicz & Czupryna, 2018; Wu et al., 2018). In our manuscript, we did not strive to compare which factors might be more important in eliciting hyperaltruistic behavior, but rather to demonstrate the crucial role played by the decision context and to show that the internal moral framing could be the mediating factor in driving subjects’ hyperaltruistic behavior. In fact, we speculate that the egoistic tendencies found in the Volz et al. 2017 study was partly driven by the subjects’ failure to engage the proper internal moral framing in the social condition (harm for self, see Volz et al., 2017 for details).

      (2) Experimental Design:

      (2a) The experiment per se is largely solid, as it followed a previously well-established protocol. But I am curious about how the participants got instructed? Did the experimenter ever mention the word "help" or "harm" to the participants? It would be helpful to include the exact instructions in the SI.

      In the instructions, we avoided words such as “harm”, “help”, or other terms reminding subjects about the moral judgement of the decisions they were about to make. Instead, we presented the options in a neutral and descriptive manner, focusing only on the relevant components (shocks and money). The instructions for all four conditions are shown in supplementary Fig. 9.

      (2b) Relatedly, the experimental details were not quite comprehensive in the main text. Indeed, the Methods come after the main text, but to be able to guide readers to understand what was going on, it would be very helpful if the authors could include some necessary experimental details at the beginning of the Results section.

      We thank the reviewer’s suggestion. We have now provided a brief introduction of the experimental details in the revised results section (Lines 125 ~132).

      “Prior to the money-pain trade-off task, we individually calibrated each subject’s pain threshold using a standard procedure[4–6]. This allowed us to tailor a moderate electric stimulus that corresponded to each subject’s subjective pain intensity. Subjects then engaged in 240 decision trials (60 trials per condition), acting as the “decider” and trading off between monetary gains or losses for themselves and the pain experienced by either themselves or an anonymous “pain receiver” (gain-self, gain-other, loss-self and loss-other, see Supplementary Fig. 8 for the instructions and also see methods for details).”

      (3) Statistical Analysis<br /> (3a) One of the main analyses uses the harm aversion model (Eq1) and the results section keeps referring to one of the key parameters of it (ie, k). However, it is difficult to understand the text without going to the Methods section below. Hence it would be very helpful to repeat the equation also in the main text. A similar idea goes to the delta_m and delta_s terms - it will be very helpful to give a clear meaning of them, as nearly all analyses rely on knowing what they mean.

      We thank the reviewer’s suggestion. We have now added the equation of the harm aversion model and provided more detailed description to the equations in the main text (Lines 150 ~155).

      “We also modeled subjects’ choices using an influential model where subjects’ behavior could be characterized by the harm (electric shock) aversion parameter κ, reflecting the relative weights subjects assigned to ∆m and ∆s, the objective difference in money and shocks between the more and less painful options, respectively (∆V=(1-κ)∆m - κ∆s Eq.1, See Methods for details)[4–6]. Higher κ indicates that higher sensitivity is assigned to ∆s than ∆m and vice versa.”

      (3b) There is one additional parameter gamma (choice consistency) in the model. Did the authors also examine the task-related difference of gamma? This might be important as some studies have shown that the other-oriented choice consistency may differ in different prosocial contexts.

      To examine the task-related difference of choice consistency (γ), we compared the performance of 4 candidate models:

      Model 1 (M1): The choice consistency parameter γ remains constant across shock recipients (self vs. other) and decision contexts (gain vs. loss).

      Model 2 (M2): γ differs between the self- and other-recipient conditions, with γ<sub>self</sub> and γ<sub>other</sub> representing the choice consistency when pain is inflicted on him/her-self or the other-recipient.

      Model 3 (M3): γ differs between the gain and loss conditions, with γ<sub>gain</sub> and γ<sub>loss</sub> representing the choice consistencies in the gain and loss contexts, respectively.

      Model 4 (M4): γ varies across four conditions, with γ<sub>self-gain</sub>, γ<sub>other-gain</sub>, γ<sub>self-loss</sub> and γ<sub>other-loss</sub> capturing the choice consistency in each condition.

      Supplementary Fig. 10 shows, after fitting all the models to subjects’ choice behavioral data, model 1 (M1) performed the best among all the four candidate models in both studies (1 & 2) with the lowest Bayesian Information Criterion (BIC). Therefore, we conclude that factors such as the shock recipients (self vs. other) and decision contexts (gain vs. loss) did not significantly influence subjects’ choice consistency and report model results using the single choice consistency parameter.

      (3c) I am not fully convinced that the authors included two types of models: the harm aversion model and the logistic regression models. Indeed, the models look similar, and the authors have acknowledged that. But I wonder if there is a way to combine them? For example:

      Choice ~ delta_V * context * recipient (*Oxt_v._placebo)

      The calculation of delta_V follows Equation 1.

      Or the conceptual question is, if the authors were interested in the specific and independent contribution of dalta_m and dalta_s to behavior, as their logistic model did, why did the authors examine the harm aversion first, where a parameter k is controlling for the trade-off? One way to find it out is to properly run different models and run model comparisons. In the end, it would be beneficial to only focus on the "winning" model to draw inferences.

      The reviewer raised an excellent point here. According to the logistic regression model, we have:

      Where P is the probability of selecting the less harmful option. Similarly, if we combine Eq.1 (∆V=1-κ)∆m-κ∆s) and Eq.2 ) of the harm aversion model, we have:

      If we ignore the constant term β<sub>0</sub> from the logistic regression model, the harm aversion model is simply a reparameterization of the logistic regression model. The harm aversion model was implemented first to derive the harm aversion parameter (κ), which is an parameter in the range of [0 1] to quantify how subjects value the relative contribution of Δm and Δs between options in their decision processes. Since previous studies used the term κ<sub>other</sub>-κ<sub>self</sub> to define the magnitude of hyperaltruistic preference, we adopted similar approach to compare our results with previous research under the same theoretical framework. However, in order to investigate the independent contribution of Δm and Δs, we will have to take γ into account (we can see that the β<sub>∆m</sub> and β<sub>∆s</sub> in the logistic regression model are not necessarily correlated by nature; however, in the harm aversion model the coefficients (1-κ) and κ is always strictly negatively correlated (see Eq. 1). Only after multiplying γ, the correlation between γ(1-κ) and γκ will vary depending on the specific distribution of γ and κ). In summary, we followed the approach of previous research to estimate harm aversion parameter κ to compare our results with previous studies and to capture the relative influence between Δm and Δs. When we studied the contextual effects (gain vs. loss or placebo vs. control) on subjects’ behavior, we further investigated the contextual effect on how subjects evaluated Δm and Δs, respectively. The two models (logistic regression model and harm aversion model) in our study are mathematically the same and are not competitive candidate models. Instead, they represent different aspects from which our data can be examined.

      We also compared the harm aversion model with and without the constant term β<sub>0</sub> in the choice function. Adding a constant term β<sub>0</sub> the above Equation 2 becomes:

      As the following figure shows, the hyperaltruistic parameters (κ<sub>other</sub>-κ<sub>self</sub>) calculated from the harm aversion model with the constant term (panels A & B) have almost identical patterns as the model without the constant term (panels C & D, i.e. Figs. 2B & 4B in the original manuscript) in both studies.

      Author response image 1.

      Figs. 2B & 4B in the original manuscript) in both studies.

       

      (3d) The interpretation of the main OXT results needs to be more cautious. According to the operationalization, "hyperaltruistic" is the reduction of pain of others (higher % of choosing the less painful option) relative to the self. But relative to the placebo (as baseline), OXT did not increase the % of choosing the less painful option for others, rather, it decreased the % of choosing the less painful option for themselves. In other words, the degree of reducing other's pain is the same under OXT and placebo, but the degree of benefiting self-interest is reduced under OXT. I think this needs to be unpacked, and some of the wording needs to be changed. I am not very familiar with the OXT literature, but I believe it is very important to differentiate whether OXT is doing something on self-oriented actions vs other-oriented actions. Relatedly, for results such as that in Figure 5A, it would be helpful to not only look at the difference but also the actual magnitude of the sensitivity to the shocks, for self and others, under OXT and placebo.

      We thank the reviewer for this thoughtful comment. As the reviewer correctly pointed out, “hyperaltruism” can be defined as “higher % of choosing the less painful option to the others relative to the self”. Closer examination of the results showed that both the degrees of reducing other’s pain as well as reducing their own pain decreased under OXT (Figure 4A). More specifically, our results do not support the claim that “In other words, the degree of reducing others’ pain is the same under OXT and placebo, but the degree of benefiting self-interest is reduced under OXT.” Instead, the results show a significant reduction in the choice of less painful option under OXT treatment for both the self and other conditions (the interaction effect of OXT vs. placebo and self vs. other: F<sub>1.45</sub>= 16.812, P < 0.001, η<sup>2</sup> = 0.272, simple effect OXT vs. placebo in the self- condition: F<sub>1.45</sub>=59.332, P < 0.001, η<sup>2</sup> = 0.569, OXT vs. placebo in the other-condition: F<sub>1.45</sub>= 14.626, P < 0.001, η<sup>2</sup> = 0.245, repeated ANOVA, see Figure 4A).

      We also performed mixed-effect logistic regression analyses where subjects’ choices were regressed against  and  in different valences (gain vs. loss) and recipients (self vs. other) conditions in both studies 1 & 2 (Supplementary Figs. 1 & 6). As we replot supplementary Fig. 6 and panel B (included as Supplementary Fig. 8 in the supplementary materials) in the above figure, we found a significant treatment × ∆<sub>s</sub> (differences in shock magnitude between the more and less painful options) interaction effect β=0.136±0.029P < =0.001, 95% CI=[-0.192, -0.079]), indicating that subject’s sensitivities towards pain were indeed different between the placebo and OXT treatments for both self and other conditions. Furthermore, the significant four-way ∆<sub>s</sub> × treatment (OXT vs. Placebo) × context (gain vs. loss) × recipient (self vs. other) interaction effect (β=0.125±0.053, P=0.018 95% CI=[0.022, 0.228]) in the regression analysis, followed by significant simple effects (In the OXT treatment: ∆<sub>s</sub> × recipient effect in the gain context: F<sub>1.45</sub>= 7.622, P < 0.008, η<sup>2</sup> = 0.145; ∆<sub>s</sub> × recipient effect in the loss context: F<sub>1.45</sub>= 7.966, P 0.007, η<sup>2</sup> = 0.150, suggested that under OXT treatment, participants showed a greater sensitivity toward ∆<sub>s</sub> (see asterisks in the OXT condition in panel B) in the other condition than the self-condition, thus restoring the hyperaltruistic behavior in loss context.

      As the reviewer suggested, OXT’s effect on hyperaltruism does manifest separately on subjects’ harm sensitivities on self- and other-oriented actions. We followed the reviewer’s suggestions and examined the actual magnitude of the sensitivities to shocks for both the self and other treatments (panel B in the figure above). It’s clear that the administration of OXT (compared to the Placebo treatment, panel B in the figure above) significantly reduced participants’ pain sensitivity (treatment × ∆<sub>s</sub>: β=-0.136±0.029, P < 0.001, 95% CI=[-0.192,-0.079]), yet also restored the harm sensitivity patterns in both the gain and loss conditions. These results are included in the supplementary figures (6 & 8) as well as in the main texts.

      Recommendations:

      (1) For Figures 2A-B, it would be great to calculate the correlation separately for gain and loss, as in other figures.

      We speculate that the reviewer is referring to Figures 3A & B. Sorry that we did not present the correlations separately for the gain and loss contexts because the correlation between an individual’s IH (instrumental harm), IB (impartial beneficence) and hyperaltruistic preferences was not significantly modulated by the contextual factors. The interaction effects in both Figs. 3A & B and Supplementary Fig.5 (also see Table S1& S2) are as following: Study1 valence × IH effect: β=0.016±0.022, t<sub>152</sub>=0.726, P=0.469; valence × IB effect: β=0.004±0.031, t<sub>152</sub>=0.115, P=0.908; Study2 placebo condition: valence × IH effect: β=0.018±0.024, t<sub>84</sub>=0.030 P=0.463; valence × IB effect: β=0.051±0.030, t<sub>84</sub>=1.711, P=0.702. We have added these statistics to the main text following the reviewer’s suggestions.

      (2) "by randomly drawing a shock increment integer ∆s (from 1 to 19) such that [...] did not exceed 20 (𝑆+ {less than or equal to} 20)." I am not sure if a random drawing following a uniform distribution can guarantee S is smaller than 20. More details are needed. Same for the monetary magnitude.

      We are sorry for the lack of clarity in the method description. As for the task design, we followed adopted the original design from previous literature (Crockett et al., 2014, 2017). More specifically:

      “Specifically, each trial was determined by a combination of the differences of shocks (Δs, ranging from 1 to 19, with increment of 1) and money (Δm, ranging from ¥0.2 to ¥19.8, with increment of ¥0.2) between the two options, resulting in a total of 19×99=1881 pairs of [Δs, Δm]. for each trial. To ensure the trials were suitable for most subjects, we evenly distributed the desired ratio Δm / (Δs + Δm) between 0.01 and 0.99 across 60 trials for each condition. For each trial, we selected the closest [Δs, Δm] pair from the [Δs, Δm] pool to the specific Δm / (Δs + Δm) ratio, which was then used to determine the actual money and shock amounts of two options. The shock amount (S<sub>less</sub>) for the less painful option was an integer drawn from the discrete uniform distribution [1-19], constraint by S<sub>less</sub> + ∆s < 20. Similarly, the money amount (M<sub>less</sub>) for the less painful option was drawn from a discrete uniform distribution [¥0.2 - ¥19.8], with the constraint of M<sub>less</sub> + ∆m < 20. Once the S<sub>less</sub>and M<sub>less</sub> were selected, the shock (S<sub>more</sub>) and money (M<sub>more</sub>) magnitudes for the more painful option were calculated as: S<sub>more</sub> = S<sub>less</sub> + ∆s, M<sub>more</sub> = M<sub>less</sub> + ∆m”  

      We have added these details to the methods section (Lines 520-533).

      Reviewer #2:

      (1) The theoretical hypothesis needs to be better justified. There are studies addressing the neurobiological mechanism of hyperaltruistic tendency, which the authors unfortunately skipped entirely.

      Also in recommendation #1:

      (1) In the Introduction, the authors claim that "the mechanistic account of the hyperaltruistic phenomenon remains unknown". I think this is too broad of a criticism and does not do justice to prior work that does provide some mechanistic account of this phenomenon. In particular, I was surprised that the authors did not mention at all a relevant fMRI study that investigates the neural mechanism underlying hyperaltruistic tendency (Crockett et al., 2017, Nature Neuroscience). There, the researchers found that individual differences in hyperaltruistic tendency in the same type of moral decision-making task is better explained by reduced neural responses to ill-gotten money (Δm in the Other condition) in the brain reward system, rather than heightened neural responses to others' harm. Moreover, such neural response pattern is related to how an immoral choice would be judged (i.e., blamed) by the community. Since the brain reward system is consistently involved in Oxytocin's role in social cognition and decision-making (e.g., Dolen & Malenka, 2014, Biological Psychiatry), it is important to discuss the hypothesis and results of the present research in the context of this literature.

      We totally agree with the reviewer that the expression “mechanistic account of the hyperaltruistic phenomenon remains unknown” in our original manuscript can be misleading to the audience. Indeed, we were aware of the major findings in the field and cited all the seminal work of hyperaltruism and its related neural mechanism (Crockett et al., 2014, 2015, 2017). We have changed the texts in the introduction to better reflect this point and added further discussion as to how oxytocin might play a role:

      “For example, it was shown that the hyperaltruistic preference modulated neural representations of the profit gained from harming others via the functional connectivity between the lateral prefrontal cortex, a brain area involved in moral norm violation, and profit sensitive brain regions such as the dorsal striatum6.” (Lines 41~45)

      “Oxytocin has been shown to play a critical role in social interactions such as maternal attachment, pair bonding, consociate attachment and aggression in a variety of animal models[42,43]. Humans are endowed with higher cognitive and affective capacities and exhibit far more complex social cognitive patterns[44]. ” (Lines 86~90)

      (2) There are some important inconsistencies between the preregistration and the actual data collection/analysis, which the authors did not justify.

      Also in recommendations:

      (4) It is laudable that the authors pre-registered the procedure and key analysis of the Oxytocin study and determined the sample size beforehand. However, in the preregistration, the authors claimed that they would recruit 30 participants for Experiment 1 and 60 for Experiment 2, without justification. In the paper, they described a "prior power analysis", which deviated from their preregistration. It is OK to deviate from preregistration, but this needs to be explicitly mentioned and addressed (why the deviation occurred, why the reported approach was justifiable, etc.).

      We sincerely appreciate the reviewer’s thorough assessment of our manuscript. In the more exploratory study 1, we found that the loss decision context effectively diminished subjects’ hyperaltruistic preference. Based on this finding, we pre-registered study 2 and hypothesized that: 1) The administration of OXT may salvage subject’s hyperaltruistic preference in the loss context; 2) The administration of OXT may reduce subjects’ sensitivities towards electric shocks (but not necessarily their moral preference), due to the well-established results relating OXT to enhanced empathy for others (Barchi-Ferreira & Osório, 2021; Radke et al., 2013) and the processing of negative stimuli(Evans et al., 2010; Kirsch et al., 2005; Wu et al., 2020); and 3) The OXT effect might be context specific, depending on the particular combination of valence (gain vs. loss) and shock recipient (self vs. other) (Abu-Akel et al., 2015; Kapetaniou et al., 2021; Ma et al., 2015).

      As our results suggested, the administration of OXT indeed restored subjects’ hyperaltruistic preference (confirming hypothesis 1, Figure 4A). Also, OXT decreased subjects’ sensitivities towards electric shocks in both the gain and loss conditions (supplementary Fig. 6 and supplementary Fig. 8), consistent with our second hypothesis. We must admit that our hypothesis 3 was rather vague, since a seminal study clearly demonstrated the context-dependent effect of OXT in human cooperation and conflict depending on the group membership of the subjects (De Dreu et al., 2010, 2020). Although our results partially validated our hypothesis 3 (supplementary Fig. 6), we did not make specific predictions as to the direction and the magnitude of the OXT effect.

      The main inconsistency is related to the sample size. When we carried out study 1, we recruited both male and female subjects. After we identified the context effect on the hyperaltruistic preference, we decided to pre-register and perform study 2 (the OXT study). We originally made a rough estimate of 60 male subjects for study 2. While conducting study 2, we also went through the literature of OXT effect on social behavior and realized that the actual subject number around 45 might be enough to detect the main effect of OXT. Therefore, we settled on the number of 46 (study 2) reported in the manuscript. Correspondingly, we increased the subject number in study 1 to the final number of 80 (40 males) to make sure the subject number is enough to detect a small-to-medium effect, as well as to have a fair comparison between study 1 and 2 (roughly equal number of male subjects). It should be noted that although we only reported all the subjects (male & female) results of study 1 in the manuscript, the main results remain very similar if we only focus on the results of male subjects in study 1 (see the figure below). We believe that these results, together with the placebo treatment group results in study 2 (male only), confirmed the validity of our original finding.

      Author response image 2.

      Author response image 3.

      We have included additional texts (Lines 447 ~ 452) in the Methods section for the discrepancy between the preregistered and actual sample sizes in the revised manuscript:

      “It should be noted that in preregistration we originally planned to recruit 60 male subjects for Study 2 but ended up recruiting 46 male subjects (mean age =  years) based on the sample size reported in previous oxytocin studies[57,69]. Additionally, a power analysis suggested that the sample size > 44 should be enough to detect a small to median effect size of oxytocin (Cohen’s d=0.24, α=0.05, β=0.8) using a 2 × 2 × 2 within-subject design[76].”

      (3) Some of the exploratory analysis seems underpowered (e.g., large multiple regression models with only about 40 participants).

      We thank the reviewer’s comments and appreciate the concern that the sample size would be an issue affecting the results reliability in multiple regression analysis.

      In Fig. 2, the multiple regression analyses were conducted after we observed a valence-dependent effect on hyperaltruism (Fig. 2A) and the regression was constructed accordingly:

      Choice ~ ∆s *context*recipient + ∆m *context*recipient+(1+ ∆s *context*recipient + ∆s*context*recipient | subject)

      Where ∆s and ∆m indicate the shock level and monetary reward difference between the more and loss painful options, context as the monetary valence (gain vs. loss) and recipient as the identity of the shock recipient (self vs. other).

      Since we have 240 trials for each subject and a total of 80 subjects in Study 1, we believe that this is a reasonable regression analysis to perform.

      In Fig. 3, the multiple regression analyses were indeed exploratory. More specifically, we ran 3 multiple linear regressions:

      hyperaltruism~EC*context+IH*context+IB*context

      Relative harm sensitivity~ EC*context+IH*context+IB*context

      Relative money sensitivity~ EC*context+IH*context+IB*context

      Where Hyperaltruism is defined as κ<sub>other</sub> - κ<sub>self</sub>, Relative harm sensitivity as otherβ<sub>∆s</sub> - selfβ<sub>∆s</sub> and Relative monetary sensitivity as otherβ<sub>∆m</sub> - selfβ<sub>∆m</sub>. EC (empathic concern), IH (instrumental harm) and IB (impartial beneficence) were subjects’ scores from corresponding questionnaires.

      For the first regression, we tested whether EC, IH and IB scores were related to hyperaltruism and it should be noted that this was tested on 80 subjects (Study 1). After we identified the effect of IH on hyperaltruism, we ran the following two regressions. The reason we still included IB and EC as predictors in these two regression analyses was to remove potential confounds caused by EC and IB since previous research indicated that IB, IH and EC could be correlated (Kahane et al., 2018).

      In study 2, we performed the following regression analyses again to validate our results (Placebo treatment in study 2 should have similar results as found in study 1).

      Relative harm sensitivity~ EC*context+IH*context+IB*context

      Relative money sensitivity~ EC*context+IH*context+IB*context

      Again, we added IB and EC only to control for the nuance effects by the covariates. As indicated in Fig. 5 C-D, the placebo condition in study 2 replicated our previous findings in study 1 and OXT administration effectively removed the interaction effect between IH and valence (gain vs. loss) on subjects’ relative harm sensitivity.

      To more objectively present our data and results, we have changed the texts in the results section and pointed out that the regression analysis:

      hyperaltruism~EC*context+IH*context+IB*context

      was exploratory (Lines 186-192).

      “We tested how hyperaltruism was related to both IH and IB across decision contexts using an exploratory multiple regression analysis. Moral preference, defined as κ<sub>other</sub> - κ<sub>self</sub>, was negatively associated with IH (β=-0.031±0.011, t<sub>156</sub>=-2.784, P =0.006) but not with IB (β=0.008±0.016, t<sub>156</sub>=0.475, P=0.636) across gain and loss contexts, reflecting a general connection between moral preference and IH (Fig. 3A & B).”

      (4) Inaccurate conceptualization of utilitarian psychology and the questionnaire used to measure it.

      Also in recommendations:

      (2) Throughout the paper, the authors placed lots of weight on individual differences in utilitarian psychology and the Oxford Utilitarianism Scale (OUS). I am not sure this is the best individual difference measure in this context. I don't see a conceptual fit between the psychological construct that OUS reflects, and the key psychological processes underlying the behaviors in the present study. As far as I understand it, the conceptual core of utilitarian psychology that OUS captures is the maximization of greater goods. Neither the Instrumental Harm (IH) component nor the Impartial Beneficence (IB) component reflects a tradeoff between the personal interests of the decision-making agent and a moral principle. The IH component is about the endorsement of harming a smaller number of individuals for the benefit of a larger number of individuals. The IB component is about treating self, close others, and distant others equally. However, the behavioral task used in this study is neither about distributing harm between a smaller number of others and a larger number of others nor about benefiting close or distant others. The fact that IH showed some statistical association with the behavioral tendency in the present data set could be due to the conceptual overlap between IH and an individual's tendency to inflict harm (e.g., psychopathy; Table 7 in Kahane et al., 2018, which the authors cited). I urge the authors to justify more why they believe that conceptually OUS is an appropriate individual difference measure in the present study, and if so, interpret their results in a clearer and justifiable manner (taking into account the potential confound of harm tendency/psychopathy).

      We thank the reviewer for the thoughtful comment and agree that “IH component is about the endorsement of harming a smaller number of individuals for the benefit of a larger number of individuals. The IB component is about treating self, close others, and distant others equally”. As we mentioned in the previous response to the reviewer, we first ran an exploratory multiple linear regression analysis of hyperaltruistic preference (κ<sub>other</sub> - κ<sub>self</sub>) against IB and IH in study 1 based on the hypothesis that the reduction of hyperaltruistic preference in the loss condition might be due to 1) subjects’ altered altitudes between IB and hyperaltruistic preference between the gain and loss conditions, and/or 2) the loss condition changed how the moral norm was perceived and therefore affected the correlation between IH and hyperaltruistic preference. As Fig. 3 shows, we did not find a significant IB effect on hyperaltruistic preference (κ<sub>other</sub> - κ<sub>self</sub>), nor on the relative harm or money sensitivity (supplementary Fig. 3). These results excluded the possibility that subjects with higher IB might treat self and others more equally and therefore show less hyperaltruistic preference. On the other hand, we found a strong correlation between hyperaltruistic preference and IH (Fig. 3A): subjects with higher IH scores showed less hyperaltruistic preference. Since the hyperaltruistic preference (κ<sub>other</sub> - κ<sub>self</sub>) is a compound variable and we further broke it down to subjects’ relative sensitivity to harm and money (other β<sub>∆s</sub> - self β<sub>∆s</sub> and other β<sub>∆m</sub> - self β<sub>∆m</sub>, respectively). The follow up regression analyses revealed that the correlation between subjects’ relative harm sensitivity and IH was altered by the decision contexts (gain vs. loss, Fig. 3C-D). These results are consistent with our hypothesis that for subjects to engage in the utilitarian calculation, they should first realize that there is a moral dilemma (harming others to make monetary gain in the gain condition). When there is less perceived moral conflict (due to the framing of decision context as avoiding loss in the loss condition), the correlation between subjects’ relative harm sensitivity and IH became insignificant (Fig. 3C). It is worth noting that these results were further replicated in the placebo condition of study 2, further indicating the role of OXT is to affect how the decision context is morally framed.

      The reviewer also raised an interesting possibility that the correlation between subject’s behavioral tendency and IH may be confounded by the fact that IH is also correlated with other traits such as psychopathy. Indeed, in the Kahane et al., 2018 paper, the authors showed that IH was associated with subclinical psychopathy in a lay population. Although we only collected and included IB and Empathic concern (EC) scores as control variables and in principle could not rule out the influence of psychopathy, we argue it is unlikely the case. First, psychopaths by definition “only care about their own good” (Kahane et al., 2018). However, subjects in our studies, as well as in previous research, showed greater aversion to harming others (compared to harming themselves) in the gain conditions. This is opposite to the prediction of psychopathy. Even in the loss condition, subjects showed similar levels of aversion to harming others (vs. harming themselves), indicating that our subjects valuated their own and others’ well-being similarly. Second, although there appears to be an association between utilitarian judgement and psychopathy(Glenn et al., 2010; Kahane et al., 2015), the fact that people also possess a form of universal or impartial beneficence in their utilitarian judgements suggest psychopathy alone is not a sufficient variable explaining subjects’ hyperaltruistic behavior.

      We have thus rewritten part of the results to clarify our rationale for using the Oxford Utilitarianism Scale (especially the IH and IB) to establish the relationship between moral traits and subjects’ decision preference (Lines 212-215):

      “Furthermore, our results are consistent with the claim that profiting from inflicting pains on another person (IH) is inherently deemed immoral1. Hyperaltruistic preference, therefore, is likely to be associated with subjects’ IH dispositions.”

      (3) Relatedly, in the Discussion, the authors mentioned "the money-pain trade-off task, similar to the well-known trolley dilemma". I am not sure if this statement is factually accurate because the "well-known trolley dilemma" is about a disinterested third-party weighing between two moral requirements - "greatest good for the greatest number" (utilitarianism) and "do no harm" (Kantian/deontology), not between a moral requirement and one's own monetary interest (which is the focus of the present study). The analogy would be more appropriate if the task required the participants to trade off between, for example, harming one person in exchange for a charitable donation, as a recent study employed (Siegel et al., 2022, A computational account of how individuals resolve the dilemma of dirty money. Scientific reports). I urge the authors to go through their use of "utilitarian/utilitarianism” in the paper and make sure their usage aligns with the definition of the concept and the philosophical implications.

      We thank the reviewer for prompting us to think over the difference between our task and the trolley dilemma. Indeed, the trolley dilemma refers to a disinterested third-party’s decision between two moral requirements, namely, the utilitarianism and deontology. In our study, when the shock recipient was “other”, our task could be interpreted as either the decision between “moral norm of no harm (deontology) and one’s self-interest maximization (utilitarian)”, or a decision between “greatest good for both parties (utilitarian) vs. do no harm (deontology)”, though the latter interpretation typically requires differential weighing of own benefits versus the benefits of others(Fehr & Schmidt, 1999; Saez et al., 2015). In fact, it could be argued that the utilitarianism account applies not only to the third party’s well-being, but also to our own well-being, or to “that of those near or dear to us” (Kahane et al., 2018).

      We acknowledge that there may lack a direct analogy between our task and the trolley dilemma and therefore have deleted the trolley example in the discussion.

      (5) Related to the above point, the sample size of Study 2 was calculated based on the main effect of oxytocin. However, the authors also reported several regression models that seem to me more like exploratory analyses. Their sample size may not be sufficient for these analyses. The authors should: a) explicitly distinguish between their hypothesis-driven analysis and exploratory analysis; b) report achieved power of their analysis.

      We appreciate the reviewer’s thorough reading of our manuscript. Following the reviewer’s suggestions, we have explicitly stated in the revised manuscript which analyses were exploratory, and which were hypothesis driven. Following the reviewer’s request, we added the achieved power into the main texts (Lines 274-279):

      “The effect size (Cohen’s f<sup>2</sup>) for this exploratory analysis was calculated to be 0.491 and 0.379 for the placebo and oxytocin conditions, respectively. The post hoc power analysis with a significance level of α = 0.05, 7 regressors (IH, IB, EC, decision context, IH×context, IB×context, and EC×context), and sample size of N = 46 yielded achieved power of 0.910 (placebo treatment) and 0.808 (oxytocin treatment).”

      (6) Do the authors collect reaction times (RT) information? Did the decision context and oxytocin modulate RT? Based on their procedure, it seems that the authors adopted a speeded response task, therefore the RT may reflect some psychological processes independent of choice. It is also possible (and recommended) that the authors use the drift-diffusion model to quantify latent psychological processes underlying moral decision-making. It would be interesting to see if their manipulations have any impact on those latent psychological processes, in addition to explicit choice, which is the endpoint product of the latent psychological processes. There are some examples of applying DDM to this task, which the authors could refer to if they decide to go down this route (Yu et al, 2021, How peer influence shapes value computation in moral decision-making. Cognition.)

      We did collect the RT information for this experiment. As demonstrated in the figure below, participants exhibited significantly longer RT in the loss context compared to the gain context (Study1: the main effect of decision context: F<sub>1,79</sub>=20.043, P < 0.001, η<sup>2</sup> =0.202; Study2-placebo: F<sub>1.45</sub>=17.177, P < 0.001, η<sup>2</sup> =0.276). In addition to this effect of context, decisions were significantly slower in the other-condition compared to the self-condition

      (Study1: the main effect of recipient: F<sub>1,79</sub>=4.352, P < 0.040, η<sup>2</sup> =0.052; Study2-placebo: F<sub>1,45</sub>=5.601, P < 0.022, η<sup>2</sup> =0.111) which replicates previous research findings (Crockett et al., 2014). However, the differences in response time between recipients was not modulated by decision context (Study1: context × recipient interaction: F<sub>1,79</sub>=1.538, P < 0.219, η<sup>2</sup> =0.019; Study2-placebo: F<sub>1,45</sub>=2.631, P < 0.112, η<sup>2</sup> =0.055). Additionally, the results in the oxytocin study (study 2) revealed no evidence supporting any effect of oxytocin on reaction time. Neither the main effect (treatment: placebo vs. oxytocin) nor the interaction effect of oxytocin on response time was statistically significant (main effect of OXT treatment: F<sub>1,45</sub>=2.380, P < 0.230, η<sup>2</sup> =0.050; treatment × context: F<sub>1,45</sub>=2.075, P < 0.157η<sup>2</sup> =0.044; treatment × recipient: F<sub>1,45</sub>=0.266, P < 0.609, η<sup>2</sup> =0.006; treatment × context × recipient: F<sub>1,45</sub>=2.909, P < 0.095, η<sup>2</sup> =0.061).;

      Author response image 4.

      We also agree that it would be interesting to also investigate how the OXT might impact the dynamics of the decision process using a drift-diffusion model (DDM). However, we have already showed in the original manuscript that the OXT increased subjects’ relative harm sensitivities. If a canonical DDM is adopted here, then such an OXT effect is more likely to correspond to the increased drift rate for the relative harm sensitivity, which we feel still aligns with the current framework in general. In future studies, including further manipulations such as time pressure might be a more comprehensive approach to investigate the effect of OXT on DDM related decision variables such as attribute drift rate, initial bias, decision threshold and attribute synchrony.

      (7) This is just a personal preference, but I would avoid metaphoric language in a scientific paper (e.g., rescue, salvage, obliterate). Plain, neutral English terms can express the same meaning clearly (e.g., restore, vanish, eliminate).

      Again, we thank the reviewer for the suggestion and have since modified the terms.

      Reviewer #3:

      The primary weakness of the paper concerns its framing. Although it purports to be measuring "hyper-altruism" it does not provide evidence to support why any of the behavior being measured is extreme enough to warrant the modifier "hyper" (and indeed throughout I believe the writing tends toward hyperbole, using, e.g., verbs like "obliterate" rather than "reduce"). More seriously, I do not believe that the task constitutes altruism, but rather the decision to engage, or not engage, in instrumental aggression.

      We agree with the reviewer (and reviewer # 2) that plain and clear English should be used to describe our results and have since modified those terms.

      However, the term “hyperaltruism”, which is the main theme of our study, was originally proposed by a seminal paper (Crockett et al., 2014) and has since been widely adopted in related studies (Crockett et al., 2014, 2015, 2017; Volz et al., 2017; Zhan et al., 2020). The term “hyperaltruism” was introduced to emphasize the difference from altruism (Chen et al., 2024; FeldmanHall et al., 2015; Hu et al., 2021; Hutcherson et al., 2015; Lockwood et al., 2017; Xiong et al., 2020). Hyperaltruism does not indicate extreme altruism. Instead, it simply reflects the fact that “we are more willing to sacrifice gains to spare others from harm than to spare ourselves from harm” (Volz et al., 2017). In other words, altruism refers to people’s unselfish regard for or devotion to the welfare of others, and hyperaltruism concerns subject’s own cost-benefit preference as the reference point and highlights the “additional” altruistic preference when considering other’s welfare. For example, in the altruistic experimental design, altruism is characterized by the degree to which subjects take other people’s welfare into account (left panel). However, in a typical hyperaltruism task design (right panel), hyperaltruistic preference is operationally defined as the difference (κ<sub>other</sub> - κ<sub>self</sub>) between the degrees to which subjects value others’ harm (κ<sub>other</sub>) and their own harm (κ<sub>self</sub>).

      Author response image 5.

      I found it surprising that a paradigm that entails deciding to hurt or not hurt someone else for personal benefit (whether acquiring a financial gain or avoiding a loss) would be described as measuring "altruism." Deciding to hurt someone for personal benefit is the definition of instrumental aggression. I did not see that in any of the studies was there a possibility of acting to benefit the other participant in any condition. Altruism is not equivalent to refraining from engaging in instrumental aggression. True altruism would be to accept shocks to the self for the other's benefit (e.g., money).  The interpretation of this task as assessing instrumental aggression is supported by the fact that only the Instrumental Harm subscale of the OUS was associated with outcomes in the task, but not the Impartial Benevolence subscale. By contrast, the IB subscale is the one more consistently associated with altruism (e.g,. Kahane et al 2018; Amormino at al, 2022) I believe it is important for scientific accuracy for the paper, including the title, to be re-written to reflect what it is testing.

      Again, as we mentioned in the previous response, hyperaltruism is a term coined almost a decade ago and has since been widely adopted in the research field. We are afraid that switching such a term would be more likely to cause confusion (instead of clarity) among audience.

      Also, from the utilitarian perspective, the gain or loss (or harm) occurred to someone else is aligned on the same dimension and there is no discontinuity between gains and losses. Therefore, taking actions to avoid someone else’s loss can also be viewed as altruistic behavior, similar to choices increasing other’s welfare (Liu et al., 2020).

      Relatedly: in the introduction I believe it would be important to discuss the non-symmetry of moral obligations related to help/harm--we have obligations not to harm strangers but no obligation to help strangers. This is another reason I do not think the term "hyper altruism" is a good description for this task--given it is typically viewed as morally obligatory not to harm strangers, choosing not to harm them is not "hyper" altruistic (and again, I do not view it as obviously altruism at all).

      We agree with the reviewer’s point that we have the moral obligations not to harm others but no obligation to help strangers (Liu et al., 2020). In fact, this is exactly what we argued in our manuscript: by switching the decision context from gains to losses, subjects were less likely to perceive the decisions as “harming others”. Furthermore, after the administration of OXT, making decisions in both the gain and loss contexts were more perceived by subjects as harming others (Fig. 6A).

      The framing of the role of OT also felt incomplete. In introducing the potential relevance of OT to behavior in this task, it is important to pull in evidence from non-human animals on origins of OT as a hormone selected for its role in maternal care and defense (including defensive aggression). The non-human animal literature regarding the effects of OT is on the whole much more robust and definitive than the human literature. The evidence is abundant that OT motivates the defensive care of offspring of all kinds. My read of the present OT findings is that they increase participants' willingness to refrain from shocking strangers even when incurring a loss (that is, in a context where the participant is weighing harm to themselves versus harm to the other). It will be important to explain why OT would be relevant to refraining from instrumental aggression, again, drawing on the non-human animal literature.

      We thank the reviewer’s comments and agree that the current understanding of the link between our results of OT with animal literature can be at best described as vague and intriguing. Current literature on OT in animal research suggests that the nucleus accumbens (NAc) oxytocin might play the critical role in social cognition and reinforcing social interactions (Dölen et al., 2013; Dölen & Malenka, 2014; Insel, 2010). Though much insight has already been gained from animal studies, in humans, social interactions can take a variety of different forms, and the consociate recognition can also be rather dynamic. For example, male human participants with self-administered OT showed higher trust and cooperation towards in-group members but more defensive aggression towards out-group members (De Dreu et al., 2010). In another human study, participants administered with OT showed more coordinated out-group attack behavior, suggesting that OT might increase in-group efficiency at the cost of harming out-group members (Zhang et al., 2019). It is worth pointing out that in both experiments, the participant’s group membership was artificially assigned, thus highlighting the context-dependent nature of OT effect in humans.

      In our experiment, more complex and higher-level social cognitive processes such as moral framing and moral perception are involved, and OT seems to play an important role in affecting these processes. Therefore, we admit that this study, like the ones mentioned above, is rather hard to find non-human animal counterpart, unfortunately. Instead of relating OT to instrumental aggression, we aimed to provide a parsimonious framework to explain why the “hyperaltruism” disappeared in the loss condition, and, with the OT administration, reappeared in both the gain and loss conditions while also considering the effects of other relevant variables.  

      We concur with the reviewer’s comments about the importance of animal research and have since added the following paragraph into the revised manuscript (Line 86~90) as well as in the discussion:

      “Oxytocin has been shown to play a critical role in social interactions such as maternal attachment, pair bonding, consociate attachment and aggression in a variety of animal models[42,43]. Humans are endowed with higher cognitive and affective capacities and exhibit far more complex social cognitive patterns[44].”

      Another important limitation is the use of only male participants in Study 2. This was not an essential exclusion. It should be clear throughout sections of the manuscript that this study's effects can be generalized only to male participants.

      We thank the reviewer’s comments. Prior research has shown sex differences in oxytocin’s effects (Fischer-Shofty et al., 2013; Hoge et al., 2014; Lynn et al., 2014; Ma et al., 2016; MacDonald, 2013). Furthermore, with the potential confounds of OT effect due to the menstrual cycles and potential pregnancy in female subjects, most human OT studies have only recruited male subjects (Berends et al., 2019; De Dreu et al., 2010; Fischer-Shofty et al., 2010; Ma et al., 2016; Zhang et al., 2019). We have modified our manuscript to emphasize that study 2 only recruited male subjects.

      Recommendations:

      I believe the authors have provided an interesting and valuable dataset related to the willingness to engage in instrumental aggression - this is not the authors' aim, although also an important aim. Future researchers aiming to build on this paper would benefit from it being framed more accurately.

      Thus, I believe the paper must be reframed to accurately describe the nature of the task as assessing instrumental aggression. This is also an important goal, as well-designed laboratory models of instrumental aggression are somewhat lacking.

      Please see our response above that to have better connections with previous research, we believe that the term hyperaltruism might align better with the main theme for this study.

      The research literature on other aggression tasks should also be brought in, as I believe these are more relevant to the present study than research studies on altruism that are primarily donation-type tasks. It should be added to the limitations of how different aggression in a laboratory task such as this one is from real-world immoral forms of aggression. Arguably, aggression in a laboratory task in which all participants are taking part voluntarily under a defined set of rules, and in which aggression constrained by rules is mutual, is similar to aggression in sports, which is not considered immoral. Whether responses in this task would generalize to immoral forms of aggression cannot be determined without linking responses in the task to some real-world outcome.

      We agree with the reviewer that “aggression in a lab task …. is similar to aggression in sports”. Our starting point was to investigate the boundary conditions for the hyperaltruism (though we don’t deny that there is an aggression component in hyperaltruism, given the experiment design we used). In other words, the dependent variable we were interested in was the difference between “other” and “self” aggression, not the aggression itself. Our results showed that by switching the decision context from the monetary gain environment to the loss condition, human participants were willing to bear similar amounts of monetary loss to spare others and themselves from harm. That is, hyperaltruism disappeared in the loss condition. We interpreted this result as the loss condition prompted subjects to adopt a different moral framework (help vs. harm, Fig. 6A) and subjects were less influenced by their instrumental harm personality trait due to the change of moral framework (Fig. 3C). In the following study (study 2), we further tested this hypothesis and verified that the administration of OT indeed increased subjects’ perception of the task as harming others for both gain and loss conditions (Fig. 6A), and such moral perception mediated the relationship between subject’s personality traits (instrumental harm) and their relative harm sensitivities (the difference of aggression between the other- and self-conditions). We believe the moral perception framework and that OT directly modulates moral perception better account for subjects’ context-dependent choices than hypothesizing OT’s context-dependent modulation effects on aggression.

      The language should also be toned down--the use of phrases like "hyper altruism" (without independent evidence to support that designation) and "obliterate" rather than "reduce" or "eliminate" are overly hyperbolic.

      We have changed terms such as “obliterate” and “eliminate” to plain English, as the reviewer suggested.

      Reference

      Abu-Akel, A., Palgi, S., Klein, E., Decety, J., & Shamay-Tsoory, S. (2015). Oxytocin increases empathy to pain when adopting the other- but not the self-perspective. Social Neuroscience, 10(1), 7–15.

      Barchi-Ferreira, A., & Osório, F. (2021). Associations between oxytocin and empathy in humans: A systematic literature review. Psychoneuroendocrinology, 129, 105268.

      Berends, Y. R., Tulen, J. H. M., Wierdsma, A. I., van Pelt, J., Feldman, R., Zagoory-Sharon, O., de Rijke, Y. B., Kushner, S. A., & van Marle, H. J. C. (2019). Intranasal administration of oxytocin decreases task-related aggressive responses in healthy young males. Psychoneuroendocrinology, 106, 147–154.

      Chen, J., Putkinen, V., Seppälä, K., Hirvonen, J., Ioumpa, K., Gazzola, V., Keysers, C., & Nummenmaa, L. (2024). Endogenous opioid receptor system mediates costly altruism in the human brain. Communications Biology, 7(1), 1–11.

      Crockett, M. J., Kurth-Nelson, Z., Siegel, J. Z., Dayan, P., & Dolan, R. J. (2014). Harm to others outweighs harm to self in moral decision making. Proceedings of the National Academy of Sciences of the United States of America, 111(48), 17320–17325.

      Crockett, M. J., Siegel, J. Z., Kurth-Nelson, Z., Dayan, P., & Dolan, R. J. (2017). Moral transgressions corrupt neural representations of value. Nature Neuroscience, 20(6), 879–885.

      Crockett, M. J., Siegel, J. Z., Kurth-Nelson, Z., Ousdal, O. T., Story, G., Frieband, C., Grosse-Rueskamp, J. M., Dayan, P., & Dolan, R. J. (2015). Dissociable Effects of Serotonin and Dopamine on the Valuation of Harm in Moral Decision Making. Current Biology, 25(14), 1852–1859.

      De Dreu, C. K. W., Greer, L. L., Handgraaf, M. J. J., Shalvi, S., Van Kleef, G. A., Baas, M., Ten Velden, F. S., Van Dijk, E., & Feith, S. W. W. (2010). The Neuropeptide Oxytocin Regulates Parochial Altruism in Intergroup Conflict Among Humans. Science, 328(5984), 1408–1411.

      De Dreu, C. K. W., Gross, J., Fariña, A., & Ma, Y. (2020). Group Cooperation, Carrying-Capacity Stress, and Intergroup Conflict. Trends in Cognitive Sciences, 24(9), 760–776.

      Dölen, G., Darvishzadeh, A., Huang, K. W., & Malenka, R. C. (2013). Social reward requires coordinated activity of nucleus accumbens oxytocin and serotonin. Nature, 501(7466), 179–184.

      Dölen, G., & Malenka, R. C. (2014). The Emerging Role of Nucleus Accumbens Oxytocin in Social Cognition. Biological Psychiatry, 76(5), 354–355.

      Evans, S., Shergill, S. S., & Averbeck, B. B. (2010). Oxytocin Decreases Aversion to Angry Faces in an Associative Learning Task. Neuropsychopharmacology, 35(13), 2502–2509.

      Fehr, E., & Schmidt, K. M. (1999). A Theory of Fairness, Competition, and Cooperation*. The Quarterly Journal of Economics, 114(3), 817–868.

      FeldmanHall, O., Dalgleish, T., Evans, D., & Mobbs, D. (2015). Empathic concern drives costly altruism. Neuroimage, 105, 347–356.

      Fischer-Shofty, M., Levkovitz, Y., & Shamay-Tsoory, S. G. (2013). Oxytocin facilitates accurate perception of competition in men and kinship in women. Social Cognitive and Affective Neuroscience, 8(3), 313–317.

      Fischer-Shofty, M., Shamay-Tsoory, S. G., Harari, H., & Levkovitz, Y. (2010). The effect of intranasal administration of oxytocin on fear recognition. Neuropsychologia, 48(1), 179–184.

      Glenn, A. L., Koleva, S., Iyer, R., Graham, J., & Ditto, P. H. (2010). Moral identity in psychopathy. Judgment and Decision Making, 5(7), 497–505.

      Hoge, E. A., Anderson, E., Lawson, E. A., Bui, E., Fischer, L. E., Khadge, S. D., Barrett, L. F., & Simon, N. M. (2014). Gender moderates the effect of oxytocin on social judgments. Human Psychopharmacology: Clinical and Experimental, 29(3), 299–304.

      Hu, J., Hu, Y., Li, Y., & Zhou, X. (2021). Computational and Neurobiological Substrates of Cost-Benefit Integration in Altruistic Helping Decision. Journal of Neuroscience, 41(15), 3545–3561.

      Hutcherson, C. A., Bushong, B., & Rangel, A. (2015). A Neurocomputational Model of Altruistic Choice and Its Implications. Neuron, 87(2), 451–462.

      Insel, T. R. (2010). The Challenge of Translation in Social Neuroscience: A Review of Oxytocin, Vasopressin, and Affiliative Behavior. Neuron, 65(6), 768–779.

      Kahane, G., Everett, J. A. C., Earp, B. D., Caviola, L., Faber, N. S., Crockett, M. J., & Savulescu, J. (2018). Beyond sacrificial harm: A two-dimensional model of utilitarian psychology. Psychological Review, 125(2), 131–164.

      Kahane, G., Everett, J. A. C., Earp, B. D., Farias, M., & Savulescu, J. (2015). ‘Utilitarian’ judgments in sacrificial moral dilemmas do not reflect impartial concern for the greater good. Cognition, 134, 193–209.

      Kahneman, D., & Tversky, A. (1979). Prospect Theory: An Analysis of Decision under Risk. Econometrica, 47(2), 263.

      Kapetaniou, G. E., Reinhard, M. A., Christian, P., Jobst, A., Tobler, P. N., Padberg, F., & Soutschek, A. (2021). The role of oxytocin in delay of gratification and flexibility in non-social decision making. eLife, 10, e61844.

      Kirsch, P., Esslinger, C., Chen, Q., Mier, D., Lis, S., Siddhanti, S., Gruppe, H., Mattay, V. S., Gallhofer, B., & Meyer-Lindenberg, A. (2005). Oxytocin Modulates Neural Circuitry for Social Cognition and Fear in Humans. The Journal of Neuroscience, 25(49), 11489–11493.

      Liu, J., Gu, R., Liao, C., Lu, J., Fang, Y., Xu, P., Luo, Y., & Cui, F. (2020). The Neural Mechanism of the Social Framing Effect: Evidence from fMRI and tDCS Studies. The Journal of Neuroscience, 40(18), 3646–3656.

      Liu, Y., Li, L., Zheng, L., & Guo, X. (2017). Punish the Perpetrator or Compensate the Victim? Gain vs. Loss Context Modulate Third-Party Altruistic Behaviors. Frontiers in Psychology, 8, 2066.

      Lockwood, P. L., Hamonet, M., Zhang, S. H., Ratnavel, A., Salmony, F. U., Husain, M., & Maj, A. (2017). Prosocial apathy for helping others when effort is required. Nature Human Behaviour, 1(7), 131–131.

      Losecaat Vermeer, A. B., Boksem, M. A. S., & Sanfey, A. G. (2020). Third-party decision-making under risk as a function of prior gains and losses. Journal of Economic Psychology, 77, 102206.

      Lynn, S. K., Hoge, E. A., Fischer, L. E., Barrett, L. F., & Simon, N. M. (2014). Gender differences in oxytocin-associated disruption of decision bias during emotion perception. Psychiatry Research, 219(1), 198–203.

      Ma, Y., Liu, Y., Rand, D. G., Heatherton, T. F., & Han, S. (2015). Opposing Oxytocin Effects on Intergroup Cooperative Behavior in Intuitive and Reflective Minds. Neuropsychopharmacology, 40(10), 2379–2387.

      Ma, Y., Shamay-Tsoory, S., Han, S., & Zink, C. F. (2016). Oxytocin and Social Adaptation: Insights from Neuroimaging Studies of Healthy and Clinical Populations. Trends in Cognitive Sciences, 20(2), 133–145.

      MacDonald, K. S. (2013). Sex, Receptors, and Attachment: A Review of Individual Factors Influencing Response to Oxytocin. Frontiers in Neuroscience, 6. 194.

      Markiewicz, Ł., & Czupryna, M. (2018). Cheating: One Common Morality for Gain and Losses, but Two Components of Morality Itself. Journal of Behavior Decision Making. 33(2), 166-179.

      Pachur, T., Schulte-Mecklenbeck, M., Murphy, R. O., & Hertwig, R. (2018). Prospect theory reflects selective allocation of attention. Journal of Experimental Psychology: General, 147(2), 147–169.

      Radke, S., Roelofs, K., & De Bruijn, E. R. A. (2013). Acting on Anger: Social Anxiety Modulates Approach-Avoidance Tendencies After Oxytocin Administration. Psychological Science, 24(8), 1573–1578.

      Saez, I., Zhu, L., Set, E., Kayser, A., & Hsu, M. (2015). Dopamine modulates egalitarian behavior in humans. Current Biology, 25(7), 912–919.

      Teoh, Y. Y., Yao, Z., Cunningham, W. A., & Hutcherson, C. A. (2020). Attentional priorities drive effects of time pressure on altruistic choice. Nature Communications, 11(1), 3534.

      Tom, S. M., Fox, C. R., Trepel, C., & Poldrack, R. A. (2007). The neural basis of loss aversion in decision-making under risk. Science, 315(5811), 515–518.

      Usher, M., & McClelland, J. L. (2004). Loss Aversion and Inhibition in Dynamical Models of Multialternative Choice. Psychological Review, 111(3), 757–769.

      Volz, L. J., Welborn, B. L., Gobel, M. S., Gazzaniga, M. S., & Grafton, S. T. (2017). Harm to self outweighs benefit to others in moral decision making. Proceedings of the National Academy of Sciences of the United States of America, 114(30), 7963–7968.

      Wu, Q., Mao, J., & Li, J. (2020). Oxytocin alters the effect of payoff but not base rate in emotion perception. Psychoneuroendocrinology, 114, 104608.

      Wu, S., Cai, W., & Jin, S. (2018). Gain or non-loss: The message matching effect of regulatory focus on moral judgements of other-orientation lies. International Journal of Psychology, 53(3), 223-227.

      Xiong, W., Gao, X., He, Z., Yu, H., Liu, H., & Zhou, X. (2020). Affective evaluation of others’ altruistic decisions under risk and ambiguity. Neuroimage, 218, 116996.

      Yechiam, E., & Hochman, G. (2013). Losses as modulators of attention: Review and analysis of the unique effects of losses over gains. Psychological Bulletin, 139(2), 497–518.

      Zhan, Y., Xiao, X., Tan, Q., Li, J., Fan, W., Chen, J., & Zhong, Y. (2020). Neural correlations of the influence of self-relevance on moral decision-making involving a trade-off between harm and reward. Psychophysiology, 57(9), e13590.

      Zhang, H., Gross, J., De Dreu, C., & Ma, Y. (2019). Oxytocin promotes coordinated out-group attack during intergroup conflict in humans. eLife, 8, e40698.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important work identifies a previously uncharacterized capacity for songbirds to recover vocal targets even without sensory experience. While the evidence supporting this claim is solid, with innovative experiments exploring vocal plasticity in deafened birds, additional behavioral controls and analyses are necessary to shore up the main claims. If improved, this work has the potential for broad relevance to the fields of vocal and motor learning.

      We were able to address the requests for additional behavioral controls about the balancing of the groups (reviewer 1) and the few individual birds that showed a different behavior (reviewer 2) without collecting any further data. See our detailed replies below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zai et al test if songbirds can recover the capacity to sing auditory targets without singing experience or sensory feedback. Past work showed that after the pitch of targeted song syllables is driven outside of birds' preferred target range with external reinforcement, birds revert to baseline (i.e. restore their song to their target). Here the authors tested the extent to which this restoration occurs in muted or deafened birds. If these birds can restore, this would suggest an internal model that allows for sensory-to-motor mapping. If they cannot, this would suggest that learning relies entirely on feedback-dependent mechanisms, e.g. reinforcement learning (RL). The authors find that deafened birds exhibit moderate but significant restoration, consistent with the existence of a previously under-appreciated internal model in songbirds.

      Strengths:

      The experimental approach of studying vocal plasticity in deafened or muted birds is innovative, technically difficult, and perfectly suited for the question of feedback-independent learning. The finding in Figure 4 that deafened birds exhibit subtle but significant plasticity toward restoration of their pre-deafening target is surprising and important for the songbird and vocal learning fields, in general.

      Weaknesses:

      The evidence and analyses related to the directed plasticity in deafened birds are confusing, and the magnitude of the plasticity is far less than the plasticity observed in control birds with intact feedback. The authors acknowledge this difference in a two-system model of vocal plasticity, but one wonders why the feedback-independent model, which could powerfully enhance learning speed, is weak in this songbird system.

      We fully agree with the reviewer. This surprising weakness applies to birds’ inability rather than our approach for characterizing it.

      There remains some confusion about the precise pitch-change methods used to study the deafened birds, including the possibility that a critical cohort of birds was not suitably balanced in a way where deafened birds were tested on their ability to implement both pitch increases and decreases toward target restoration.

      Both deaf groups were balanced: (dLO and WNd) were balanced in that half of the birds (5/10 WNm and 4/8 dLO) shifted their pitch up (thus target restoration corresponded to decreasing pitch) and half of the birds (5/10 WNd and 4/8 dLO) shifted their pitch down (thus target restoration corresponded to increasing pitch), see Methods.

      To clarify the precise pitch-change method used, we added to the methods an explanation about why we used the sensitivity index 𝒅′ in Fig. 4:

      We used sensitivity 𝒅′ relative to the last 2 h of WN/LO instead of NRP because we wanted to detect a pitch change, which is the realm of detection theory, i.e. 𝒅′. Furthermore, by measuring local changes in pitch relative to the last 2 h of WN/LO reinforcement, our measurements are only minimally affected by the amount of reinforcement learning that might have occurred during this 2 h time window — choosing an earlier or longer window would have blended reinforced pitch changes into our estimates. Last but not least, changes in the way in which we normalized 𝒅’ values — dividing by 𝑺𝑩, — or using the NRP relative to the last 2 h of WN/LO did not qualitatively change the results shown in Fig. 4D.

      Reviewer #2 (Public Review):

      Summary:

      This paper investigates the role of motor practice and sensory feedback when a motor action returns to a learned or established baseline. Adult male zebra finches perform a stereotyped, learned vocalization (song). It is possible to shift the pitch of particular syllables away from the learned baseline pitch using contingent white noise reinforcement. When the reinforcement is stopped, birds will return to their baseline over time. During the return, they often sing hundreds of renditions of the song. However, whether motor action, sensory feedback, or both during singing is necessary to return to baseline is unknown.

      Previous work has shown that there is covert learning of the pitch shift. If the output of a song plasticity pathway is blocked during learning, there is no change in pitch during the training. However, as soon as the pathway is unblocked, the pitch immediately shifts to the target location, implying that there is learning of the shift even without performance. Here, they ask whether the return to baseline from such a pitch shift also involves covert or overt learning processes. They perform a series of studies to address these questions, using muting and deafening of birds at different time points. learning.

      Strengths:

      The overall premise is interesting and the use of muting and deafening to manipulate different aspects of motor practice vs. sensory feedback is a solid approach.

      Weaknesses:

      One of the main conclusions, which stems primarily from birds deafened after being pitch-shifted using white noise (WNd) birds in comparison to birds deafened before being pitchshifted with light as a reinforcer (LOd), is that recent auditory experience can drive motor plasticity even when an individual is deprived of such experience. While the lack of shift back to baseline pitch in the LOd birds is convincing, the main conclusion hinges on the responses of just a few WNd individuals who are closer to baseline in the early period. Moreover, only 2 WNd individuals reached baseline in the late period, though neither of these were individuals who were closer to baseline in the early phase. Most individuals remain or return toward the reinforced pitch. These data highlight that while it may be possible for previous auditory experience during reinforcement to drive motor plasticity, the effect is very limited. Importantly, it's not clear if there are other explanations for the changes in these birds, for example, whether there are differences in the number of renditions performed or changes to other aspects of syllable structure that could influence measurements of pitch.

      We thank the reviewer for these detailed observations. We looked into the reviewer’s claim that our main conclusion of revertive pitch changes in deaf birds with target mismatch experience hinges on only few WNd birds in the early period.

      When we remove the three birds that were close to baseline (NRP=0) in the early period, we still get the same trend that WNd birds show revertive changes towards baseline: Early 𝒅’ = −𝟎. 𝟏𝟑, 𝒑 = 𝟎. 𝟐𝟒, tstat = −𝟎.𝟕𝟒, 𝒅𝒇 = 𝟔, 𝑵 = 𝟕 birds, one-sided t-test of H0: 𝒅′ = 𝟎; Late 𝒅’ = −𝟏. 𝟐𝟔, 𝒑 = 𝟎. 𝟎𝟖, tstat = −𝟏.𝟔𝟑, 𝒅𝒇 = 𝟔, 𝑵 = 𝟕 birds, one-sided t-test of H0: 𝒅′ = 𝟎. Furthermore, even without these three birds, bootstrapping the difference between WNd and dC birds shows the same trend in the early period (p=0.22) and a significant reversion in the late period (p<0.001). Thus, the effect of reversion towards baseline in the late period is robustly observed on a population level, even when discounting for three individual birds that the reviewer suspected would be responsible for the effect.

      Moreover, note that there are not two but three WNd individuals that reached baseline in the late period (see Figure 2C, D). One of them was already close to baseline in the early period and another one was already relatively close, too.

      Also, the considerable variability among birds is not surprising, it is to be expected that the variability across deaf birds is large because of their ongoing song degradation that might lead to a drift of pitch over time since deafening.

      Last but not least, see also our multivariate model (below).

      With regards to the “differences in the number of renditions” that could explain pitch changes: Deaf birds sing less after deafening than hearing birds: they sing less during the first 2 hours (early): 87±59 renditions (WNd) and 410±330 renditions (dLO) compared to 616±272 renditions (control birds). Also, WN deaf birds sing only 4300±2300 motif renditions between the early and late period compared to the average of 11000±3400 renditions that hearing control birds produce in the same time period. However, despite these differences, when we provide WNd birds more time to recover, namely 9 days after the early period, they sung on average 12000±6000 renditions, yet their NRP was still significantly different from zero (NRP = 0.37, p=0.007, tstat=3.47, df=9). Thus, even after producing more practice songs, deaf birds do not recover baseline pitch and so the number of songs alone cannot explain why deaf birds do not fully recover pitch. We conclude that auditory experience seems to be necessary to recover song.

      We added this information to the Results.

      In this context, note that the interesting part of our work is not that deaf birds do not fully recover, but that they recover anything at all (“main conclusion”, Fig. 4). The number of songs does not explain why deaf birds with mismatch experience (WNd, singing the least and singing significantly less than control birds, p=2.3*10-6, two-tailed t-test) partially revert song towards baseline, unlike deaf birds without mismatch experience (dLO, singing significantly more than WNd birds, p=0.008, and indistinguishable from control birds, p=0.1). We added this information to the Results section.

      With regards to ‘other aspects of syllable structure’: We did not look into this. Regardless of the outcome of such a hypothetical analysis, whether other syllable features change is irrelevant for our finding that deaf birds do not recover their target song. Nevertheless, note that in Zai et al. 2020 (supplementary Figure 1), we analyzed features other than pitch change in deaf birds. Absolute change in entropy variance was larger in deaf birds than in hearing birds, consistent with the literature on song degradation after deafening (Lombardino and Nottebohm, 2000, Nordeen and Nordeen 2010 and many others). In that paper, we found that only pitch changes consistently along the LO direction. All other features that we looked at (duration, AM, FM and entropy) did not change consistently with the LO contingency. We expect that a similar result would apply for the changes across the recovery period in WNd and dLO birds, i.e., that song degradation can be seen in many features and that pitch is the sole feature that changes consistently with reinforcement (LO/WN) direction.

      While there are examples where the authors perform direct comparisons between particular manipulations and the controls, many of the statistical analyses test whether each group is above or below a threshold (e.g. baseline) separately and then make qualitative comparisons between those groups. Given the variation within the manipulated groups, it seems especially important to determine not just whether these are different from the threshold, but how they compare to the controls. In particular, a full model with time (early, late), treatment (deafened, muted, etc), and individual ID (random variable) would substantially strengthen the analysis.

      We performed a full model of the NRP as the reviewer suggests and it supports our conclusions: Neither muting, deafening nor time without practice between R and E windows have a significant effect on pitch in the E window, but the interaction between deafening and time (late, L) results in a significant pitch change (fixed effect 0.67, p=2*10-6), demonstrating that deaf birds are significantly further away from baseline (NRP=0) than hearing birds in late windows, thereby confirming that birds require auditory feedback to recover a distant pitch target. Importantly, we find a significant fixed effect on pitch in the direction of the target with mismatch experience (fixed effect -0.37, p=0.006), supporting our finding that limited vocal plasticity towards a target is possible even without auditory feedback.

      We included this model as additional analysis to our manuscript.

      The muted birds seem to take longer to return to baseline than controls even after they are unmuted. Presumably, there is some time required to recover from surgery, however, it's unclear whether muting has longer-term effects on syrinx function or the ability to pass air. In particular, it's possible that the birds still haven't recovered by 4 days after unmuting as a consequence of the muting and unmuting procedure or that the lack of recovery is indicative of an additional effect that muting has on pitch recovery. For example, the methods state that muted birds perform some quiet vocalizations. However, if birds also attempt to sing, but just do so silently, perhaps the aberrant somatosensory or other input from singing while muted has additional effects on the ability to regain pitch. It would also be useful to know if there is a relationship between how long they are muted and how quickly they return to baseline.

      We agree, it might be the case that muting has some longer-term effects that could explain why WNm birds did not recover pitch 4 days after unmuting. However, if such an effect exists, it is only weak. Arguing against the idea that a longer muting requires longer recovery, we did not find a correlation between the difference in NRP between early and late and 1. the duration the birds were muted (correlation coefficient = -0.50, p=0.20), and 2. the number of renditions the birds sung between early and late (correlation coefficient = 0.03, p=0.95), and 3. the time since they last sung the target song (last rendition of baseline, correlation coefficient = -0.43, p=0.29). Neither did we find a correlation between the early NRP and the time since the muting surgery (correlation coefficient = 0.26, p=0.53), suggesting that the lack of pitch recovery while muted was not due to a lingering burden of the muting surgery. We added these results to the results section.

      In summary, we used the WNm group to assess whether birds can recover their target pitch in the absence of practice, i.e. whether they recovered pitch in the early time period. Whether or not some long-term effect of the muting/unmuting procedure affects recovery does not impair the main finding we obtained from WNm birds in Figure 1 (that birds do not recover without practice).

      Reviewer #3 (Public Review):

      Summary:

      Zai et al. test whether birds can modify their vocal behavior in a manner consistent with planning. They point out that while some animals are known to be capable of volitional control of vocalizations, it has been unclear if animals are capable of planning vocalizations -that is, modifying vocalizations towards a desired target without the need to learn this modification by practicing and comparing sensory feedback of practiced behavior to the behavioral target. They study zebra finches that have been trained to shift the pitch of song syllables away from their baseline values. It is known that once this training ends, zebra finches have a drive to modify pitch so that it is restored back to its baseline value. They take advantage of this drive to ask whether birds can implement this targeted pitch modification in a manner that looks like planning, by comparing the time course and magnitude of pitch modification in separate groups of birds who have undergone different manipulations of sensory and motor capabilities. A key finding is that birds who are deafened immediately before the onset of this pitch restoration paradigm, but after they have been shifted away from baseline, are able to shift pitch partially back towards their baseline target. In other words, this targeted pitch shift occurs even when birds don't have access to auditory feedback, which argues that this shift is not due to reinforcement-learning-guided practice, but is instead planned based on the difference between an internal representation of the target (baseline pitch) and current behavior (pitch the bird was singing immediately before deafening).

      The authors present additional behavioral studies arguing that this pitch shift requires auditory experience of the song in its state after it has been shifted away from baseline (birds deafened early on, before the initial pitch shift away from baseline, do not exhibit any shift back towards baseline), and that a full shift back to baseline requires auditory feedback. The authors synthesize these results to argue that different mechanisms operate for small shifts (planning, does not need auditory feedback) and large shifts (reinforcement learning, requires auditory feedback).

      We thank the reviewer for this concise summary of our paper. To clarify, we want to point out that we do not make any statement about the learning mechanism birds use to make large shifts to recover their target pitch, i.e. we do not say that large shifts are learned by reinforcement learning requiring auditory feedback. We only show that large shifts require auditory feedback.

      The authors also make a distinction between two kinds of planning: covert-not requiring any motor practice and overt-requiring motor practice but without access to auditory experience from which target mismatch could be computed. They argue that birds plan overtly, based on these deafening experiments as well as an analogous experiment involving temporary muting, which suggests that indeed motor practice is required for pitch shifts.

      Strengths:

      The primary finding (that partially restorative pitch shift occurs even after deafening) rests on strong behavioral evidence. It is less clear to what extent this shift requires practice, since their analysis of pitch after deafening takes the average over within the first two hours of singing. If this shift is already evident in the first few renditions then this would be evidence for covert planning. This analysis might not be feasible without a larger dataset. Similarly, the authors could test whether the first few renditions after recovery from muting already exhibit a shift back toward baseline.

      This work will be a valuable addition to others studying birdsong learning and its neural mechanisms. It documents features of birdsong plasticity that are unexpected in standard models of birdsong learning based on reinforcement and are consistent with an additional, perhaps more cognitive, mechanism involving planning. As the authors point out, perhaps this framework offers a reinterpretation of the neural mechanisms underlying a prior finding of covert pitch learning in songbirds (Charlesworth et al., 2012).

      A strength of this work is the variety and detail in its behavioral studies, combined with sensory and motor manipulations, which on their own form a rich set of observations that are useful behavioral constraints on future studies.

      Weaknesses:

      The argument that pitch modification in deafened birds requires some experience hearing their song in its shifted state prior to deafening (Fig. 4) is solid but has an important caveat. Their argument rests on comparing two experimental conditions: one with and one without auditory experience of shifted pitch. However, these conditions also differ in the pitch training paradigm: the "with experience" condition was performed using white noise training, while the "without experience" condition used "lights off" training (Fig. 4A). It is possible that the differences in the ability for these two groups to restore pitch to baseline reflect the training paradigm, not whether subjects had auditory experience of the pitch shift. Ideally, a control study would use one of the training paradigms for both conditions, which would be "lights off" or electrical stimulation (McGregor et al. 2022), since WN training cannot be performed in deafened birds. This is difficult, in part because the authors previously showed that "lights off" training has different valences for deafened vs. hearing birds (Zai et al. 2020). Realistically, this would be a point to add to in discussion rather than a new experiment.

      We added the following statement to our manuscript:

      It is unlikely that dLO birds’ inability to recover baseline pitch is somehow due to our use of a reinforcer of a non-auditory (visual) modality, since somatosensory stimuli do not prevent reliable target pitch recovery in hearing birds (McGregor et al 2022).

      A minor caveat, perhaps worth noting in the discussion, is that this partial pitch shift after deafening could potentially be attributed to the birds "gaining access to some pitch information via somatosensory stretch and vibration receptors and/or air pressure sensing", as the authors acknowledge earlier in the paper. This does not strongly detract from their findings as it does not explain why they found a difference between the "mismatch experience" and "no mismatch experience groups" (Fig. 4).

      We added the following statement: Our insights were gained in deaf birds and we cannot rule out that deaf birds could gain access to pitch information via somatosensoryproprioceptive sensory modalities. However, such information, even if available, cannot explain the difference between the "mismatch experience” (WNd) and the "no mismatch experience" (dLO) groups, which strengthens our claim that the pitch reversion we observe is a planned change and not merely a rigid motor response (as in simple usedependent forgetting).

      More broadly, it is not clear to me what kind of planning these birds are doing, or even whether the "overt planning" here is consistent with "planning" as usually implied in the literature, which in many cases really means covert planning. The idea of using internal models to compute motor output indeed is planning, but why would this not occur immediately (or in a few renditions), instead of taking tens to hundreds of renditions?

      Indeed, what we call ‘covert planning’ refers to what usually is called ‘planning’ in the literature. Also, there seems to be currently no evidence for spontaneous overt planning in songbirds (which we elicited with deafening). Replay of song-like syringeal muscle activity can be induced by auditory stimuli during sleep (Bush, A., Doppler, J. F., Goller, F., and Mindlin, G. B. (2018), but to our knowledge there are no reports of similar replay in awake, non-singing birds, which would constitute evidence for overt planning.

      We cannot ascertain how fast birds can plan their song changes, but our findings are not in disagreement with fast planning. The smallest time window of analysis we chose is 2h, which sets a lower bound of the time frame within which we can measure pitch changes. Our approach is probably not ideally suited for determining the minimal planning time, because the deafening and muting procedures cause an increase in song variability, which calls for larger pitch sample sizes for statistical testing, and the surgeries themselves cause a prolonged period without singing during which we have no access to the birds’ planned motor output. Note that fast planning is demonstrated by the recent finding of instant imitation in nightingales (Costalunga, Giacomo, et al. 2023) and is evidenced by fast re-pitching upon context changes in Bengalese finches (Veit, L., Tian, L. Y., Monroy Hernandez, C. J., & Brainard, M. S., 2021).

      To resolve confusion, it would be useful to discuss and add references relating "overt" planning to the broader literature on planning, including in the introduction when the concept is introduced.

      Overt and covert planning are terms used in the literature on child development and on adult learning, see (Zajic, Matthew Carl, et al., Overt planning behaviors during writing in school-age children with autism spectrum disorder and attention-deficit/hyperactivity disorder, 2020) and (Abbas zare-ee, Researching Aptitude in a Process-Based Approach to Foreign Language Writing Instruction. Advances in Language and Literary Studies, 2014), and references therein.

      Indeed, muddying the interpretation of this behavior as planning is that there are other explanations for the findings, such as use-dependent forgetting, which the authors acknowledge in the introduction, but don't clearly revisit as a possible explanation of their results. Perhaps this is because the authors equate use-dependent forgetting and overt planning, in which case this could be stated more clearly in the introduction or discussion.

      We do not mean to strictly equate use-dependent forgetting and overt planning, although they can be related, namely when ‘use’ refers to ‘altered use’ as is the case when something about the behavior is missing (e.g. auditory feedback in our study), and the dependence is not just on ‘use’ but also on ‘experience’.

      We added the following sentence to the discussion: We cannot distinguish the overt planning we find from more complex use-and-experience dependent forgetting, since we only probed for recovery of pitch and did not attempt to push birds into planning pitch shifts further away from baseline.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The single main issue with this paper is in the section related to Figure 4, and the Figure itself - this is the most important part of the paper essential to buttress the claim of covert learning. However, there are several sources of confusion in the text, analyses, and figures. The key result is in Figure 4B, C - and, in the context of Figs 1-3, the data are significant but subtle. That is, as the authors state, the birds are mostly dependent on slow sensory feedback-dependent (possibly RL) mechanisms but there is a small component of target matching that evidences an internal model. One wonders why this capacity is so small - if they had a good internal model they'd be much faster and better at recovering target pitches after distortion-driven deviations even without sensory feedback.

      (1a) The analysis of the WNd and DLO reversions of pitch (related to Fig. 4) uses a d' analysis which is a pivot from the NRP analysis used in the rest of the paper. It is not clear why different analyses are being used here to compute essentially the same measure, i.e. how much did the pitch revert. It's also odd that different results are now obtained - Fig. 4 has a small but significant reversion of pitch in WNd birds but Fig. 2 shows no significant return to baseline.

      We did not test for reversion towards baseline in Fig. 2 and made no statement about whether there is a significant reversion or not. But when we do such a test, we find a significant reversion for WNd birds in the ‘late’ window (NRP=0.5, p=0.02, N=10, tstat=-1.77, two-tailed t-test), which agrees with Figure 4. In the ‘early’ window in Fig. 2, we find only a trend but no reversion (NRP = 0.76, p=0.11, n=10, tstat=-1.76), which contrasts with our findings in Figure 4. However, the discrepancy can be simply explained by the difference in time alignment that we detail in the Materials and Methods. Namely, in Figure 2, we measure pitch relative to the pitch in the morning on the day before, which is not a good measure of ‘reversion’ (since pitch had been reinforced further away during the day), which is why we do not present this analysis in the paper and dedicate a separate analysis in Figure 4 to reversion.

      (1b) Also in Fig. 4 is it the case that, as in the schematic of 4a, ALL birds in these experiments had their pitch pushed up - so that the return to baseline was all down? If this is the case the analysis may be contaminated by a pitch-down bias in deafened birds. This would ideally be tested with a balance of pitch-up and pitch-down birds in the pre-deafening period, and/or analysis of non-targeted harmonic stacks to examine their pitch changes. If non-targeted stacks exhibit pitch-down changes after deafening, then the reversion that forms the key discovery of this paper will be undermined. Please address.

      Both groups in Figure 4 were balanced (same number of birds were shifted their pitch up and down), see response to public review and Methods.

      (1c) After multiple re-reads and consultations with the Methods section I still do not understand the motivation or result for Figure 4E. Please provide clarification of the hypothesis/control being assessed and the outcome.

      Figure 4E does not add an additional result but strengthens our previous findings because we obtain the same result with a different method. The pitch of deaf birds tends to drift after deafening. To discount for this drift and the effect of time elapsed since deafening, we bootstrapped the magnitude of the pitch change in WNd and dLO birds by comparing them to dC birds in matched time windows. We modified the sentence in the results section to clarify this point:

      To discount for the effect of time elapsed since deafening and quantify the change in pitch specifically due to reinforcement, we bootstrapped the difference in 𝒅′ between dLO/WNd birds and a new group of dC birds that were deafened but experienced no prior reinforcement (see methods).

      (1d) Line 215. It's not clear in the text here how the WNd birds experience a pitch mismatch. Please clarify the text that this mismatch was experienced before deafening. This is a critical paragraph to set up the main claims of the paper. Also, it's not clear what is meant by 'fuel their plan'? I can imagine this would simply be a DA-dependent plasticity process in Area X that does not fuel a plan but rather re-wires and HVC timestep to medium spiny neurons whose outputs drive pitch changes - i.e. not a fueled plan but simply an RL-dependent re-mapping in the motor system. Alternatively, a change could result in plasticity in pallial circuits (e.g. auditory to HVC mappings) that are RL independent and invoke an inverse model along the lines of the author's past work (e.g. Ganguli and Hahnlsoer). This issue is taken up in the discussion but the setup here in the results is very confusing about the possible outcomes. This paragraph is vague with respect to the key hypotheses. It's possible that the WNd and DLO groups enable dissection of the two hypotheses above - because the DLO groups would presumably have RL signals but without recovery - but there remains a real lack of clarity over exactly how the authors are interpreting Fig 4 at the mechanistic level.

      WNd birds experience a pitch mismatch because while singing they hear that their pitch differs from baseline pitch, but the same is not true for dLO birds. We simply tested whether this experience makes a difference for reversion and it does. We added ‘before deafening’ to the paragraph and changed the wording of our hypothesis to make it clearer (we reworded ‘fuel their plan’). Mechanistic interpretations we left in the discussion. Without going to details, all we are saying is that birds can only plan to revert motor changes they are aware of in the first place.

      Minor issues

      The songs of deafened birds degrade, at a rate that depends on the bird's age. Younger crystalized birds degrade much faster, presumably because of lower testosterone levels that are associated with increased plasticity and LMAN function. Some background is needed on deafened birds to set up the WNd experiments.

      Despite deafening leading to the degradation of song (Lombardino and Nottebohm, 2000), syllable detection and pitch calculation were still possible in all deaf birds (up to 13-50 days after deafening surgery, age range 90-300 dph, n=44 birds).

      Since pitch shifting was balanced in both deaf bird groups (the same number of birds were up- and down-shifted), systematic changes in pitch post deafening (Lombardino and Nottebohm, 2000) will average out and so would not affect our findings.

      Lines 97-103. The paragraph is unclear and perhaps a call to a SupFig to show the lack of recovery would help. If I understand correctly, the first two birds did not exhibit the normal recovery to baseline if they did not have an opportunity to hear themselves sing without the WN. I am failing to understand this.

      In the early window (first 2 hours after unmuting) birds have not changed their pitch compared to their pitch in the corresponding window at the end of reinforcement (with matching time-of-day). We added ‘immediately after unmuting (early)’ to clarify this statement.

      Lines 68-69. What is the difference between (2) and (3)? Both require sensory representation/target to be mapped to vocal motor output. Please clarify or fuse these concepts.

      We fused the concept and changed the figure and explanation accordingly.

      Line 100. Please name the figure to support the claim.

      We marked the two birds in the Fig. 1H and added a reference in the text.

      Line 109. Is there a way to confirm / test if muted birds attempted to sing?

      Unfortunately, we do not have video recordings to check if there are any signs of singing attempts in muted birds.

      Line 296: Why 'hierarchically 'lower'?

      Lower because without it there is nothing to consolidate, i.e. the higher process can only be effective after the lower but not before. We clarified this point in the text.

      Past work on temporal - CAF (tcaf) by the Olveczky group showed that syllable durations and gaps could be reinforced in a way that does not depend on Area X and, therefore, related to the authors' discussion on the possible mechanisms of sensory-feedback independent recovery, may rely on the same neural substrates that Fig. 4 WNd group uses to recover. Yet the authors find in this paper that tCAF birds did not recover. There seems to be an oddity here - if covert recovery relies on circuits outside the basal ganglia and RL mechanisms, wouldn't t-CAF birds be more likely to recover? This is not a major issue but is a source of confusion related to the authors' interpretations that could be fleshed out.

      This is a good point, we reinvestigated the tCAF birds in the context of Fig 4 where we looked for pitch reversions towards baseline. tCAF birds do also revert towards baseline. We added this information to the supplement. We cannot say anything about the mechanistic reasons for lack of recovery, especially given that we did not look at brain-level mechanisms.

      Reviewer #2 (Recommendations For The Authors):

      The data presentation could be improved. It is difficult to distinguish between the early and late symbols and to distinguish between the colors for the individual lines on the plots or to match them with the points on the group data plots. In addition, because presumably, the points in plots like 2D are for the same individuals, lines connecting those points would be useful rather than trying to figure out which points are the same color.

      We added lines in Fig. 2D connecting the birds in early and late.

      The model illustrations (Fig 1A, Fig 5) are not intuitive and do not help to clarify the different hypotheses or ideas. I think these need to be reworked.

      We revised the model illustrations and hope they improved to clarify the different hypothesis.

      Some of the phrasing is confusing. Especially lines 157-158 and 256-257.

      Lines 157-158: we removed an instance of ‘WNd’, which was out of place.

      Lines 256-257: we rephrased to ‘showing that prior experience of a target mismatch is necessary for pitch reversion independently of auditory feedback’

      Reviewer #3 (Recommendations For The Authors):

      For Fig. 1, the conclusion in the text "Overall, these findings suggest that either motor practice, sensory feedback, or both, are necessary for the recovery of baseline song" is not aligned with the figure header "Recovery of pitch target requires practice".

      We rephrased the conclusion to: Overall, these findings rule out covert planning in muted birds and suggest that motor practice is necessary for recovery of baseline song.

      The use of the term "song experience" can be confusing as to whether it means motor or auditory experience. Perhaps replace it with "singing experience" or "auditory experience" where appropriate.

      We did the requested changes.

      Fig. 1A, and related text, reads as three hypotheses that the authors will test in the paper, but I don't think this turns out to the be the main goal (and if it is, it is not clear their results differentiate between hypotheses 1, 2, and 3). Perhaps reframe as discussion points and have this panel not be so prominent at the start, just to avoid this confusion.

      We modified the illustration in Fig 1A and simplified it. We now only show the 2 hypotheses that we test in the paper.

      Line 275-276, "preceding few hours necessitates auditory feedback, which sets a limit to zebra finches' covert planning ability". Did the authors mean "overt", not covert? Since their study focuses on overt planning.

      Our study focuses on covert planning in figure 1 and overt planning in subsequent figures.

      The purpose of the paragraph starting on line 278 could be more clear. Is the goal to say that overt planning and what has previously been described as use-dependent forgetting are actually the same thing? If not, what is the relationship between overt planning and forgetting? In other words, why should I care about prior work on use-dependent forgetting?

      We moved the paragraph further down where it does not interrupt the narrative. See also our reply to reviewer 3 on use-dependent forgetting.

      Line 294, "...a dependent process enabled by experience of the former...", was not clear what "former" is referring to. In general, this paragraph was difficult to understand. Line 296: Which is the "lower" process?

      We added explanatory parentheses in the text to clarify. We rephrased the sentence to ‘the hierarchically lower process of acquisition or planning as we find is independent of immediate sensory experience.’

      Line 295, the reference to "acquisition" vs. "retention". It is not clear how these two concepts relate to the behavior in this study, and/or the hierarchical processes referenced in the previous sentence. Overall, it is not clear how consolidation is related to the paper's findings.

      We added explanatory parentheses in the text and changed figure 5 to better explain the links.

      Line 305, add a reference to Warren et al. 2011, which I believe was the first study (or one of them) that showed that AFP bias is required for restoring pitch to baseline.

      We are citing Warren et al. 2011 in the sentence:

      Such separation also applies to songbirds. Both reinforcement learning of pitch and recovery of the original pitch baseline depend on the anterior forebrain pathway and its output, the lateral magnocellular nucleus of the anterior nidopallium (LMAN)(1).

      Line 310, "Because LMAN seems capable of executing a motor plan without sensory feedback", is this inferred from this paper (in which case this is an overreach) or is this referencing prior work (if so, which one, and please cite)?

      We changed the wording to ‘It remains to be seen whether LMAN is capable of executing a motor plans without sensory feedback’.

      Line 326, "which makes them well suited for planning song in a manner congruent with experience." I don't fully understand the logic. Can this sentence be clarified?

      We rephrased the sentence and added an explanation as follows: …which makes them well suited for executing song plans within the range of recent experience (i.e., if the song is outside recent experience, it elicits no LMAN response and so does not gain access to planning circuits).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors report a molecular mechanism for recruiting syntaixn 17 (Syn17) to the closed autophagosomes through the charge interaction between enriched PI4P and the C-terminal region of Syn17. How to precisely control the location and conformation of proteins is critical for maintaining autophagic flux. Particularly, the recruitment of Syn17 to autophagosomes remains unclear. In this paper, the author describes a simple lipid-protein interaction model beyond previous studies focusing on protein-protein interactions. This represents conceptual advances.

      We would like to thank Reviewer #1 for the positive evaluation of our study.

      Reviewer #2 (Public Review):

      Summary:

      Syntaxin17 (STX17) is a SNARE protein that is recruited to mature (i.e., closed) autophagosomes, but not to immature (i.e., unclosed) ones, and mediates the autophagosome-lysosome fusion. How STX17 recognizes the mature autophagosome is an unresolved interesting question in the autophagy field. Shinoda and colleagues set out to answer this question by focusing on the C-terminal domain of STX17 and found that PI4P is a strong candidate that causes the STX17 recruitment to the autophasome.

      Strengths:

      The main findings are: 1) Rich positive charges in the C-terminal domain of STX17 are sufficient for the recruitment to the mature autophagosome; 2) Fluorescence charge sensors of different strengths suggest that autophagic membranes have negative charges and the charge increases as they mature; 3) Among a battery of fluorescence biosensors, only PI4P-binding biosensors distribute to the mature autophagosome; 4) STX17 bound to isolated autophagosomes is released by treatment with Sac1 phosphatase; 5) By dynamic molecular simulation, STX17 TM is shown to be inserted to a membrane containing PI4P but not to a membrane without it. These results indicate that PI4P is a strong candidate that STX17 binds to in the autophagosome.

      We would like to thank Reviewer #2 for pointing out these strengths.

      Weaknesses:

      • It was not answered whether PI4P is crucial for the STX17 recruitment in cells because manipulation of the PI4P content in autophagic membranes was not successful for unknown reasons.

      As we explained in the initial submission, we tried to deplete PI4P in autophagosomes by multiple methods but did not succeed. In this revised manuscript, we added the result of an experiment using the PI 4-kinase inhibitor NC03 (Figure 4―figure supplement 1), which shows no significant effect on the autophagosomal PI4P level and STX17 recruitment.

      Author response image 1.

      The PI 4-kinase inhibitor NC03 failed to suppress autophagosomal PI4P accumulation and STX17 recruitment. HEK293T cells stably expressing mRuby3–STX17TM (A) or mRuby3–CERT(PHD) (B) and Halotag-LC3 were cultured in starvation medium for 1 h and then treated with and without 10 μM NC03 for 10 min. Representative confocal images are shown. STX17TM- or CERT(PHD)-positive rates of LC3 structures per cell (n > 30 cells) are shown in the graphs. Solid horizontal lines indicate medians, boxes indicate the interquartile ranges (25th to 75th percentiles), and whiskers indicate the 5th to 95th percentiles. Differences were statistically analyzed by Welch’s t-test. Scale bars, 10 μm (main), 1 μm (inset).

      • The molecular simulation study did not show whether PI4P is necessary for the STX17 TM insertion or whether other negatively charged lipids can play a similar role.

      As the reviewer suggested, we performed the molecular dynamics simulation using membranes with phosphatidylinositol, a negatively charged lipid. STX17 TM approached the PI-containing membrane but was not inserted into the membrane within a time scale of 100 ns in simulations of all five structures. This data suggests that PI4P, which is more negatively charged than PI, is required for STX17 insertion. Thus, we have included these data in Figure 5E and F and added the following text to Lines 242–244. “Moreover, if the membrane contained phosphatidylinositol (PI) instead of PI4P, STX17 approached the PI-containing membrane but was not inserted into the membrane (Figure 5E, F, Video 3)."

      Author response image 2.

      (E) An example of a time series of simulated results of STX17TM insertion into a membrane consisting of 70% phosphatidylcholine (PC), 20% phosphatidylethanolamine (PE), and 10% phosphatidylinositol (PI). STX17TM is shown in blue. Phosphorus in PC, PE and PI are indicated by yellow, cyan, and orange, respectively. Short-tailed lipids are represented as green sticks. The time evolution series are shown in Video 3. (F) Time evolution of the z-coordinate of the center of mass (z_cm) of the transmembrane helices of STX17TM in the case of membranes with PI. Five independent simulation results are represented by solid lines of different colors. The gray dashed lines indicate the locations of the lipid heads. A scale bar indicates 5 nm.

      • The question that the authors posed in the beginning, i.e., why is STX17 recruited to the mature (closed) autophagosome but not to immature autophagic membranes, was not answered. The authors speculate that the seemingly gradual increase of negative charges in autophagic membranes is caused by an increase in PI4P. However, this was not supported by the PI4P fluorescence biosensor experiment that showed their distribution to the mature autophagosome only. Here, there are at least two possibilities: 1) The increase of negative charges in immature autophagic membranes is derived from PI4P. However the fluorescence biosensors do not bind there for some reason; for example, they are not sensitive enough to recognize PI4P until it reaches a certain level, or simply, their binding does not occur in a quantitative manner. 2) The negative charge in immature membranes is not derived from PI4P, and PI4P is generated abundantly only after autophagosomes are closed. In either case, it is not easy to explain why STX17 is recruited to the mature autophagosome only. For the first scenario, it is not clear how the PI4P synthesis is regulated so that it reaches a sufficient level only after the membrane closure. In the second case, the mechanism that produces PI4P only after the autophagosome closure needs to be elucidated (so, in this case, the question of the temporal regulation issue remains the same).

      We thank the reviewers for pointing this out. While the probe for weakly negative charges (1K8Q) labeled both immature and mature autophagosomes, the probes for intermediate charges (5K4Q and 3K6Q) and PI4P labeled only mature autophagosomes (Figure 2F, Figure 2–figure supplement 1B). Thus, we think that the autophagosomal membrane rapidly and drastically becomes negatively charged, and at the same time, PI4P is enriched. Although immature membranes may have weak negative charges, we did not examine which lipids contribute to the negative charges. Thus, we have added the following sentences to the Discussion part.

      “Our data of the 1K8Q probe suggest that immature autophagosomal membranes may also have slight negative charges (Figure 2E). Although the source of the negative charge of immature autophagosomes is currently unknown, it may be derived from low levels of PI4P, which is undetectable by the PI4P probes and/or other negatively charged lipids such as PI and PS (Schmitt et al., EMBO Rep, 2022).” (Lines 279–283) “In any case, it would be important to elucidate how PI 4-kinase activity or PI4P synthesis is upregulated during autophagosome maturation.” (Lines 302–303)

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors set out to address the question of how the SNARE protein Syntaxin 17 senses autophagosome maturation by being recruited to autophagosomal membranes only once autophagosome formation and sealing is complete. The authors discover that the C-terminal region of Syntaxin 17 is essential for its sensing mechanism that involves two transmembrane domains and a positively charged region. The authors discover that the lipid PI4P is highly enriched in mature autophagosomes and that electrostatic interaction with Syntaxin 17's positively charged region with PI4P drives recruitment specifically to mature autophagosomes. The temporal basis for PI4P enrichment and Syntaxin 17 recruitment to ensure that unsealed autophagosomes do not fuse with lysosomes is a very interesting and important discovery. Overall, the data are clear and convincing, with the study providing important mechanistic insights that will be of broad interest to the autophagy field, and also to cell biologists interested in phosphoinositide lipid biology. The author's discovery also provides an opportunity for future research in which Syntaxin 17's c-terminal region could be used to target factors of interest to mature autophagosomes.

      Strengths:

      The study combines clear and convincing cell biology data with in vitro approaches to show how Syntaxin 17 is recruited to mature autophagosomes. The authors take a methodical approach to narrow down the critical regions within Syntaxin 17 required for recruitment and use a variety of biosensors to show that PI4P is enriched on mature autophagosomes.

      We would like to thank Reviewer #3 for the positive comments.

      Weaknesses:

      There are no major weaknesses, overall the work is highly convincing. It would have been beneficial if the authors could have shown whether altering PI4P levels would affect Syntaxin 17 recruitment. However, this is understandably a challenging experiment to undertake and the authors outlined their various attempts to tackle this question.

      We thank Reviewer #3 for pointing this out. Please see our above response to Reviewer #2 (Public Review).

      In addition, clear statements within the figure legends on the number of independent experimental repeats that were conducted for experiments that were quantitated are not currently present in the manuscript.

      As pointed out by Reviewer #3, we have added the number of independent experimental repeats in the figure legends.

      Reviewer #1 (Recommendations For The Authors):

      This paper is well written and all experiments were conducted with a high standard. Several minor issues should be addressed before final publication.

      (1) To further confirm the charge interaction, a charge screening experiment should be performed for Fig. 2A.

      We have asked Reviewer #1 through the editor what this experiment meant and understood that it was to see the effects of high salt concentrations. We monitored the association of GFP-STX17TM with liposomes in the presence or absence of 1 M NaCl and found that it was blocked in a high ionic buffer. This data supports the electrostatic interaction of STX17 with membranes. We have included this data in Figure 2B and added the following sentences to Lines 124–126.

      “The association of STX17TM with PI4P-containing membranes was abolished in the presence of 1 M NaCl (Figure 2B). These data suggest that STX17 can be recruited to negatively charged membranes via electrostatic interaction independent of the specific lipid species.”

      Author response image 3.

      GFP–STX17TM translated in vitro was incubated with rhodamine-labeled liposomes containing 70% PC, 20% PE and 10% PI4P in the presence of 1 M NaCl or 1.2 M sucrose. GFP intensities of liposomes were quantified and shown as in Figure 1C (n > 30).

      (2) The authors claim that "Autophagosomes become negatively charged during maturation", based on experiments using membrane charge probes. Since it's mainly about the membrane, it's better to refine the claim to "The membrane of autophasosomes becomes...", which would be more precise and close to the topic of this paper.

      We would like to thank the reviewer for pointing this out. This point is valid. As recommended, we have collected the phrases “Autophagosomes become negatively charged during maturation” to “The membrane of autophagosomes becomes negatively charged during maturation” (Line 72, 118, 262, 969 (title of Figure2), 1068 (title of Figure2–figure supplyment1)).

      (3) The authors should add more discussion regarding the "specificity" for recruiting Syn17 through the charge interaction. Particularly, how Syn17 could be maintained before the closure of autophagosomes? For the MD simulations in Fig. 5, the current results don't add much to the manuscript. The cell biology experiments have demonstrated the conclusion. The authors could try to find more details about the insertion by analyzing the simulation movies. Do membrane packing defects play a role during the insertion process? A similar analysis was conducted for alpha-synuclein (https://pubmed.ncbi.nlm.nih.gov/33437978/).

      Regarding the mechanism of STX17 maintenance in the cytosol, we do not think that other molecules, such as chaperones, are essential because purified recombinant mGFP-STX17TM used in this study is soluble. However, it does not rule out such a mechanism, which would be a future study.

      In the paper by Liu et al. (PMID: 33437978), small liposomes with diameters of 25–50 nm are used. Therefore, there are packing defects in the highly curved membranes, to which alpha-synuclein helices are inserted in a curvature-dependent manner. On the other hand, autophagosomes are much larger (~1 um in diameter) and almost flat for STX17 molecules, so we think it is unlikely that STX17 recognizes the packing defect.

      Reviewer #2 (Recommendations For The Authors):

      • The two (and other) possibilities with regards to the interpretation of the negative charge/PI4P result in autophagic membranes are hoped to be discussed.

      As mentioned above, we have added the following sentences to the Discussion section. “Our data of the 1K8Q probe suggest that immature autophagosomal membranes may also have slight negative charges (Figure 2E). Although the source of the negative charge of immature autophagosomes is currently unknown, it may be derived from low levels of PI4P, which is undetectable by the PI4P probes and/or other negatively charged lipids such as PI and PS (Schmitt et al., EMBO Rep, 2022).” (Lines 279–283)

      “In any case, it would be important to elucidate how PI 4-kinase activity or PI4P synthesis is upregulated during autophagosome maturation.” (Lines 302–303)

      • Fluorescence biosensors are convenient to give an overview of the intracellular distribution of various lipids, but some of them show false-negative results. For example, evectin-2-PH for PS binds to endosomes but not to the plasma membrane, even though the latter contains abundant PS. With regards to PI4P, some biosensors illuminate both the Golgi and autophagosome, while others do not appear to bind the Golgi. Moreover, fluorescence biosensors for PI(3,5)P2 and PI(3,4)P2, which are also candidates for the STX17 insertion issue, are less reliable than others (e.g., those for PI3P and PI(4,5)P2). These problems need to be considered.

      We agree with Reviewer #2 that fluorescence biosensors are not perfect for detecting specific lipids. Based on the Reviewer’s suggestion, we have included a comment on this in the Discussion section as follows (Lines 265–268).

      “Given the possibility that fluorescence lipid probes may give false-negative results, a more comprehensive biochemical analysis, such as lipidomics analysis of mature autophagosomes, would be imperative to elucidate the potential involvement of other negatively charged lipids.”

      • A negative control for the PI4P biosensor, i.e., a mutant lacking the PI4P binding ability, is better to be tested to confirm the presence of PI4P in autophagosomes.

      We would like to thank the Reviewer for this comment. We conducted the suggested experiment and confirmed that the CERT(PHD)(W33A) mutant, which is deficient for PI4P binding (Sugiki et al., JBC. 2012), was diffusely present in the cytosol and did not localize to STX17-positive autophagosomes. This data supports our conclusion that PI4P is indeed present in autophagosomes. We have included this data in Figure 3–figure supplement 2A and explained it in the text (Lines 164–166).

      Author response image 4.

      Mouse embryonic fibroblasts (MEFs) stably expressing GFP–CERT(PHD)(W33A) and mRuby3–STX17TM were cultured in starvation medium for 1 h. Bars indicate 10 μm (main images) and 1 μm (insets).

      • As a control to the molecular dynamic simulation study, STX17 TM insertion into a membrane containing other negative charge lipids, especially PI, needs to be tested. PI is a negative charge lipid that is likely to exist in autophagic membranes (as suggested by the authors' past study).

      We thank the reviewers for this suggestion. As mentioned above (Reviewer #2, Public Review), we performed the molecular dynamics simulation using membranes containing PI and added the results in Figure 5E and F and Video 3.

      • If the putative role of PI4P could be shown in the cellular context, the authors' conclusion would be much strengthened. I wonder if overexpression of PI4P fluorescence biosensors, especially those that appear to bind to the autophagosome almost exclusively, may suppress the recruitment of STX17 there.

      We would like to thank the Reviewer for asking this question. In MEFs stably overexpressing PI4P probes driven by the CMV promoter, STX17 recruitment was not affected. Thus, simple overexpression of PI4P probes does not appear to be effective in masking PI4P in autophagosomes.

      Another idea is to use an appropriate molecule (e.g., WIPI2, ATG5) and to recruit Sac1 to autophagic membranes by using the FRB-FKBP system or the like. I hope these and other possibilities will be tested to confirm the importance of PI4P in the temporal regulation of STX17 recruitment.

      We tried the FRB-FKBP system using the phosphatase domain of yeast Sac1 fused to FKBP and LC3 fused to FRB, but unfortunately, this system failed to deplete PI4P from the autophagosomal membrane.

      Reviewer #3 (Recommendations For The Authors):

      A few areas for suggested improvement are:

      (1) It would be helpful if the authors could clarify for all figures how many independent experiments were conducted for all experiments, particularly those that have quantitation and statistical analyses.

      As pointed out by Reviewer #3, we have added the number of independent experimental repeats in the figure legends.

      The authors made several attempts to modulate PI4P levels on autophagosomes although understandably this proved to be challenging. A couple of suggestions are provided to address this area:

      (2) Given the reported role of GABARAPs in PI4K2a recruitment and PI4P production on autophagosomes, as well as autophagosome-lysosome fusion (Nguyen et al (2016) J Cell Biol) it would be worthwhile to assess whether GABARAP TKO cells have reduced PI4P and reduced Stx17 recruitment

      According to the Reviewer’s suggestion, we examined the localization of STX17 TM and the PI4P probe CERT(PHD) in ATG8 family (LC3/GABARAP) hexa KO HeLa cells that were established by the Lazarou lab (Nguyen et al., JCB 2016). As in WT cells, STX17 TM and CERT(PHD) were still colocalized with each other in hexa KO cells, suggesting that neither STX17 recruitment nor PI4P enrichment depends on ATG8 family proteins (note: the size of autophagosomes in HeLa cells is smaller than in MEFs, making it difficult to observe autophagosomes as ring-shaped structures). We have included this result in Figure 3–figure supplement 2(F) and explained it in the text (Lines 194–196, 198).

      Author response image 5.

      (F) WT and ATG8 hexa KO HeLa cells stably expressing GFP–STX17TM and transiently expressing mRuby3–CERT(PHD) were cultured in starvation medium. Bars indicate 10 μm (main images) and 1 μm (insets).

      (3) Can the authors try fusing Sac1 to one of the PI4P probes (CERT(PHD)) that were used, or alternatively to the c-terminus of Syntaxin 17? This approach would help to recruit Sac1 only to mature autophagosomes and could therefore prevent the autophagosome formation defect observed when fused to LC3B that targeted Sac1 to autophagosomes as they were forming. Understandably, this approach might seem a bit counterintuitive since the phosphatase is removing PI4P which is what is recruiting it but it could be a viable approach to keep PI4P levels low enough on mature autophagosomes so that Syntaxin 17 is no longer recruited. A Sac1 phosphatase mutant might be needed as a control.

      We would like to thank the Reviewer for these suggestions. We tried the phosphatase domain of yeast Sac1 or human SAC1 fused with STX17TM, but unfortunately, these fusion proteins did not deplete PI4P from autophagosomes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Reviewer #1:

      (1) To support the finding that texture is not represented in a modular fashion, additional possibilities must be considered. These include (a) the effectiveness and specificity of the texture stimulus and control stimuli, (b) further analysis of possible structure in images that may have been missed, and (c) limitations of imaging resolution.

      Thank you for your comments. To address your concerns, we have conducted a new 3T fMRI experiment to demonstrate the effectiveness and specificity of our stimuli, performed further analyses to investigate possible structure of texture-selective activation, and discussed the limitations of imaging resolution.

      (a) To demonstrate the effectiveness and specificity of our stimuli, we conducted a new 3T fMRI experiment in five participants using an experimental design and texture families similar to those in Freeman (2013). Six texture stimuli in the 7T experiment were also included. To assess the effectiveness of each stimulus type, different texture families and their corresponding noise patterns were presented in separate blocks for 24 seconds, at a high presentation rate of 5 frames per second. In Figure S7, all texture families showed significantly stronger activation in V2 compared to their corresponding noise patterns, even for those that ‘appeared’ to have residual texture (e.g., the third texture family). These results demonstrate that our texture vs. noise stimuli were effective in producing texture-selective activations in area V2. Compared to the 7T results, the 3T data showed a notable increase in texture-selective activations in V2, likely due to increased stimulus presentation speed (1.25 vs. 5 frames/second). Future studies should use stimuli with faster presentation speed to validate our results in the 7T experiment.

      (b)Thank you for pointing out the possible structures of texture-selective activations in the peripheral visual field (Figure S1). In further analyses, we also found stronger texture selectivity in more peripheral visual fields (Figure 2D), and there were weak but significant correlations in the texture-noise activation patterns during split-half analysis (Author response image 2). Although this is not strong evidence for columnar organization of naturalistic textures, it suggests a possibility for modular organizations in the peripheral visual field.

      (c) Although our fMRI result at 1-mm isotropic resolution did not show strong evidence for modular processing of naturalistic texture in V2 stripe columns, this does not exclude the possibility that smaller modules exist beyond the current fMRI resolution. We have discussed this possibility in the revised manuscript.

      We hope this response clarifies our findings, and we have revised the conclusions in the manuscript accordingly.

      (2) More in-depth analysis of subject data is needed. The apparent structure in the texture images in peripheral fields of some subjects calls for more detailed analysis. e.g Relationship to eccentricity and the need for a 'modularity index' to quantify the degree of modularity. A possible relationship to eccentricity should also be considered.

      Based on your recommendations, we have performed further analysis and found interesting results regarding the modularity index in relation to eccentricity. As shown in Figure 2D, the texture-selectivity index increased as eccentricity. This may suggest a higher possibility of modular organization for texture representation in the peripheral compared to central visual fields. We have updated our results in Figure 2C, and discussed this possibility in the revised manuscript.

      (3) Given what is known as a modular organization in V4 and V3 (e.g. for color, orientation, curvature), did images reveal these organizations? If so, connectivity analysis would be improved based on such ROIs. This would further strengthen the hierarchical scheme.

      Following your recommendations, we have conducted further analysis to investigate the potential modular organizations in V4 and V3ab. In Figure S9 (Figure S9), vertices that are most responsive to color, disparity and texture were shown in a representative subject. Indeed, texture-selective patches can be found in both V4 and V3ab, along with the color- and disparity-selective patches. We agree with you that there should be pathway-specific connectivity among the same type of functional modules. In the informational connectivity analyses, we already used highly informative voxels by feature selection, which should mainly represent information from the modular organizations in these higher visual areas.

      Reviewer #2:

      (1) In lines 162-163, it is stated that no clear columnar organization exists for naturalistic texture processing in V2. In my opinion, this should be rephrased. As far as I understand, Figure 2B refers to the analysis used to support the conclusion. The left and middle bar plots only show a circular analysis since ROIs were based on the color and disparity contrast used to define thin and thick stripes. The interesting graph is the right plot, which shows no statistically significant overlap of texture processing with thin, thick, and pale stripe ROIs. It should be pointed out that this analysis does not dismiss a columnar organization per se but instead only supports the conclusion of no coincidence with the CO-stripe architecture.

      Thank you for your suggestions. Reviewer #1 also raised a similar concern. We agree that there may be a smaller functional module of textures in area V2 at a finer spatial scale than our fMRI resolution. We have rephrased our conclusions to be more precise.

      (2) In Figure 3, cortical depth-dependent analyses are presented for color, disparity, and texture processing. I acknowledge that the authors took care of venous effects by excluding outlier voxels. However, the GE-BOLD signal at high magnetic fields is still biased to extravascular contributions from around larger veins. Therefore, the highest color selectivity in superficial layers might also result from the bias to draining veins and might not be of neuronal origin. Furthermore, it is interesting that cortical profiles with the highest selectivity in superficial layers show overall higher selectivity across cortical depth. Could the missing increase toward the pial surface in other profiles result from the ROI definition or overall smaller signal changes (effect size) of selected voxels? At least, a more careful interpretation and discussion would be helpful for the reader.

      We agree with you that there will be residual venous effects even after removing voxels containing large veins. However, calculating the selectivity index largely removed the superficial bias (Figure 3). In the revised manuscript, we discussed the limitations of cortical depth-dependent analysis using GE-BOLD fMRI.

      In Line 397-403: “Due to the limitations of the T2*w GE-BOLD signal in its sensitivity to large draining veins (Fracasso et al., 2021; Parkes et al., 2005; Uludag & Havlicek, 2021), the original BOLD responses were strongly biased towards the superficial depth in our data (Figure S8). Compared to GE-BOLD, VASO-CBV and SE-BOLD fMRI techniques have higher spatial specificity but much lower sensitivity (Huber et al., 2019). As shown in a recent study (Qian et al., 2024), using differential BOLD responses in a continuous­­ stimulus design can significantly enhance the laminar specificity of the feature selectivity measures in our results (Figure 3).”

      It is unlikely that the strongest color selectivity index in the superficial depth is a result of stronger signal change or larger effect size in this condition. As shown by the original BOLD responses in Figure S8, all stimulus conditions produced robust activations that strongly biased to the superficial depth. High texture selectivity was also found in V4 and V3ab across cortical depth, which showed a flat laminar profile.

      (3) I was slightly surprised that no retinotopy data was acquired. The ROI definition in the manuscript was based on a retinotopy atlas plus manual stripe segmentation of single columns. Both steps have disadvantages because they neglect individual differences and are based on subjective assessment. A few points might be worth discussing: (1) In lines 467-468, the authors state that V2 was defined based on the extent of stripes. This classical definition of area V2 was questioned by a recent publication (Nasr et al., 2016, J Neurosci, 36, 1841-1857), which showed that stripes might extend into V3. Could this have been a problem in the present analysis, e.g., in the connectivity analysis? (2) The manual segmentation depends on the chosen threshold value, which is inevitably arbitrary. Which value was used?

      A previous study showed that the retinotopic atlas of early visual areas (V1-V3) aligned very well across participants on the standard surface after surface-based registration by the anatomical landmarks (Benson 2018). Thus, the group-averaged atlas should be accurate in defining the boundaries of early visual areas. To directly demonstrate the accuracy of this method, retinotopic data were acquired in five participants in a 3T fMRI experiment. A phase-encoded method was used to define the boundaries of early visual areas (black lines in Author response image 1), which were highly consistent with the Benson atlas.

      Although a few feature-selective stripes may extend into V3, these stripe patterns were mainly represented in V2. Thus, the signal contribution from V3 is likely to be small and should not affect the pattern of results. The activation map threshold for manual segmentation was abs(T)>2. We have clarified this in the revised methods.

      Author response image 1.

      Retinotopic ROIs defined by the Benson atlas (left) and the polar angle map (right) of the representative subject. Black lines denote the boundaries of early visual areas based on the retinotopic map from the subject.

      Benson, N. C., Jamison, K. W., Arcaro, M. J., Vu, A. T., Glasser, M. F., Coalson, T. S., Van Essen, D. C., Yacoub, E., Ugurbil, K., Winawer, J., & Kay, K. (2018). The Human Connectome Project 7 Tesla retinotopy dataset: Description and population receptive field analysis. J Vis, 18(13), 23. https://doi.org/10.1167/18.13.23

      (4) The use of 1-mm isotropic voxels is relatively coarse for cortical depth-dependent analyses, especially in the early visual cortex, which is highly convoluted and has a small cortical thickness. For example, most layer-fMRI studies use a voxel size of around isotropic 0.8 mm, which has half the voxel volume of 1 mm isotropic voxels. With increasing voxel volume, partial volume effects become more pronounced. For example, partial volume with CSF might confound the analysis by introducing pulsatility effects.

      We agree that a 1-mm isotropic voxel is much larger in volume than a 0.8-mm isotropic voxel, but the resolution along the cortical depth is not a big difference. In addition to our study, a previous study showed that fMRI at 1-mm isotropic resolution is capable of resolving cortical depth-dependent signals (Roefs et al., 2024; Shao et al., 2021). We have discussed these issues about fMRI resolution in the revised manuscript.

      In Line 403-408: “Compared to the submillimeter voxels, as used in most laminar fMRI studies, our fMRI resolution at 1-mm isotropic voxel may have a stronger partial volume effect in the cortical depth-dependent analysis. However, consistent with our results, previous studies have also shown that 7T fMRI at 1-mm isotropic resolution can resolve cortical depth-dependent signals in human visual cortex (Roefs et al., 2024; Shao et al., 2021).”

      Shao, X., Guo, F., Shou, Q., Wang, K., Jann, K., Yan, L., Toga, A. W., Zhang, P., & Wang, D. J. J. (2021). Laminar perfusion imaging with zoomed arterial spin labeling at 7 Tesla. NeuroImage, 245, 118724. https://doi.org/10.1016/j.neuroimage.2021.118724

      Roefs, E. C., Schellekens, W., Báez-Yáñez, M. G., Bhogal, A. A., Groen, I. I., van Osch, M. J., ... & Petridou, N. (2024). The Contribution of the Vascular Architecture and Cerebrovascular Reactivity to the BOLD signal Formation across Cortical Depth. Imaging Neuroscience, 2, 1–19.

      (5) The SVM analysis included a feature selection step stated in lines 531-533. Although this step is reasonable for the training of a machine learning classifier, it would be interesting to know if the authors think this step could have reintroduced some bias to draining vein contributions.

      We excluded vertices with extremely large signal change and their corresponding voxels in the gray matter when defining ROIs. The same number of voxels were selected from each cortical depth for the SVM analysis, thus there was no bias in the number of voxels from the superficial layers susceptible to large draining veins.

      Reviewer #3:

      The authors tend to overclaim their results.

      Re: Thank you for your comments. We added more control analyses to strengthen our findings, and gave more appropriate discussion of results.

      Recommendations for the authors:

      Reviewer #1:

      (1) Controls: There is a bit more complexity than is expressed in the introduction. The authors hypothesize that the emergence of computational features such as texture may be reflected in specialized columns. That is, if texture is generated in V2, there may be texture columns (perhaps in the pale stripes of V2); but if generated at a higher level, then no texture columns would be needed. This is a very interesting and fundamental hypothesis. While there may be merit to this hypothesis, the demonstration that color and disparity are modular but not texture falls short of making a compelling argument. At a minimum, the finding that texture is not organized in V2 requires additional controls. (a) To boost the texture signal, additional texture stimuli or a sequence of multiple texture stimuli per trial could be considered. (b) Unfortunately, the comparison noise pattern also seems to contain texture; perhaps a less textured control could be designed. (c) It also appears that some of the texture images in Supplementary Figure S1 contain possible structure, e.g. in more peripheral visual fields. (d) Is it possible that the current imaging resolution is not sufficient for revealing texture domains? (e) Note that 'texture' may be a property that defines surfaces and not contours. Thus, while texture may have orientation content, its function may be associated with the surface processing pathways. A control stimulus might contain oriented elements of a texture stimulus that do not elicit texture percept; such a control might activate pale and/or thick stripes (both of which contain orientation domains), while the texture percept stimulus may activate surface-related bands in V4.

      Thank you for your suggestions. They are extremely helpful in improving our manuscript. For the controls you mentioned in (a-d), we discussed them in the public review that we also attached below.

      (a) and (b): To demonstrate the effectiveness and specificity of our stimuli, we conducted a new 3T fMRI experiment in five participants using an experimental design and texture families similar to those in Freeman (2013). All texture stimuli in the 7T experiment were also included. To assess the effectiveness of each stimulus type, different texture families and their corresponding noise patterns were presented in separate blocks for 24 seconds, at a high presentation rate of 5 frames per second. In Figure S7, all texture families showed significantly stronger activation in V2 compared to their corresponding noise patterns, even for those that ‘appeared’ to have residual texture (e.g., the third texture family). These results suggest that our texture stimuli were effective in producing texture-selective activations in area V2 compared to the noise control. Compared to the 7T results, the 3T data showed a notable increase in texture-selective activations in V2, likely due to the increased stimulus presentation speed (1.25 vs. 5 frames/second). Weak texture activations might preclude the detection of columnar representations in the 7T experiment.

      (c) Thank you for pointing out the possible structures of texture-selective activations in the peripheral visual field (Figure S1). In further analyses, we also found stronger texture selectivity in more peripheral visual fields (Figure 2D), and there were weak but significant correlations in the texture-noise activation patterns during split-half analysis (Author response image 2). Although these are not strong evidence for columnar organization of naturalistic textures, it suggests a possibility for such organizations in the peripheral visual field.

      (d) Although our fMRI result at 1-mm isotropic resolution did not show strong evidence for modular processing of naturalistic texture in V2 stripe columns, this does not exclude the possibility that smaller modules exist beyond the current fMRI resolution. We have discussed these limitations in the revised manuscript.

      We fully agree with your explanation in (e). It fits our data very well. Both texture and control stimuli strongly activated the CO-stripes (Figure 2 and Figure 2D), while modular organizations for texture were found in V4 and V3ab (Figure S9). We have discussed this explanation in the revised manuscript.

      In Line 371-374: “Consistently, our pilot results also revealed modular organizations for textures in V4 and V3ab (Figure S9). These texture-selective organizations may be related to surface representations in these higher order visual areas (Wang et al., 2024).”

      (2) Overly simple description of FF, FB circuitry. The classic anatomical definition of feedforward is output from a 'lower' area, in most cases predominantly arising from superficial layers and projecting to middle layers of a 'higher area' (Felleman and Van Essen 1991). This description holds for V1-to-V2, V2-to-V3, and V2-to-V4. [Note there are also feedforward projections from central 5 degrees of V1-to-V4 (cf. Ungerleider) as well as V3-to-V4.] The definition of feedback can be more varied but is generally considered from cells in superficial and deep layers of 'higher' areas projecting to superficial and deep layers of 'lower' areas. Feedback inputs to V1 heavily innervate Layer 1 and superficial Layer 2, as well as the deep layers. Note that feedback connections from V2 to V1, similar to that from V1 to V2, are functionally specific, i.e. thin-to-blob and pale/thick-to interblob (Federer...Angelucci 2021, Hu...Roe 2022). Thus, current views are moving away from the dogma that feedback is diffuse. Recognition that feedback may be modular introduces new ideas about analysis.

      Thanks for your detailed recommendations. We have expanded the discussion of circuit models of functional connectivity in the introduction. Our model and experiments primarily aim to investigate how higher-level areas provide feedback to the V2 area. While we acknowledge that feedback may indeed be functionally specific, our methodology has some certain advantages: it ensures signal stability and avoids the double-dipping issue. Meanwhile, it also focuses on voxels with high feature selectivity, which may already be included in the modular organizations of early visual areas. In the functional connectivity analysis, we performed feature selection to use the most informative voxels. These voxels with high feature selectivity should already be included in the modular organizations of early visual areas. Identifying functionally specific feedback connections between modular areas will be an important and meaningful work for future research. We have added a discussion of this topic in the revised manuscript.

      In Line 136-138: “Only major connections were shown here. There are also other connections, such as V1 interblobs projecting to thick stripes (Federer et al., 2021; Hu & Roe, 2022; Sincich and Horton, 2005).”

      (3) Imaging superficial layers: Although removal of the top layer of cortical voxels (top 5% of voxels) is a common method for dealing with surface vascular artifact contribution to BOLD signal, it likely removes a portion of the Layer 1&2 feedback signals. Is this why the authors define feedback and deep layer to deep layer? If so, both superficial and deep-layer data in Figure 4 should be explicitly explained and discussed.

      Thank you for pointing this out. We would like to clarify the surface-based method removing vascular artifact. The vertices influenced by large pial veins were first defined on the cortical surface, and then voxels were removed from the entire columns corresponding to these vertices to avoid sampling bias along the cortical depth. Thus, there should be complete data from all cortical depths for the remaining columns. We defined the feedback connectivity from deep layers to deep layers because it represents strong feedback connections according to literature (Markov et al., 2013; Ullman, 1995) and also avoids confounding the feedforward signals from superficial layers.

      Markov, N. T., Vezoli, J., Chameau, P., Falchier, A., Quilodran, R., Huissoud, C., Lamy, C., Misery, P., Giroud, P., Ullman, S., Barone, P., Dehay, C., Knoblauch, K., & Kennedy, H. (2014). Anatomy of hierarchy: feedforward and feedback pathways in macaque visual cortex. The Journal of comparative neurology, 522(1), 225–259. https://doi.org/10.1002/cne.23458

      Ullman S. (1995). Sequence seeking and counter streams: a computational model for bidirectional information flow in the visual cortex. Cerebral cortex, 5(1), 1–11. https://doi.org/10.1093/cercor/5.1.1

      (4) More detail on other subjects in Figure S1. Ten subjects conducted visual fixation and used a bite bar. Imaging data are illustrated in detail from one subject and the remaining subjects are depicted in graphs and in Supplemental Figure S1. Please provide arrowheads in each image to help guide the reader. Some kind of summary or index of modularity would also be helpful.

      Thanks for your suggestions. There are arrowheads in each image in our original manuscript and we have revised Figure S1 for better illustration. Additionally, we have added a table summarizing the number of stripes to provide a clearer overview.

      (5) How are ROIs in V3ab and V4 defined? V2 ROIs were defined (thin, thick, and pale stripe), but V3ab and V4 averaged across the whole area. Why not use the most activated "domains" from V3ab and V4? How does this influence connectivity analysis?

      Thank you for your question. We defined V4 and V3ab on the cortical surface using a retinotopic atlas (Benson 2018), which has been shown to be quite accurate in defining ROIs for the early visual areas. Since all ‘domains’ showed robust BOLD activation to our stimuli, we used voxels from the entire ROI in the depth-dependent analysis. In the functional connectivity analysis, we used the most informative voxels by feature selection, which should already be included in the feature domains.

      Minor:

      English language editing is needed.

      Thank you for your feedback. We have carefully revised the manuscript for clarity and readability.

      Line 31 "its" should be "their".

      Thank you. We have corrected "its" to "their".

      Replace 'representative subject' with 'subject'.

      We have replaced "representative subject" with "subject" in the manuscript.

      Replace 'naturalistic texture' with 'texture'.

      Thank you for your suggestion. The textures used in our experiment were generated based on the algorithm by Portilla and Simoncelli (2000), and the term "naturalistic texture" was used to be consistent with literature. The textures used in our study are different from traditional artificial textures, as they contain higher-order statistical dependencies. Following your recommendations, we have replaced ‘naturalistic texture’ with ‘texture’ in some places in the main text to improve readability.

      Typo: Line 126, Fig 2B should be 1B.

      Thank you. We have corrected "Fig 2B" to "Fig 1B" in Line 128.

      Fig. 2A: point out where are texture domains in anterior V2.

      The texture-selective activations in anterior V2 (corresponds to peripheral visual field) have been highlighted by arrowheads.

      Fig 2B, 3 legend: Round symbols are for each subject?

      Yes, the round symbols in Figures 2B represent data for individual participants. We have revised the legend for clarity.

      Fig. 3: Disparity and texture values do not look different across depth (except may the V2 texture values).

      While the difference in feature selectivity is small across cortical depths, they are highly consistent across participants. We have provided a figure showing the original BOLD responses in the revised manuscript (Figure S8 and Figure S8). Data from individual subjects were also available at Open Science Framework (OSF, https://doi.org/10.17605/OSF.IO/KSXT8 (‘rawBetaValues.mat’ in the data directory)).

      Line 57-59 The statement is not strictly accurate. V1 also has color, orientation, and motion representations.

      Thank you for your feedback. Our statement was intended to convey that M and P information from the geniculate input are transformed into representations of color, orientation, disparity, and motion in the primary visual cortex. We have clarified this point in the revised manuscript.

      In Line 58-60: “In the primary visual cortex (V1), the M and P information from the geniculate input are transformed into higher-level visual representations, such as motion, disparity, color, orientation, etc. (Tootell & Nasr, 2017).”

      Fig. 1B V1 interblobs also project to thick stripes (Sincich and Horton).

      Thank you for the additional information. We appreciate your input. Our figure is intended as a simplified schematic and does not fully represent all the connections. We have discussed this reference in the revised manuscript.

      In Line 136-138: “Only major connections were shown here. There are also other connections, such as V1 interblobs projecting to thick stripes (Federer et al., 2021; Hu & Roe, 2022; Sincich and Horton, 2005).”

      Line 207 "suggesting that both local and feedforward connections are involved in processing color information in area V2." Logic? English?

      Thank you for pointing this out. The superficial layers are involved in local intracortical processing by lateral connections and also send output to higher order visual areas along the feedforward pathway. Thus, the strongest color selectivity in the superficial depth of V2 supports that color information was processed in local neural circuits in area V2 and transmitted to higher order areas along the feedforward pathway. We have revised the manuscript for clarity.

      In Line 241-245: “According to the hierarchical model, the strongest color selectivity in the superficial cortical depth is consistent with the fact that color blobs locate in the superficial layers of V1 (Figure 1B, Felleman & Van Essen, 1991; Hubel & Livingstone, 1987; Nassi & Callaway, 2009). The strongest color selectivity in superficial V2 suggests that both local and feedforward connections are involved in processing color information (Figure 1C).”

      Line 254 "Laminar". Please use "cortical depth" or explicitly state that 'laminar' refers to superficial, middle, and deep as defined by cortical depth.

      Thank you for your suggestion. We have clarified the term "laminar" in the manuscript as referring to superficial, middle, and deep layers as defined by cortical depth.

      In Line 96-99: “To better understand the mesoscale functional organizations and neural circuits of information processing in area V2, the present study investigated laminar (or cortical depth-dependent) and columnar response profiles for color, disparity, and naturalistic texture in human V2 using 7T fMRI at 1-mm isotropic resolution.”

      Fig. S5 Please add a unit of isoluminance.

      Thank you for your suggestion. Supplementary Figure S10A and S10B illustrate the blue-matched luminance levels in RGB index. In our isoluminance experiment, blue was set as the reference color (RGB [0 0 255]) to measure the red and gray isoluminance.

      Line 448-449 To make this rationale clearer, refer to:

      Wang J, Nasr S, Roe AW, Polimeni JR. 2022. Critical factors in achieving fine‐scale functional MRI: Removing sources of inadvertent spatial smoothing. Human Brain Mapping. 43:3311-3331.

      Thank you for your suggestion. We have added this reference to better support the rationale of data analysis.

      Reviewer #2:

      (1) Line 126 should refer to Figure 1B.

      Thank you. We have corrected the reference in the revised manuscript as Figure 1B.

      (2) Even if only one naturalistic texture session was acquired per participant, it might be interesting to see the within-session repeatability by, e.g., splitting the texture runs into two halves.

      Thank you for your suggestion. We performed a split-half correlation analysis for participants who completed 10 runs in the naturalistic texture session. The result from one representative subject was shown in the figure below (for other participants, r = 0.38, 0.38, 0.24, and 0.23, respectively).

      Author response image 2.

      Split-half correlations for the texture-selective activation maps in a representative subject (S01) in V2.

      (3) Unfortunately, Figure S2 only shows the stripe ROIs but not V3ab or V4 ROIs. Including another figure that shows all ROIs in more detail would be interesting.

      Thank you for your suggestion. We have included a figure showing the ROIs for V4 and V3ab (the black dotted lines in Figure S9).

      (4) It would be helpful for the reader to have a more detailed discussion about methodological limitations, including the unspecificity of the GE-BOLD signal (Engel et al., 1997, Cereb Cortex, 7, 181-192; Parkes et al., 2005, MRM, 54, 1465-1472; Fracasso et al., 2021, Prog Neurobiol, 202, 102187) and the used voxel sizes.

      Thank you for your suggestion. We have added a more detailed discussion about the methodological limitations, including the unspecificity of the GE-BOLD signal and the voxel sizes used.

      In Line 397-408: “Due to the limitations of the T2*w GE-BOLD signal in its sensitivity to large draining veins (Fracasso et al., 2021; Parkes et al., 2005; Uludag & Havlicek, 2021), the original BOLD responses were strongly biased towards the superficial depth in our data (Figure S8). Compared to GE-BOLD, VASO-CBV and SE-BOLD fMRI techniques have higher spatial specificity but much lower sensitivity (Huber et al., 2019). As shown in a recent study (Qian et al., 2024), using differential BOLD responses in a continuous¬¬ stimulus design can significantly enhance the laminar specificity of the feature selectivity measures in our results (Figure 3). Compared to the submillimeter voxels, as used in most laminar fMRI studies, our fMRI resolution at 1-mm isotropic voxel may have a stronger partial volume effect in the cortical depth-dependent analysis. However, consistent with our results, previous studies have also shown that 7T fMRI at 1-mm isotropic resolution can resolve cortical depth-dependent signals in human visual cortex (Roefs et al., 2024; Shao et al., 2021).”

      (5) If I understand correctly, different numbers of runs/sessions were acquired for different subjects. It would be good to discuss if this could have impacted the results, e.g., different effect sizes could have biased the manual ROI definition.

      Thank you for your suggestion. Although there were differences in the number of runs/sessions acquired for different subjects, there were at least four runs of data for each experiment, which should be enough to examine the within-subject effect. We have discussed this point in the revised manuscript.

      In Line 481-484: “Although the number of runs were not equal across participants, there were at least four runs (twenty blocks for each stimulus condition) of data in each experiment, which should be sufficient to investigate within-subject effects.”

      (6) It would be good to add the software used for layer definition. Was it Laynii?

      We have provided more details in the revised methods.

      In Line 523-526: “An equi-volume method was used to calculate the relative cortical depth of each voxel to the white matter and pial surface (0: white matter surface, 1: pial surface, Supplementary Figure S11A), using mripy (https://github.com/herrlich10/mripy).”

      (7) It would be interesting to see (at least for one subject) the contrasts of color-selective thin stripes and disparity-selective thick stripes from single sessions to demonstrate the repeatability of measurements.

      Thank you for your suggestion. We have shown the test-retest reliability of the response pattern of color-selective thin stripes and disparity-selective thick stripes in a representative subject in Figure S5.

      (8) By any chance, do the authors also have resting-state data from the same subjects? It would be interesting to see the connectivity analysis between stripes and V3ab, V4 with resting-state data.

      Thank you for your suggestion. Unfortunately, we do not have resting-state data from the same subjects at this time. We agree with you that layer-specific connectivity analysis with resting-state data is very interesting and worth investigating in future studies.

      Reviewer #3:

      (1) For investigating information flow across areas, the authors rely on layer-specific informational connectivity analyses, which is an exciting approach. Covariation in decoding accuracy for a specific dependent variable between the superficial layers of a lower area and the middle layer of a higher area is taken as evidence for feedforward connectivity, whereas FB was defined as the connection between the two deep layers. Yet this method is not assumption-free. For example, the canonical idea (Figure 1C) of FF terminals exclusively arriving in layer 4 and FB terminals exclusively terminating in supra-or infragranular layers is not entirely correct. This is not even the case for area V1 - see for example Kathy Rockland's exquisite tractography studies, showing that even single axons with branches terminating in different layers. Also, feedback signals not only arrive in the deep layers of a lower area. Although these informational connectivity analyses can be suggestive of information flow, this reviewer doubts it can be considered as conclusive evidence. Therefore, the authors should drastically tone down their language in this respect, throughout the text. They present suggestive, not conclusive evidence. To obtain truly conclusive evidence, one likely has to perform laminar electrophysiological recordings simultaneously across multiple areas and infer the directionality of information flow using, for example, granger causality.

      Thank you for pointing out this important issue. In our response to a previous question (Reviewer #1, the 2nd comment), we have discussed other possible connections in addition to the canonical feedforward and feedback pathways. In the revised manuscript, the conclusion has been toned down to properly reflect our findings. However, we would also like to emphasize that our conclusion about laminar circuits was supported by converging lines of evidence. For example, in addition to the depth-dependent connectivity results, the role of feedback circuit in processing texture information was also supported by greater selectivity in V4 than V2, and the strongest deep layer selectivity in V2 (Figure 3C).

      (2) In the same realm, how reproducible are the information connectivity results? In the first part of the study, the authors performed a split-half analyses. This should be also done for Figure 4.

      Thank you for your suggestion. We have performed a split-half analysis for the informational connectivity results. As shown in Author response image 3, the results for the color experiment were robust and reproducible, while the disparity and texture connectivity results were less consistent between the two halves. The results from the second half (Author response image 3, below) are more consistent with the original findings (Figure 4). Overall, the pattern of results were qualitatively similar between the two halves. The inconsistency may be due to the fact that some participants had only four runs of data, which could make the split-half analysis less reliable.

      Author response image 3.

      Split-half analysis of informational connectivity.

      (3) Most of the other layer-specific claims (not the ones about the flow of information) are based on indices. It is unclear which ROIs contributed to these indices. Was it the entire extent of V1, V2, ...? Or only the visually-driven voxels within these areas? How exactly were the voxels selected? For V2, it would make sense to calculate the selectivity indices independently for the disparity and color-selective (putative) thick and (putative) thin stripe compartments, respectively. Adding voxels of non-selective compartments (e.g. putative thick stripe voxels for calculating the color-index; or adding putative thin-strip voxels for calculating the disparity index), will only add noise.

      In the revised manuscript, we have clarified that we selected the entire ROI in the depth-dependent analysis. Since our study does not have an independent functional localizer, using the entire ROI avoids the problem of double dipping. The processing of visual features is not confined solely to specific stripes. We have also provided a more comprehensive explanation of this issue in the discussion section.

      In Line 541-544: “For the cortical depth-dependent analyses in Figure 3, we used all voxels in the retinotopic ROI. Pooling all voxels in the ROI avoids the problem of double-dipping and also increases the signal-to-noise ratio of ROI-averaged BOLD responses.”

      (4) It is apparent from Figure 3, that the indices are largely (though not exclusively) driven by 2 subjects. Therefore, this reviewer wishes to see the raw data in addition to a table for calculating the color, disparity, and texture selectivity indices -along with the number of voxels that contributed to it.

      Thank you for your suggestion. We have provided a figure showing the original BOLD responses (Figure S8 and Figure S8). Data from individual subjects were also available at Open Science Framework (OSF, https://doi.org/10.17605/OSF.IO/KSXT8 (‘rawBetaValues.mat’ in the data directory)).

      Minor:

      (1) I typically find inferences about 'layer fMRI' vastly overstated. We all know that fMRI does not (yet) provide laminar-specific resolution, i.e., whereby meaningful differences in fMRI signals can be extracted from all 6 individual layers of neocortex, without partial volume effects, or without taking into account pre-and postsynaptic contributions of neurons to the fMRI signal (the cell bodies may very well lay in different layers than the dendritic trees etc.), or without taking into account the vascular anatomy, etc. The authors should use the term cortical depth-dependent fMRI throughout the text -as they do in the abstract and intro.

      Thank you for pointing out this important issue. We have now defined the meaning of layer or laminar as “cortical depth-dependent” in the introduction, to be consistent with the terminology in most published papers on this topic.

      (2) 1st sentence abstract: I disagree with this statement. The parallel streams in intermediate-level areas are probably equally well studied as the geniculostriate pathway -already starting with the seminal work of Hubel, Livingstone, and more recently by Angelucci and co-workers who looked in detail at the anatomical and functional interactions across sub-compartments of V1 and V2.

      Thank you for your feedback. In the revised manuscript, we have removed the term "much" from the first sentence of the abstract. Although there have been seminal studies of V2 sub-compartments in monkeys, only a few fMRI studies investigated this issue in humans.

      (3) The authors show inter-session correlations for color and disparity. This reviewer would like to see test-retest images since the explained variance is not terribly good. Also, show the correlation values for the inter-session texture beta values.

      Thank you for your suggestion. We have performed the test-retest reliability analysis of texture-selective patterns in the response to a previous question (Reviewer #2, the 2nd comment, Author response image 2).

      (4) The stripe definitions are threshold dependent. Please clarify whether the reported results are threshold-independent.

      Thank you for your question. To address your concern, we defined the stripe ROIs using different thresholds, and the results remained consistent. Specifically, we ranked the voxels in manually defined stripe ROIs by the color-disparity response. We then defined the lowest 10% as the thick stripe voxels, the highest 10% as thin stripe voxels, and the middle 10% as pale stripe voxels. Additionally, we adjusted the thresholds to 20% and 30% to define the three stripes (with 30% being the least strict threshold). Feature selectivities at different thresholds were shown in Figure S6 (from left to right: 10%, 20%, 30%). Notably, in all threshold conditions, there was no significant difference in texture selectivity across different stripes.

      (5) How were the visual areas defined?

      In the revised manuscript, we have provided a detailed description about methods.

      In Line 531-535: “ROIs were defined on the inflated cortical surface. Surface ROIs for V1, V2, V3ab, and V4 were defined based on the polar angle atlas from the 7T retinotopic dataset of Human Connectome Project (Benson et al., 2014, 2018). Moreover, the boundary of V2 was edited manually based on columnar patterns. All ROIs were constrained to regions where mean activation across all stimulus conditions exceeded 0.”

      (6) "According to the hierarchical model in Figure 1B and 1C, the strongest color selectivity in the superficial cortical depth is consistent with the fact that color blobs mainly locate in the superficial layers of V1, suggesting that both local and feedforward connections are involved in processing color information in area V2." But color-selective activation within V2 could be also consistent with feedback from other areas (some of which were not covered in the present experiments) -the more since most parts of the brain were not covered (i.e. a slab of 4 cm was covered)?

      Thank you for reminding us about this issue. We have discussed the possibility of feedback influence in explanation of the superficial bias of color selectivity in area V2.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review):

      Summary: 

      Authors benchmarked 5 IBD detection methods (hmmIBD, isoRelate, hap-IBD, phasedIBD, and Refined IBD) in Plasmodium falciparum using simulated and empirical data. Plasmodium falciparum has a mutation rate similar to humans but a much higher recombination rate and lower SNP density. Thus, the authors evaluated how recombination rate and marker density affect IBD segment detection. Next, they performed parameter optimization for Plasmodium falciparum and benchmarked the robustness of downstream analyses (selection detection and Ne inference) using IBD detected by each of the methods. They also tracked the computational efficiency of these methods. The authors work is valuable for the tested species and the analyses presented appear to support their claim that users should be cautious calling IBD when SNP density is low and recombination rate is high. 

      Strengths: 

      The study design was solid. The authors set up their reasoning for using P. falciparum very well. The high recombination rate and similar mutation rate to humans is indeed an interesting case. Further, they chose methods that were developed explicitly for each species. This was a strength of the work, as well as incorporating both simulated and empirical data to support their goal that IBD detection should be benchmarked in P. falciparum

      Weaknesses: 

      The scope of the optimization and application of results from the work are narrow, in that everything is finetuned for Plasmodium. Some of the results were not entirely unexpected for users of any of the tested software that was developed for humans. For example, it is known that Refined IBD is not going to do well with the combination of short IBD segments and low SNP density. Lastly, it appears the authors only did one largescale simulation (there are no reported SDs). 

      We thank the reviewer for highlighting the strengths and weaknesses of the study. 

      First, we would like to highlight that: (1) while we use Plasmodium as a model to investigate the impact of high recombination and low marker density on IBD detection and downstream analyses, our IBD benchmarking framework and strategies are widely applicable to IBD methods development for many sexually recombining species including both Plasmodium and non-Plasmodium species. (2) Although some results are not completely unexpected, such as the impact of low marker density on IBD detection, IBD-based methods have been increasingly used in malaria genomic surveillance research without comprehensive benchmarking for malaria parasites despite the high recombination rate. Due to the lack of benchmarking, researchers use a variety of different IBD callers for malaria research including those that are only benchmarked in human genomes, such as refined-ibd. Our work not only confirmed that low marker density (related to high recombination rate) can affect the accuracy of IBD detection, but also demonstrated the importance of proper parameter optimization and tool prioritization for specific downstream analyses in malaria research. We believe our work significantly contributes to the robustness of IBD segment detection and the enhancement of IBDbased malaria genomic surveillance.

      Second, we agree that there is a lack of clarity regarding simulation replicates and the uncertainty of reported estimates. We have made the following improvements, including (1) running n = 3 full sets of simulations for each analysis purpose, which is in addition to the large sample sizes and chromosomal-level replications already presented in our initial submission, and (2) updating data and figures to reflect the uncertainty at relevant levels (segment level, genome-pair level or simulation set level).   

      Reviewer #2 (Public review):

      Summary: 

      Guo et al. benchmarked and optimized methods for detecting Identity-By-Descent (IBD) segments in Plasmodium falciparum (Pf) genomes, which are characterized by high recombination rates and low marker density. Their goal was to address the limitations of existing IBD detection tools, which were primarily developed for human genomes and do not perform well in the genomic context of highly recombinant genomes. They first analysed various existing IBD callers, such as hmmIBD, isoRelate, hap-IBD, phased-IBD, refinedIBD. They focused on the impact of recombination on the accuracy, which was calculated based on two metrics, the false negative rate and the false positive rate. The results suggest that high recombination rates significantly reduce marker density, leading to higher false negative rates for short IBD segments. This effect compromises the reliability of IBD-based downstream analyses, such as effective population size (Ne) estimation. They showed that the best tool for IBD detection in Pf is hmmIBD, because it has relatively low FN/FP error rates and is less biased for relatedness estimates. However, this method is less computationally efficient. Their suggestion is to optimize human-oriented IBD methods and use hmmIBD only for the estimation of Ne. 

      Strengths: 

      Although I am not an expert on Plasmodium falciparum genetics, I believe the authors have developed a valuable benchmarking framework tailored to the unique genomic characteristics of this species. Their framework enables a thorough evaluation of various IBD detection tools for non-human data, such as high recombination rates and low marker density, addressing a key gap in the field. This study provides a

      comparison of multiple IBD detection methods, including probabilistic approaches (hmmIBD, isoRelate) and IBS-based methods (hap-IBD, Refined IBD, phased IBD). This comprehensive analysis offers researchers valuable guidance on the strengths and limitations of each tool, allowing them to make informed choices based on specific use cases. I think this is important beyond the study of Pf. The authors highlight how optimized IBD detection can help identify signals of positive selection, infer effective population size (Ne), and uncover population structure. They demonstrate the critical importance of tailoring analytical tools to suit the unique characteristics of a species. Moreover, the authors provide practical recommendations, such as employing hmmIBD for quality-sensitive analyses and fine-tuning parameters for tools originally designed for non-P. falciparum datasets before applying them to malaria research. 

      Overall, this study represents a meaningful contribution to both computational biology and malaria genomics, with its findings and recommendations likely to have an impact on the field. 

      Weaknesses: 

      One weakness of the study is the lack of emphasis on the broader importance of studying Plasmodium falciparum as a critical malaria-causing organism. Malaria remains a significant global health challenge, causing hundreds of thousands of deaths annually. The authors could have introduced better the topic, even though I understand this is a methodological paper. While the study provides a thorough technical evaluation of IBD detection methods and their application to Pf, it does not adequately connect these findings to the broader implications for malaria research and control efforts. Additionally, the discussion on malaria and its global impact could have framed the study in a more accessible and compelling way, making the importance of these technical advances clearer to a broader audience, including researchers and policymakers in the fight against malaria. 

      We thank the reviewer for highlighting the need to better contextualize the work and emphasize its relevance to malaria control and elimination efforts. We have edited the introduction and discussion sections to highlight the importance of studying Plasmodium as malaria-causing organisms and why IBD-based analysis is important to malaria researchers and policymakers. We believe the changes will better emphasize the public health relevance of the work and improve clarity for a general audience.  

      We would like to clarify that we are not recommending that researchers “optimize human-oriented IBD methods and use hmmIBD only for the estimation of Ne.” We recommended hmmIBD for Ne analysis; however, hmmIBD can be utilized for other applications, including population structure and selection detection. Thus, we generally recommend using hmmIBD for Plasmodium when phased genotypes are available. To avoid potential misunderstandings, we have revised relevant sentences in the abstract, introduction, and discussion. One reason to consider human-oriented IBD detection methods in Plasmodium research is that hmmIBD currently has limitations in handling large genomic datasets. Our ongoing research focuses on improving hmmIBD to reduce its computational runtime, making it scalable for large Plasmodium wholegenome sequence datasets.

      Recommendations for the authors

      Reviewer #1:

      (1) Additional experiments 

      (i) More simulation replicates would be valuable here. The way that results are presented, it appears as though there are no replicates. Apologies if I am incorrect, but when looking through the authors code the --num_reps defaults to one simulation and there are no SDs reported for any figure. Perhaps the authors are bypass replicates by taking a random sample of lineages? Some clarification here would be great. 

      We agree with the reviewer’s constructive suggestions. We have increased the number of simulation sets to (n = 3) in addition to the existing replicates at the chromosomal level. We did not use a larger n for full sets of simulation replicates for two reasons: (1) full replication is quite computationally intensive (n=3 simulation sets already require a week to run on our computer cluster with hundreds of CPU cores). (2), the results from different simulation sets are highly consistent with each other, likely due to our large sample size (n= 1000 haploid genomes for each parameter combination).  The consistency across simulation sets can be exemplified by the following figures (Author response image 1 and 2) based on simulation sets different from Figures and Supplementary Figures included in the manuscript. 

      Author response image 1.

      Additional simulation sets repeating experiments shown in Fig 2.

      Author response image 2.

      Post-optimization Ne estimates based on three independent simulation sets (Fig 5 shows data simulation set 1).

      In our updated figures, we address the uncertainty of measurements as follows:

      (1) For IBD accuracy based on overlapping IBD segments, we present the mean ± standard deviation (SD) at the segment level (IBD segment false positives and false negatives for each length bin) or genome-pair level (IBD error rates at the genome-wide level). Figures in the revised manuscript show results from one of the three simulation set replicates. The SD of IBD segment accuracy is included in all relevant figures. In the S2 Data file, we chose not to show SDs to avoid text overcrowding in the heatmaps; however, a detailed version, including SD plotting on the heatmap and across three simulation set replicates, is available on our GitHub repository at https://github.com/bguo068/bmibdcaller_simulations/tree/main/simulations/ext_data

      (2) For IBD-based genetic relatedness, the uncertainty is depicted in scatterplots.

      (3) For IBD-based selection signal scans, we provide the mean ± SD of the number of true selection signals and false selection signals. The SD is calculated at the simulation set level (n=3). 

      (4) For IBD network community detection, the mean ± SD of the adjusted Rand index is reported at the simulation set level (n=3). A representative simulation set is randomly chosen for visualization purposes.

      (5) For IBD-based Ne estimates, each simulation set provides confidence intervals via bootstrapping. We found Ne estimates across n=3 simulation sets to be highly consistent and decided to display Ne from one of the simulation sets.

      (6) For the measurement of computational efficiency and memory usage, the mean ± SD was calculated across chromosomes from the same simulation sets.

      We have included a paragraph titled "Replications and Uncertainty of Measures" in the methods section to clarify simulation replications. Additionally, a table of simulation replicates is provided in the new S1 Data file under the sheet named “02_simulation_replicates.”

      (ii) I might also recommend a table or illustrative figure with all the simulation parameters for the readers rather than them having to go to and through a previous paper to get a sense of the tested parameters. 

      We have now generated tables containing full lists of simulation/IBD calling parameters. We have organized the tables into two sections: simulation parameters and IBD calling parameters. For the simulations, we are using three demographic models: the single-population (SP) model, the multiple-population (MP) model, and the human population demography in the UK (UK) model, each with different sets of parameters. Parameters and their values are listed separately for each demographic model (SP, MP and UK). For the IBD calling, we have five different IBD callers, each with different parameters. We have provided lists of the parameters and their values separately for each caller. In total, there are 15 different combinations of 3 demographic models in simulation and five callers in IBD detection (Author response image 3). We provide a table for each of the 15 combinations. We also provide a single large table by concatenating all 15 tables. In the combined table, demographic model-specific or IBD caller-specific parameters are displayed in their own columns, with NA values (empty cells) appearing in rows where these parameters are not applied (see S2 Data file).

      Author response image 3.

      Schematic of combined parameters from simulations and IBD detection (also included in the S2 Data file)

      (2) Recommendations for improving the writing and presentation 

      Overall, the writing was great, especially the introduction. 

      Three thoughts: 

      (i) It would be great if the authors included a few sentences with guidance on the approach one would take if their organism was not human or P. falciparum

      We have updated our discussion with the following statement: “Beyond Plasmodium parasites, there are many other high-recombining organisms such as Apicomplexan species like Theileria, insects like Apis mellifera (honeybee), and fungi like Saccharomyces cerevisiae (Baker's yeast). For these species, our optimized parameters may not be directly applicable, but the benchmarking framework established in this study can be utilized to prioritize and optimize IBD detection methods in a context-specific manner.”

      (ii) I think there was a lot of confusion about the simulations as they were presented between the co-reviewer and I. Clarification on whether there were replicates and how sampling of lineages occurred would be helpful for a reader. 

      We have added a paragraph with heading “Replications and uncertainty of measures” under the method section to clarify simulation replicates.  Please also refer to our response above for more details (Reviewer #1 (1) Additional experiments).

      (iii) Maybe we missed it, but could the authors add a sentence or two about why isoRelate performed so poorly (e.g. lines 206-207) considering it was developed for Plasmodium? This result seems important. 

      IsoRelate assumes non-phased genotypes as input; therefore, even if phased genotypes are provided, the HMM model used in isoRelate (distinct from the hmmIBD model) may not utilize them. Below, we present examples of IBD segments between true sets and inferred sets from both isoRelate and hmmIBD, where many small IBD segments identified by tskibd (ground truth) and hmmIBD (inferred) are not detected by isoRelate (inferred), although isoRelate still captures very long IBD segments. These patterns are also illustrated in Fig. 3 and S3 Fig. We acknowledge that isoRelate may outperform other methods in the context of unphased genotypes. However, we chose not to benchmark IBD calling methods using unphased genotypes in simulations, as the results may be significantly influenced by the quality of genotype phasing for all other IBD detection methods. The characterization of deconvolution methods is beyond the scope of this paper. We have added a paragraph in the discussion to reflect the above explanation.

      Author response image 4.

      Example IBD segments inferred by isoRelate and hmmIBD compared to true IBD segments calculated by tskibd.

      (3) Minor corrections to the text and figures 

      Lines 105-110 feel like introduction because the authors are defining IBD and goals of work 

      We have shortened these sentences and retained only relevant information for transition purposes. 

      Line 121-122 The definition of false positive is incorrect, it appears to be the exact text from false negative 

      We apologize for the typo and have corrected the definition, so that  it is consistent with that in the methods section. 

      Lines 177-180 feels more like discussion than results 

      We have removed this sentence for brevity. 

      Figure 1: 

      Remove plot titles from the figure 

      Write out number in a 

      The legend in b overlaps the data so moving that inset to the right would be helpful 

      We have removed the titles from Figure 1. In Figure 1a, we have changed the format of  the y-axis tick labels from scientific notation to integers.  In Figure 1b, we have adjusted the size and location of the legend so that it does not overlap with the data points.

      Figure 2-3 & S4-5: 

      It was hard to tell the difference between [3-4) and [10-18) because the colors and shapes are similar. It might be worth using a different color or shape for one of them? 

      We have changed the color for the [10-18) group so that the two groups are easier to distinguish.

      Figure 3 & S3-5: 

      Biggest suggestion is that when an axis is logged it should not only be mentioned in the caption but also should be shown in the figure as well. 

      We have updated all relevant figures so that the log scale is noted in the figure captions (legends) as well as in the figures (in the x and/or y axis labels).

      Supplementary Figure S2 

      (i) It would be nice to either combine it with the main text Figure 1 (I don't believe it would be overwhelming) or add in the other two methods for comparison 

      We have now plotted data for all five IBD callers in S1 Fig for better comparison. 

      (ii) the legend overlaps the data so relocating it to the top or bottom would be helpful 

      We have moved the legend to the bottom of the figure to avoid overlap with the data.

      Reviewer #2:

      I don't have any major comments on the paper. It is well-written, although perhaps a bit long and repetitive in some sections. Make sure not to repeat the same concepts too many times. 

      We have consolidated and removed several paragraphs to reduce repetition of the same concepts.

      I am not a methodological developer, but it seems you have addressed several challenges regarding IBD detection in P. falciparum. You have also acknowledged the study's caveats, which I agree with. 

      Thank you for the positive comments.

      Minor comments: 

      -In my opinion, the paper would benefit from including the workflow figure in the main text rather than keeping it in the supplementary materials. This would make it more accessible and useful for readers. 

      We have moved the original S1 Fig to be Fig 1 in the main text.

      -Some of the figures (e.g. Fig. 2, 4) should be larger for better clarity and interpretation. 

      We have updated Fig 2 and Fig 4 (now labeled as Figure 3 and 5) to make them larger for improved clarity and interpretation.

      -While the focus on P. falciparum is understandable, it would have been valuable to include examples of other species and discuss the broader implications of the findings for a broader field. 

      We have updated the third-to-last paragraph to discuss implications for other species, such as Apicomplexan species like Theileria, insects like Apis mellifera (honeybee), and fungi like Saccharomyces cerevisiae (Baker's Yeast). We acknowledge that optimal parameters and tool choices may vary among species due to differences in demographic history and evolutionary parameters. However, we emphasize that the methods outlined are adaptable for prioritizing and optimizing IBD detection methods in a context-specific manner across different species.

      -Figure 6 is somewhat confusing and could use clearer labeling or additional explanation to improve comprehension. 

      We have updated the labels and titles in the figure to improve clarity. We also edited the figure caption for better clarity.

      -Although hmmIBD outperformed other tools in accuracy, its computational inefficiency due to single-threaded execution poses a significant challenge for scaling to large datasets. The trade-off between accuracy and computational cost could be discussed in more detail. 

      We have added a paragraph in the discussion section to highlight the trade-off between accuracy and computation cost. We noted that we are developing an adapted tool to enhance the hmmIBD model and significantly reduce the runtime via parallelizing the IBD inference process.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      The authors of this study use electron microscopy and 3D reconstruction techniques to study the morphology of distinct classes of Drosophila sensory neurons *across many neurons of the same class.* This is a comprehensive study attempting to look at nearly all the sensory neurons across multiple sensilla to determine a) how much morphological variability exists between and within neurons of different and similar sensory classes, and 2) identify dendritic features that may have evolved to support particular sensory functions. This study builds upon the authors' previous work, which allowed them to identify and distinguish sensory neuron subtypes in the EM volumes without additional staining so that reconstructed neurons could reliably be placed in the appropriate class. This work is unique in looking at a large number of individual neurons of the same class to determine what is consistent and what is variable about their class-specific morphologies.

      This means that in addition to providing specific structural information about these particular cells, the authors explore broader questions of how much morphological diversity exists between sensory neurons of the same class and how different dendritic morphologies might affect sensory and physiological properties of neurons.

      The authors found that CO2-sensing neurons have an unusual, sheet-like morphology in contrast to the thin branches of odor-sensing neurons. They show that this morphology greatly increases the surface area to volume ratio above what could be achieved by modest branching of thin dendrites, and posit that this might be important for their sensory function, though this was not directly tested in their study. The study is mainly descriptive in nature, but thorough, and provides a nice jumping-off point for future functional studies. One interesting future analysis could be to examine all four cell types within a single sensilla together to see if there are any general correlations that could reveal insights about how morphology is determined and the relative contributions of intrinsic mechanisms vs interactions with neighboring cells. For example, if higher than average branching in one cell type correlated with higher than average branching in another type, if in the same sensilla. This might suggest higher extracellular growth or branching cues within a sensilla. Conversely, if higher branching in one cell type consistently leads to reduced length or branching in another, this might point to dendrite-dendrite interactions between cells undergoing competitive or repulsive interactions to define territories within each sensilla as a major determinant of the variability.

      We thank the reviewer for the insightful comments and appreciation for our study.

      Reviewer #2 (Public Review):

      The manuscript employs serial block‐face electron microscopy (SBEM) and cryofixation to obtain high‐resolution, three‐dimensional reconstructions of Drosophila antennal sensilla containing olfactory receptor neurons (ORNs) that detectCO2. This method has been used previously by the same lab in Gonzales et. al, 2021. (https://elifesciences.org/articles/69896), which had provided an exemplary model by integrating high-resolution EM with electrophysiology and cell-type-specific labeling.

      We thank the reviewer for expressing appreciation for our published study.

      The previous study ended up correlating morphology with activity for multiple olfactory sensillar types. Compared to the 2021 study, this current manuscript appears somewhat incomplete and lacks integration with activity.

      We thank the reviewer for their feedback. However, we would like to clarify that our previous study did not correlate morphology with activity to a greater extent than the current study. Both employed the same cryofixation, SBEM-based approach without recording odor-induced activity, but the focus of the current work is fundamentally different. While the previous study examined multiple sensillum types, the current study concentrates on a single sensillum type to address a distinct biological question regarding morphological heterogeneity. We appreciate the opportunity to clarify this distinction, and we hope that the revised manuscript more clearly conveys the unique scope and contributions of this study.

      In fact older studies have also reported two-dimensional TEM images of the putative CO2 neuron in Drosophila (Shanbhag et al., 1999) and in mosquitoes (McIver and Siemicki, 1975; Lu et al, 2007), and in these instances reported that the dendritic architecture of the CO2 neuron was somewhat different (circular and flattened, lamellated) from other olfactory neurons.

      We thank the reviewer for pointing this out. As noted in both the Introduction and Discussion sections, previous studies—including those cited by the reviewer—suggested that CO2-sensing neurons may have a distinct dendritic morphology. However, those earlier studies lacked the means to definitively link the observed morphology to CO2 neuron identity.

      In contrast, our study assigns neuronal identity based on quantitative morphometric measurements, allowing us to confidently associate the unique dendritic architecture with CO2 neurons. Furthermore, we extend previous observations by providing full 3D reconstructions and nanoscale morphometric analyses, offering a much more comprehensive and definitive characterization of these neurons. We believe this represents a significant advancement over earlier work.

      The authors claim that this approach offers an artifact‐minimized ultrastructural dataset compared to earlier. In this study, not only do they confirm this different morphology but also classify it into distinct subtypes (loosely curled, fully curled, split, and mixed). This detailed morphological categorization was not provided in prior studies (e.g., Shanbhag et al., 1999).

      We thank the reviewer for acknowledging the significance of our study.

      The authors would benefit from providing quantitative thresholds or objective metrics to improve reproducibility and to clarify whether these structural distinctions correlate with distinct functional roles.

      We thank the reviewer for raising this point. However, we would like to clarify that assigning neurons to strict morphological subtypes was not the primary aim of our study. In practice, dendritic architectures can be highly complex, with individual neurons often displaying features characteristic of multiple subtypes. This is precisely why we included a “mixed” subtype category—to acknowledge and capture this morphological heterogeneity rather than impose rigid classification boundaries.

      Our intent in defining subtypes was not to imply discrete functional classes, but rather to highlight the range of morphological variation observed across ab1C neurons. While we agree that exploring potential correlations between structure and function is an important future direction, the current study focuses on characterizing this diversity using 3D reconstruction and morphometric analysis. We hope this clarifies the purpose and scope of our morphological categorization.

      Strengths:

      The study makes a convincing case that ab1C neurons exhibit a unique, flattened dendritic morphology unlike the cylindrical dendrites found in ab1D neurons. This observation extends previous qualitative TEM findings by not only confirming the presence of flattened lamellae in CO₂ neurons but also quantifying key morphometrics such as dendritic length, surface area, and volume, and calculating surface area-to-volume ratios. The enhanced ratios observed in the flattened segments are speculated to be linked to potential advantages in receptor distribution (e.g., Gr21a/Gr63a) and efficient signal propagation.

      We thank the reviewer for appreciating the significance our current study.

      Weaknesses:

      While the manuscript offers valuable ultrastructural insights and reveals previously unappreciated heterogeneity among CO₂-sensing neurons, several issues warrant further investigation in addition to the points made above.

      (1) Although this quantitative approach is robust compared to earlier descriptive reports, its impact is somewhat limited by the absence of direct electrophysiological data to confirm that ultrastructural differences translate into altered neuronal function. A direct comparison or discussion of how the present findings align with the functional data obtained from electrophysiology would strengthen the overall argument.

      We thank the reviewer for this comment. We would like to clarify, however, that our study does not claim that the observed morphological heterogeneity necessarily leads to functional diversity. Rather, we consider this as a possible implication and discuss it as a potential question for future research. This idea is raised only in the Discussion section, and we are carefully not to present functional diversity as a conclusion of our study. Nonetheless, we have reviewed the relevant paragraph to ensure the language remains cautious and does not overstate our interpretation.

      We also acknowledge the significance of directly linking ultrastructural features to neuronal function through electrophysiological recordings. However, at present, it is technically challenging to correlate the nanoscale morphology of individual ORNs with their functional activity, as this would require volume EM imaging of the very same neurons that were recorded via electrophysiology. Currently, there is no dye-labeling method compatible with single-sensillum recording and SBEM sample preparation that allows for unambiguous identification and segmentation of recorded ORNs at the necessary ultrastructural resolution.

      To acknowledge this important limitation, we have added a paragraph in the Discussion section, as suggested, to clarify the current technical barriers and to highlight this as a promising direction for future methodological advances.

      (2) Clarifying the criteria for dendritic subtype classification with quantitative parameters would enhance reproducibility and interpretability. Moreover, incorporating electrophysiological recordings from ab1C neurons would provide compelling evidence linking structure and function, and mapping key receptor proteins through immunolabeling could directly correlate receptor distribution with the observed morphological diversity.

      Please see our response to the comment regarding the technical limitations of directly correlating ultrastructure with electrophysiological data.

      In addition, we would like to address the suggestion of using immunolabeling to map receptor distribution in relation to the 3D EM models. Currently, antibodies against Gr21a or Gr63a (the receptors expressed in ab1C neurons) are not available. Even if such antibodies were available, immunogold labeling for electron microscopy requires harsh detergent treatment to increase antibody permeability, damaging morphological integrity. These treatments would compromise the very morphological detail that our study aims to capture and quantify.

      (3) Even though Cryofixation is claimed to be superior to chemical fixation for generating fewer artifacts, authors need to confirm independently the variation observed in the CO2 neuron morphologies across populations. All types of fixation in TEMs cause some artifacts, as does serial sectioning. Without understanding the error rates or without independent validation with another method, it is hard to have confidence in the conclusions drawn by the authors of the paper.

      We thank the reviewer for raising concerns regarding potential artifacts in morphological analyses. However, we would like to clarify that cryofixation is widely regarded as a gold standard for ultrastructural preservation and minimizing fixation-induced artifacts, as supported by extensive literature. This is why we adopted high-pressure freezing and freeze substitution in our study.

      We have also published a separate methods paper (Tsang et al., eLife, 2018) directly comparing our cryofixation-based protocol with conventional chemical fixation, demonstrating substantial improvements in morphological preservation. This provides strong empirical support for the reliability of our approach.

      Regarding the suggestion to validate observed morphological variation across populations: we note that determining the presence of artifacts requires a known ground truth, which is inherently unavailable as we could not measure the morphometrics of fly olfactory receptor neurons in their native state. In the absence of such a benchmark, we have instead prioritized using the best-available preparation methods and high-resolution imaging to ensure structural integrity.

      Addressing these concerns and integrating additional experiments would significantly bolster the manuscript's completeness and advancement.

      We appreciate the reviewer’s feedback. As discussed in our responses to the specific comments above, certain suggested experiments are currently limited by technical constraints, particularly in the context of high-resolution volume EM for insect tissues enclosed in cuticles.

      Nevertheless, we have carefully addressed the reviewer’s concerns to the fullest extent possible within the scope of this study. We have revised the manuscript to clarify methodological limitations, added new explanatory content where appropriate, and ensured that our interpretations remain well grounded in the data. We hope these revisions strengthen the clarity and completeness of the manuscript.

      Reviewer #3 (Public Review):

      In the current manuscript entitled "Population-level morphological analysis of paired CO2- and odor-sensing olfactory neurons in D. melanogaster via volume electron microscopy", Choy, Charara et al. use volume electron microscopy and sensillum. They aim to investigate the degree of dendritic heterogeneity within a functional class of neurons using ab1Cand ab1D, which they can identify due to the unique feature of ab1 sensilla to house four neurons and the stereotypic location on the third antennal segment. This is a great use of volumetric electron imaging and neuron reconstruction to sample a population of neurons of the same type. Their data convincingly shows that there is dendritic heterogeneity in both investigated populations, and their sample size is sufficient to strongly support this observation. This data proposes that the phenomenon of dendritic heterogeneity is common in the Drosophila olfactory system and will stimulate future investigations into the developmental origin, functional implications, and potential adaptive advantage of this feature.

      Moreover, the authors discovered that there is a difference between CO2- and odour-sensing neurons of which the first show a characteristic flattened and sheet-like structure not observed in other sensory neurons sampled in this and previous studies. They hypothesize that this unique dendritic organization, which increases the surface area to volume ratio, might allow more efficient CO2 sensing by housing higher numbers of CO2 receptors. This is supported by previous attempts to express CO2 sensors in olfactory sensory neurons, which lack this dendritic morphology, resulting in lower CO2 sensitivity compared to endogenous neurons.

      Overall, this detailed morphological description of olfactory sensory neurons' dendrites convincingly shows heterogeneity in two neuron classes with potential functional impacts for odour sensing.

      Strength:

      The volumetric EM imaging and reconstruction approach offers unprecedented details in single cell morphology and compares dendrite heterogeneity across a great fraction of ab1 sensilla. The authors identify specific shapes for ab1C sensilla potentially linked to their unique function in CO2 sensing.

      We thank the reviewer for the insightful comments and appreciation for our study.

      Weaknesses:

      While the morphological description is highly detailed, no attempts are made to link this to odour sensitivity or other properties of the neurons. It would have been exciting to see how altered morphology impacts physiology in these olfactory sensory cells.

      We agree that linking morphological variation to physiological properties, such as odor sensitivity, would be a highly valuable direction for future research. However, the aim of the current study is to provide an in-depth nanoscale characterization based on a substantial proportion of ab1 sensilla, highlighting morphological heterogeneity among homotypic ORNs.

      At present, it is technically challenging to correlate the nanoscale morphology of individual ORNs with their physiological responses, as this would require volume EM imaging of the exact neurons recorded via single-sensillum electrophysiology. Currently, no dye-labeling method exists that is compatible with both single-sensillum recording and the stringent requirements of SBEM sample preparation to allow for unambiguous identification and segmentation of recorded ORNs.

      To acknowledge this important limitation, we have added a paragraph in the Discussion section clarifying the current technical barriers and highlighting this as a promising area for future methodological development. Please also see our responses to the reviewer’s 4th comment below, where we present preliminary experiments examining whether odor sensitivity varies among homotypic ORNs.

      (Please see the following pages for additional responses to the reviewers’ specific comments. These responses are not intended for publication.)

      Reviewer #1 (Recommendations for the authors):

      As this is mainly a descriptive paper I have no suggestions for additional experiments. Minor Text Suggestions:

      (1) The authors might want to include a better description/definition of the fly antennae, olfactory sensilla and their basic structure/makeup, position of the sensory neurons and dendrites within, etc, in the introduction perhaps in cartoon form to help readers that are not familiar (i.e. non-Drosophila readers) with the terminology and basic organization can follow the paper more easily from the start.

      We thank the reviewer for the helpful suggestion to broaden the appeal of our study to a wider readership. In response, we added a new introductory paragraph at the beginning of the Results section, along with illustrations in a new supplementary figure (Figure 1—figure supplement 1). The new paragraph reads as follows.

      “The primary olfactory organ in Drosophila is the antenna, which contains hundreds olfactory sensilla on the surface of its third segment (Figure 1—figure supplement 1A) . Each sensillum typically encapsulates the outer dendrites of two to four ORNs. The outer dendrites are the sites where odorant receptors are expressed, enabling the detection of volatile chemicals. A small portion of the outer dendrites lies beneath the base of the sensillum cuticle. At the ciliary constriction, the outer dendrites connect to the inner dendritic segment, which then links to the soma of each ORN (Figure 1—figure supplement 1B).”

      (2) In Figure 4D, the letter annotations above the graphs are not clearly defined anywhere that I could easily find. Please clarify with different symbols and/or in the figure legend so readers can easily comprehend the stats that are presented.

      We thank the reviewer for raising this point. As suggested, in the revised Figure 4D legend, following the original sentence “Statistical significance is determined by Kruskal-Wallis one-way ANOVA on ranks and denoted by different letters”, we added “For example, labels “a” and “b” indicate a significant difference between groups (P < 0.05), whereas labels with identical or shared letters (e.g., “a” and “a”, “a,b” and “a”, or “a,b” and “b”) indicate no significant difference.”

      Reviewer #3 (Recommendations for the authors):

      There are several aspects that I would like the authors to consider to improve the current manuscript:

      (1) Line 331: "Our analysis highlights how structural scaling in ab1D neurons achieves enhanced sensory capacity while maintaining the biophysical properties of dendrites". This is a strong statement, and not shown by the authors. They speculate about this in the discussion, but I would like them to soften the language here.

      We thank the reviewer for raising this point. As suggested, we have softened the language in the sentence in question. The revised version is as follows.

      “Our analysis suggests that structural scaling in ab1D neurons may enhance sensory capacity while preserving the biophysical properties of dendrites.”

      (2) The Supplementary material is not well presented and is not cited in the manuscript. It is not clear what the individual data files show, where they refer to, etc. Please provide clear labels of all data, cite them at the appropriate location in the manuscript, and make them more accessible to the reader. Also, there are two Videos mentioned in the manuscript that are not included in the submission.

      We thank the reviewer for bringing this to our attention and apologize for the oversight. We appreciate the reviewer’s careful attention to the supplementary materials. We have addressed these issues accordingly: 1) all source data have been consolidated in to a single, clearly labeled Excel file to improve accessibility for readers; this file is now cited at the appropriate locations in the manuscript. 2) The supplementary videos mentioned in the manuscript have also been included in the re-submission.

      (3) In Figure 1B, it is hard to recapitulate the increase in dendritic density in the presented pictures. Could the authors please highlight dendrites in the raw imaging files (e.g. by colour coding as done later in the manuscript). Also, it might be helpful to indicate the measured parameters visually in this Figure (e.g. volume, length, etc.).

      We thank the reviewer for the helpful suggestion. As suggested, we have pseudocolored the dendrites in Figure 1B to enhance visual clarity.

      As noted, the original legend stated that “the sensilla were arranged from left to right in order of increasing dendritic branch counts”. To improve clarity, we have now added the number of dendritic branches above each sensillum to make this information more explicit.

      We hope these changes make the figure more accessible and informative for readers.

      (4) Given the strength of the authors in in vivo physiology and single sensilla recordings, I would be very curious about how the described morphological heterogeneity is reflected in the response properties of ab1Cs and ab1Ds. Can the authors provide data (already existing from their lab) of these two neurons on response heterogeneity? I acknowledge that spike sorting can be very challenging in ab1s, but maybe it is possible to show the range of response sensitivities upon CO2 stimulation in ab1Cs? The authors speculate in the discussion and presented data will only be correlative - however I think it would strengthen the manuscript to have some link to physiology included.

      We thank the reviewer for this insightful comment. We share the same curiosity about response variability among homotypic ORNs, including ab1C and ab1D. Ideally, this question could be addressed by recording from a large proportion of neurons of a given ORN type to assess the response variability within a single antenna. However, due to technical limitations, we are only able to reliably record from 3–4 ab1 sensilla per antennal preparation, representing approximately 8% of the total ab1 population.

      Moreover, our recordings are typically limited to ab1 sensilla located on the posterior-medial side of the antenna, as this region provides the best accessibility for our recording electrode. This spatial constraint may limit our ability to sample the full morphological diversity of ab1C and ab1D neurons.

      Given these limitations, it is technically challenging to rigorously assess physiological variability in ab1C and ab1D responses across the entire ab1 population. Nonetheless, we attempted to address this question using a different sensillum type where a larger proportion of the population is accessible to single-sensillum recording per antennal preparation. Specifically, we focused on ab2 sensilla in the following analysis because we can reliably record from 6 sensilla per antenna, representing approximately 25% of the total ab2 population.

      In the preliminary data presented below, we recorded from 6 ab2A ORNs per antenna across a total of 6 flies. Spike analysis revealed that odor-evoked responses were consistent across individual ab2A neurons (Author response image 1A). When analyzing the dose-response curve for each ORN, we found no statistically significant differences in odor sensitivity, either among ORNs within the same antenna or across different flies (Author response image 1B; two-way ANOVA: P > 0.99 within antennae, P > 0.99 across flies). This is further supported by the closely clustered EC50 values (Author response image 1C). This result suggests that odor sensitivity is largely uniform among homotypic ab2A ORNs.

      Author response image 1.

      Homotypic ab2A ORNs display similar odorant sensitivity. (A) Single-sensillum recording. Raster plots of ab2A/Or59b ORN spike responses. Six ab2A ORNs from the same antenna were recorded per fly. Odor stimulus: methyl acetate (10-6). (B) Dose-response relationships of peak spike responses, normalized to the maximum response of the ORN to facilitate comparison of odor sensitivity. Each curve represents responses from a single ab2A ORN fitted with the Hill equation (n=36 ab2 sensilla from 6 flies). Responses recorded from the same antenna are indicated by the same color. Statistical comparisons between different ab2A ORNs from the same antenna (P > 0.99) or across flies (P > 0.99) were performed by two-way ANOVA. (C) Quantification of individual pEC50 values from (B), defined as -logEC50.

      However, we are hesitant to include this result in the main manuscript for several reasons. First, it does not directly relate to the morphometric analysis of ab1C and ab1D neurons, which is the primary focus of our study. Second, while we were able to record from approximately 25% of the ab2 population, this level of coverage is still limited and potentially subject to sampling bias due to the spatial constraints of the antennal region accessible to the recording electrode.

      At best, our data suggest limited variability in odor sensitivity among the recorded ab2A ORNs. However, we are cautious about generalizing this finding to the entire ab2 population. In light of these considerations, we hope the reviewer can appreciate the technical challenges inherent in addressing what may appear to be a straightforward question.

      For these reasons, we have chosen to include this preliminary result in the response only, rather than in the main manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public Review):

      Summary:

      The authors present a mean-field model that describes the interplay between (protein) aggregation and phase separation. Different classes of interaction complexity and aggregate dimensionality are considered, both in calculations concerning (equilibrium) phase behavior and kinetics of assembly formation.

      Strengths:

      The present work is, although purely theoretical, of high interest to understanding biological processes that occur as a result of a coupling between protein aggregation and phase separation. Of course, such processes are abundant, in the living cell as well as in in-vitro experiments. I appreciate the consideration of aggregates with various dimensionality, as well as the categorization into different ”interaction classes”, together with the mentioning of experimental observations from biology. The model is convincing and underlines the complexity associated with the distribution of proteins across phases and aggregates in the living cell.

      Weaknesses:

      There are a few minor weaknesses.

      Reviewer 2 (Public Review):

      This work deals with a very difficult physical problem: relating the assembly of building blocks on a molecular scale to the appearance of large, macroscopic assemblies. This problem is particularly difficult to treat, because of the large number of units involved, and of the complex way in which these units-monomers-interact with each other and with the solvent. In order to make the problem treatable, the authors recur to a number of approximations: Among these, there is the assumption that the system is spatially homogeneous, i.e., its features are the same in all regions of space. In particular, the homogeneity assumption may not hold in biologically relevant systems such as cells, where the behavior close to the cell membrane may strongly differ from the one in the bulk. As a result, this hypothesis calls for a cautious consideration and interpretation of the results of this work. Another notable simplification introduced by the authors is the assumption that the system can only follow two possible behaviors: In the first, each monomer interacts equally with the solvent; no matter the size of the cluster of which it is part. In the second case, monomers in the bulk of a cluster and monomers at the assembly boundary interact with the solvent in a different way. These two cases are considered not only because they simplify the problem, but also because they are inspired by biologically relevant proteins.

      With these simplifications, the authors trace the phase diagram of the system, characterizing its phases for different fractions of the volume occupied by the monomers and solvent, and for different values of the temperature. The results qualitatively reproduce some features observed in recent experiments, such as an anomalous distribution of cluster sizes below the system saturation threshold, and the gelation of condensed phases above such threshold.

      Reviewer 3 (Public Review):

      Summary:

      The authors combine classical theories of phase separation and self-assembly to establish a framework for explaining the coupling between the two phenomena in the context of protein assemblies and condensates. By starting from a mean-field free energy for monomers and assemblies immersed in solvent and imposing conditions of equilibrium, the authors derive phase diagrams indicating how assemblies partition into different condensed phases as temperature and the total volume fraction of proteins are varied. They find that phase separation can promote assembly within the protein-rich phase, providing a potential mechanism for spatial control of assembly. They extend their theory to account for the possibility of gelation. They also create a theory for the kinetics of self-assembly within phase separated systems, predicting how assembly size distributions change with time within the different phases as well as how the volumes of the different phases change with time.

      Strengths:

      The theoretical framework that the authors present is an interesting marriage of classic theories of phase separation and self-assembly. Its simplicity should make it a powerful general tool for understanding the thermodynamics of assembly coupled to phase separation, and it should provide a useful framework for analyzing experiments on assembly within biomolecular condensates.

      The key advance over previous work is that the authors now account for how self-assembly can change the boundaries of the phase diagram.

      A second interesting point is the explicit theoretical consideration for the possibility that gelation (i.e. self-assembly into a macroscopic aggregate) could account for widely observed solidification of condensates. While this concept has been broadly discussed, to date I have yet to see a rigorous theoretical analysis of the possibility.

      The kinetic theory in sections 5 and 6 is also interesting as it extends on previous work by considering the kinetics of phase separation as well as those of self-assembly.

      Weaknesses:

      A key point the authors make about their theory is that it allows, as opposed to previous research, to study non-dilute limits. It is true that they consider gelation when the 3D assemblies become macroscopic. However, dilute solution theory assumptions seem to be embedded in many aspects of their theory, and it is not always clear where else the non-dilute limits are considered. Is it in the inter-species interaction χij? Why then do they never explore cases for which χij is nonzero in their analysis?

      We explicitly consider that monomers and aggregates are non-dilute with respect to solvent. This is evident in accounting for the mixing entropy of all components, including the solvent. Moreover, we account for interactions among the monomers and the different aggregates with the solvent. We consider the case where each monomeric unit, independent in aggregate it is part of, interacts the same way with the solvent. Please note that this case corresponds to a non-dilute scenario where interactions indeed drive phase separation.

      The connection between this theory and biological systems is described in the introduction but lost along the main text. It would be very helpful to point out, for instance, that the presence of phase separation might induce aggregation of proteins. This point is described formally at the end of Section 3, but a more qualitative connection to biological systems would be very useful here.

      We thank the referee for the useful comment, we now mention this in the introduction (line 80) and point out the biological relevance of assembly formation and localization via the presence of phase separation (lines 268 and 283).

      Building on the previous point, it would be helpful to give an intuitive sense of where the equations derived in the Appendices and presented in the main text come from and to spell out clear physical interpretations of the results. For example, it would be helpful to point out that Eq. 4 is a form of the law of mass action, familiar from introductory chemistry. It would be useful to better explain how the current work extends on existing previous work from these authors as well as others. Along these lines, closely related work by W. Jacobs and B. Rogers [O. Hedge et al. 2023, https://arxiv.org/abs/2301.06134; T. Li et al. 2023, https://arxiv.org/abs/2306.13198] should be cited in the introduction. The results discussed in the first paragraph of Section 3 on assembly size distributions in a homogeneous system are well-known from classic theories of self-assembly. This should be acknowledged and appropriate references should be added; see for instance, Rev. Mod. Phys. 93, 025008 and Statistical Thermodynamics Of Surfaces, Interfaces, And Membranes by Sam Safran. Equation 14 for the kinetic of volume fractions is given with reference to Bauermann et al. 2022, but it should be accompanied by a better intuitive interpretation of its terms in the main text. In particular, how should one understand the third term in this equation? Why does the change in volume impact the change of volume fraction in this way?

      We thank the referee for the suggestions. We have included the missing references, with a particular emphasis on DNA nanostars that inhibit phase separation in DNA liquids in the definition of class II. We added intuitive explanations of the main equations, such as Eqs. (4),(8),(14), (17), and (18). Notice that, according to Mysels, Karol J., J. Chem. Educ., 33, 178 (1956) (https://pubs-acs-org.sire.ub.edu/doi/epdf/10.1021/ed033p178) we refer to (18) as the law of mass action.

      The discussion in the last paragraph of Section 6 should be clarified. How can the total amount of protein in both phases decrease? This would necessarily violate either mass or volume conservation. Also, the discussion of why the volume is non-monotonic in time is not clear.

      A decrease in the total amount of protein in both phases does not violate mass conservation, if the volume of the phases varies accordingly. In particular, the volume of the denser phase should grow. This given, in the case presented the total protein amount in the dense phase decreases, while in the dilute phase increases. For this reason, we revised the paragraph and now explain the results in more detail (see lines starting from 407). The nonmonotonic volume change is indeed a puzzling finding that, as we now state in the manuscript, requires further investigation. Given the lack of analytical approaches available to tackle the complex kinetics in the presence of coexisting phases, we believe that this analysis goes beyond the scope of the present paper.

      Recommendations for the authors

      Reviewer 1 (Recommendations For The Authors):

      Line 96: I feel a mentioning/definition/explanation and perhaps some discussion on the parameter M (limiting aggregate size) would have been in place in the introduction of Equation (1). Furthermore, in the usual interpretation, Flory interaction parameters (symbolized χ) are dimensionless, as, classically, they represent an exchange energy (normalized by kT), defined on a monomeric basis. Here they seem to carry the dimension of energy.

      We thank the reviewer for the observation. We have included a brief comment on M and mentioned that we use χ parameters that carry the dimension of energy such that, varying kBT, we scale at the same time the term containing interaction propensities (χ) and the one containing internal energies (_e_int). See the comment on line 127

      Line 150: The choice of ρi \= i physically implies that a single protein is assumed to have the same as a solvent molecule. This may be a bit of a stretch. This assumption leads to an overestimation of the translational entropy of the aggregates (first term in Equation (1)). Acknowledging that ρ_1 >> ρs_ would give a pronounced desymmetrization of the phase diagram (I suspect).

      Indeed, in the case of monomers only, the assumption leads to a symmetric phase diagram which may be unrealistic. Once assemblies form, however, the phase diagram becomes asymmetric and for this reason we decided to assume ρi \= i, simplifying the theoretical analysis. We have added a clarifying sentence in the manuscript, see line 163

      Furthermore, the pictures in Figure 1a-c suggest the presence of a disordered residue, the degree of swelling of which might affect binding strength (see for instance: https://doi.org/10.3389/fnmol.2022.962526).

      We added a comment on the possible coupling between internal free energies and interaction propensities, such as the swelling mechanism that affects binding sites, and included the reference above (line 215).

      Line 154-156: It’s unclear what is meant with ”an internal bond that keeps each assembly together”. How should this be interpreted on an intuitive physical level?

      We apologise for being unclear. We meant the internal bonds that lead to the formation of assemblies. We have now rephrased this sentence in the main text (lines starting from 169).

      Line 254: The fact that ϕsg is defined below does not mean it does not fall out of the air here. The same holds for the consideration of the limit M →∞. Ideally, the main text should stand on its own, in particular with respect to physical intuitiveness, as well as the necessity and interest of discussion topics. Technical details, derivations and additional information can be in an appendix.

      We agree with the referee and added some physical insights about the limit. We now also state clearly in the main text (line 298) that _ϕ_sg is affected by temperature and the free energy of internal bonds.

      Line 257: ”Since we do not explicitly include the solvent in assembly formation we will consider the gel as a phase without solvent and thus ϕtot \= 1”. I’m not sure if I can agree with this. I would say, a gel, certainly in biological context, almost per definition contains a large fraction of solvent, i.e. here water. The situation ”ϕtot \= 1” would rather be a solid precipitate. Is gelation properly captured by this model?

      We thank the referee for this very relevant observation. We now state in the main text that the model predicts a macroscopic assembly which we call ’the gel phase’, in agreement with previous literature. Then, to clarify, we added the sentence ”Please note that, since we do not explicitly include the solvent in assembly formation (see reaction scheme in Fig.1a), in our model the gel corresponds to a phase without solvent, _ϕ_tot \= 1. To account for biological gels that can be rich in water, our theory can be straightforwardly extended by incorporating the solvent into the reaction scheme.”, see main text line 300.

      Line 268: Shouldn’t ”solvent” be ”solution”? If fsol is given by Equation (1), surely not only the solvent is considered.

      Indeed, this is a typo, and we now use the term ’solution’ instead of ’solvent.’

      Line 273: At this stage, the only information provided in the main text is that ω∞ is ”a constant that does not affect chemical nor phase equilibrium, except in the limit M →∞” (see lines 153-154). This is a little bit too abstract for me. Again, the main text should stand on its own, meaning the reader should not have to rely on an appendix to at least have an intuitive physical understanding of any modeling or input parameter discussed in the main text.

      We thank the reviewer for pointing this out. We now comment on the physical interpretation of ω∞ in the main text, see lines from 320 on.

      Figure 4. appears in Equation (39) but it is not defined.

      We thank the reviewer for pointing this out. We have reshaped appendix 6A, making use of chemical activities and clarified the origin of the rate .

      Line 317. I don’t fully understand the intention of the remark on the model being adaptable for ”primary and secondary nucleation”. How/in what way is this different from association and dissociation? For instance, classical nucleation theory is based on association and dissociation of monomeric units to and from clusters.

      We agree that the kinetic rate coefficients kij (appearing in the association and dissociation rates ∆rij, Eq. 17) in our manuscript already depend on assembly length, see Appendix 6 B, where we now clarified their definition. Please note that, however, that secondary nucleation is a special kind of association, for which the kinetic rate coefficients corresponding to associations of small assemblies, i.e. kij with_i,j_ ≪ M, explicitly depend on the presence of large assemblies with sizes l ≫ 1. In our manuscript, we have not accounted for such a dependence. We now make this aspect clear in the manuscript, see Appendix 6 B.

      Line 321. Why is ∆rij called the ”monomer exchange rate”? In line 318 the same parameter is defined as the ”reaction rate for the formation of a (i+j)-mer”. Why should these be the same?

      We thank the reviewer for spotting this typo.

      Line 323. Why do these calculations use M = 15?

      The exploration of a 15-dimensional phase space is already numerically challenging. We are currently working on a generalization of the numerical scheme to work with larger values of M but, to discuss the fundamental physical principles, we kept M \= 15.

      Reviewer 2 (Recommendations For The Authors):

      The manuscript presents several issues, on both the scientific and presentational level, which need to be carefully addressed. Please find below a list of the points that need to be addressed by the authors, divided into major and minor points. Major issues:

      • A general, major concern about the results in the paper is the homogeneity assumption. I do understand that repeating the whole analysis presented in the manuscript by allowing for spatial inhomogeneities partially goes beyond the scope of this paper. However, the authors should at least discuss how such inhomogeneities may alter the results in a qualitative way, and treat explicitly the presence of inhomogeneity in one prototypical case treated in the manuscript. Namely, what happens if the volume fractions and relative molecular volumes in the free energy (1) depend on space, e.g., ϕiϕi(x)?

      We would like to stress that, in the present paper, we do account for spatial inhomogeneities. Indeed, in the case of phase separation, we consider systems which are divided into two phases, characterized by different values of the assemblies’ volume fractions ϕi. We do, however, consider the system to be homogeneous inside the phases, implying a jump in the value of the volume fraction at the interface between the two phases. In this sense, the analysis we carry out is valid in the thermodynamic limit, where gradients of the volume fractions ϕi(x) within the phases, can be neglected. On the other hand, considering the full spatial problem, i.e. solving the equations for M \= 15 spatially varying fields, would be numerically extremely challenging.

      • The authors’ results relate molecular assembly- a phenomenon at the molecular scale-to phase separation-a mesoscopic or macroscopic phenomenon. The authors should stress the conceptual importance of this connection between scales, and present their results from the perspective of a multi-scale model.

      We thank the reviewer for pointing this out. We now emphasize the multi-scale feature of our model in the introduction (line 80).

      • Starting from Section 1, the reader is not well guided through the sections that follow. The authors should provide an outline of the line of though that they are going to follow in the following sections, and logically connect each section to the next one with a short paragraph at the end of each section. This paragraph should resume what has been addressed in the current section, and the connection with the topic that will be addressed in the next one.

      We agree with the reviewer and have added a transitioning sentence at the end of each paragraph.

      • ’We focus on linear assemblies (d = 1)’: Given the striking differences of the results between d = 1 and d > 1 shown above, the authors should discuss what happens for d > 1 as well.

      • ’In figure Fig. 5a, we show the initial and final equilibrium binodals (black and coloured curve, respectively), for the case of linear assemblies (d = 1) belonging to class 1’: Again, show what happens for d > 1.

      We agree with the reviewer, the kinetics in d > 1 would be definitely interesting. However, in this case, one assembly can become macroscopic (i.e. M must be set to ∞). This requires some substantial modification in the kinetic scheme, like introducing an absorbing boundary condition for monomers ’sucked in’ the gel. We prefer to leave this for future work, and now state it explicitly in the manuscript (line 383).

      • ’This difference arises because, within class 2, monomers in the bulk of an assembly have reduced interaction propensity with respect to the boundary ones. As a consequence, the formation of large clusters shifts the onset of phase separation to higher ϕtot values.’: To prove this argument, the authors should show Fig. 2g and h for d > 1. In fact, by varying d, the effect of the boundary vs. bulk also varies.

      We prefer to discuss the thermodynamics of d > 1 in section 4 on gelation. There we present only a single phase diagram so as not to blow up the discussion on equilibrium too much.

      • ’referring for simplicity to systems belonging to Class 1’: The authors should do the same analysis for Class 2.

      We agree with the reviewer. However, again not to blow up the discussion on equilibrium, we leave it for future work.

      • ’other, implying that the corresponding Flory-Huggins parameter χij vanishes’: Why?

      The explanation based on a lattice model is reported in Appendix 2, and is now more clearly referenced (line 185).

      Minor issues:

      • Eq. (10): Here the authors should explain in the main text, possibly in a simple and intuitive way, why the number of monomers i and the space dimension d enter the righthand side of this equation in this particular way.

      We thank the reviewer for pointing this out. We added the physical origin of the scaling with dimension in Eq. (10) and in Eq. (8), as pointed out by reviewer 3.

      • ’The second and fifth terms of fsol characterize the internal free energies’: What do you mean by ’characterize the internal free energies’? Please clarify.

      As we now state more clearly (lines 114-120), these two contributions include the internal free energies ω_s and _ωi, stemming from the free energy of internal bonds that lead to assembly formation.

      • ’depend on the scaling form of the’: Scaling with respect to what ? Please clarify.

      We have now clarified that the scaling is with respect to the assembly size i.

      • Figure 2 is way too dense: it should be split into two figures, and the legend of each of the two figures should be expanded to properly guide the reader to understand the figures.

      We understand the reviewer’s point of view. To avoid altering the present flow, we decided not to split the figure, but we have included shaded boxes to better guide the reader.

      • ’this is a consequence of the gelation transition’: Please clarify

      • ’and this limitation can be dealt with by introducing explicitly the infinite-sized gel in the free energy’: Why? Please clarify.

      We have now rephrased these sentences, hopefully in a clearer way. We now state: ’We know that this divergence is physical, and is caused by the gelation transition. This limitation can be dealt with by introducing explicitly a term in the free energy that accounts for an infinite-sized assembly (the gel)’, see lines 320-322.

      • Figure 4: Add plots of panels d, e, h and i with log scale on the y axis to make explicit an eventual exponential behavior, and revise the text accordingly

      Not to further complicate Figure 4, we preferred to display the logarithmic plots of the equilibrium distribution in the appendix, see Figure A3-1.

      • ’... an equilibrium distribution which monotonously decreases with assembly size’: It is not the distributions that decreases but the cluster volume fraction, please rephrase.

      We thank the reviewer for pointing this out and have now rephrased this sentence (line 394).

      Reviewer 3 (Recommendations For The Authors):

      I could not obtain the exact form of Eq 29 in App 3, can the authors elaborate on this calculation. App 3: What does it mean binodal agrees well with ϕsg? And doesn’t ϕsg depend on temperature through phi tilde? What temperature is this result for?

      We apologise for the unclear explanation. We now state in detail that Eq. (29) is obtained by plugging the expression of ϕi given in Eq. (24) into Eq. (1), in the main text. The dependence of ϕ<sub>1</sub> on ϕ<sub>tot</sub> is expressed in Eq. (26), and we have omitted linear terms in ϕ<sub>tot</sub>, since they do not affect phase equilibrium (see lines 802-809). Moreover, ϕsg depends indeed on k<sub>B</sub>T. We refer to the comparison between the full curve ϕsg in the k<sub>B</sub>T−ϕ<sub>tot</sub> plane, and the branch of the binodal between the triple point (indicated now with a cross) and ϕ<sub>tot</sub> \= 1. The two curves are close, as expected since both correspond to the boundary between homogeneous mixtures and the gel state, obtained with different methods.

      The references to Figures in the appendices are confusing. Please make it clear whether Figures in the main text or the appendices are being referenced. On a related note, the Appendix figures seem to be placed in appendices whose text describes something else - Appendix 2, Figure 1 should be moved to Appendix 3; Appendix 3, Figure 1 should be moved to Appendix 4; etc.

      We revised the appendix, corrected the figure positions and clarified their references.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #2 (Public Review):

      Making state-of-the-art (super-resolution) microscopy widely available has been the subject of many publications in recent years as correctly referenced in the manuscript. By advocating the ideas of open-microscopy and trying to replace expensive, scientific-grade components such as lasers, cameras, objectives, and stages with cost-effective alternatives, interested researchers nowadays have a number of different frameworks to choose from. In the iteration of the theme presented here, the authors used the existing modular UC2 framework, which consists of 3D printable building blocks, and combined a cheapish laser, detector and x,y,(z) stage with expensive filters/dichroics and a very expensive high-end objective (>15k Euros). This particular choice raises a first technical question, to which extent a standard NA 1.3 oil immersion objective available for <1k would compare to the chosen NA 1.49 one.

      Measurement of the illumination quality (e.g. the spectral purity) of low budget lasers convinced us of the necessity to use spectral filtering. These cannot be replaced with lower budget alternatives, to sill retain the necessary sensitivity to image single molecules. As expected, the high-quality objectives are able to produce high-quality data. Lower budget alternatives (<500 €) to replace the objective have been tried out. Image quality is reduced but key features in fluorescent images can be identified (see figure S1). The usage of a low budget objective for SMLM imaging is possible, but quality benchmarks such as identifying railroad tracks along microtubule profiles is not possible. Their usage is not optimal for applications aiming to visualize single molecules and might find better application in teaching projects.

      The choice of using the UC2 framework has the advantage, that the individual building blocks can be 3D printed, although it should be mentioned that the authors used injection-molded blocks that will have a limited availability if not offered commercially by a third party. The strength of the manuscript is the tight integration of the hardware and the software (namely the implementations of imSwitch as a GUI to control data acquisition, OS SMLM algorithms for fast sub-pixel localisation and access to Napari).

      The injection-molded cubes can be acquired through the OpenUC2 platform. Alternatively, the 3D printable version of the cubes is freely available and just requires the user to have a 3D printer. https://github.com/openUC2/UC2-GIT/tree/master/CAD/CUBE_EmptyTemplate

      The presented experimental data is convincing, demonstrating (1) extended live cell imaging both using bright-field and fluorescence in the incubator, (2) single-particle tracking of quantum dots, and (3) and STORM measurements in cells stained against tubulin. In the following I will raise two aspects that currently limit the clarity and the potential impact of the manuscript.

      First, the manuscript would benefit from further refinement. Elements in Figure 1d/e are not described properly. Figure 2c is not described in the caption. GPI-GFP is not introduced. MMS (moment scaling spectrum) could benefit from a one sentence description of what it actually is. In Figure 6, the size of the STORM and wide-field field of views are vastly different, the distances between the peaks on the tubuli are given in micrometers rather than nanometers. (more in the section on recommendations for the author)

      Second, and this is the main criticism at this point, is that although all the information and data is openly available, it seems very difficult to actually build the setup due to a lack of proper documentation (as of early July 2023).

      1) The bill of materials (https://github.com/openUC2/UC2-STORM-and-Fluorescence#bill-of-material) should provide a link to the commercially available items. Some items are named in German. Maybe split the BoM in commercially available and 3D printable parts (I first missed the option to scroll horizontally).

      2) The links to the XY and Z stage refer to the general overview site of the UC2 project (https://github.com/openUC2/) requiring a deep dive to find the actual information.

      3) Detailed building instructions are unfortunately missing. How to assemble the cubes (pCad files showing exploded views, for example)? Trouble shooting?

      4) Some of the hardware details (e.g. which laser was being used, lenses, etc) should be mentioned in the manuscript (or SI)

      I fully understand that providing such level of detail is very time consuming, but I hope that the authors will be able to address these shortcomings.

      1) The bill of materials has been and will also in future still be improved. The items have been sorted into UC2 printed parts and externally acquired parts. The combination of part name as well as provider enables users to find and acquire the same parts. Additionally, depending on the country where the user is located, different providers of a given part might be advantageous as delivery means and costs might vary.

      2) The Z-stage now has a specific repository with different solutions, offering different solutions with different levels of movement precision. According to the user and their budget, different solutions can be optimal for the endeavor.

      https://github.com/openUC2/UC2-Zstage

      The XY stage now also has a detailed repository, as the motorizing of the stage requires a fair amount of tinkering. The video tutorials and the detailed instructions on stage motorizing should help any user to reproduce the stage shown within this manuscript. https://github.com/openUC2/UC2-Motorized-XY-Table

      3) The updated repository has a short video showing the general assembly of the cubes and the layers. Additionally, figure S2 shows all the pieces that are included in every layer (as a photograph as well as CAD). An exploded view of the complete setup would certainly be a helpful visualization of the complete setup. We however hope that the presented assembly tutorials and documents are sufficient to successfully reproduce the U.C.STORM setup.

      First, we want to thank the reviewers for their effort to help us improving our work. We apologize for any trivial mistakes we had overlooked. Please find below our answers to the very constructive and helpful comments of the editors.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      To complement the current data set:

      Figure 2(a & b): Panels i & ii, were chosen on the area where the distribution of the laser appears to be flatter. Can the authors select microtubules from a different section? Otherwise, it is reasonable to also crop the field-of-view along the flatter area (as done in Fig 6).

      Figure 2 was changed to according to the reviewer’s suggestions. The profiles of microtubules from a different section have similar profiles, but the region with best illumination thus best SNR of the profile have been used for the figure.

      Figure 2(c): The current plot shows the gaussian distribution which does not appear to be centered. Instead of a horizontal line, can the authors provide a diagonal profile across the field of view and update the panel below?

      A diagonal cross-section of the illuminated FOV is provided in figure 2 to replace the previous horizontal profile. The pattern seems not to be perfectly radially symmetric, and more light seems to be blocked at the bottom of the illumination pattern compared to the top. A possible improvement can be provided by a fiber-coupled laser, that could provide a more homogeneous illumination while being easier to handle in the assembly process.

      Author response image 1.

      Diagonal cross-section of the illuminated FOV. Pixel-size (104nm) is the same as in figure 2. Intensity has been normalized according to the maximal value.

      Figure 2(d): The system presents a XY drift of ~500nm over the course of a couple of hours. However, is not clear how the focus is being maintained. Can the authors clarify this point and add the axial drift to the plot?

      The axial position of the sample could be maintained over a prolonged period of time without correcting for drift. Measurements where an axial shift was induced by tension pulses in the electronics have been discarded, but the stability of the stage seems to be sufficient to allow for imaging without lateral and axial drift correction. The XY drift measurement displayed in Figure 2(d) can be extended by measuring the σ of the PSF over time. The increase of σ would suggest an axial displacement in relation to the focus plane. In these measurements, a slight axial drift can be seen, the fluorescent beads however can still be localized over the whole course of the measurement.

      A separate experiment was performed, using the same objective on the UC2 setup and on a high-quality setup equipped with a piezo actuator able to move in 10 nm steps. The precise Z steps of the piezo allows to reproducibly swipe through the PSF shape and to give an estimate of the axial displacement of the sample, according to the changes in PSF FWHM (Full Width at Half Maximum). When superimposing the graph with the UC2 measurement of fluorescent beads with the smallest possible Z step, an estimate about the relative axial position of the sample can be provided. The accuracy of the stage however remains limited.

      Author response image 2.

      Drift Figure: a. Drift of fluorescent TS beads on the UC2 setup positioned upon an optical table over a duration of two hours. Beads are localized and resulting displacement in i. and ii. are plotted in the graphs below. The procedure is repeated in b. with the microscope placed on a laboratory bench instead. c. (for the optical table i.) and d. (for the laboratory bench i.) show the variation in the sigma value of the localized beads over the measurement duration. As the sigma values changes when the beads are out of focus, the stability of the setup can be confirmed, as it remains practically unchanged over the measurement duration.

      Author response image 3.

      Z-focus Figure: Estimation of the axial position of TS beads on the UC2 setup. a. The change in PSF FWHM was quantified by acquiring a Z stack of a beads sample. The homebuilt high-quality setup (HQ) was used as a reference, by using the same objective and TS sample. The PSF FWHM on the UC2 setup was measured using the lowest possible axial stage displacement. A Z-position can thus be estimated for single molecules, as displayed in b.

      Addressing the seemingly correlated behavior of the X and Y drift:

      Further measurement show less correlation between drift in X and in Y. Simultaneous motion in X and Y seems to indicate that the stage or the sample is tilted. The collective movement in X and Y seems accentuated by bigger jumps, probably originating from vibrations (as more predominantly shown in the measurements on the laboratory bench compared to the optical table). Tension fluctuations inducing motion of the stage are possible but are highly unlikely to have induced the drift in the displayed measurements.

      Figure 3: Can the authors comment on the effect or otherwise potential effect of the incubator (humidity, condensation etc) may have on the system (e.g., camera, electronics etc)?

      When moving the microscope into the incubator, the first precaution is to check if the used electronics are able to perform at 37° C. Then, placing the microscope inside the incubator can induce condensation of water droplets at the cold interfaces, potentially damaging the electronics or reducing imaging quality. This can be prevented by preheating the microscope in e.g. an incubator without humidity, for a few hours before placing it within the functional incubator. The used incubator should also be checked for air streams (to distribute the CO2), and a direct exposure of the setup to the air stream should be prevented. The usage of a layer of foam material (e.g. Polyurethane) under the microscope helps to reduce possible effects of incubator vibrations on the microscope. The hydrophilic character of PLA makes its usage within the incubator challenging due to its reduced thermal stability. The temperature also inherently reduces the mechanical stability of 3D printed parts. Using a less hydrophilic and more thermally stable plastic, such as ABS, combined with a higher percentage of infill are the empirical solution to this challenge. Further options and designs to improve the usage of the microscope within the incubator are still in developement.

      Figure 5: Can the authors perform single molecule experiments with an alternative tag such as Alexa647?

      The SPT experiments were performed with QDs to make use of their photostability and brightness. The dSTORM experiment suggests that imaging single AF647 molecules with sufficient SNR is possible. The usage of AF647 for SPT is possible but would reduce the accuracy of the localization and shorten the acquired track-lengths, due to the blinking properties of AF647 when illuminated. The tracking experiment with the QDs thus was a proof of concept that the SPT experiments are possible and allow to reproduce the diffusion coefficients published in common literature. The usage of alternative tags can be an interesting extension of the capabilities that users can perform for their applications.

      Figure 6: The authors demonstrate dSTORM of microtubules. It would enhance the paper to also demonstrate 3D imaging (e.g., via cylindrical lens).

      The usage of a cylindrical lens for 3D imaging was not performed yet. The implementation would not be difficult, given the high modularity of the setup in general. The calibration of the PSF shape with astigmatism might however be challenging as the vertical scanning of the Z-stage lacks reliability in its current build. Methods such as biplane imaging might also be difficult to implement, as the halved number of photons in each channel leads to losses in the accuracy of localization. As a future improvement of the setup, the option of providing 3D information with single molecule accuracy is definitely desirable and will be tried out. In the following figure, two concepts for introducing 3D imaging capabilities in the detection layer of the microscope are presented.

      Author response image 4.

      3D concept Figure: Two possible setup modifications to provide axial information when imaging single molecules. a. A cylindrical lens can be placed to induce an asymmetry between the PSF FWHM in x and in y. Every Z position can be identified by two distinct PSF FWHM values in X and Y. b. By splitting the beam in two and defocusing one path, every PSF will have a specific set of values for its FWHM on the two detectors.

      Imaging modalities section: Regarding the use of cling film to diffuse; can the authors comment on the continual use of this approach, including its degradation over time?

      The cling foil was only used as a diffuser for broadening the laser profile. A detailed analysis of the constitution of the foil was not done, as no visible changes could be seen on the illumination pattern and the foil itself. The piece of cling foil is attached to a rotor. Detaching of the cling foil or vibrations originating from the rotor need to be minimized. By keeping the rotation speed to a necessary minimum and attaching the cling foil correctly to the rotor, a usable solution can be created. The low price of the cling foil provides the possibility to exchange the foil on a regular basis, allowing to keep the foil under optimal conditions.

      Author response image 5.

      Profile Figure: By moving a combination of pinhole and photometer to scan through the laser profile with a translational mount, the shape of the laser beam can be estimated. The cling foil plays the same role as a diffuser in other setups.

      Reviewer #2 (Recommendations for The Authors):

      lines

      20, add "," after parts

      110, rotating cling foil?

      112/116, "custom 3D printed" I thought they were injection molded, please finalize

      113, "puzzle pieces" rephrase and they are also barely visible

      119, not clear that the stage is a manual stage that was turned into a motorised one by adding belts

      123-126, detail for SI,

      132, replace Arduino-coded with Arduino-based

      143, add reference to Napari

      146, (black) cardboard seems to be a cheaper and quicker alternative

      153, dichroic

      151-155, reads more like a blog post than a paper (maybe add a section on trouble shooting)

      156, antibody?

      167/189, moderate, please be specific

      194, layer of foam material, specify

      221, add description/reference to GPI. What is that? why is it relevant?

      226: add one sentence description of MMS

      318, add "," after students

      332-334, as mentioned earlier, not clear, you bought a manual stage and connected belts, correct?

      376-377, might be difficult to understand for the layman

      391, what laser was used?

      Figure 1, poor contrast between components, components visible should be named as much as possible, maybe provide the base layer in a different shade. To me, the red and blue labels look like fluorophores.

      Figure 1. looks like d is the excitation layer and not e, please fix.

      Figure 2, caption a-c, figure 1-d!, btw, why is the drift so anti-correlated?

      Figure 6 (line 259) nanometer I guess, not micrometer

      We now incorporated all the above-mentioned changes in the manuscript. Furthermore we added the supplementary Figures as below.

      Author response image 6.

      Basic concept of the UC2 setup: Left: Cubes (green) are connected to one another via puzzle pieces (white). Middle: 3D printed mounts have been designed to adapt various optics (right) to the cube framework. Combined usage of cubes and design of various mounts allows to interface various optics for the assembly.

      Author response image 7.

      Building the UC2 widefield microscope: a. Photograph of the complete setup. b. All pieces necessary to build the setup. A list of the components can be found in the bill of materials. c. Bottom emission layer of the microscope before assembly. d. Emission layer after assembly. Connection between cubes is doubled by using a layer of puzzles on the top and the bottom of the emission layer. e. CAD schematic of the emission layer and the positioning of the optics. f. Middle excitation layer of the microscope before assembly. Beam magnifier and homogenizer have been left out for clarity. g. Excitation layer after assembly is also covered by a puzzle layer. h. CAD schematic of the excitation layer and the positioning of the optics. i. Z-stage photograph and corresponding CAD file. Motor of the stage is embedded within the bottom cube. j. A layer of empty cubes supports the microscope stage. k. At this stage of the assembly, the objective is screwed into the objective holder. l. Finally, the stage is wired to the electronics and can then be mounted on top of the microscope (see a.).

      Author response image 8.

      Measurements performed on the UC2 setup with lower budget objectives. The imaged sample is HeLa cells, stably transfected to express CLC-GFP, then labelled with AF647 through immunostaining. The setup has been kept identical except for the objectives. Scale bar respectively represents 30 µm.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This study investigates associations between retrotransposon element expression and methylation with age and inflammation, using multiple public datasets. The study is valuable because a systematic analysis of retrotransposon element expression during human aging has been lacking. However, the data provided are incomplete due to the sole reliance on microarray expression data for the core analysis of the paper. 

      Both reviewers found this study to be important. We have selected the microarray datasets of human blood adopted by a comprehensive study of ageing published in a Nature

      Communications manuscript (DOI: doi: 10.1038/ncomms9570). We only included the datasets specifically collected for ageing studies. Therefore, the large RNA-seq cohorts for cancer, cardiovascular, and neurological diseases were not relevant to this study and cannot be included.   

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Tsai and Seymen et al. investigate associations between RTE expression and methylation and age and inflammation, using multiple public datasets. The concept of the study is in principle interesting, as a systematic analysis of RTE expression during human aging is lacking. 

      We thank the reviewer for the positive comment. 

      Unfortunately, the reliance on expression microarray data, used to perform the core analysis of the paper places much of the study on shaky ground. The findings of the study would not be sufficiently supported until the authors validate them with more suitable methods. 

      In our discussion section in the manuscript, we have clarified that “we are aware of the limitations imposed by using microarray in this study, particularly the low number of intergenic probes in the expression microarray data. Our study can be enriched with the advent of large  RNA-seq cohorts for aging studies in the future.”  However, the application of microarray for RTE expression analysis was introduced previously (DOI: 10.1371/journal.pcbi.1002486) and applied in some highly cited and important publications before (DOI: 10.1038/ncomms1180, DOI: 10.1093/jnci/djr540). In fact, in a manuscript published by Reichmann et al.  (DOI: 10.1371/journal.pcbi.1002486) which was cited 76 times, the authors showed and experimentally verified that cryptic repetitive element probes present in Illumina and Affymetrix gene expression microarray platforms can accurately and sensitively monitor repetitive element expression data. Inspired by this methodological manuscript with reasonable acceptance by other researchers, we trusted that the RTE microarray probes could accurately quantify RTE expression at class and family levels.

      Strengths: 

      This is a very important biological problem. 

      Weaknesses: 

      RNA microarray probes are obviously biased to genes, and thus quantifying transposon analysis based on them seems dubious. Based on how arrays are designed there should at least be partial (perhaps outdated evidence) that the probe sites overlap a protein-coding or non-coding RNA. 

      We disagree with the reviewer that quantifying transposon analysis based on microarray data is dubious. As previously shown by Reichmann et al., the quantification is reliable as long as the probes do not overlap with annotated genes and they are in the correct orientation to detect sense repetitive element transcripts. Reichman et al. identified 1,400 repetitive element probes in version 1.0, version 1.1 and version 2.0 of the Illumina Mouse WG-6 Beadchips by comparing the genomic locations of the probes with the Repeatmasked regions of the mouse genome. We applied the same criteria for Illumina Human HT-12 V3 (29431 probes) and V4 (33963) to identify the RTE-specific probes. 

      The authors state they only used intergenic probes, but based on supplementary files, almost half of RTE probes are not intergenic but intronic (n=106 out of 264). 

      All our identified RTE probes overlap with intergenic regions. However, due to their repetitive natures, some probes overlap with intronic regions, too. We have replaced "intergenic" with "non-coding" in our resubmission to show that they do not overlap with the exons of protein-coding genes. However, we do not rule out the possibility that some of our detected RTE probes might overlap non-coding RNAs. In fact, the border between coding and non-coding genomes has recently become very fuzzy with new annotations of the genome. RTE RNAs can be easily considered as non-coding RNAs if we challenge our traditional junk DNA view. 

      This is further complicated by the fact that not all this small subset of probes is available in all analyzed datasets. For example, 232 probes were used for the MESA dataset but only 80 for the GTP dataset. Thus, RTE expression is quantified with a set of probes which is extremely likely to be highly affected by non-RTE transcripts and that is also different across the studied datasets. Differences in the subsets of probes could very well explain the large differences between datasets in multiple of the analyses performed by the authors, such as in Figure 2a, or 3a. It is nonetheless possible that the quantification of RTE expression performed by the authors is truly interpretable as RTE expression, but this must be validated with more data from RNA-seq. Above all, microarray data should not be the main type of data used in the type of analysis performed by the authors. 

      In this study, we did not compare MESA with GTP etc. We have analysed each dataset separately based on the available data for that dataset. Therefore, sacrificing one analysis because of the lack of information from the other does not make sense. We would do that if we were after comparing different datasets. Moreover, the datasets are not comparable because they were collected from different types of blood samples. 

      Reviewer #2 (Public Review): 

      Summary: 

      Yi-Ting Tsai and colleagues conducted a systematic analysis of the correlation between the expression of retrotransposable elements (RTEs) and aging, using publicly available transcriptional and methylome microarray datasets of blood cells from large human cohorts, as well as single-cell transcriptomics. Although DNA hypomethylation was associated with chronological age across all RTE biotypes, the authors did not find a correlation between the levels of RTE expression and chronological age. However, expression levels of LINEs and LTRs positively correlated with DNA demethylation, and inflammatory and senescence gene signatures, indicative of "biological age". Gene set variation analysis showed that the inflammatory response is enriched in the samples expressing high levels of LINEs and LTRs. In summary, the study demonstrates that RTE expression correlates with "biological" rather than "chronological" aging. 

      Strengths: 

      The question the authors address is both relevant and important to the fields of aging and transposon biology. 

      We thank the reviewer for finding this study relevant and important.

      Weaknesses: 

      The choice of methodology does not fully support the primary claims. Although microarrays can detect certain intergenic transposon sequences, the authors themselves acknowledge in the Discussion section that this method's resolution is limited. More critical considerations, however, should be addressed when interpreting the results. The coverage of transposon sequences by microarrays is not only very limited (232 unique probes) but also predetermined. This implies that any potential age-related overexpression of RTEs located outside of the microarray-associated regions, or of polymorphic intact transposons, may go undetected. Therefore, the authors should be more careful while generalising their conclusions. 

      This is a bioinformatics study, and we have already admitted and discussed the limitations in the discussion section of this manuscript. All technologies have their own limitations, and this should not stop us from shedding light on scientific facts because of inadequate information. In the manuscript, we have discussed that all large and proper ageing studies were performed using microarray technology. Peters et al. (DOI: doi: 10.1038/ncomms9570) adopted all these datasets in their transcriptional landscape of ageing manuscript, which was used in previous studies of ageing as well. Our study essentially applies the Reichmann et al. method to the peripheral blood-related data from the Peters et al. manuscript. Since hypomethylation due to ageing is a well-established and broad epigenetic reprogramming, it is unlikely that only a fraction of RTEs is affected by this phenomenon. Therefore, the subsampling of RTEs should not affect the result so much. Indeed, this is supported in our study by the inverse correlation between DNA methylation and RTE expression for LINE and SINE classes despite having limited numbers of probes for LINE and SINE expressions.    

      Additionally, for some analyses, the authors pool signals from RTEs by class or family, despite the fact that these groups include subfamilies and members with very different properties and harmful potentials. For example, while sequences of older subfamilies might be passively expressed through readthrough transcription, intact members of younger groups could be autonomously reactivated and cause inflammation. The aggregation of signals by the largest group may obscure the potential reactivation of smaller subgroups. I recommend grouping by subfamily or, if not possible due to the low expression scores, by subgroup. For example, all HERV subfamilies are from the ERVL family. 

      We agree with the reviewer that different subfamilies of RTEs play different roles through their activation. However, we will lose our statistical power if we study RTE subfamilies with a few probes. Global epigenetic alteration and derepression of RTEs by ageing have been observed to be genome-wide. While our systematic analysis across RTE classes and families cannot capture alterations in subfamilies due to statistical power, it is still relevant to the research question we are addressing.

      Next, Illumina arrays might not accurately represent the true abundance of TEs due to nonspecific hybridization of genomic transposons. Standard RNA preparations always contain traces of abundant genomic SINEs unless DNA elimination is specifically thorough. The problem of such noise should be addressed. 

      We have checked the RNA isolation step from MESA, GTP, and GARP manuscripts. The total RNA was isolated using the Qiagen mini kit following the manufacturer’s recommendations. The authors of these manuscripts did not mention whether they eliminated genomics DNA, but we assumed they were aware of the DNA contamination and eliminated it based on the manufacturer’s recommendations. We have looked up the literature about nonspecific hybridization of RTEs but could not find any evidence to support this observation. We would appreciate the reviewers providing more evidence about such RTE contaminations.   

      Lastly, scRNAseq was conducted using 10x Genomics technology. However, quantifying transposons in 10x sequencing datasets presents major challenges due to sparse signals. 

      Applying the scTE pipeline (https://www.nature.com/articles/s41467-021-21808-x), we have found that the statical power of quantifying RTE classes (LINE, SINE, and LTR) or  RTE families (L1, L2, All, ERVK, etc.) are as good as each individual gene. However, our proposed method cannot analyse RTE subfamilies, and we did not do that. 

      Smart-seq single-cell technology is better suited to this particular purpose. 

      We agree with the reviewer that Smart-seq provides higher yield than 10x, but there is no Smartseq data available for ageing study.  

      Anyway, it would be more convincing if the authors demonstrated TE expression across different clusters of immune cells using standard scRNAseq UMAP plots instead of boxplots. 

      Since the number of RTE reads per cell is low, showing the expression of RTEs per cell in UMAP may not be the best statistical approach to show the difference between the aged and young groups. This is why we chose to analyse with Pseudobulk and displayed differential expression using boxplot rather than UMAP for each immune cell type. 

      I recommend validating the data by RNAseq, even on small cohorts. Given that the connection between RTE overexpression and inflammation has been previously established, the authors should consider better integrating their observations into the existing knowledge. 

      Please see below. We have analysed RNA-seq data suggested by Reviewer 1 in the Recommendations for the Authors section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      I can recommend two sizeable human PMBC RNA-seq datasets that the authors could use:

      Marquez et al. 2020 (phs001934.v1.p1, controlled access) and Morandini et al. 2023 (GSE193141, public access). There are likely other suitable datasets that I am not aware of. I would also recommend using identical sets of probes to quantify RTE expression across studies. If certain datasets have too few probes and would thus limit the number of probes available across all studies it might be a good idea to exclude the dataset, especially if the analysis has been supplemented by the additional RNA-seq datasets. 

      Until recently, there was no publicly-available, non-cancerous, large cohort of RNA-seq data for ageing studies. We tried to gain access to the two RNA-seq datasets suggested by reviewer 2: Marquez et al. 2020 (phs001934.v1.p1, controlled access) and Morandini et al. 2023 (GSE193141, public access). 

      Unfortunately, Marquez et al. 2020 data is not accessible because the authors only provide the data for projects related to cardiovascular diseases. However, we did analyse Morandini et al. 2023 data, and we can confirm that no association was observed between any class and family of RTEs with chronological ageing (Author response image 1), which is the second strong piece of evidence supporting the statement in the manuscript. However, as expected, we found a positive correlation between RTE expression and IFN-I signature score (Author response image 2).

      Author response image 1.

      Linear analysis of RTE expression and chronological age.

      Author response image 2.

      Linear analysis of RTE expression and IFN gene signature expression.

      The authors use "biological age" and inflammation as interchangeable concepts, including in the title. Please correct this wording. 

      We have now added a new terminology to the manuscript called “biological age-related (BAR)”, which has been clearly addressed this distinction. We don’t think it is needed to change the title.  

      The authors find correlations between RTE expression and age-associated gene signatures but not chronological age itself. This is puzzling because, as the wording suggests, the expression of these inflammatory pathways is age-associated. If RTE expression correlates with inflammation which itself correlates with age, one might expect RTE expression to also correlate with age. Do the authors see a correlation between various inflammatory gene signatures and chronological age, in the analyzed datasets? If yes, then how would you explain that discrepancy? Moreover, in this case, I would recommend using a linear model, rather than correlation, to separate the effects of chronological age and RTE expression on inflammation (Inflammation et al ~ Age + RTE expression), or equivalent designs.

      As described above, we have now introduced the BAR terminology, which resolves this confusion. We did not find a correlation between RTE expression and chronological age. However, we did identify the correlation between BAR gene signatures and RTE expression.

      To separate the effects of chronological age and RTE expression on BAR gene signature scores, we performed a generalized linear model (GLM) analysis using BAR gene signature scores as response variables and RTE expression and chronological age as predictors (BAR gene signature scores ~ RTE expression + chronological age). Significant association was observed between BAR gene signature scores and RTE expression in the GARP cohort (Author response image 3). However, when chronological age is considered as predictor, we did not identify a correlation between chronological age and BAR gene signatures, indicating that BAR events are not corelated with chronological age (Author response image 3).  

      Author response image 3.

      Generalized linear models (GLM) analysis (BAR gene signature scores ~ RTE expression + chronological age). For each RTE family, we separately performed GLM. Age (RTE family) indicates the chronological age when used in the design formula for that specific RTE family. 

      Some of the gene sets used by the authors have considerable overlap with others and are also not particularly comprehensive. I can recommend this very comprehensive gene set: https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/SAUL_SEN_MAYO.  

      We did not choose to use large gene lists such as the suggested SEN_MAYO list, as we found Singscore struggles to generate reliable scores with sufficient variance when the number of genes increase to more than twenty. Although there is some overlap between inflammation-related genes and cellular senescence genes (e.g., IL6, IL1A, IL1B), it is important to note that each gene list focuses on different aspects of biological aging and should not be dismissed as redundant.

      Minor comments: 

      Overall, several sentences in the manuscript feel somewhat unnatural. I would recommend further proofreading. I will mention some examples:  

      Thank you for your feedback. We have fixed all these issues in the new submission.  

      • One line 34, "like the retroviruses" should be "like retroviruses. There are several other places in the text where "the" is not required. 

      Fixed.

      • On line 86, "to generate the RTE expression". "the" is again not necessary and I would replace "generate" with "quantify". 

      Fixed.

      • On line 86, "we mapped the probe locations to RepeatMasker". RepeatMasker is not a genome. Do you mean you mapped the probe location to a genome annotated by RepeatMasker? The same applies to line 99.  

      Fixed. We changed the sentence to: “To quantify RTE expression, we mapped the microarray probe locations to RTE locations in RepeatMasker to extract the list of noncoding (intergenic or intronic) probes that cover the RTE regions.”

      • Figure 1 contains a typo in the aims section: "evetns" instead of "events".  

      Fixed.

      • On line 495 "filtered out" seems to imply your removed intergenic probes. I assume you mean that you specifically selected intergenic probes. 

      Fixed.

      • Figure 1 nicely summarizes your datasets. Could you add a Figure 1b panel showing how you used RNA arrays to quantify RTE expression? This should include the number of probes for each RTE family, so I suggest merging this with Figure S1.  

      We disagree with the reviewer to merge Figure 1 and Figure S1 because they are addressing two different concepts.  

      Reviewer #2 (Recommendations For The Authors): 

      In Figure 2c, it is unclear what colour scale has been used for age. 

      Thank you for the comment. We have added a legend for age in this figure.

      There are no figure legends for Supplementary Figures 1 to 5 and all figures after Supplementary Figure 8. 

      A new version with legends has been submitted.

      For different datasets used, the choice of "healthy" patients should be more clear and explicit.

      Are asymptomatic patients with autoimmune inflammatory disorders considered as "healthy"? If not only healthy patients' blood is analysed (such as PBMS from primary osteoarthrosis), how inflammatory signatures enrichment discovered in this study may be associated not just with "biological age" but with the disease itself? 

      In our analysis, we did not exclusively study "healthy" individuals, as none of our datasets were initially collected from strictly healthy populations. While the microarray datasets were not specifically collected from people with particular diseases, they were also not screened for asymptomatic conditions. To demonstrate the same pattern in healthier cohorts, we added scRNA-seq analysis of confirmed healthy individuals to our study. However, the focus of this study is not on healthy aging. Instead, it is on biological ageing that includes both healthy and non-healthy ageing.

      We included the GARP (primary osteoarthritis) dataset as it is a cohort of age-related diseases (ARD). While we cannot definitively attribute inflammatory signatures enrichment to biological aging or disease, the observation of such enrichment in a cohort of ARD is worth considering. To make this clearer, we have replaced the term “healthy” with “non-cancerous” for microarray analysis throughout the paper.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to reviewers

      We would like to thank the reviewers for their feedback. Below we address their comments and have indicated the associated changes in our point-by-point response (blue: answers, red: changes in manuscript).

      Reviewer #1:

      Overall, the hypotheses and results are clearly presented and supported by high quality figures. The study is presented in a didactic way, making it easy for a broad audience to understand the significance of the results. The study does present some weaknesses that could easily be addressed by the authors.

      We thank the reviewer for appreciating our work and providing useful suggestions for improvement.

      1) First, there are some anatomical inaccuracies: line 129 and fig1C, the authors omit m.dial septum projections to area CA1 (in addition to the entorhinal cortex). Moreover, in addition to CA1, CA3 also provides monosynaptic feedback projections to the medial septum CA3. Finally, an indirect projection from CA1/3 excitatory neurons to the lateral septum, which in turn sends inhibitory projections to the medial septum could be included or mentioned by the authors. This could be of particular relevance to support claims related to effects of neurostimulations, whereby minutious implementation of anatomical data could be key.

      If not updating their model, the authors could add this point to their limitation section, where they already do a good job of mentioning some limitations of using the EC as a sole oscillatory input to CA1.

      We acknowledge that our current model strongly simplifies the interconnections between the medial septum and the hippocampal formation, but including more anatomical details is beyond the scope of this manuscript and would be a topic for future work. Nevertheless, we followed the reviewer’s advice to stress this point in our manuscript. First, we moved a paragraph that was initially in the “methods” section to the “results” section (L.141-150 of the revised manuscript):

      “Biologically, GABAergic neurons from the medial septum project to the EC, CA3, and CA1 fields of the hippocampus (Toth et al., 1993; Hajós et al., 2004; Manseau et al., 2008; Hangya et al., 2009; Unal et al., 2015; Müller and Remy, 2018). Although the respective roles of these different projections are not fully understood, previous computational studies have suggested that the direct projection from the medial septum to CA1 is not essential for the production of theta in CA1 microcircuits (Mysin et al., 2019). Since our modeling of the medial septum is only used to generate a dynamic theta rhythm, we opted for a simplified representation where the medial septum projects only to the EC, which in turn drives the different fields of the hippocampus. In our model, Kuramoto oscillators are therefore connected to the EC neurons and they receive projections from CA1 neurons (see methods for more details).”

      Second, we expanded the corresponding paragraph in the limitation section to discuss this point further (L.398-415 of the revised manuscript):

      “We decided to model septal pacemaker neurons projecting to the EC as the main source of hippocampal theta as reported in multiple experimental studies (Buzsáki, 2002; Buzsáki et al., 2003; Hangya et al., 2009). However, experimental findings and previous models have also proposed that direct septal inputs are not essential for theta generation (Wang, 2002; Colgin et al., 2013; Mysin et al., 2019), but play an important role in phase synchronization of hippocampal neurons. Furthermore, the model does not account for the connections between the lateral and medial septum and the hippocampus (Takeuchi et al., 2021). These connections include the inhibitory projections from the lateral to the medial septum and the monosynaptic projections from the hippocampal CA3 field to the lateral septum. An experimental study has highlighted the importance of the lateral septum in regulating the hippocampal theta rhythm (Bender et al., 2015), an area that has not been included in the model. Specifically, theta-rhythmic optogenetic stimulation of the axonal projections from the lateral septum to the hippocampus was shown to entrain theta oscillations and lead to behavioral changes during exploration in transgenic mice. To account for these discrepancies, our model could be extended by considering more realistic connectivity patterns between the medial / lateral septum and the hippocampal formation, including glutamatergic, cholinergic, and GABAergic reciprocal connections (Müller and Remy, 2018), or by considering multiple sets of oscillators each representing one theta generator.”

      1. The authors test conditions of low theta inputs, which they liken to pathological states (line 112). It is not clear what pathology the authors are referring to, especially since a large amount of 'oscillopathies' in the septohippocampal system are associated with decreased gamma/PAC, but not theta oscillations (e.g. Alzheimer's disease conditions).

      In the manuscript, we referred to “oscillopathies” in a broad sense way as we did not want to overstate the biological implications of the model or the way we modeled pathological states. To our knowledge, several studies have yielded inconsistent results regarding the specific changes in theta or gamma power in Alzheimer’s disease, and the most convincing alteration seems to be the theta-gamma phase-amplitude coupling (PAC) (for review see e.g., Kitchigina, V. F. Alterations of Coherent Theta and Gamma Network Oscillations as an Early Biomarker of Temporal Lobe Epilepsy and Alzheimer’s Disease. Front Integr Neurosci 12, 36 (2018)), as also mentioned by the reviewer.

      In this study, the most straightforward way to reduce theta-gamma PAC was to reduce the amplitude of the oscillators’ gain, which affected theta power, gamma power, and theta-gamma PAC (Figure 5 of the revised manuscript). Affecting their synchronization level (i.e., the order parameter) did not affect any of these variables (Figure 5 – Figure Supplement 4).

      In order to alter theta-gamma PAC without affecting theta or gamma power, we believe that more complex changes should be performed in the model, likely at the level of individual neurons in the hippocampal formation. For example, cholinergic deprivation has been previously used in a multi-compartment model of the hippocampal CA3 to mimic Alzheimer’s disease and to draw functional implications on the slowing of theta oscillations and the storage of new information (Menschik, E. D. & Finkel, L. H. Neuromodulatory control of hippocampal function: towards a model of Alzheimer’s disease. Artif Intell Med 13, 99–121 (1998)).

      This has now been added to the limitations section (L.458-465 of the revised manuscript):

      “Finally, we likened conditions of low theta input to pathological states characteristic of oscillopathies such as Alzheimer’s disease, as these conditions disrupted all aspects of theta-gamma oscillations in our model: theta power, gamma power, and theta-gamma PAC (Figure 5). However, it should be noted that changes in theta or gamma power in these pathologies are often unclear, and that the most consistent alteration that has been reported in Alzheimer’s disease is a reduction of theta-gamma PAC (for review, see Kitchigina, 2018). Future work should explore the effects of cellular alterations intrinsic to the hippocampal formation and their impact on theta-gamma oscillations.”

      1. While relevant for the clinical field, there is overall a missed opportunity to explain many experimental accounts with this novel model. Although to this day, clinical use of DBS is mostly restricted to electrical (and thus cell-type agnostic) stimulation, recent studies focusing on mechanisms of neurostimulations have manipulated specific subtypes in the medial septum and observed effects on hippocampal oscillations (e.g. see Muller & Remy, 2017 for review). Focusing stimulations in CA1 is of course relevant for clinical studies but testing mechanistic hypotheses by focusing stimulation on specific cell types could be highly informative. For instance, could the author reproduce recent optogenetic studies (e.g. Bender et al. 2015 for stimulation of fornix fibers; Etter et al., 2019 & Zutshi et al. 2018 for stimulation of septal inhibitory neurons)? Cell specific manipulations should at least be discussed by the authors.

      We acknowledge the importance of cell-type-specific manipulation in the septo-hippocampal circuitry. However, our model was designed to study neurostimulation protocols that affect the hippocampal formation, not the medial septum, which is why only the hippocampal formation is composed of biophysically realistic (i.e., conductance-based) neuronal models. To replicate the various studies mentioned by the reviewer (which are all very relevant), we would need to implement a biophysical model of the medial septum, which would be an entirely new project.

      Nevertheless, we can use the existing model to replicate optogenetic studies that induced gamma oscillations in excitatory-inhibitory circuits, using either ramped photostimulation targeting excitatory neurons (Adesnik et al., 2010; Akam et al., 2012; Lu et al., 2015), or pulsed stimulation driving inhibitory cells in the gamma range (Cardin et al., 2009; Iaccarino et al., 2016). In fact, such approaches have been demonstrated not just in the hippocampus but also in the neocortex, and represent a hallmark of local excitatory-inhibitory circuits. To account for these experimental results and replicate them, we have added 4 new figures (Figure 2 and its 3 figure supplements) and an extensive section in the results part (L.151-217 of the revised manuscript):

      “From a conceptual point of view, our model is thus composed of excitatory-inhibitory (E-I) circuits connected in series, with a feedback loop going through a population of coupled phase oscillators. In the next sections, we first describe the generation of gamma oscillations by individual E-I circuits (Figure 2), and illustrate their behavior when driven by an oscillatory input such as theta oscillations (Figure 3). We then present a thorough characterization of the effects of theta input and stimulation amplitude on theta-nested gamma oscillations (Figure 4 and Figure 5). Finally, we present some results on the effects of neurostimulation protocols for restoring theta-nested gamma oscillations in pathological states (Figure 6 and Figure 7).

      Generation of gamma oscillations by E-I circuits

      It is well-established that a network of interconnected pyramidal neurons and interneurons can give rise to oscillations in the gamma range, a mechanism termed pyramidal-interneuronal network gamma (PING) (Traub et al., 2004; Onslow et al., 2014; Segneri et al., 2020;). This mechanism has been observed in several optogenetic studies with gradually increasing light intensity (i.e., under a ramp input) affecting multiple different circuits, such as layer 2-3 pyramidal neurons of the mouse somatosensory cortex (Adesnik et al., 2010), the CA3 field of the hippocampus in rat in vitro slices (Akam et al., 2012), and in the non-human primate motor cortex (Lu et al., 2015). In all cases, gamma oscillations emerged above a certain threshold in terms of photostimulation intensity, and the frequency of these oscillations was either stable or slightly increased when increasing the intensity further. We sought to replicate these findings with our elementary E-I circuits composed of single-compartment conductance-based neurons driven by a ramping input current (Figure 2 and Figure S2). As an example, all the results in this section will be shown for an E-I circuit that has similar connectivity parameters as the CA1 field of the hippocampus in our complete model (see section “Hippocampal formation: inputs and connectivity” in the methods).

      For low input currents provided to both neuronal populations, only the highly-excitable interneurons were activated (Figure 2A). For a sufficiently high input current (i.e., a strong input that could overcome the inhibition from the fast-spiking interneurons), the pyramidal neurons started spiking as well. As the amplitude of the input increased, the activity of the both neuronal populations became synchronized in the gamma range, asymptotically reaching a frequency of about 60 Hz (Figure 2A bottom panel). Decoupling the populations led to the abolition of gamma oscillations (Figure 2B), as neuronal activity was determined solely by the intrinsic properties of each cell. Interestingly, when the ramp input was provided solely to the excitatory population, we observed that the activity of the pyramidal neurons preceded the activity of the inhibitory neurons, while still preserving the emergence of gamma oscillations (Figure S2 A). As expected, decoupling the populations also abolished gamma oscillations, with the excitatory neurons spiking a frequency determined by their intrinsic properties and the inhibitory population remaining silent (Figure S2B).

      To further characterize the intrinsic properties of individual inhibitory and excitatory neurons, we derived their input-frequency (I-F) curves, which represent the firing rate of individual neurons in response to a tonic input (Figure S3A). We observed that for certain input amplitudes, the firing rates of both types of neurons was within the gamma range. Interestingly, in the absence of noise, each population could generate by itself gamma oscillations that were purely driven by the input and determined by the intrinsic properties of the neurons (Figure S3B). Adding stochastic Gaussian noise in the membrane potential disrupted these artificial oscillations in decoupled populations (Figure S3C). All subsequent simulations were run with similar noise levels to prevent the emergence of artificial gamma oscillations.

      Another potent way to induce gamma oscillations is to drive fast-spiking inhibitory neurons using pulsed optogenetic stimulation at gamma frequencies, a strategy that has been used both in the neocortex (Cardin et al., 2009) and hippocampal CA1 (Iaccarino et al., 2016). In particular, Cardin and colleagues systematically investigated the effect of driving either excitatory or fast-spiking inhibitory neocortical neurons at frequencies between 10 and 200 Hz (Cardin et al., 2009). They showed that fast-spiking interneurons are preferentially entrained around 40-50 Hz, while excitatory neurons respond better to lower frequencies. To verify the behavior of our model against these experimental data, we simulated pulsed optogenetic stimulation as an intracellular current provided to our reduced model of a single E-I circuit. Stimulation was applied at frequencies between 10 and 200 Hz to excitatory cells only, to inhibitory cells only, or to both at the same time (Figure S4). The population firing rates were used as a proxy for the local field potentials (LFP), and we computed the relative power in a 10-Hz band centered around the stimulation frequency, similarly to the method proposed in (Cardin et al., 2009). When presented with continuous stimulation across a range of frequencies in the gamma range, interneurons showed the greatest degree of gamma power modulation (Figure S4). Furthermore, when the stimulation was delivered to the excitatory population, the relative power around the stimulation frequency dropped significantly in frequencies above 10 Hz, similar to the reported experimental data (Cardin et al., 2009). The main difference between our simulation results and these experimental data is the specific frequencies at which fast-spiking interneurons showed resonance, which was slow gamma around 40 Hz in the mouse barrel cortex and fast gamma around 90 Hz in our model. This could be attributed to several factors, such as differences in the cellular properties between cortical and hippocampal fast-spiking interneurons, or the differences between the size of the populations and their relevant connectivity in the cortex and the hippocampus.”

      Author response image 1.

      Figure 2. Emergence of gamma oscillations in coupled excitatory-inhibitory populations under ramping input to both populations. A. Two coupled populations of excitatory pyramidal neurons (NE = 1000) and inhibitory interneurons (NI = 100) are driven by a ramping current input (0 nA to 1 nA) for 5 s. As the input becomes stronger, oscillations start to emerge (shaded green area), driven by the interactions between excitatory and inhibitory populations. The green inset shows the raster plot (neuronal spikes across time) of the two populations during the green shaded period (red for inhibitory; blue for excitatory). When the input becomes sufficiently strong (shaded magenta area), the populations become highly synchronized and produce oscillations in the gamma range (at approximately 50 Hz). The spectrogram (bottom panel) shows the power of the instantaneous firing rate of the pyramidal population as a function of time and frequency. It reveals the presence of gamma oscillations that emerge around 2s and increase in frequency until 4 s, when they settle at approximately 60 Hz. B. Similar depiction as in panel A. with the pyramidal-interneuronal populations decoupled. The absence of coupling leads to the abolition of gamma oscillations, each cell spiking activity being driven by its own inputs and intrinsic properties.

      Author response image 2.

      Figure S2 (Figure 2 – Figure Supplement 1). Emergence of gamma oscillations in coupled excitatoryinhibitory populations under ramping input to the excitatory population. Similar representation as in Figure 2, but with the input provided only to the excitatory population. All conclusions remain the same. In addition, the inhibitory population does not show any spiking activity in the decoupled case.

      Author response image 3.

      Figure S3 (Figure 2 – Figure Supplement 2). Cell-intrinsic spiking activity in decoupled excitatory and inhibitory populations under ramping input. A. Input-Frequency (I-F) curves for excitatory cells (left panel; pyramidal neurons with ICAN) and inhibitory cells (right panel; interneurons, fast-spiking) used in the model. Above a certain tonic input (around 0.35 nA for excitatory and 0.1 nA for inhibitory neurons), neurons can spike in the gamma range. B. Raster plot showing the spiking activity of excitatory (blue, NE = 1000) and inhibitory (red, NI = 100) neurons in decoupled populations under ramping input (top trace) and in the absence of noise in the membrane potential. Despite random initial conditions across neurons, oscillations emerge in both populations due to the intrinsic properties of the cells, with a frequency that is predicted by the respective I-F curves (panel A.). C. Similar representation as panel B. but with the addition of stochastic noise in the membrane potential of each neuron. The presence of noise disrupts the emergence of oscillations in these decoupled populations.

      Author response image 4.

      Figure S3 (Figure 2 – Figure Supplement 2). Cell-intrinsic spiking activity in decoupled excitatory and inhibitory populations under ramping input. A. Input-Frequency (I-F) curves for excitatory cells (left panel; pyramidal neurons with ICAN) and inhibitory cells (right panel; interneurons, fast-spiking) used in the model. Above a certain tonic input (around 0.35 nA for excitatory and 0.1 nA for inhibitory neurons), neurons can spike in the gamma range. B. Raster plot showing the spiking activity of excitatory (blue, NE = 1000) and inhibitory (red, NI = 100) neurons in decoupled populations under ramping input (top trace) and in the absence of noise in the membrane potential. Despite random initial conditions across neurons, oscillations emerge in both populations due to the intrinsic properties of the cells, with a frequency that is predicted by the respective I-F curves (panel A.). C. Similar representation as panel B. but with the addition of stochastic noise in the membrane potential of each neuron. The presence of noise disrupts the emergence of oscillations in these decoupled populations.

      Beyond these weaknesses, this study has a strong utility for researchers wanting to explore hypotheses in the field of neurostimulations. In particular, I see value in such models for exploring more intricate, phase specific effects of continuous, as well as close loop stimulations which are on the rise in systems neuroscience.

      We thank the reviewer for this appreciation of our work and its future perspectives.

      Recommendations For The Authors:

      Line 144, the authors mention that their MI values are erroneous in absence of additive noise - could this be due to the non-sinusoidal nature of the phase signal recorded, and be fixed by upscaling model size?

      We thank the reviewer for this question and suggestion. The main reason behind the errors in the computation of the MI lies in the complete absence of oscillations at specific frequencies. Filtered signals within specific bands produced a power of 0 (or extremely low values), as seen in the power spectral densities. In such cases, the phase signal was not mathematically defined, but the toolbox we used to compute it still returned a numerical result that was inaccurate (for more details on the computation of the MI see Tort et al., 2010). To mitigate this numerical artefact, we decided to add uniform noise in the computed firing rates. This strategy is illustrated on Figure S6 (Figure 3 – Figure Supplement 2), which we have copied below for reference. Alternative approaches could probably have been used, such as increasing the noise in the membrane potential so that neurons would start spiking with firing rates that show more realistic power spectra, even in the absence of external inputs.

      Author response image 5.

      Figure S6 (Figure 3 – Figure Supplement 2). Quantification of PAC with and without noise. A. Quantifying PAC in the absence of noise produced inaccurate identification of the coupled frequency bands, due to the complete absence of oscillations at some frequencies. All analyses are based on the CA1 firing rates (top traces) during a representative simulation. Power spectral densities of these firing rates (left) indicate that some frequencies have a power of 0. PAC of the excitatory population was assessed using two graphical representations, the polar plot (middle) and comodulogram (right), and quantified using the MI. The comodulogram was calculated by computing the MI across 80% overlapping 1-Hz frequency bands in the theta range and across 90% overlapping 10-Hz frequency bands in the gamma range and subsequently plotted as a heat map. In the absence of noise, a slow theta frequency centered around 5 Hz is found to modulate a broad range of gamma frequencies between 40 and 100 Hz. The value indicated on the comodulogram indicates the average MI in the 3-9 Hz theta range and 40-80 Hz gamma range. As in Figure 2, the polar plot represents the amplitude of gamma oscillations (averaged across all theta cycles) at each phase of theta (theta range: 3-9 Hz, phase indicated as angular coordinate) and for different gamma frequencies (radial coordinate, binned in 1-Hz ranges). B. Adding uniform noise to the firing rate (with an amplitude ranging between 15 and 25% of the maximum firing rate) improved the identification of the coupled frequency bands. In this case, the slower theta frequency centered around 5 Hz modulates a gamma band located between 45 and 75 Hz.

      Reviewer #2:

      The main strength of this model is its use of a fairly physiologically detailed model of the hippocampus. The cells are single-compartment models but do include multiple ion channels and are spatially arranged in accordance with the hippocampal structure. This allows the understanding of how ion channels (possibly modifiable by pharmacological agents) interact with system-level oscillations and neurostimulation. The model also includes all the main hippocampal subfields. The other strength is its attention to an important topic, which may be relevant for dementia treatment or prevention, which few modeling studies have addressed. The work has several weaknesses.

      We thank the reviewer for appreciating our detailed description of the hippocampal formation and the focus on neurostimulation applications that aim at treating oscillopathies, especially dementia.

      1. First, while investigations of hippocampal neurostimulation are important there are few experimental studies from which one could judge the validity of the model findings. All its findings are therefore predictions. It would be much more convincing to first show the model is able to reproduce some measured empirical neurostimulation effect before proceeding to make predictions.

      We acknowledge that the results presented in Figures 4-7 of the revised manuscript cannot be compared to existing experimental data, and are therefore purely predictive. Future experimental work is needed to verify these predictions.

      Yet, we would also like to stress that the motivation behind this project was the inadequacy of previous models of theta-nested gamma oscillations (Onslow et al., 2014; Aussel et al., 2018; Segneri et al., 2020) to account for the mechanism of theta phase reset that occurs during electrical stimulation of the fornix or perforant path (Williams and Givens, 2003). Since we could not use these previous models to study the effects of neurostimulation on theta-nested gamma oscillations, we had to modify them to account for a dynamical theta input, which is the main methodological novelty that is reported in our manuscript (Figures 1 and 3 of the revised manuscript).

      Despite the scarcity of experimental studies that could confirm the full model, we sought to replicate a few experimental findings that employed optogenetic stimulation to induce gamma oscillations in individual excitatory-inhibitory circuits. Although not specific to the hippocampus, these studies have shown that gamma oscillations can be induced using either ramped photostimulation targeting excitatory neurons (Adesnik et al., 2010; Akam et al., 2012; Lu et al., 2015), or pulsed stimulation driving inhibitory cells in the gamma range (Cardin et al., 2009; Iaccarino et al., 2016). To account for these experimental results and replicate them, we have added 4 new figures (Figure 2 and its 3 figure supplements) and an extensive section in the results part (L.141-217 of the revised manuscript). The added section and related figures are indicated in our response to reviewer 1, comment 3 (p 2-7).

      2.1. Second, the model is very specific. Or if its behavior is to be considered general it has not been explained why.

      Although the spatial organization and cellular details of the model are indeed very specific, its general behavior, i.e., the production of theta-nested gamma oscillations and theta phase reset, are common to any excitatory-inhibitory circuit interconnected with Kuramoto oscillators. To illustrate this point, we have generalized our approach to the neural mass model developed by Onslow and colleagues (Onslow ACE, Jones MW, Bogacz R. A Canonical Circuit for Generating Phase-Amplitude Coupling. PLoS ONE. 2014 Aug; 9(8):e102591). These results are represented in a new supplementary figure (Figure3 – Figure Supplement 4), and briefly described in a new paragraph of the results section (L.262-268 of the revised manuscript):

      “Importantly, our approach is generalizable and can be applied to other models producing theta-nested gamma oscillations. For instance, we adapted the neural mass model by Onslow and colleagues (Onslow et al., 2014), replaced the fixed theta input by a set of Kuramoto oscillators, and demonstrated that it could also generate theta phase reset in response to single-pulse stimulation (Figure S8). These results illustrate that the general behavior of our model is not specific to the tuning of individual parameters in the conductancebased neurons, but follows general rules that are captured by the level of abstraction of the Kuramoto formalism.”

      Author response image 6.

      Figure S8 (Figure 3 – Figure Supplement 4). A neural mass model of coupled excitatory and inhibitory neurons driven by Kuramoto oscillators generates theta-nested gamma oscillations and theta phase reset. A. Two coupled neural masses (one excitatory and one inhibitory) driven by Kuramoto oscillators, which represent a dynamical oscillatory drive in the theta range, were used to implement a neural mass equivalent to our conductance-based model represented in Figure 1. Neural masses were modeled using the WilsonCowan formalism, with parameters adapted from Onslow et al. (2014) (𝑊𝐸𝐸 = 4.8, 𝑊𝐸𝐼 = 𝑊𝐼𝐸 = 4, 𝑊𝐼𝐼 = 0). B. The normalized population firing rates exhibit theta-nested gamma oscillations (middle and bottom panels) in response to the dynamic theta rhythm (top panel). A stimulation pulse delivered at the descending phase of the rhythm to both populations (marked by the inverted red triangle) produces a robust theta phase reset, similarly to Figure 3A.

      This simplified model is described in more details in the methods (L.694-710 of the revised manuscript). Additionally, the generation of gamma oscillations by individual excitatory-inhibitory circuits is now described in details in the added section “Generation of gamma oscillations by E-I circuits” (L.159-217 of the revised manuscript), which has already been discussed in our response to reviewer 1, comment 3 (p 2-7).

      2.2. For example, the model shows bistability between quiescence and TNGO, however what aspect of the model underlies this, be it some particular network structure or particular ion channel, for example, is not addressed.

      We thank the reviewer for mentioning this point, which we have now addressed. The “bistable” behavior that we reported occurs for values of the theta input that are just below the threshold to induce selfsustained theta-gamma oscillations (Figure 5 of the revised manuscript, point B). Moreover, the presence of the Calcium-Activated-Nonspecific (CAN) cationic channel, which is expressed by pyramidal neurons in the entorhinal cortex, CA3, and CA1 fields of the hippocampus, is necessary for this behavior to occur. Indeed, abolishing CAN channels in all areas of the model suppresses this behavior. We have now addressed this point in a new supplementary figure (Figure 5 – Figure Supplement 4) and a short description in the text (L.287-303 of the revised manuscript).

      “In the presence of dynamic theta input, the effects of single-pulse stimulation depended both on theta input amplitude and stimulation amplitude, highlighting different regimes of network activity (Figure 5 and Figure S9, Figure S10, Figure S11). For low theta input, theta-nested gamma oscillations were initially absent and could not be induced by stimulation (Figure 5A). At most, the stimulation could only elicit a few bursts of spiking activity that faded away after approximately 250 ms, similar to the rebound of activity seen in the absence of theta drive. For increasing theta input, the network switched to an intermediate regime: upon initialization at a state with no spiking activity, it could be kicked to a state with self-sustained theta-nested gamma oscillations by a single stimulation pulse of sufficiently high amplitude (Figure 5B). This regime existed for a range of septal theta inputs located just below the threshold to induce self-sustained theta-gamma oscillations without additional stimulation, as characterized by the post-stimulation theta power, gamma power, and theta-gamma PAC (Figure 5D). Removing CAN currents from all areas of the model abolished this behavior (Figure S12), which is interesting given the role of this current in the multistability of EC neurons (Egorov et al., 2002; Fransen et al., 2006) and in the intrinsic ability of the hippocampus to generate thetanested gamma oscillations (Giovannini et al., 2017). For the highest theta input, the network became able to spontaneously generate theta-nested gamma oscillations, even when initialized at a state with no spiking activity and without additional neurostimulation (Figure 5C).”

      Author response image 7.

      Figure S12 (Figure 5 – Figure Supplement 4). CAN currents are necessary for the production of selfsustained theta-gamma oscillations in response to single-pulse stimulation. A. Same as Figure 5B. B. Similar simulation as panel A., but without the presence of CAN currents in the EC, CA3 and CA1 fields of the hippocampus. Removing CAN currents from the model abolishes self-sustained theta-nested gamma oscillations in response to a single stimulation pulse (for the parameters represented in Figure 5, point B).

      Furthermore, we realized that the terminology “bistable” may not be justified as we could not perform a systematic bifurcation analysis, which is typically carried out in simpler neural mass models (e.g., Onslow et al., 2014; Segneri et al., 2020). Therefore, we decided to rephrase the sentences about “bistability” to keep a more general terminology. The following sentences were revised:

      L.20-23: “We showed that, for theta inputs just below the threshold to induce self-sustained theta-nested gamma oscillations, a single stimulation pulse could switch the network behavior from non-oscillatory to a state producing sustained oscillations.”

      L.305-309: “Based on the above analyses, we considered two pathological states: one with a moderate theta input (i.e., moderately weak projections from the medial septum to the EC) that allowed the initiation of selfsustained oscillations by single stimulation pulses (Figure 5, point B), and one with a weaker theta input characterized by the complete absence of self-sustained oscillations even following transient stimulation (Figure 5, point A).”

      L.316-317: “In the case of a moderate theta input and in the presence of phase reset, delivering a pulse at either the peak or trough of theta could induce theta-nested gamma oscillations (Figure 6A and 6C).”

      L.353-357: “A very interesting finding concerns the behavior of the model in response to single-pulse stimulation for certain values of the theta amplitude (Figure5). For low theta amplitudes, a single stimulation pulse was capable of switching the network behavior from a state with no spiking activity to one with prominent theta-nested gamma oscillations. Whether such an effect can be induced in vivo in the context of memory processes remains an open question.”

      2.3. Similarly for the various phase reset behaviors that are found.

      We would like to clarify the fact that the observed phase reset curves (reported in Figure 3D) are a direct consequence of the choice of an appropriate phase response function for the Kuramoto oscillators representing the medial septum. This choice is inspired by experimentally measured phase response curves from CA3 neurons. These aspects are described briefly in the introduction and in more details in the methods, as indicated below:

      L.101: “This new hybrid dynamical model could generate both theta-nested gamma oscillations and theta phase reset, following a particular phase response curve (PRC) inspired by experimental literature (Lengyel et al., 2005; Akam et al., 2012; Torben-Nielsen et al., 2010).”

      L.528-537: “Hereafter, we call the term 𝑍(𝜃) the phase response function, to distinguish it from the PRC obtained from experimental data or simulations (see section below "Data Analysis", "Phase Response Curve"). Briefly, the PRC of an oscillatory system indicates the phase delay or advancement that follows a single pulse, as a function of the phase at which this input is delivered. The phase response function 𝑍(𝜃) was chosen to mimic as well as possible experimental PRCs reported in the literature (Lengyel et al., 2005; Kwag and Paulsen, 2009; Akam et al., 2012). These PRCs appear biphasic and show a phase advancement (respectively delay) for stimuli delivered in the ascending (respectively descending) slope of theta. To accurately model this behavior, we used the following equation for the phase response function, where 𝜃𝑝𝑒𝑎𝑘 represents the phase at which the theta rhythm reaches its maximum and the parameter 𝜙𝑜𝑓𝑓𝑠𝑒𝑡 controls the desired phase offset from the peak:

      Author response image 8.

      On the figure below, we illustrate the phase response curves of CA3 neurons measured by Lengyel et al., 2005 (panel A.), and compare it with our simulated phase response curves (panel B.). Note that the conventions for phase advance and phase delay are reversed between the two panels.

      Finally, we would like to acknowledge that the model “is not derived from experimental phase response curves of septal neurons of which there is no direct measurement”, as mentioned by the reviewer in their comment 4 below. Despite the lack of experimental data specific to medial septum neurons, we argue that this phase response function is the only one that mathematically supports the generation of self-sustained theta-nested gamma oscillations in our current model. This statement is illustrated by Figure S7 (Figure 3 – Figure Supplement 3) and is mentioned in the results (L.249-261 of the revised manuscript):

      We modeled this behavior by a specific term (which we called the phase response function) in the general equation of the Kuramoto oscillators (see methods, Equation 1). Importantly, introducing a phase offset in the phase response function disrupted theta-nested gamma oscillations (Figure S7), which suggests that the septohippocampal circuitry must be critically tuned to be able to generate such oscillations. The strength of phase reset could also be adjusted by a gain that was manually tuned. In the presence of the physiological phase response function and of a sufficiently high reset gain, a single stimulation pulse delivered to all excitatory and inhibitory CA1 neurons could reset the phase of theta to a value close to its peaks (Figure 3A). We computed the PRC of our simulated data for different stimulation amplitudes and validated that our neuronal network behaved according to the phase response function set in our Kuramoto oscillators (Figure 3D). It should be noted that including this phase reset mechanism affected the generated theta rhythm even in the absence of stimulation, extending the duration of the theta peak and thereby slowing down the frequency of the generated theta rhythm.

      Author response image 9.

      Figure S7 (Figure 3 – Figure Supplement 3). Network behavior generated by Kuramoto oscillators with nonphysiological phase response functions. Each panel is similar to Figure 3A, but with a different offset added to the phase response function of the Kuramoto oscillators (see methods, Equation 4). The center frequency was set to 6 Hz in all of these simulations. Overall, theta oscillations in these cases are less sinusoidal and show more abrupt phase changes than in the physiological case. A. A phase offset of −𝜋∕2 leads to an overall theta oscillation of 4 Hz, with a second peak following the main theta peak. B. A phase offset of +𝜋∕2 reduces the peak of theta, resetting the rhythm to the middle of the ascending phase. C. A phase offset of 𝜋 or -𝜋 leads to the CA1 output resetting the theta rhythm to the trough of theta.

      2.4. We may wonder whether a different hippocampal model of TNGO, of which there are many published (for example [1-6]) would show the same effect under neurostimulation. This seems very unlikely […]

      [1] Hyafil A, Giraud AL, Fontolan L, Gutkin B. Neural cross-frequency coupling: connecting architectures, mechanisms, and functions. Trends in neurosciences. 2015 Nov 1;38(11):725-40.

      [2] Tort AB, Rotstein HG, Dugladze T, Gloveli T, Kopell NJ. On the formation of gamma-coherent cell assemblies by oriens lacunosum-moleculare interneurons in the hippocampus. Proceedings of the National Academy of Sciences. 2007 Aug 14;104(33):13490-5.

      [3] Neymotin SA, Lazarewicz MT, Sherif M, Contreras D, Finkel LH, Lytton WW. Ketamine disrupts theta modulation of gamma in a computer model of hippocampus. Journal of Neuroscience. 2011 Aug 10;31(32):11733-43.

      [4] Ponzi A, Dura-Bernal S, Migliore M. Theta-gamma phase-amplitude coupling in a hippocampal CA1 microcircuit. PLOS Computational Biology. 2023 Mar 23;19(3):e1010942.

      [5] Bezaire MJ, Raikov I, Burk K, Vyas D, Soltesz I. Interneuronal mechanisms of hippocampal theta oscillations in a full-scale model of the rodent CA1 circuit. Elife. 2016 Dec 23;5:e18566.

      [6] Chatzikalymniou AP, Gumus M, Skinner FK. Linking minimal and detailed models of CA1 microcircuits reveals how theta rhythms emerge and their frequencies controlled. Hippocampus. 2021 Sep;31(9):982-1002.

      The highlighted publications, while very important in their findings regarding theta-gamma phase-amplitude coupling, focused on specific subfields of the hippocampus. In our work, we aimed to develop a model that includes the different anatomical divisions of the hippocampal formation, while still exhibiting theta-nested gamma oscillations, which is why we decided to expand the model by Aussel et al. (2018). Exploring the behavior of all these different hippocampal models under neurostimulation is beyond the scope of the current manuscript.

      Nevertheless, we have added a new figure (Figure 3 – Figure Supplement 4) showing an adaptation of our modeling approach to a generic neural mass model of theta-nested gamma oscillations (Onslow et al., 2014), which illustrates the generalizability of our findings and is described in details in our response to comment 2.1. Moreover, we have further addressed the comments of the reviewers regarding bistability and phase response curves in our responses to comments 2.2 and 2.3.

      Furthermore, we have added references to all 6 of these publications in the revised version of the manuscript:

      L.43-50: Moreover, the modulation of gamma oscillations by the phase of theta oscillations in hippocampal circuits, a phenomenon termed theta-gamma phase-amplitude coupling (PAC), correlates with the efficacy of memory encoding and retrieval (Jensen and Colgin, 2007; Tort et al., 2009; Canolty and Knight, 2010; Axmacher et al., 2010; Fell and Axmacher, 2011; Lisman and Jensen, 2013; Lega et al., 2016). Experimental and computational work on the coupling between oscillatory rhythms has indicated that it originates from different neural architectures and correlates with a range of behavioral and cognitive functions, enabling the long-range synchronization of cortical areas and facilitating multi-item encoding in the context of memory (Hyafil et al., 2015)."

      L.415-426: “In terms of neuronal cell types, we also made an important simplification by considering only basket cells as the main class of inhibitory interneuron in the whole hippocampal formation. However, it should be noted that many other types of interneurons exist in the hippocampus and have been modeled in various works with higher computational complexity (e.g., Bezaire et al., 2016; Chatzikalymniou et al., 2021). Among these various interneurons, oriens-lacunosum moleculare (OLM) neurons in the CA1 field have been shown to play a crucial role in synchronizing the activity of pyramidal neurons at gamma frequencies (Tort et al., 2007), and in generating theta-gamma PAC (e.g., Neymotin et al., 2011; Ponzi et al., 2023). Additionally, these cells may contribute to the formation of specific phase relationships within CA1 neuronal populations, through the integration between inputs from the medial septum, the EC, and CA3 (Mysin et al., 2019). Future work is needed to include more diverse cell types and detailed morphologies modeled through multiple compartments.”

      2.5. […] and indeed the quiescent state itself shown by this model seems quite artificial.

      We would like to clarify the fact that the “quiescent state” mentioned by the reviewer is a simply a state where the theta input is too low to induce theta-nested gamma oscillations. In this regime, neurons are active only due to the noise term in the membrane potential, which was adjusted based on Figure S3 (Figure 2 – Figure Supplement 2, shown below), at the minimal level needed to disrupt artificial synchronization in decoupled populations. For an input of 0 nA, we acknowledge that this network is indeed fully quiescent (i.e., does not show any spiking activity). However, as soon as the input increases, spontaneous spiking activity starts to appear with an average firing rate that depends on the input amplitude and is characterized by the input-frequency curves (panel A.). Please note that adding more noise could eliminate the observed quiescence in the absence of any input, but that it would not affect qualitatively the reported results.

      Author response image 10.

      Figure S3 (Figure 2 – Supplement 2). Cell-intrinsic spiking activity in decoupled excitatory and inhibitory populations under ramping input. A. Input-Frequency (I-F) curves for excitatory cells (left panel; pyramidal neurons with ICAN) and inhibitory cells (right panel; interneurons, fast-spiking) used in the model. Above a certain tonic input (around 0.35 nA for excitatory and 0.1 nA for inhibitory neurons), neurons can spike in the gamma range. B. Raster plot showing the spiking activity of excitatory (blue, NE = 1000) and inhibitory (red, NI = 100) neurons in decoupled populations under ramping input (top trace) and in the absence of noise in the membrane potential. Despite random initial conditions across neurons, oscillations emerge in both populations due to the intrinsic properties of the cells, with a frequency that is predicted by the respective IF curves (panel A.). C. Similar representation as panel B. but with the addition of stochastic noise in the membrane potential of each neuron. The presence of noise disrupts the emergence of oscillations in these decoupled populations.

      2.6. Some indication that particular ion channels, CAN and M are relevant is briefly provided and the work would be much improved by examining this aspect in more detail.

      We thank the reviewer for acknowledging the importance of these ion channels. We have now added a new supplementary figure (Figure 5 – Figure Supplement 4), which is described in more details in our response to comment 2.2 and illustrates the role of the CAN current in the generation of theta-nested gamma oscillations following a single stimulation pulse. Moreover, we would like to stress that the impact of CAN currents in the ability of the hippocampus to generate theta-nested gamma oscillations intrinsically, i.e., in the absence of persistent external input, has already been investigated in details by a previous computational study cited in our manuscript (Giovannini F, Knauer B, Yoshida M, Buhry L. The CAN-In network: A biologically inspired model for self-sustained theta oscillations and memory maintenance in the hippocampus. Hippocampus. 2017 Apr;809 27(4):450–463).

      2.7. In summary, the work would benefit from an intuitive analysis of the basic model ingredients underlying its neurostimulation response properties.

      We thank the reviewer for this suggestion. By addressing the reviewer’s previous comments (reviewer 2, comments 2.1 and 2.2), which overlap partly with the first reviewer (reviewer 1, comment 3), we believe we have improved the manuscript and have provided key information related to the way the model responds to neurostimulation.

      3..) Third, while the model is fairly realistic, considerable important factors are not included and in fact, there are much more detailed hippocampal models out there (for example [5,6]). In particular, it includes only excitatory cells and a single type of inhibitory cell. This is particularly important since there are many models and experimental studies where specific cell types, for example, OLM and VIP cells, are strongly implicated in TNGO.

      [5] Bezaire MJ, Raikov I, Burk K, Vyas D, Soltesz I. Interneuronal mechanisms of hippocampal theta oscillations in a full-scale model of the rodent CA1 circuit. Elife. 2016 Dec 23;5:e18566.

      [6] Chatzikalymniou AP, Gumus M, Skinner FK. Linking minimal and detailed models of CA1 microcircuits reveals how theta rhythms emerge and their frequencies controlled. Hippocampus. 2021 Sep;31(9):982-1002.

      We thank the reviewer for pointing out these interesting avenues for future studies. As indicated in previous responses (reviewer 1, comment 1; reviewer 2, comment 2.4), we have added several paragraphs to discuss these limitations, the rationale behind our simplifications, and potential improvements. In particular, we have added the following paragraphs to discuss our simplifications in terms of connectivity and cell types:

      Anatomical connectivity:

      L.141-150: “Biologically, GABAergic neurons from the medial septum project to the EC, CA3, and CA1 fields of the hippocampus (Toth et al., 1993; Hajós et al., 2004; Manseau et al., 2008; Hangya et al., 2009; Unal et al., 2015; Müller and Remy, 2018). Although the respective roles of these different projections are not fully understood, previous computational studies have suggested that the direct projection from the medial septum to CA1 is not essential for the production of theta in CA1 microcircuits (Mysin et al., 2019). Since our modeling of the medial septum is only used to generate a dynamic theta rhythm, we opted for a simplified representation where the medial septum projects only to the EC, which in turn drives the different subfields of the hippocampus. In our model, Kuramoto oscillators are therefore connected to the EC neurons and they receive projections from CA1 neurons (see methods for more details).”

      Cell types:

      L.415-426: “In terms of neuronal cell types, we also made an important simplification by considering only basket cells as the main class of inhibitory interneuron in the whole hippocampal formation. However, it should be noted that many other types of interneurons exist in the hippocampus and have been modeled in various works with higher computational complexity (e.g., Bezaire et al., 2016; Chatzikalymniou et al., 2021). Among these various interneurons, oriens-lacunosum moleculare (OLM) neurons in the CA1 field have been shown to play a crucial role in synchronizing the activity of pyramidal neurons at gamma frequencies (Tort et al., 2007), and in generating theta-gamma PAC (e.g., Neymotin et al., 2011; Ponzi et al., 2023). Additionally, these cells may contribute to the formation of specific phase relationships within CA1 neuronal populations, through the integration between inputs from the medial septum, the EC, and CA3 (Mysin et al., 2019). Future work is needed to include more diverse cell types and detailed morphologies modeled through multiple compartments.”

      3.2. Other missing ingredients one may think might have a strong impact on model response to neurostimulation (in particular stimulation trains) include the well-known short-term plasticity between different hippocampal cell types and active dendritic properties.

      We agree with the reviewer that plasticity mechanisms are important to include in future work, which we had already mentioned in the limitations section of the manuscript:

      L.436-443: “Importantly, we did not consider learning through synaptic plasticity, even though such mechanisms could drastically modify synaptic conduction for the whole network (Borges et al., 2017). Even more interestingly, the inclusion of spike-timing-dependent plasticity would enable the investigation of stimulation protocols aimed at promoting LTP, such as theta-burst stimulation (Larson et al., 2015). This aspect would be of uttermost importance to make a link with memory encoding and retrieval processes (Axmacher et al., 2006; Tsanov et al., 2009; Jutras et al., 2013) and with neurostimulation studies for memory improvement (Titiz et al., 2017; Solomon et al., 2021).”

      1. Fourth the MS model seems somewhat unsupported. It is modeled as a set of coupled oscillators that synchronize. However, there is also a phase reset mechanism included. This mechanism is important because it underlies several of the phase reset behaviors shown by the full model. However, it is not derived from experimental phase response curves of septal neurons of which there is no direct measurement. The work would benefit from the use of a more biologically validated MS model.

      We would like to confirm that the phase reset mechanism is indeed at the core of using Kuramoto oscillators to model a particular system. For more details about our choice of a phase response function and the obtained results in terms of phase response curves, we refer the reader to our response to comment 2.3.

      Generally speaking, we chose to use Kuramoto oscillators as it is the simplest model that can provide an oscillatory input to another system while including a phase reset mechanism. This set of oscillators was used to replace the fixed sinusoidal wave that represented theta inputs in previous models (Onslow et al., 2014; Aussel et al., 2018; Segneri et al., 2020). Kuramoto oscillators are a well-established model of synchronization in various fields of physics. They have also been used in neuroscience to model the phase reset of collective rhythms (Levnajić et al. 2010), and the effects of DBS on the basal ganglia network in Parkinson’s disease (Tass et al. 2003, Ebert et al. 2014, Weerasinghe et al. 2019).

      More detailed models of the medial septum exist in the literature (e.g., Wang et al. 2002, Hajós et al. 2004) and model the GABAergic effects of the septal projections onto the hippocampal formation. However, it is not trivial to infer the connectivity parameters and the degree of innervation between the hippocampus and the medial septum. Furthermore, the claims made in our study do not necessarily depend on the nature of the projections between the two areas. Therefore, we decided to represent the medial septum in a conceptual way and focus mostly on the effects of these projections rather than replicating them in detail.

      Aussel, Amélie, Laure Buhry, Louise Tyvaert, and Radu Ranta. “A Detailed Anatomical and Mathematical Model of the Hippocampal Formation for the Generation of Sharp-Wave Ripples and Theta-Nested Gamma Oscillations.” Journal of Computational Neuroscience 45, no. 3 (December 2018): 207–21. https://doi.org/10.1007/s10827-018-0704-x.

      Ebert, Martin, Christian Hauptmann, and Peter A. Tass. “Coordinated Reset Stimulation in a Large-Scale Model of the STN-GPe Circuit.” Frontiers in Computational Neuroscience 8 (2014): 154. https://doi.org/10.3389/fncom.2014.00154.

      Hajós, M., W.E. Hoffmann, G. Orbán, T. Kiss, and P. Érdi. “Modulation of Septo-Hippocampal θ Activity by GABAA Receptors: An Experimental and Computational Approach.” Neuroscience 126, no. 3 (January 2004): 599–610. https://doi.org/10.1016/j.neuroscience.2004.03.043.

      Levnajić, Zoran, and Arkady Pikovsky. “Phase Resetting of Collective Rhythm in Ensembles of Oscillators.” Physical Review E 82, no. 5 (November 3, 2010): 056202. https://doi.org/10.1103/PhysRevE.82.056202.

      Onslow, Angela C. E., Matthew W. Jones, and Rafal Bogacz. “A Canonical Circuit for Generating PhaseAmplitude Coupling.” Edited by Adriano B. L. Tort. PLoS ONE 9, no. 8 (August 19, 2014): e102591. https://doi.org/10.1371/journal.pone.0102591.

      Segneri, Marco, Hongjie Bi, Simona Olmi, and Alessandro Torcini. “Theta-Nested Gamma Oscillations in Next Generation Neural Mass Models.” Frontiers in Computational Neuroscience 14 (2020). https://doi.org/10.3389/fncom.2020.00047. T ass, Peter A. “A Model of Desynchronizing Deep Brain Stimulation with a Demand-Controlled Coordinated Reset of Neural Subpopulations.” Biological Cybernetics 89, no. 2 (August 1, 2003): 81–88. https://doi.org/10.1007/s00422-003-0425-7.

      Wang, Xiao-Jing. “Pacemaker Neurons for the Theta Rhythm and Their Synchronization in the Septohippocampal Reciprocal Loop.” Journal of Neurophysiology 87, no. 2 (February 1, 2002): 889–900. https://doi.org/10.1152/jn.00135.2001.

      Weerasinghe, Gihan, Benoit Duchet, Hayriye Cagnan, Peter Brown, Christian Bick, and Rafal Bogacz. “Predicting the Effects of Deep Brain Stimulation Using a Reduced Coupled Oscillator Model.” PLoS Computational Biology 15, no. 8 (August 8, 2019): e1006575. https://doi.org/10.1371/journal.pcbi.1006575.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Weaknesses:

      (1) The authors themselves propose in their Introduction that the "ECM-associated changes are increasingly perceived as causative, rather than consequential"; however, they have not conducted mechanistic (gain of function/loss of function) studies either in vitro or in vivo from any of their identified targets to truly prove causality. This remains one of the limitations of this study. Thus, future studies should investigate this point in detail. For instance, it would have been intriguing to dissect if knocking out specific genes involved in one specific model or genes common to both would yield distinct phenotypic outcomes.

      We agree with the reviewer that our study does not provide mechanistic verification of the function of identified targets with suggested role in the development and/or resolution of fibrosis. The current study was primarily conducted in order to identify these possible targets with focus on the identification of differences in extracellular matrix deposited in two selected models of liver fibrosis with different modes of action. To conduct further studies using knock-out/in models for verification of causality of proposed targets was at this point well beyond our intention. However, we are fully aware of the potential of identified molecules and further studies to disect their roles in liver diseases are part of future plans.

      (2) The majority of the conclusions are derived primarily from the proteomic analyses. Although well conducted, it would strengthen the study to corroborate some of the major findings by other means such as IHC/IF with the corresponding quantifications and not only representative images.

      We have now provided additional IF images and their quantifications in accordance with the Reviewer’s suggestions to our major MS findings to strenghten the significance of the MS data (see detailed answer below).

      Reviewer #2:

      Weaknesses:

      (1) As it currently stands, the data, whilst extensive, is primarily focussed on the proteomic data which is fairly descriptive and I am not clear on the additional insight gained in their approach that is not already detailed from the extensive transcriptomic studies. The manuscript overall would benefit from some mechanistic functional insight to provide new additional modes of action relevant to fibrosis progression.  

      We agree with the reviewer that our study could initially appear descriptive. However, this characteristics is inherent to most omics studies, which tend to provide hypothesis-free testing of a large number of analytes in order to find a multitude of candidate biomarkers(1). Importantly, we believe our study provides insights that go beyond the scope of previously published transcriptomic analyses.

      Specifically, our work focuses on compartment-specific changes in the liver proteome, with an emphasis on the extracellular matrix (ECM) composition and alterations in protein solubility—features that cannot be captured by transcriptomic studies. The matrisome is more than a structural scaffold; it functions as a reservoir for secreted factors, including growth factors and cytokines, which modulate the local cellular microenvironment. Transition dynamics between the insoluble matrisome and soluble protein pools influence the signaling capabilities and bioavailability of these factors. Moreover, fibrous ECM assemblies directly impact tissue mechanics, providing cells embedded within the matrix with spatially distinct biochemical and biomechanical contexts. The current understanding of matrisome composition in the context of specific liver disease etiologies is limited. Dr. Friedman, in his 2022 review on hepatic fibrosis, highlights the unmet need to elucidate etiology-specific protein signatures of the cirrhotic liver matrisome, which could serve as disease staging or prognostic biomarkers(2). Our study addresses this gap by characterizing the distinct matrisome profiles associated with hepatotoxic- versus cholestasis-driven liver injury. We believe our findings lay the groundwork for identifying etiology-specific biomarkers and potential therapeutic targets for antifibrotic interventions, offering a novel layer of insight beyond what transcriptomic data alone can provide.

      (2) Whilst there is some human data presented it is a minimal analysis without quantification that would imply relevance to disease state. Although studying disease progression in animals is a fundamental aspect of understanding the full physiological response of fibrotic disease, without more human insight makes any analysis difficult to fulfil their suggestion that these targets identified will be of use to treat human disease.

      We thank the reviewer for this comment. Our study primarily focuses on utilizing animal models to explore the fundamental physiological processes underlying the development and resolution of fibrotic liver disease. To address the translational relevance of our findings, we concentrated on clusterin, one of the key target proteins identified during our analysis of the insoluble proteome. Specifically, we investigated its localization in human liver samples, focusing on its association with collagen deposits (Figure 6F). To this end, we analyzed human liver samples of diverse etiologies and varying degrees of fibrotic damage, including samples representing four distinct stages of HCV-induced fibrosis (Figure 6F, lower panel). While this analysis highlights the presence and localization of clusterin in fibrotic deposits, we acknowledge that our study does not include extensive quantification or mechanistic insight into clusterin's role in human liver fibrosis. We believe that the data presented in this manuscript provide a valuable foundation for future investigations into clusterin’s involvement in liver fibrosis across different etiologies. Recognizing the translational importance of this work, we have already initiated a prospective study involving human patients, which aims to conduct a more comprehensive analysis of clusterin's function and its potential as a therapeutic target.

      To further support our findings on clusterin's role in fibrosis development and resolution and to address the reviewer's concern, we quantified clusterin deposits in the available human samples representing four distinct stages of HCV-induced fibrotic disease. Using immunofluorescence (IF) images at a 20x field of view, we measured both clusterin and collagen deposits to illustrate changes in clusterin abundance during fibrosis progression (stages F1–F4) in relation to collagen deposition dynamics. The quantified data have been included for the reviewer's consideration (Figure 1). However, it is important to emphasize that this quantification was conducted on a single human sample per fibrotic stage, which limits the statistical robustness of the analysis. A more comprehensive evaluation involving additional patient samples would be necessary for a more definitive conclusion. For this reason, we propose to include these results solely in our rebuttal letter and to incorporate a more extensive analysis in our intended follow-up study, where larger cohorts will allow for a thorough investigation of clusterin's role in human liver fibrosis.

      Author response image 1.

      Dynamics of clusterin abundance with the development of HCV-induced fibrotic disease in comparison to the changes in collagen deposits. IF images of human liver sections from different stages of chronic HCV infection were immunolabeled for clusterin and collagen 1. Clusterin- and collagenpositive (<sup>+</sup>) areas (as %) from three to eight fields of view (20x objective) were evaluated for each fibrosis stage (F1-F4). 

      (3) Some of the terminology is incorrect while discussing these models of injury used and care should be taken. For example - both models are toxin-induced and I do not think these data have any support that the DDC model has a higher carcinogenic risk. An investigation into the tumour-induced risk would require significant additional models. These types of statements are incorrect and not supported by this study.

      We are grateful to the reviewer for drawing our attention to the incorrect use of the term "toxin-induced". In two instances, where the wording was incorrect, we have corrected the term to hepatotoxin-induced as it was originally intended. While we believe that our proteomic signature data and identified signaling pathways suggest a potential carcinogenic risk associated with the cholestatic, but not the hepatotoxic model, we have toned down the statements on this issue in the article to respect the reviewer's perspective. These changes, which are highlighted in the track changes mode of the article, aim to make the conclusions of the study more precise and thus improve the clarity of our conclusions.

      Reviewer #1 (Recommendations for the authors): 

      (1) In the Discussion, the authors could consider pointing out that one limitation of the study is a lack of mechanistic (gain of function/loss of function) studies either in vitro or in vivo from any of their identified targets to truly prove causality. 

      As noted earlier, we fully agree with both reviewers that a limitation of this study is its descriptive nature, which is an inherent characteristic of omics-based research. In our manuscript, we aimed to "determine compartment-specific proteomic landscapes of liver fibrosis and delineate etiology-specific ECM components," with the overarching goal of providing a foundation for future antifibrotic therapies.

      The insights gained from our study will indeed serve as a critical basis for subsequent research, where we will prioritize mechanistic investigations to elucidate the roles of the identified targets. While we acknowledge the importance of gain- or loss-of-function studies to establish causality, we believe this falls outside the primary scope of the current manuscript. Instead, we envision these mechanistic approaches as key elements of our future research efforts. For this reason, we feel it is not necessary to further expand on this limitation in the current discussion.

      (2) The majority of the conclusions are derived primarily from the proteomic analyses. Although well conducted, it would strengthen the study to corroborate some of the major findings by other means such as IHC/IF with the corresponding quantifications and not only representative images. For example, the IF stainings for ECM1 should also be quantified - ECM1. 

      To strengthen our MS findings on ECM1 expression and to address the reviewer's concern, we have now included quantification of ECM1 using IF staining at selected time points in Figure S7E and we refer to these data in the Results section (p. 12 of the current manuscript). The IF quantification data correspond well to the MS data showing increase in ECM1 expression with fibrosis development and decline with partial fibrosis resolution.

      (3) S1 - it would be important to show Sirius Red images over the time course, especially for CCl4 T4 where fibrosis resolution is occurring. Proteomics data also show this group clusters more closely with control mice and seeing a representative image would add further credibility to this point. 

      Requsted Sirius Red images are now part of the Figure S1B, documenting partial fibrosis resolution and overall parenchyma healing in T4 in both models.

      (4) How comparable are the periods of the two models? 2 weeks in one model may not be the same as 2 weeks in the other depending on the severity of the pathogenesis. 

      We appreciate the reviewer’s comment regarding the comparability of time points between the two models. Indeed, the temporal dynamics of fibrosis development differ between the models employed in our study, and we have carefully considered this aspect to ensure the validity of our comparative analysis. To address this, we started our comparisons at a stage corresponding to the onset of fibrosis in each model. Specifically, quantification of Sirius Red-positive areas, indicative of collagen deposition (Figure S1B), revealed that 2 weeks of DDC treatment produced a comparable extent of fibrosis to that observed after 3 weeks of CCl₄ treatment. This point was designated as the initial fibrosis time point (T1, Figure S1B), from which further treatment was applied to induce more advanced fibrosis. This approach allowed us to standardize the comparison of fibrosis progression between the two models.

      (5) Figure 4A-D - cell-type-specific signatures should be corroborated by actual IHC or IF stainings if possible. HNF4a (hepatocytes), CK19 (cholangiocytes), aSMA (activated fibrogenic HSCs), immune cells (B220, F4/80, Cd11b, CD11c etc).

      We thank the reviewer for this valuable suggestion. To strengthen our analysis, we have now complemented the box plots of cell type-specific signatures derived from the MS data (Figure 4A-D) with immunofluorescence (IF) staining, which has been included in the Supplemental Data (Figure S6). Specifically, we provide representative IF images from control and T1-T4 time points for each model, documenting the changes in abundance with treatment in:

      A) Hepatocytes (HNF4α), activated hepatic stellate cells (αSMA), and cholangiocytes (CK19).

      B) Immune cell populations, including B cells (B220) and macrophages/monocytes/Kupffer cells (F4/80), as these immune cell groups were not only identified in our MS analysis but also have established roles in the selected models(3, 4, 5). 

      The representative images shown in Figure S6 show the dynamics of the cellular populations in each of the models, which correspond well with the MS data (compare Figures 4A-D and S5). These additional data further validate our findings and enhance the robustness of our conclusions.

      References:

      (1) Thiele M, Villesen IF, Niu L, et al. Opportunities and barriers in omics-based biomarker discovery for steatotic liver diseases. J Hepatol 2024;81:345-359.

      (2) Friedman SL, Pinzani M. Hepatic fibrosis 2022: Unmet needs and a blueprint for the future. Hepatology 2022;75:473-488.

      (3) Best J, Verhulst S, Syn WK, et al. Macrophage Depletion Attenuates Extracellular Matrix Deposition and Ductular Reaction in a Mouse Model of Chronic Cholangiopathies. PLoS One 2016;11:e0162286.

      (4) Aoyama T, Inokuchi S, Brenner DA, et al. CX3CL1-CX3CR1 interaction prevents carbon tetrachlorideinduced liver inflammation and fibrosis in mice. Hepatology 2010;52:1390-400.

      (5) Yang W, Chen L, Zhang J, et al. In-Depth Proteomic Analysis Reveals Phenotypic Diversity of Macrophages in Liver Fibrosis. J Proteome Res 2024;23:5166-5176.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The current manuscript focuses on the adenine phosphoribosyltransferase (Aprt) and how the lack of its function affects nervous system function. It puts it into the context of Lesch-Nyhan disease, a rare hereditary disease linked to hypoxanthine-guanine phosphoribosyltransferase (HGPRT). Since HGPRT appears absent in Drosophila, the study focuses initially on Aprt and shows that aprt mutants have a decreased life-span and altered uric acid levels (the latter can be attenuated by allopurinol treatment). Moreover, aprt mutants show defects in locomotor reactivity behaviors. A comparable phenotype can be observed when specifically knocking down aprt in dopaminergic cells. Interestingly, also glia-specific knock-down caused a similar behavioral defect, which could not be restored when re-expressing UAS-aprt, while neuronal re-expression did restore the mutant phenotype. Moreover, mutants, pan-neuronal and pan-neuronal plus glia RNAi for aprt caused sleep-defects. Based on immunostainings Dopamine levels are increased; UPLC shows that adenosine levels are reduced and PCR showed in increase of Ent2 levels are increased (but not AdoR). Moreover, aprt mutants display seizure-like behaviors, which can be partly restored by purine feeding (adenosine and N6methyladenosine). Finally, expression of the human HGPRT also causes locomotor defects.

      The authors provide a wide range of genetic experimental data to assess behavior and some molecular assessment on how the defects may emerge. It is clearly written, and the arguments follow the experimental evidence that is provided. The findings provide a new example of how manipulating specific genes in the fruit fly allows the study of fundamental molecular processes that are linked to a human disease.

      We thank the reviewer for his clear understanding and positive assessment of our work.

      Reviewer #2 (Public Review):

      The manuscript by Petitgas et al demonstrates that loss of function for the only enzyme responsible for the purine salvage pathway in fruit-flies reproduces the metabolic and neurologic phenotypes of human patients with Lesch-Nyhan disease (LND). LND is caused by mutations in the enzyme HGPRT, but this enzyme does not exist in fruit-flies, which instead only have Aprt for purine recycling. They demonstrate that mutants lacking the Aprt enzyme accumulate uric acid, which like in humans can be rescued by feeding flies allopurinol, and have decreased longevity, locomotion and sleep impairments and seizures, with striking resemblance to HGPRT loss of function in humans. They demonstrate that both loss of function throughout development or specifically in the adult ubiquitously or in all neurons, or dopaminergic neurons, mushroom body neurons or glia, can reproduce the phenotypes (although knock-down in glia does not affect sleep). They show that the phenotypes can be rescued by over-expressing a wild-type form of the Aprt gene in neurons. They identify a decrease in adenosine levels as the cause underlying these phenotypes, as adenosine is a neurotransmitter functioning via the purinergic adenosine receptor in neurons. In fact, feeding flies throughout development and in the adult with either adenosine or m6A could prevent seizures. They also demonstrate that loss of adenosine caused a secondary up-regulation of ENT nucleoside transporters and of dopamine levels, that could explain the phenotypes of decreased sleep and hyperactivity and night. Finally, they provide the remarkable finding that over-expression of the human mutant HGPRT gene but not its wild-type form in neurons impaired locomotion and induced seizures. This means that the human mutant enzyme does not simply lack enzymatic activity, but it is toxic to neurons in some gain-of-function form. Altogether, these are very important and fundamental findings that convincingly demonstrate the establishment of a Drosophila model for the scientific community to investigate LND, to carry out drug testing screens and find cures.

      We thank the reviewer for his clear understanding and positive assessment of our work.

      The experiments are conducted with great rigour, using appropriate and exhaustive controls, and on the whole the evidence does convincingly or compellingly support the claims. The exception is an instance when authors mention 'data not shown' and here data should either be provided, or claims removed: "feeding flies with adenosine or m6A did not rescue the SING phenotype of Aprt mutants (data not shown)". It is important to show these data (see below).

      As recommended by the reviewer, these results are now shown in the new Figure S15.

      Sleep is used to refer to lack of movement of flies to cross a beam for more than 5 minutes. However, lack of movement does not necessarily mean the flies are asleep, as they could be un-motivated to move (which could reflect abnormal dopamine levels) or engaged in incessant grooming instead. These differences are important for future investigation into the neural circuits affect by LND.

      We agree that the method we used could overestimate sleep duration because flies that don't move do not necessarily sleep either, as it is the case with brain-dopamine deficient flies (Riemensperger et al., PNAS 2011). To address this issue, we have recorded video data showing that after 5 min of inactivity, wild-type and Aprt5 mutant flies are less sensitive to stimulation, indicating that they were indeed asleep. This is now shown in the new Figure S10 and mentioned on page 17, lines 338-339 in the main text. In addition, in this work we report that Aprt mutant flies have a nocturnal insomnia phenotype. Sleep overestimation is not, therefore, an issue that could challenge these results.

      The authors claim that based on BLAST genome searchers, there are no HPRTI (encoding HGPRT) homologues in Drosophila. However, such a claim would require instead structure-based searches that take into account structural conservation despite high sequence divergence, as this may not be detected by regular BLAST.

      To reinforce our conclusions about the lack of homologue of the human HPRT1 gene in Drosophila, we have now added a Results section about the evolution of HGPRT proteins on pages 6-7, lines 122150, and two phylogenetic analyses as new Figures S2 and S3 with more details in legends. We have also carried out structural similarity searches against the RCSB PDB repository. The structural analysis did not identify any relevant similarity with HGPRT 3D structures in Insecta (mentioned lines 146-150). We hope these new analyses address the Reviewer's concerns. Furthermore, as shown in Table S2, no enzymatic HGPRT activity could be detected in extracts of wild-type Drosophila. A protein that would be structurally similar to human HGPRT but with a divergent sequence could not be involved in purine recycling without expressing HGPRT-like activity. In contrast, enzymatic Aprt activity could be easily detected in this organism (Figure S4 and Table S1).

      This work raises important questions that still need resolving. For example, the link between uric acid accumulation, reduced adenosine levels, increased dopamine and behavioural neurologic consequences remain unresolved. It is important that they show that restoring uric acid levels does not rescue locomotion nor seizure phenotypes, as this means that this is not the cause of the neurologic phenotypes.

      We agree with the reviewer about the potential importance of our results and the need to resolve the exact origin of the neurological phenotypes. This would need to be addressed in further studies in our opinion. The fact that allopurinol treatment did not improve the locomotor ability of Aprt5 mutant flies is now shown in Figure 1D, E to emphasize this result. Results showing that allopurinol does not rescue the bang-sensitivity phenotype of Aprt-deficient mutants are shown in Figure S14.

      Instead, their data indicate adenosine deficiency is the cause. However, one weakness is that for the manipulations they test some behaviours but not all. The authors could attempt to improve the link between mechanism and behaviour by testing whether over-expression of Aprt in neurons or glia, throughout development or in the adult, and feeding with adenosine and m6A can rescue each of the behavioural phenotypes handled: lifespan, SING, sleep and seizures. The authors could also attempt to knock-down dopamine levels concomitantly with feeding with adenosine or m6A to see if this rescues the phenotypes of SING and sleep.

      The reviewer is right. However, carrying out all these experiments properly with enough repeats will require about two more years of work. Because of that, they could not be included in the revision of the present article. Here we show that Aprt overexpression in neurons, but not in glia, rescues the SING phenotype of Aprt5 mutants (Figure 2B and 2E). We have also added in the revised article the new result that Aprt overexpression reduces transcript levels of DTH1, which codes for the neural form of the dopamine-synthesizing enzyme tyrosine hydroxylase (new Figure 5F).

      Visualising the neural circuits that express the adenosine receptor could reveal why the deficit in adenosine can affect distinct behaviours differentially, and which neurologic phenotypes are primary and which secondary consequences of the mutations. This would allow them to carry out epistasis analysis by knocking-down AdoR in specific circuits, whilst at the same time feeding Aprt mutants with Adenosine.

      Deciphering the specific circuits involved in the various effects of adenosine would indeed be extremely interesting. Unfortunately very few is currently known about the neural circuits that express AdoR in flies. No antibody is available to detect this receptor in situ and mutated AdoR gene coding for a tagged form of the receptor has not been engineered yet to our knowledge.

      The revelation that the mutant form of human HGPRT has toxic effects is very intriguing and important and it invites the community to investigate this further into the future.

      To conclude, this is a fundamental piece of work that opens the opportunity for the broader scientific community to use Drosophila to investigate LND.

      We sincerely thank the reviewer for his thoughtful and positive comments on our work.

      Reviewer #3 (Public Review):

      The study attempts to develop a Drosophila model for the human disease of LND. The issue here, and the main weakness of this study, is that Drosophila does not express the enzyme, HGPRT, which when mutated causes LND. The authors, instead, mutate the functionally-related Drosophila Aprt enzyme. However, it is unknown whether Aprt is also a structural homologue. Because of this, it will likely not be possible to identify pharmacological compounds that rescue HGPRT activity via a direct interaction (unless modelling predicts high conservation of substrate binding pocket between the two enzymes, etc).

      As stated in our Provisional Responses prior to revision of the Reviewed Preprint, the enzymes APRT and HGPRT are actually known to be functionally and structurally related. We apologize for not providing this information in the original submission. This point is now made clearer in the revised article on page 39, lines 785-792. Indeed, both human APRT and HGPRT belong to the type I PRTases family identified by a conserved phosphoribosyl pyrophosphate (PRPP) binding motif, which is used as a substrate to transfer phosphoribosyl to purines. This binding motif is only found in PRTases from the nucleotide synthesis and salvage pathways (see: Sinha and Smith (2001) Curr Opin Struct Biol 11(6):733-9, doi: 10.1016/s0959-440x(01)00274-3). The purine substrates adenine, hypoxanthine and guanine share the same chemical skeleton and APRT can bind hypoxanthine, indicating that APRT and HGPRT also share similarities in their substrate binding sites (Ozeir et al. (2019) J Biol Chem. 294(32):11980-11991, doi: 10.1074/jbc.RA119.009087). Moreover, Drosophila Aprt and Human APRT are closely related as the amino acid sequences of APRT proteins have been highly conserved throughout evolution (see Figure S5B in our paper).

      An additional weakness is that the study does not identify a molecule that may act as a lead compound for further development for treating LND. Rather, the various rescues reported are selective for only a subset of the disease-associated phenotypes. Thus, whilst informative, this first section of the study does not meet the study ambitions.

      In this study, we identify adenosine and N6-methyladenosine as rescuers of the epileptic behavior in Aprt mutant flies (shown in Figure 7E, F). Interestingly, the same molecules have been found to rescue the viability of fibroblasts and neural stem cells derived from iPSCs of LND patients, in which de novo purine synthesis was prevented (discussed on page 38, lines 747-753). This suggests that the Drosophila model reported here could help to identify new genetic targets and pharmacological compounds capable to rescue HGPRT mutations in humans.

      The second approach adopted is to express a 'humanised mutated' form of HGPRT in Drosophila, which holds more promise for the development of a pharmacological screen. In particular, the locomotor defect is recapitulated but the seizure-like activity, whilst reported as being recapitulated, is debatable. A recovery time of 2.3 seconds is very much less than timings for typical seizure mutants. Nevertheless, the SING behaviour could be sufficient to screen against. However, this is not explored.

      We agree with the reviewer that it would be very interesting to do a pharmacological screen in this second LND model. However, we did not have the possibility to carry out such a screen yet.

      In summary, this is a largely descriptive study reporting the behavioural effects of an Aprt loss-offunction mutation. RNAi KD and rescue expression studies suggest that a mix of neuronal (particularly dopaminergic and possibly adenosinergic signalling pathways) and glia are involved in the behavioural phenotypes affecting locomotion, sleep and seizure. There is insufficient evidence to have confidence that the Arpt fly model will prove valuable for understanding / treating LND.

      Here we report many common phenotypes between the Aprt fly model and the symptoms of LND patients (reduced longevity, locomotor problems, sleep defects, overproduction of uric acid that is rescued by allopurinol treatment…). Moreover, APRT and HGPRT enzymes are both functional and structural homologues, as explained in our answers. We also found that the same drugs can rescue the seizure-like phenotype in Aprt-deficient flies and the viability of LND fibroblasts and neural stem cells, derived from iPSCs of LND patients, in which de novo purine synthesis is prevented (Figure 7E, F). In many respects, our results therefore suggest that Aprt mutant flies could be useful to better understand LND, and potentially to screen for new therapeutic compounds.

      From the Reviewing Editor:

      (1) How are the pathways of purine catabolism different between flies and mammals? How does the absence of HGPRT and presence of only AGPRT affect purine catabolism? When did HGPRT appear in evolution?

      Purine catabolism is quite similar in flies and mammals, except for the lack of urate oxidase in primates, as described in Figure S1. We added words in the revised article about purine anabolism/catabolism pathways lines 123-126 (see below our detailed response to Reviewer 1’s Recommandations). HGPRT is present in Bacteria, Archea and Eukaryota, and nearly all animal phyla. However, BLAST search indicates that HGPRT homologues cannot be found in most insect species, such as Drosophila. To reinforce our conclusions about the lack of homologue of the human HPRT1 gene in Drosophila melanogaster, we have now added a Results section about the evolution of HGPRT proteins on pages 6-7, lines 122-150, and two phylogenetic analyses as new Figures S2 and S3 with details in legends.

      In addition to BLAST a structural based modelling method should be used to establish the loss of HGPRT in Drosophila.

      In agreement with the phylogenetic analyses, we have confirmed that no HGPRT enzymatic activity can be detected in wild-type Drosophila extract (Table S2). To complete these observations, as recommended by reviewer #2, we have carried out 3D structure-based searches in the RCSB Protein Data Bank. This enabled us to compare human HGPRT with all currently available protein structures. W found no Drosophila protein with a divergent sequence showing relevant structural similarity to human HGPRT. In contrast, this search identified proteins similar to human HGPRT in many other species of Eukaryota, Archea and Bacteria. This is now mentioned on page 7, lines 146-150 in the revised article.

      (2) Of the three biochemical changes reported the change in dopamine levels should be validated by other methods given the unreliable nature of IHC.

      As recommended by Reviewer #1, we have added the results of new experiments carried out by RTqPCR and Western blotting, which confirm the effect of Aprt mutation on brain dopamine levels. In addition, we added the consistent result that Aprt overexpression reduces transcript levels of DTH1. The results are shown in the new panels E to H of Figure 5 and mentioned in the text on page 20, lines 385-389.

      (3) As suggested by reviewer 2 it would be helpful to clearly identify which of the three biochemical changes (DA, uric acid, adenosine) are responsible for the numerous behaviours tested. This is important because it is relevant for developing any therapeutic strategy arising from this study.

      We agree that it would be very interesting to decipher the relationship between the different behaviors observed in mutant flies and the biochemical changes (dopamine, uric acid or adenosine). However, this would require a large amount of new experiments and it would probably double the size of our paper, which already includes many original data. In our opinion, such a detailed study should logically be the purpose of another article.

      (4) There is concern regarding the robustness of the seizure data. Reviewer 3 has suggestions on how to address this.

      See our answers to Reviewer 3’s recommendations below.

      (5) Editorial corrections and changes suggested by reviewers 2 and 3 need to be addressed.

      As indicated in our answers, we have taken into account and when possible addressed the corrections and changes suggested by the reviewers.

      (6) It is recommended that the authors tone down the relevance of this model for LND, particularly in the abstract. The focus should be on stating what is actually delivered.

      As recommended by the reviewing editor, and to take in account the reserved comments of reviewer #3, we have toned down our affirmation that our new fly models are relevant for LND in the last sentences of the Abstract and Discussion, and also added a question mark in the subtitle of the Discussion on line 777. As mentioned in our provisional responses to the Public Reviews, we would like to emphasize, however, that reviewers #1 and #2 expressed more confidence than reviewer #3 in the potential usefulness of our work. Reviewer #1 indeed stated that: “The findings provide a new example of how manipulating specific genes in the fruit fly allows the study of fundamental molecular processes that are linked to a human disease”, and reviewer #2 further wrote: "Altogether, these are very important and fundamental findings that convincingly demonstrate the establishment of a Drosophila model for the scientific community to investigate LND, to carry out drug testing screens and find cures”, and added: “To conclude, this is a fundamental piece of work that opens the opportunity for the broader scien2fic community to use Drosophila to inves2gate LND”.

      Reviewer #1 (Recommendations For The Authors):

      • An important prerequisite for the current study is that there appears to be no HGPRT "activity" in Drosophila. It is initially stated that there was previously no "HGPRT activity observed" in two papers form the 70ies. It would be important to corroborate this notion and provide some background on the <br /> /catabolism pathways. How shared or divergent are these pathways between Drosophila and mammals?

      In agreement with the pioneering studies of Becker (1974a, b), we have confirmed in this work that no HGPRT enzymatic activity can be detected in wild-type Drosophila extracts, as mentioned in Results on page 6, lines 127-130 and reported in Table S2. Purine catabolism is quite similar in flies and mammals, except for the lack of urate oxidase in primates, as shown in Figure S1. All the enzymes involved in purine anabolism/catabolim or recycling in humans have been conserved in Drosophila and humans, with the notorious exception of HPRT1.

      If there is no HGPRT gene, but only the APRT ortholog, what would this mean for the metabolites? Our enzymatic assays on Drosophila extracts indicated that hypoxanthine and guanine cannot be recycled into IMP and GMP, respectively, contrary to adenine which can be converted into AMP in flies. In the absence of HGPRT activity, GMP and IMP could be produced by de novo purine synthesis, or, alternatively, synthesized from AMP, which can be converted into IMP by the enzyme AMPD, and then IMP can be converted into GMP by the enzymes IMPDH and GMPS. These metabolic pathways are depicted in Figure S1A.

      Is the lack of HGPRT specific for Drosophila, insects (generally in invertebrates)? I feel clarifying this would provide more insight into the motivation of the experimental approach.

      As suggested by the Reviewer and the Reviewing Editor, we have addressed the evolution of HGPRT proteins more precisely in the revision. We have added a section on this subject in Results on pages 67, lines 122-150, and two phylogenetic analyses as Figures S2 and S3 with details in legends. A phylogenetic analysis was carried out a few years ago by Giorgio Matassi, who is now co-author of this paper. The most striking result was the great impact of horizontal gene transfer in the evolution of HGPRT in Insects (Figures S2 and S3). Our analysis of the phyletic distribution of HGPRT proteins revealed their striking rareness in Insecta, and in particular, their absence in Drosophilidae. The PSIBlast search detected however a significant hit in Drosophila immigrans (accession KAH8256851.1). Yet, this sequence is 100% identical to the HGPRT of the Gammaroteobacterium Serratia marcescens. Indeed, a phylogenetic analysis showed that D. immigrans HGPRT clusters with the Serratia genus (see Figure S3). This can be interpreted either a contamination of the sequenced sample, or as a very recent horizontal gene transfer event. The second scenario is more likely for the corresponding nucleotide sequences differ by 5 synonymous substitutions (out of 534 positions). A powerful approach to try to understand the "origin" of the D. immigrans protein would be to analyze whether horizontal gene transfer has affected its chromosomal neighbours. This approach, proposed previously by G. Matassi (BMC Evol Biol, 2017, 17:2, doi: 10.1186/s12862-016-0850-6), is highly demanding in terms of computing time and would require an ad hoc study. We hope that these new analyses address the Reviewer's concerns.

      • On the mechanistic side on how the behavioral defects may arise, the authors show that dopaminergic neurons (and glia cells) are involved. One interesting finding is that dopamine immunostainings suggest increased dopamine levels. However, immunostainings are notorious for artifacts and do not provide a strong quantitative assessment. I feel it would be helpful to have an alternative technique to corroborate this finding.

      We agree with the reviewer and we added the results of further confirmatory experiments in the four new panels E-H of Figure 5, showing that: 1) the transcript levels of DTH1 (encoding the neuronal isoform of the dopamine-synthesizing enzyme tyrosine hydroxylase in Drosophila) are increased in Aprt5 mutants compared to wild-type flies (new Figure 5E), 2) consistent with this, DTH1 transcript levels were found in contrast to be decreased when Aprt was overexpressed ubiquitously in flies (new Figure 5F), 3) Western blot experiments showed that DTH1 protein levels are also increased in Aprt5 mutant flies compared to controls (new Figure 5G-H).

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in the public review, the behavioural phenotypes of decreased lifespan, SING, sleep and seizures could be tested for all manipulations: feeding with allopurinol, adenosine and m6A, and combining this with knock-down dopamine levels in PAMs or MBs. This could help dissect the relationship between mutations in Aprt and behaviour.

      We thank the reviewer for these suggestions, and, indeed, we would have liked to do all these experiments. However, as mentioned in our responses to the Public Reviews, carrying out these experiments properly with sufficient repeats would require about two more years of work. We have already accumulated a large amount of data, so we have decided to publish our results at this stage in order to make our new fly models available to the scientific community. We are giving careful and due consideration to these experimental proposals and we hope to continue our investigation on this topic in the future.

      It would also be helpful to find out which neurons and glia express AdoR. Perhaps there are already tools available the authors could test or at least check with the scRNAseq Fly Atlas (public Scope database).

      Following the reviewer’s recommendation, we have checked the scRNAseq Fly Atlas for AdoR expression in the brain, compared to that of ple (encoding tyrosine hydroxylase) and Eaat1 (encoding the astrocytic glutamate transporter). As shown in the image below, the results are not very informative. AdoR appears to be expressed in rather widespread subsets of neurons and glial cells, that partly overlap with ple and Eaat1 expression. Further work would be required to identify more precisely the neurons and glial cells expressing AdoR in the brain.

      Author response image 1.

      Page 7, line 161: use of the word 'normalize'. "We tried to normalise uric acid content in flies..." would best to use 'rescue' instead, as normalisation in science has a different meaning.

      We modified this word as suggested.

      Page 9 line 203: 'genomic deficiencies that cover': the genetic term is 'uncover', as a deficiency for a locus reveals a phenotypes, thus it is said 'a gene uncovered by xx deficiency".

      Thank you for this helpful remark. We corrected this in line 221.

      Page 10, lines 206-208: 'allopurinol treatment did not improve the locomotor activity...". These are important observations that should be best presented within the main manuscript Figure 1.

      As recommended, we have transferred the graphs of Figure S5 to new panels D and E of Figure 1.

      Figure 4: please indicate genotypes in the figure, where no information is given that these are UASAprt-RNAi experiments.

      We added the complete genotype in Figure 4G, and also in Figure S12C and D. Thank you for noting that.

      Page 25 line 491: "None of these drugs was able to rescue the SING defects (data not shown)". Either provide the data or remove this claim.

      We have added these data in the new Figure S15.

      Statistical analyses: details are provided in the methods, but the name of test and multiple comparisons corrections should be also provided in the legends.

      Thank you very much for the careful proofreading. This was an oversight and we have added the information in all legends of the revised article.

      Reviewer #3 (Recommendations For The Authors):

      This is a difficult manuscript to appreciate. The abstract and introduction suggest that the study is to identify novel treatments for a human disease (LND) by development of a Drosophila model. Much of the results, however, are focussed to describing the consequences to purine metabolism of the Aprt mutation. To my mind, a rewrite to focus on the latter would be beneficial. The potential applicability to LND would be best restricted to the discussion.

      We apologize for not making our goals clearer. Our purpose was to find out if purine recycling deficiency could lead to metabolic and neurobehavioral disturbances in Drosophila, as it is the case in human LND patients when HGPRT is mutated. Interestingly, we observed that mutation of the only purine recycling enzyme in flies, Aprt, did induce defects in part comparable to that of LND in humans, including overproduction of uric acid that is rescued by allopurinol treatment, reduced longevity, and various neurobehavioral phenotypes including bang-sensitive seizure, sleep defects and locomotor impairments. We also identified adenosine and N6-methyladenosine as rescuers of the epileptic behavior in these mutants. These drugs were also identified as therapeutic candidates in screens based on iPSCs from LND patients. This suggests that Aprt deficiency in Drosophila could be used as a model to better understand this disease and find new therapeutic targets.

      Regardless of the above comment, the concluding sentence of the abstract is inappropriate. This study does not show that Drosophila can be used to identify a cure for LND.

      We agree with the Reviewer that the last sentence of the abstract was too affimative. As also recommended by the reviewing editor, we have modified this sentence in the abstract and other sentences in the text in order to tone down the affirmation that our new fly models are relevant for LND. See our answers to the Reviewing Editor above for details.

      Indeed, I would challenge the premise that screening against a functional, but unknown if structural, homologue (Aprt) will ever provide an exploitable opportunity. To meet this statement, this study needs to identify a treatment that rescues all of the behavioural phenotypes associated with the Aprt mutation, in addition to rescuing the influences of the mis-expression of mutated HGPRT.

      APRT and HGPRT are both functionally and structurally related. Both human APRT and HGPRT belong to the type I PRTases family identified by a conserved phosphoribosyl pyrophosphate (PRPP) binding motif, which is used as a substrate to transfer phosphoribosyl to purines. This binding motif is only found in PRTases from the nucleotide synthesis and salvage pathways (see: Sinha and Smith (2001) Curr Opin Struct Biol 11(6):733-9733-9, doi: 10.1016/s0959-440x(01)00274-3). The purine substrates adenine, hypoxanthine and guanine share the same chemical skeleton and APRT can bind hypoxanthine, indicating that APRT and HGPRT also share similarities in their substrate binding sites (Ozeir et al. (2019) J Biol Chem. 294(32): 11980-11991, doi: 10.1074/jbc.RA119.009087)). This point has been made clearer in the Discussion page 39, in lines 785-792.. Finally, Drosophila Aprt and Human APRT are closely related as the amino acid sequences of APRTs have been highly conserved throughout evolution (shown in Figure S5B).

      With respect to expression of the mutated HGPRT: the short seizure recovery time of 2.3 seconds is not very convincing evidence of a seizure phenotype. This is far below the timings reported for typical BS mutations. Because of this, the authors should run a positive control (e.g. one of the wellestablished BS mutations: parabss, eas or jus) to validate their assay. Moreover, was the seizure induced by the Aprt mutation (17.3 secs - again a low value) rescued by prior exposure to an antiepileptic? Could this behaviour be, instead, related to the SING locomotor phenotype?

      The assay we used to test for bang-sensitivity has been validated in previous articles from different laboratories. We agree that the recovery times we observed were shorter than those of the BS mutations mentioned by the reviewer. However, we could cite another Drosophila BS mutant, porin, that shows similarly short recovery times (2.5 and 6 sec, according to the porin alleles tested, Graham et al. J Biol Chem. 2010, doi: 10.1074/jbc.M109.080317). This is now mentioned on page 36 lines 717-720). In addition, the BS phenotype we observed with Aprt mutants was robust and highly significant compared to control flies (Figure 7). We did not try to rescue this phenotype by exposing the flies to an antiepileptic, but we do not think that it can be related to the SING phenotype. Indeed, providing adenosine or N6-methyladenosine to Aprt5 mutant flies was able to rescue the BS phenotype (Figure 7E, F), but did not rescue the locomotor defects (new Figure S15). Moreover, SING performances of Aprt5 mutant flies at 8 or 30 d a. E. are decreased nearly in almost identical way (Figure 1C), while we observed an effect on BS behavior at 30 d a. E., which implies that the SING and BS behaviors are most likely unrelated.

      Line 731 states that 'Aprt mutants show a typical BS phenotype' - whilst accurate to some extent (e.g. the behaviour depicted in the supp videos), it should be made clear, it should be made clear that the recovery time is uncharacteristically short and thus differs from typical BS mutations.

      We have corrected the sentence in the revised article to mention that (page 36, lines 717-718).

      Line 732 stating that BS phenotype is often linked to neuronal activity - what other links would there be? Even if via glia or other tissues the final effect is via neurons.

      We have modified this sentence (page 36, line 720).

      The introduction and, particularly, the discussion are overly long and, in the case of the latter, repetitive of the results text. Pruning to make the paper more concise would be very beneficial. Removal of the extensive speculation about how DA and adenosine may interact would help in this regard (line 688 onwards). Indeed, in many places the discussion morphs into a review.

      We agree with the reviewer on this point, and have therefore done our best to shorten the Introduction and Discussion, which are now 24% and 21% shorter, respectively, in the revised article compared to the original submission.

      The applicability of using Drosophila Aprt mutations to screen for compounds that may treat LND is predicated on some degree of similarity in either enzyme structure or metabolic pathways. A discussion of how relevant, therefore, studying Aprt is needs to be included. Given the authors insights - where should potential new rugs be targeted to?

      As stated above, we now mention in the article that APRT and HGPRT share similarities in their structure. In addition, the metabolic pathways between humans and Drosophila have been largely conserved (shown in Figure S1B).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to review.

      We thank the editors and reviewers for their time in assessing our manuscript. We changed the title to remove the word “all” because we realized that was hyperbolic. Corrections in response to review are in blue text throughout the manuscript document (other minor corrections are not highlighted).

      eLife assessment

      This study presents valuable insights into the evolution of the gasdermin family, making a strong case that a GSDMA-like gasdermin was already present in early land vertebrates and was activated by caspase-1 cleavage. Convincing biochemical evidence is provided that extant avian, reptile, and amphibian GSDMA proteins can still be activated by caspase-1 and upon cleavage induce pyroptosis-like cell death - at least in human cell lines. The caspase-1 cleavage site is only lost in mammals, which use the more recently evolved GSDMD as a caspase-1 cleavable pyroptosis inducer. The presented work will be of considerable interest to scientists working on the evolution of cell death pathways, or on cell death regulation in non-mammalian vertebrates.

      We thank the editor for their time in evaluating our manuscript. We agree with the eLife assessment and with the comments of the reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors start out by doing a time-calibrated gene/species tree analysis of the animal gasdermin family, resulting in a dendrogram showing the relationship of the individual gasdermin subfamilies and suggesting a series of gene duplication events (and gene losses) that lead to the gasdermin distribution in extant species. They observe that the GSDMA proteins from birds, reptiles, and amphibians do not form a clade with the mammalian GSDMAs and notice that the non-mammalian GSDMA proteins share a conserved caspase-1 cleavage motif at the predicted activation site. The authors provide several series of experiments showing that the non-mammalian GSDMA proteins can indeed be activated by caspase-1 and that this activation leads to cell death (in human cells). They also investigate the role of the caspase-1 recognition tetrapeptide for cleavage by caspase-1 and for the pathogen-derived protease SpeB.

      We thank the reviewer for their time in evaluating our manuscript.

      Strengths:

      The evolutionary analysis performed in this manuscript appears to use a broader data basis than what has been used in other published work. An interesting result of this analysis is the suggestion that GSDMA is evolutionarily older than the main mammalian pyroptotic GSDMD, and that birds, reptiles, and amphibians lack GSDMD but use GSDMA for the same purpose. The consequence that bird GSDMA should be activated by an inflammatory caspase (=caspase1) is convincingly supported by the experiments provided in the manuscript.

      We thank the reviewer for their assessment of the manuscript.

      Weaknesses:

      1. As a non-expert in phylogenetic tree reconstruction, I find the tree resulting from the authors' analysis surprising (in particular the polyphyly of GSDMA) and at odds with several other published trees of this family. The differences might be due to differences in the data being used or due to the tree construction method, but no explanation for this discrepancy is provided.

      We agree, and we have modified the text to add more context to explain why our analysis generated a different topology: “In comparison to previously published studies, we used different methods to construct our gasdermin phylogenetic tree, with the result that our tree has a different topology. The topology of our tree is likely to be affected by our increased sampling of gasdermin sequences; we included 1,256 gasdermin sequences in comparison to 300 or 97 sequences used in prior studies. Prior studies used maximum likelihood tree building techniques, whereas we used a more computationally intensive Bayesian method using BEAST with strict molecular clocks that allows us to provide divergence time estimates, which we calibrated using mammal fossil estimated ages. We think that this substantially increased sampling paired with time calibration allow us to produce a more accurate phylogeny of the gasdermin protein family.”

      To explain and further support our method in a more technical manner, in our phylogenetic tree, non-mammal GSDMAs are paralogous to mammals GSDMAs whereas others have found that non-mammal GSDMAs are orthologous to mammal GSDMAs. We obtained moderate support for the non-mammal GSDMA placement with Bayesian posterior 0.42 and with maximum likelihood bootstrap support of 0.96. Angosto-Bazarra et al. has for their placement a Bayesian posterior of 0.66 and maximum likelihood bootstrap support of 0.98. These are good results, but they arise from significantly fewer sequences than are included in our tree. However, in Fig S2 of Angosto-Bazarra et al. the support drops to 0.08. That the posteriors in both are not 1 indicate the presence of phylogenetic conflicts (i.e., a significant fraction of alternative trees), which means that the tree of our study or Angosto-Bazarra could be incorrect. That said, our tree is supported by biological support, and our dataset is substantially larger. To better characterize this node, further sampling with even more species would be required. We exhausted the current available sequences at the time our tree was generated.

      Differences between our study and previous studies:

      Author response table 1.

      1. While the cleavability of bird/reptile GSDMA by caspase-1 is well-supported by several experiments, the role of this cleavage for pyroptotic cell killing is addressed more superficially. One cell viability assay upon overexpression of GSDMA-NTD in human HEK293 cells is shown and one micrograph shows pyroptotic morphology upon expression in HeLa cells. It is not clear why these experiments were limited to human cells…

      We did include one more experiment in human cells which is Figure 4B, in which we express full length chicken GSDMA with dimerizable caspase-1, and show that LDH release requires the cleavage site aspartate, D244. That said, we agree that our use of only human cell lines is a weakness of the paper. We thought that the best way to definitively show the interaction of caspase-1 and GSDMA was to perform experiments in chicken macrophages. Therefore, we generated a custom-raised anti-chicken-GSDMA antibody. Unfortunately, the quality of the antibody was insufficient to detect endogenous GSDMA in chicken bone marrow-derived macrophages. Off target binding prevented the observation of chicken GSDMA bands. We added a section to the discussion acknowledge the need for further studies: “In future studies, the association of bird/amphibian/reptile GSDMA and caspase-1 should be confirmed in native cells from each of these animals.”

      …and why two different cell types were used for the two complementary results.

      In the paper we used 293T cells and HeLa cells as generic cell types that have distinct benefits. In general, we used 293T/17 cells for experiments where high transfection efficiency was most critical, as it is simple to achieve 90% or higher transfection efficiency in this line. However, 293T/17s have poor spreading in culture and thus are not as useful for morphologic studies. 293T/17 cells do display pyroptotic ballooning upon gasdermin activation, however, the images are less pronounced in comparison to other cell types that have more distinct morphology. Therefore, we used HeLa cells for the microscopy experiments because they are more adherent and larger than 293T/17s which make for easier visualization of pyroptotic ballooning. We have added the following statement to the text to make our rationale for the use of different cell line more apparent: “In these experiments, 293T/17s were used for their high transfection efficiency, and HeLas were used for microscopy studies for their larger size and improved adherence.”

      1. The introduction mentions as a motivation for this work our lack of knowledge of how human GSDMA is activated. This is indeed an interesting and pressing question, but it is not really addressed in the manuscript. This is particularly true when believing the authors' dendrogram results that the bird and mammalian GSDMA families do not form a clade.

      As a consequence, the significance of this finding is mostly limited to birds and reptiles.

      Our aspirations were to discover hidden facets of mammal GSDMA by using a molecular evolutionary analysis. bird/amphibian/reptile GSDMA. Although we did not learn the identity of a host protease that activates mammalian GSDMA, we serendipitously discovered the evolutionary history of the association of caspase-1 with the gasdermin family. We think this manuscript provides an important and interesting advance in the field to reveal the process of evolution at work in the gasdermin family, and that the association of caspase-1 with a gasdermin to cause pyroptosis is an unbroken pairing throughout evolution. It is surprising to us that the specific gasdermin partner has changed over time.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigated the molecular evolution of members of the gasdermin (GSDM) family. By adding the evolutionary time axis of animals, they created a new molecular phylogenetic tree different from previous ones. The analyzed result verified that non-mammalian GSDMAs and mammalian GSDMAs have diverged into completely different and separate clades. Furthermore, by biochemical analyses, the authors demonstrated non-mammalian GSDMA proteins are cleaved by the host-encoded caspase-1. They also showed mammalian GSDMAs have lost the cleavage site recognized by caspase-1. Instead, the authors proposed that the newly appeared GSDMD is now cleaved by caspase-1.

      We thank the reviewer for their time in evaluating our manuscript.

      Through this study, we have been able to understand the changes in the molecular evolution of GSDMs, and by presenting the cleavage of GSDMAs through biochemical experiments, we have become able to grasp the comprehensive picture of this family of molecules. However, there are some parts where explanations are insufficient, so supplementary explanations and experiments seem to be necessary.

      Strengths:

      It has a strong impact in advancing ideas into the study of pyroptotic cell death and even inflammatory responses involving caspase-1.

      We thank the reviewer for the critical consideration of the phylogeny presented.

      Weaknesses:

      Based on the position of mammalian GSDMA shown in the molecular phylogenetic tree (Figure 1), it may be difficult to completely agree with the authors' explanation of the evolution of GSDMA.

      1. Focusing on mammalian GSDMA, this group, and mammalian GSDMD diverged into two clades, and before that, GSDMA/D groups and mammalian GSDMC separated into two, more before that, GSDMB, and further before that, non-mammalian GSDMA, when we checked Figure 1. In the molecular phylogenetic tree, it is impossible that GSDMA appears during evolution again. Mammalian GSDMAs are clearly paralogous molecules to non-mammalian GSDMAs in the figure. If they are bona fide orthologous, the mammalian GSDMA group should show a sub-clade in the non-mammalian GSDMA clade. It is better to describe the plausibility of the divergence in the molecular evolution of mammalian GSDMA in the Discussion section.

      We appreciate the reviewer’s careful consideration of our phylogeny. We agree that we did not make this clear enough in the discussion. Indeed, this is a confusing point, and is a critical concept in the paper. This is among our most important findings, so we have added a line addressing this finding to the abstract. We think about these concepts starting from the oldest common ancestor of a group, and then think about how genes duplicate over time. To the discussion we now begin with the following:

      We discovered that GSDMA in amphibians birds and reptiles are paralogs to mammal GSDMA. Surprisingly, the GSDMA genes in both the amphibians/reptiles/birds and mammal groups appear in the exact same locus. Therefore, this GSDMA gene was present in the common ancestor of all these animals. In mammals, this GSDMA duplicated to form GSDMB and GSDMC. Finally, a new gene duplicate, GSDMD, arose in a different chromosomal location. Then this GSDMD gene became a superior target for caspase-1 after developing the exosite. Once GSDMD had evolved, we speculate that the mammalian GSDMA became a pseudogene that was available to evolve a new function. This new function included a new promoter to express mammalian GSDMA primarily in the skin, and perhaps acquisition of a new host protease that has yet to be discovered.

      In further support of the topology of our Bayesian tree in Figure 1, we also performed a maximum likelihood analysis, which also placed the GSDMA genes into similarly distinct clades (Figure 1-S3). Finally, we have biological evidence to support this reasoning, where caspase-1 cleaves non-mammal GSDMAs and also mammal GSDMD (and no longer can cleave mammal GSDMA).

      1. Regarding (1), it is recommended that the authors reconsider the validity of estimates of divergence dates by focusing on mammalian species divergence. Because the validity of this estimation requires a recheck of the molecular phylogenetic tree, including alignment.

      Our reconstructed evolution of gasdermins is consistent with the mammal tree of life. We constrained Bayesian estimation of divergences using soft calibrations from mammal fossil estimated ages. We have included the fossil calibration of mammalian gasdermins to the results section and to our methods.

      1. If GSDMB and/or GSDMC between non-mammalian GSDMA and mammalian GSDMD as shown in the molecular phylogenetic tree would be cleaved by caspase-1, the story of this study becomes clearer. The authors should try that possibility.

      It is known that mammal GSDMB and GSDMC cannot be activated by caspase-1. We propose that GSDMA was cleaved by caspase-1 only in extinct mammals that had not yet associated GSDMD with caspase-1. Such an extinct mammal could have encoded a GSDMA cleaved by caspase-1, a GSDMB cleaved by granzyme A, and GDSMC cleaved by caspase-8. Later, the GSDMA gene was again duplicated to form GSDMD. After GSDMD was targeted by caspase-1, then GSDMA was free to gain its current function in barrier tissues.

      Reviewer #1 (Recommendations For The Authors):

      As a non-expert on phylogenetic tree construction, I found the "time-calibrated maximum clade credibility coalescent tree" hard to digest. I would have liked to see an explanation of how this method is different from what has been used before and why the authors consider it to be better. This is particularly important when considering that the resulting tree shown in Figure 1 is quite different from other published trees of the same family (e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8742441 where the GSDMA family appears monophyletic).

      Please see response to Reviewer 1 weaknesses above. Also, we have moved the text “time-calibrated maximum clade credibility coalescent tree” to the figure legend.

      In the bioinformatical analysis of the conserved caspase-1 cleavage motif in bird GSDMA sequences, I would recommend also addressing the residue behind the cleavage site Asp, as this position has an unusually high conservation (mostly Gly) in bird GSDMA.

      This is a great observation. We suspect that this may reflect a need for flexibility in the secondary structure to allow the cleavage site to enter the enzymatic pocket of the caspase. This residue is also similarly enriched in mammal GSDMD, which is also cleaved by caspase-1. We also note high conservation of a P2' proline residue in birds with the FASD tetrapeptide, which could also be important for displaying the tetrapeptide to the caspase.

      This comment prompted us to search the literature for evidence of these residues in caspase-1 substrate preference studies. Remarkably, a P1' glycine and P2` proline are among the most enriched residues in human caspase-1 targets. This supports our hypothesis that caspase-1 cleaves GSDMA in non-mammals. We added the following to the results section: “Additionally, the P1' residue in amphibian, bird and reptile GSDMA was often a glycine, and the P2' residue was often a proline, especially in birds with FASD/FVSD tetrapeptides (Fig. 2B). A small P1' residue is preferred by all caspases. By using a peptide library, glycine has been determined to be the optimal P1' residue for caspase-1 and caspase-4. Further, in a review of the natural substrates of caspase-1, glycine was the second most common P1' residue, and proline was the most common P2' residue. These preferences were not observed for caspase-9.”

      Finally, I would like the authors to at least explain why the cell viability assays were done in 293T cells while the micrographs were done in HeLa cells. Why not show both experiments for both cell types?

      In the paper we used 293T cells and HeLa cells as generic cell types that have distinct benefits. In general, we used 293T/17 cells for experiments where high transfection efficiency was most critical, as it is simple to achieve 90% or higher transfection efficiency in this line. However, 293T cells have poor spreading in culture and thus are not as useful for morphologic studies. 293T/17 cells do display pyroptotic ballooning upon gasdermin activation, however, the images are less pronounced in comparison to other cell types that have more distinct morphology. Therefore, we used HeLa cells for the microscopy experiments because they are more adherent and larger than 293T/17s which make for easier visualization of pyroptotic ballooning. We have added the following statement to the text to make our rationale for the use of different cell line more apparent: “In these experiments, 293T/17s were used for their high transfection efficiency, and HeLas were used for microscopy studies for their larger size and improved adherence.”

      There are a number of minor points related to language and presentation:

      • the expressions "pathogens contaminate the cytosol", "mammals can encode..", "an outsized effect" are unusual and might be rephrased.

      We changed these to:

      “manipulate the host cell, sometimes contaminating the cytosol with pathogen associated molecular patterns, or disrupting aspects of normal cell physiology”,

      “Only mammals encode GSDMC and GSDMD alongside the other four gasdermins.”,

      and

      “greater effect”

      • in line 87 the abbreviation "GSDMEc" is first used without explanation (of the "c").

      This is an important distinction, as GSDMEc proteins were only recently uncovered. To remedy this, we have added the following text following line 87: “This gasdermin was recently identified as an ortholog of GSDMA.

      It was called GSDMEc, following the nomenclature of other duplications of GSDME in bony fish that have been named GSDMEa and GSDMEb.”

      • line 89 grammar problem.

      Corrected

      • line 186ff the sentence "We believe..." does not appear to make sense.

      We revised the text to make this clear, changing the text to now read “We hypothesized that activating pyroptosis using separate gasdermins for caspase-1 and caspase-3 is a useful adaptation and allows for fine-tuning of these separate pathways. In mammals, this separation depends on the activation of GSDMD by caspase-1 and the activation of GSDME by caspase-3.”

      • many figures use pictures rather than text to represent species groups. These pictures are not always intuitive. As an example, in Figure 6 the 'snake' represents amphibians. After reading the text, I understand that these should probably be the caecilian amphibians, but not every reader might know what these critters look like. In Figure 7, I have no idea what the black blob (2nd image from top) is supposed to be.

      In crafting the manuscript, we found the use of text to denote the various species to be cumbersome. The species silhouettes are a standard graphical depiction used in evolutionary biology, which we think aids readability to the figures. For example, in a paper cited in our manuscript, these same silhouettes were used to depict the evolution of GSDMs (https://doi.org/10.3389/fcell.2022.952015 Figure 1A, Figure 3D, Figure 4G). However, we agree that many readers will not know that caecilians are legless amphibians that resemble snakes in their body morphology, but are not close to snakes by phylogeny. We think it is important to use an image of a caecilian amphibian because the more iconic amphibians (frogs, salamanders) do not encode GSDMA. To increase clarity, we have mentioned the morphology of caecilians in the legend of Figure 2, Figure 6, and Figure 7 when caecilican amphibians are first introduced.

      In Figure 2: “Note, that caecilians morphologically are similar to snakes in their lack of legs and elongated body, however, this is an example of convergent evolution as caecilians are amphibians and are thus more closely related to frogs and salamanders than snakes.”

      In Figure 6: “M. unicolor is an amphibian despite sharing morphological similarity to a snake.”

      In Figure 7: “In caecilian amphibians, which are morphologically similar to snakes, birds, and reptiles, GSDMA is cleaved by caspase-1.”

      The black blob is the mollusk Lingula anatina, which unfortunately has an indistinct silhouette. To clarify this, we have added text to label the images in Figure 7.

      Reviewer #2 (Recommendations For The Authors):

      1. Line 214, in "(Fig. 3-S2) Human and mouse ..", it is necessary to type a period.

      2. Line 238, in the subtitle, GSMA should be amended to GSDMA.

      These have both been corrected.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the three reviewers for their positive comments and helpful suggestions. We have addressed the issues raised which have helped to improve the manuscript. Below, we address the specific points with detailed responses.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      1) Figure 2 - figure supplement 1. The figure states minimal medium while the legend states rich medium.

      We have corrected the legend as the experiment was done in minimal medium.

      2) Figure 3B - the statements in the text do not seem to match what is in the figure. "Cluster 1 (293 genes, 12 priority unstudied) is enriched for genes showing high expression variability across different conditions (71) and for genes induced during meiotic differentiation (72) and in response to TORC1 inhibitors (29). Cluster 2 (570 genes, 20 priority unstudied) is enriched for phenotypes related to cell mating and sporulation, e.g. 'incomplete cell-wall disassembly at cell fusion site' or 'abnormal shmoo morphology'". These terms (high expression variability, meiotic differentiation, TORC1 inhibitors, cell mating and sporulation/abnormal shmoo morphology" are not seen in the figure.

      As stated in the Results, we have carried out analyses with both Metascape and AnGeLi for functional enrichments in different GO and KEGG pathway terms (Figure 3B; Metascape) and/or among genes from published expression or phenotyping studies (AnGeLi). The enrichments for expression variability, meiotic differentiation, TORC1 inhibitors, and cell mating/sporulation/abnormal shmoo morphology are not based on GO terms but on lists from published expression and phenotyping experiments. We have slightly edited the sentence in the Results to make this clearer.

      3) The authors could consider citing a systematic screen for sporulation in the introduction (PMID: 292590

      We have cited 17 papers for growth screens under different conditions using similar approaches as used by us. Given that we already cite 100 papers, we did not choose to cite numerous other papers reporting screens for more complex phenotypes (cell morphology, mating, meiosis, recombination, etc), which are not directly relevant to our study here.

      Reference PMID: 292590 refers to a 1979 paper in the German Dentist Journal.

      Reviewer #2 (Recommendations For The Authors):

      General comments

      1) The authors use their NET-FF approach to predict GO Biological Process and Molecular Function terms (Figure 4). Why was the Cellular Component ontology not included? In general, gene and protein functional characterization is best described by the Biological Process and Cellular Component ontologies, whereas Molecular Function describes the biochemical activity of a protein. In other words, proteins which share Biological Process and/or Cellular Component annotations often function in the same module, which may not be the case for shared Molecular Function annotations.

      We did not include Cellular Component because in previous benchmarking of our method using CAFA datasets our approach did not perform well at predicting Cellular Component. This aspect is harder to pick up from homology data and protein network data and is generally the toughest challenge in CAFA. In contrast, our predictions of Biological Process and Molecular Function are competitive with other methods. We have now made the reason for omitting Cellular Component clearer in the Methods.

      2) The authors use protein embeddings produced by integrating 6 STRING networks using the deepNF method. One of these networks is the "database" network. According to STRING (https://academic.oup.com/nar/article/47/D1/D607/5198476): "The database channel is based on manually curated interaction records assembled by expert curators, at KEGG, Reactome, BioCyc and Gene Ontology, as well as legacy datasets from PID and BioCarta". If one of the input networks contains information from GO, and then embeddings containing this information are used to predict GO annotations, are the authors not then leaking annotations which could improve downstream GO annotation predictions? It would be valuable to demonstrate to what extent the "database" network is contributing by repeating the GO prediction analyses with this network removed.

      We agree and also pointed out this circularity in the manuscript. We used an independent dataset – phenotype data – to benchmark our method, which showed good performance. Note that this study did not aim to develop a completely new method or improve on deepNF and CATH-FunFams but to integrate and exploit their combined power. For that reason, we wanted to keep as many high-quality curated edges in the STRING network as possible. Combining these independent methods brings synergies from their complementary approaches to facilitate interpretation of gene function.

      Minor comments

      1) Ternary encoding was used as a preprocessing step on the phenotype data before clustering was performed. An explanation of why this encoding was necessary (as opposed to a normalization/standardization approach) would be helpful.

      Ternary encoding was not strictly necessary but provided more nuanced and coherent clusters. Some conditions and mutants were associated with much larger phenotypic responses which disproportionately influenced the clustering. After trying different approaches, we followed the recommendations from the R package microbialPhenotypes (https://github.com/peterwu19881230/microbialPhenotypes), which is now specified in the legend of Fig. 3A. Discretizing the data also helped to compare phenotypes across different types of mutants, and we have applied this approach previously in our phenomics study of non-coding RNA mutants (Rodriguez-Lopez et al. eLife 2022). Moreover, this approach allowed us to generate vectors of phenotypes for calculating phenotypic distances between mutants (including hamming distance or Pearson correlations), which supported the posterior cluster analysis using Cytoscape.

      2) The authors use a validation set to perform early-stopping on the deepNF model. However, it appears that the validation set proteins are then used in downstream analyses anyway: "After training, weights from the epoch with the lowest validation loss were used to generate embeddings for all proteins" (my emphasis). In the case where the model was being used to generalize to new proteins (such as classification), this analysis would not be a valid way to perform hyperparameter tuning (e.g. early-stopping) since the validation set is then used in downstream analyses. However, deepNF is performing an unsupervised, multi- network encoding on all the available datapoints (proteins). In the case where only deepNF loss is being used to tune the hyperparameters, it's not necessary to use a held-out validation set - it is appropriate to use the full set of proteins to do this.

      Our Random Forest consisted of 500 trees with default values for the number of sub- features as √n and partial sampling of 0.7. GO terms were predicted using 5-fold cross- validation. Changing parameters showed that our model was robust to the values of the hyperparameters, so we settled on our initial model.

      3) The NET-FF hyperparameter tuning results should be made available in the supplement.

      We do not think this would be useful for the reason described in the reply above.

      Reviewer #3 (Recommendations For The Authors):

      Major points

      1) Why were the quantitive colony size data converted to -1, 0, and 1?

      It is unclear to me why the authors decided to convert the colony size data to ternary encoding of -1, 0, and 1. The original colony size data seem to be of fairly high precision so that the authors can detect a 5% difference from the wild type. I guess the authors must have tried using the quantitive colony size data for clustering analysis and found the results unsatisfactory. If that is the case, can the authors provide some possible explanations?

      A similar query has been raised by Reviewer 2. Ternary encoding provided more nuanced and coherent clusters. Some conditions and mutants were associated with much larger phenotypic responses which disproportionately influenced the clustering. After trying different approaches, we followed the recommendations from the R package microbialPhenotypes, as now specified in the legend of Fig. 3A. Discretizing the data also helped to compare phenotypes across different types of mutants, and we have applied this approach previously in our phenomics study of non-coding RNA mutants (Rodriguez-Lopez et al. eLife 2022). Moreover, this approach allowed us to generate vectors of phenotypes for calculating phenotypic distances between mutants (including hamming distance or Pearson correlations), which supported the posterior cluster analysis using Cytoscape.

      2) What do 5% difference and 10% difference look like?

      The authors used 5% difference and 10% difference as cutoffs. I am curious whether a 5% difference in colony size is obvious to human eyes. Can the authors show some plate images and label colonies that differ from the wild type by about 5% and 10%? It will help readers understand the thresholds used for determining whether a mutant has a phenotype.

      Showing the original ‘raw’ colonies would not be meaningful because all colony sizes have been grid-corrected as described (Kamrad et al. eLife 2020). The grid correction takes care of three issues: (1) it converts colony size into an easily interpretable value by reporting a ratio relative to wild type; (2) it makes results comparable across different plates/batches; and (3) it corrects for within-plate positional effects which become apparent due to the same wild-type grid strain showing different fitness in different plate positions. But in principle, detecting a 5% difference in colony size by eye would be hard, and multiple measurements are required (>10 repeats) to obtain statistically reliable results. Author response image 1 shows the grid colonies in red frames and numbers at bottom right of colonies indicate the corrected effect sizes. Colony 17-8 (top right) is an example of a colony differing by 5% compared to neighbouring colonies 16-8 and 17-9.

      Author response image 1.

      3) How were the phenotyping conditions chosen?

      I am sure that the authors have put a lot of thoughts into designing the 131 phenotyping conditions. It will benefit the readers if the authors can explain how these conditions were chosen. For example, what literature precedents were considered and which conditions have never been examined before in S. pombe research? For drug treatment conditions, were pilot tests done to choose drug doses based on the growth inhibition effects on the wild type?

      We have used a wide range of different types of conditions that affect diverse processes (see colour legend on top of Fig. 3A). This was based on our previous experience and selection of conditions in large-scale phenotyping of wild strains (Jeffares et al. Nature Genetics 2015) and non-coding RNA mutants (Rodriguez-Lopez et al. eLife 2022). For previously applied conditions (e.g. oxidants), we used literature precedents for the doses, while for other conditions, we used trial and error to adjust the diose such that wild-type cell growth is barely inhibited. For some drugs and stresses, we assayed both low and high doses, in which wild-type cell growth is normal or inhibited, respectively, to uncover both sensitive or resistant mutants.

      Minor points

      1) One of the growth condition is "YES_ethanol_1percent_no_glucose". I am curious how this is possible, as S. pombe cannot use ethanol as a carbon source.

      We assume that the cells contain sufficient internal glucose to fuel growth and division for a few cycles before running out of glucose. Thus, cells showed some residual growth on this medium, but growth is indeed very limited. Nevertheless, we could identify both sensitive and resistant mutants in this condition.

      2) Abstract "over 900 new proteins affected the resistance to oxidative stress". This sentence should be rephrased. Perhaps it is better to say "over 900 proteins were newly implicated in the resistance to oxidative stress".

      Yes, we have edited the sentence as suggested.

      3) Page 4 "S. pombe encodes 641 'unknown' genes (PomBase, status March 2023). " "Among these 643 unknown proteins, many are apparently found only in the fission yeast clade, but 380 are more widely conserved. " Which number is correct, 641 or 643?

      These numbers keep changing slightly. We now consistently use 641, the number from March 2023.

      4) Page 4 "These priority unstudied proteins have not been directly studied in any organism but can be assumed to have pertinent biological roles conserved over 500 million years of evolution. " According to http://timetree.org/, S. pombe and H. sapiens diverged about 1275 million years ago.

      We have now changed ‘over 500 million’ to ‘over 1000 million’, although there are of course different estimates for these times.

      5) "Using these potent wet and dry methods, we obtained 103,520 quantitative phenotype datapoints for 3,492 non-essential genes across 131 diverse conditions."

      I think "quantitative phenotype datapoints" are generated using wet methods, not dry methods. Yes, we have now deleted ‘Using these potent wet and dry methods,’ and start the sentence with ‘We obtained…’

      6) Abstract "We assayed colony-growth phenotypes to measure the fitness of deletion mutants for all 3509 non-essential genes"

      Page 6 "We performed colony-based phenotyping of the deletion mutants for all non- essential S. pombe genes"

      It is not clear to me how the authors can claim that the 3509 non-essential genes correspond to "all non-essential S. pombe genes". The authors should explain how they classify S. pombe genes into essential genes and non-essential genes. The deletion project papers (Kim et al. 2010 and Hayles et al. 2013) provided binary classification for most but not all genes, as there are genes whose deletion mutants were not generated by the deletion project. PomBase does not use a binary classification and there are a number of genes deemed "Gene Deletion Viability: Depends on conditions" by PomBase.

      We used the latest deletion library (Bioneer Version 5) as well as additional deletion mutants published by Kathy Gould and colleagues, which together should capture all non- essential genes. But we agree that non-essentiality is not that clear-cut and context- dependent. So we have deleted ‘all’ in the two sentences highlighted above.

      7) Page 20 "Other clusters contained mostly genes involved in vacuolar/endosomal transport and peroxisome function, along with poorly characterized genes (Figure 6B)."

      This sentence needs rephrasing. Perhaps it is better to say "Cluster 31 and cluster 22 contained respectively mostly genes involved in vacuolar/endosomal transport and peroxisome function, along with poorly characterized genes (Figure 6B)."

      We have edited this sentence to ‘Cluster 31 and Cluster 22 contained mostly genes involved in vacuolar/endosomal transport and peroxisome function, respectively, along with poorly characterized genes (Figure 6B).’

      8) Legend of Figure 2-figure supplement 1A

      "Left: Volcano plot of mutant colony sizes for priority unstudied genes (green) and all other genes (grey) growing in rich medium. " I think "rich medium" should be "minimal medium".

      Yes, we have now corrected this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the role of orexin receptors in dopamine neurons is studied. Considering the importance of both orexin and dopamine signalling in the brain, with critical roles in arousal and drug seeking, this study is important to understand the anatomical and functional interaction between these two neuromodulators. This work suggests that such interaction is direct and occurs at the level of SN and VTA, via the expression of OX1R-type orexin receptors by dopaminergic neurons.

      Strengths:

      The use of a transgenic line that lacks OX1R in dopamine-transporter-expressing neurons is a strong approach to dissecting the direct role of orexin in modulating dopamine signalling in the brain. The battery of behavioural assays to study this line provides a valuable source of information for researchers interested in the role of orexin-A in animal physiology.

      We thank the reviewer for summarizing the importance and significance of our study. 

      Weaknesses:

      The choice of methods to demonstrate the role of orexin in the activation of dopamine neurons is not justified and the quantification methods are not described with enough detail. The representation of results can be dramatically improved and the data can be statistically analysed with more appropriate methods.

      We have further improved our description of the methods in the revised reviewed preprint, and here in the response letter, we respond point-by-point to ‘Reviewer #1 (Recommendations For The Authors)’ below. 

      Reviewer #2 (Public Review):

      Summary:

      This manuscript examines the expression of orexin receptors in the midbrain - with a focus on dopamine neurons - and uses several fairly sophisticated manipulation techniques to explore the role of this peptide neurotransmitter in reward-related behaviors. Specifically, in situ hybridization is used to show that dopamine neurons predominantly express the orexin receptor 1 subtype and then go on to delete this receptor in dopamine neurons using a transgenic strategy. Ex vivo calcium imaging of midbrain neurons is used to show that in the absence of this receptor orexin is no longer able to excite dopamine neurons of the substantia nigra.

      The authors proceed to use this same model to study the effect of orexin receptor 1 deletion on a series of behavioral tests, namely, novelty-induced locomotion and exploration, anxiety-related behavior, preference for sweet solutions, cocaine-induced conditioned place preference, and energy metabolism. Of these, the most consistent effects are seen in the tests of novelty-induced locomotion and exploration in which the mice with orexin 1 receptor deletion are observed to show greater levels of exploration, relative to wild-type, when placed in a novel environment, an effect that is augmented after icv administration of orexin.

      In the final part of the paper, the authors use PET imaging to compare brain-wide activity patterns in the mutant mice compared to wildtype. They find differences in several areas both under control conditions (i.e., after injection of saline) as well as after injection of orexin. They focus on changes in the dorsal bed nucleus of stria terminalis (dBNST) and the lateral paragigantocellular nucleus (LPGi) and perform analysis of the dopaminergic projections to these areas. They provide anatomical evidence that these regions are innervated by dopamine fibers from the midbrain, are activated by orexin in control, but not mutant mice, and that dopamine receptors are present. Thus, they argue these anatomical data support the hypothesis that behavioral effects of orexin receptor 1 deletion in dopamine neurons are due to changes in dopamine signaling in these areas.

      Strengths:

      Understanding how orexin interacts with the dopamine system is an important question and this paper contains several novel findings along these lines. Specifically:

      (1) The distribution of orexin receptor subtypes in VTA and SN is explored thoroughly.

      (2) Use of the genetic model that knocks out a specific orexin receptor subtype from only dopamine neurons is a useful model and helps to narrow down the behavioral significance of this interaction.

      (3) PET studies showing how central administration of orexin evokes dopamine release across the brain is intriguing, especially since two key areas are pursued - BNST and LPGi - where the dopamine projection is not as well described/understood.

      We thank the reviewer for the careful summary and highlighting the novelty of our study.

      Weaknesses:

      The role of the orexin-dopamine interaction is not explored in enough detail. The manuscript presents several related findings, but the combination of anatomy and manipulation studies does not quite tell a cogent story. Ideally, one would like to see the authors focus on a specific behavioral parameter and show that one of their final target areas (dBNST or LPGi) was responsible or at least correlated with this behavioral readout. In addition, some more discussion on what the results tell us about orexin signaling to dopamine neurons under normal physiological conditions would be very useful. For example, what is the relevance of the orexin-dopamine interaction blunting noveltyinduced locomotion under wildtype conditions?

      We agree that focusing on some orexin-dopamine targeting areas, such as dBNST or LPGi, is important to further reveal the anatomy-behavior links and underlying mechanisms. While we are very interested in further investigations, in the present manuscript we mainly aim to give an overview of the behavioral roles of orexin-dopamine interaction and to propose some promising downstream pathways in a relatively broad and systematical way. 

      We have explained the physiological meanings of our results in more detail in the discussion in the revised reviewed preprint (lines 282-293, 318-332, ). Novelty-induced behavioral response should be at proper levels under normal physiological conditions. The orexin-dopamine interaction blunting novelty-induced locomotion could be important to keep attention on the main task without being distracted too much by other random stimuli in the environment. When this balance is disrupted, behavioral deficit may happen, such as attention deficit and hyperactivity disorder (ADHD).  

      In some places in the Results, insufficient explanation and reporting is provided. For example, when reporting the behavioral effects of the Ox1 deletion in two bottle preference, it is stated that "[mutant] mice showed significant changes..." without stating the direction in which preference was affected.

      For the reward-related behaviors described in this study, we did not find significant changes between [mutant] and control mice. We agree that it will be helpful for readers by describing the behavioral tests in more details. In the revised reviewed preprint, we have described in more detail in the results and Materials and Methods section how the control and [mutant] mice behave to the reward (lines 162-165, 171-181, 526-528).  

      The cocaine CPP results are difficult to interpret because it is unclear whether any of the control mice developed a CPP preference. Therefore, it is difficult to conclude that the knockout animals were unaffected by drug reward learning. Similarly, the sucrose/sucralose preference scores are also difficult to interpret because no test of preference vs. water is performed (although the data appear to show that there is a preference at least at higher concentrations, it has not been tested).

      We described the CPP analysis in the Materials and Methods section (lines 523-528 ) as below: ‘The percentage of time spent in the reward-paired compartment was calculated: 100 x time spent in the compartment / (total time - time spent in the middle area). The CPP score was then analyzed using the calculated percentage of time: 100 x (time on the test day – time on pre-test days)/ time on pre-test days. The pre-test and test days were before and after the conditioning, respectively. Thus, the CPP score above zero indicates that the CPP preference has developed.’ In Figure 2—figure supplement 4 C and F, it was shown that most control and knockout mice had a CPP score above zero. The control and knockout groups both developed a preference and there was no significant difference between the groups. 

      For the sucrose/sucralose preference tests, in Figure 2—figure supplement 4 A and D, we present values as the percentages of sucrose/sucralose consumption in total daily drinking amount (sucrose/sucralose solution + water). Thus, percentages above 50% indicates mice prefer sucrose/sucralose to water. As shown in the figure, male mice only showed weak preference of 0.5% sucrose, compared to water, and under all other tested conditions, the mice showed strong preference of the sweet solution. There was no significant difference between control and knockout mice. 

      We have described this in more details in the Results and Materials and Methods section in the revised reviewed preprint. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 1, A-I. It is difficult to depict the anatomical subdivision of VTA in Figure 1, panels A and B. It is recommended to add a panel showing a schematic illustration of the SNc and subregions of VTA: PN, PIF, PBP, IF (providing more detail than in Figure 1, panel J). It is also recommended to show lower magnification images (as in Figure 1 - supplement 1), including both hemispheres, and to delineate the outline of the different subregions using curved lines, based on reference atlases (similar to Figure 1, panel I, please include distance from bregma). It would be helpful to indicate in Figure 1 that panel A is a control mouse and panel B is a Ox1RΔDAT mouse and include C-F letters to show corresponding insets. Anatomically, the paraintrafasicular nucleus (PIF) is positioned between the paranigral nucleus (PN) and the parabrachial pigmented nucleus (PBP). The authors have depicted the PIF ventral to the PN in Figure 1 panels A, B, and I. These panels and the quantification of Ox1R/2R positive cells within the different subdivisions need to be corrected accordingly. The image analysis method used to quantify RNAscope fluorescent images is not described in sufficient detail. Please expand this section.

      According to the reviewer’s suggestions, we have refined Figure 1 in the revised reviewed preprint. We are now showing the schematic illustration of the SN and subregions of VTA in panel I, with blue squares to label the regions shown in panels A and B, and the distance from bregma is included. The outlines to delineate SN and the subregions of VTA are adjusted from straight to curved lines based on reference atlases. As suggested, we have also indicated panel A is a control and panel B is a Ox1RΔDAT mouse and included C-F letters to show corresponding insets. We apologize for the mistake about labeling PIF and PN positions in Figure A. We have corrected the labeling of their positions and double checked the quantification accordingly. This does not change our discussion or conclusion since both PIF and PN are the medial part of VTA, where both Ox1R and Ox2R are observed. The description of the image analysis in Matierials and Methods section has been improved (lines 378-385). We decided not to show lower magnification images than in Figure 1—supplement 1 to include both hemispheres, in the interests of clarity and reader-friendliness.  

      (2) Figure 1, J-L. The claim that orexin activates dopaminergic SN and VTA neurons is weakly supported by the data provided. Calcium imaging of SN dopaminergic neurons in control mice suggests a discrete effect of 100 nM orexin-A application compared to baseline. Application of 300 nM shows a slightly bigger effect, but none of these results are statistically analysed. 

      We are surprised by this comment and thank the reviewer for pointing out our apparent lack of clarity in the previous version (lines 96-106 and legend of Figure 1K, L). In more detail, we explain the data analysis in the new version (lines 119-133, 451-465) and the legend of Figure 1K, L and Figure 1-figure supplement 3).

      The main goal of this part of the project was to functionally validate the Ox1R knockout in dopaminergic (DAT-expressing) neurons. This was a prerequisite for the behavioral and PET imaging experiments. We used GCaMP-mediated Ca2+ imaging in acute brain slices to reach this goal. This analysis was performed on the dopaminergic SN neurons, which we used as an "indicator population" because a large number of these neurons express Ox1R, but only a few express Ox2R. 

      The analysis consisted of two parts:

      a) For each neuron, we tested whether it responded to orexin A. At the single cell level, a neuron was considered orexin A-responsive if the change in fluorescence induced by orexin A was three times larger than the standard deviation (3 σ criterion) of the baseline fluorescence, corresponding to a Zscore of 3. We found that 56% of the neurons tested responded to orexin A, while 44% of the neurons did not respond to orexin A (Figure 1L, top). These data agree with the number of Ox1R-expressing neurons (Figure 1J). 

      b) We also determined the orexin A-induced GCaMP fluorescence for each neuron, expressed as a percentage of GCaMP fluorescence induced upon application of high K+ saline. Accordingly, the "population response" of all analyzed neurons was expressed as the mean ± SEM of these responses. The significance of this mean response was tested for each group (control and Ox1R KO) using a onesample t-test. We found a marked and highly significant (p < 0.0001, n = 71) response of control neurons to 100 nM orexin A, while the Ox1R KO neurons did not respond (p = 0.5, n = 86). Note that, as described in a), 44% of the neurons contributing to the mean do not respond to orexin. Thus, the orexin responses of most responders are significantly higher than the mean. This is also evident in the example recordings in Figure 1K and Figure1—figure supplement 3. The orexin A-induced change in fluorescence was increased by increasing the orexin A concentration to 300 nM.

      Note: As mentioned above, the orexin A response was expressed for each neuron individually as a percentage of its high K+saline-induced GCaMP fluorescence. This value is a solid reference point, reflecting the GCaMP fluorescence at maximal voltage-activated Ca2+ influx. Obviously, the Ca2+ concentration at this point is extremely high and not typically reached under physiological conditions. Therefore, as shown in Figure1—figure supplement 3 for completeness, the physiologically relevant responses may appear relatively minor at first glance when presented together in one figure (compare Figure1—figure supplement 3 A and B).

      The authors should provide more evidence of the orexin-induced activation of dopaminergic neurons in the SN to support this claim and investigate whether a similar activation is observed in VTA neurons. 

      Following the reviewer's suggestion, we confirmed orexin A-induced activation of dopaminergic neurons in the mouse SN by using perforated patch clamp recordings (Figure1—figure supplement 2).

      This finding is consistent with previous extracellular in vivo recordings in rats (Liu et al., 2018).

      The activation of dopaminergic neurons in the mouse VTA by orexin A has been shown repeatedly in earlier studies (e.g., Baimel et al., 2017; Korotkova et al., 2003; Tung et al., 2016).

      In addition, Figure 3-Figure Supplement 2 shows that injection of orexin does not induce c-Fos expression in SN and VTA dopaminergic neurons of control and Ox1RΔDAT mice, which further weakens the claim made by the authors.

      Figure 3—Figure Supplement 2 in the original submission is now Figure 3—Figure Supplement 3 in the revised reviewed preprint. It shows low c-Fos expression in SN and VTA dopaminergic neurons, and orexin-induced c-Fos expression was observed in Th-negative cells in SN and VTA. 

      Technically relatively straightforward, Fos analysis is widely (and successfully) used in studies to reveal neuronal activation. However, this approach has limitations, e.g., regarding sensitivity and temporal resolution. Electrophysiological or optical imaging techniques can circumvent these shortcomings. The electrophysiological and Ca2+ imaging studies presented here, along with previous electrophysiological studies by others, clearly show that orexin A acutely and directly stimulates SN and VTA dopaminergic neurons.

      In vivo, the injection of orexin A induced a pronounced c-Fos activity in non-dopaminergic cells of the VTA and SN but not in dopaminergic neurons. This result shows that the detection of c-Fos has worked in principle. Whether the absent c-Fos staining in dopaminergic neurons is due to lack of sensitivity, whether other IEGs would have worked better here, or whether there are other, e.g., cell type-specific reasons for the absence of staining, cannot be determined from the current data.

      (3) Figure 2, I-L. The fact that ICV injection of both saline and orexin causes a sustained increase of locomotion (around 20 minutes in males, and over 30 minutes in females) is problematic and could mask the effects of orexin, particularly in females. It is unclear what panels J and L are showing. To be appropriately analysed, the authors should plot the pre- and post-injection AUC data for all groups and analyse it as a two-way mixed ANOVA, with the within-subjects factor "pre/post injection activity" and between-subjects factor "group". The authors can only warrant a statistically meaningful hyperlocomotor effect in Ox1RΔDAT mice if a significant interaction is found.

      Though mice were habituated to the injection, it still makes sense to see the injection-induced increase in locomotion to some extent. We described in the figure legend that the AUC was calculated for the period after orexin injection, which meant 5 – 90 min in Figure 2 I, K. We have clearly observed significant differences between genotypes and between saline and orexin application, which means the genotype and orexin impact is strong enough to pop up despite of the injection effect. 

      As the reviewer’s suggests, we have now plotted the pre- and post-injection AUC data for all groups and analyzed it as a two-way mixed ANOVA, with the within-subjects factor "pre/post injection activity" and between-subjects factor "group". To match the pre- and post-injection duration, we are now comparing AUC for around 60 min before and after the injection. A significant interaction is found here. Panels I-L are renewed, and the differences induced by Ox1R knockout and orexin confirmed the results shown in the initially submitted manuscript.  

      (4) Figure 3. The literature has robustly shown that one of the main projection areas of VTA and SN dopaminergic neurons is the striatum, in particular its ventral part. It is surprising to see that this region is not affected by the lack of OX1R or by the injection of orexin. How can the authors explain that identified regions with significantly different activity include neighbouring brain structures with heterogenous composition? See for example, in panel A, section bregma 0.62mm, a significant region is seen expanding across the cortex, corpus callosum, and striatum. While the data from PET studies is potentially interesting, it may not be adequate to provide enough resolution to allow examination of the anatomical distribution of orexin-mediated neuronal activation.

      While the striatum is a major projection area of dopaminergic neurons in VTA and SN, the projection and function of Ox1R-positive dopaminergic neurons is not clear. We have improved the description of dopamine function diversity in the revised reviewed preprint (lines 46-58), and it was reported before that the projection-defined dopaminergic populations in the VTA exhibited different responses to orexin A (Baimel et al., 2017). Moreover, the striatum activity is modulated by the indirect effect via other brain regions affected by Ox1R-positive dopaminergic neurons. It is unknown how the striatum activity should change after Ox1R deletion in dopaminergic neurons. We could not rule out the possibility that the striatum is indeed modulated by the Ox1R-positive dopaminergic neurons, though there was only a trend of genotype difference (Ox1RΔDAT vs. ctrl) in the ventral striatum in the section bregma 1.42 mm in Figure 3A. The ICV injection of orexin is potentially acting on Ox1R and Ox2R in the whole brain, so projections from other brain regions to the striatum also affect striatum activity and could have masked the effect of Ox1R-positive dopaminergic neurons. 

      The spatial resolution of the PET data is in the order of ~1 mm^3. As we also explained in the Materials and Methods section, the size of a voxel in the original PET data is 0.4mm x 0.4mm x 0.8 mm. All calculations were performed on this grid. The higher-resolved images shown in Figure 3 are for presentation purposes only inspired by a request of the reviewer who asked us to show this in the Jais et al. 2016 manuscript. To make this clearer we now added the p-map images with the original voxel size to the supplement (Figure 3—figure supplement 1). For the interest in specific brain areas, more precise identification of anatomical sub-regions requires using methods with higher spatial resolution such as staining of brain slices for c-Fos-positive cells as we do in Figure 4.

      PET is a powerful tool to identify global regions of activation/inhibition. In the manuscript, we have described in the results and discussion section that the activity in brain regions with related functions were changed. In panel A, Ox1RΔDAT showed activity increase in MPA, Pir and endopiriform claustrum, which are important for olfactory sensation; spinal trigeminal nucleus, sp5, and IRt, which regulates mastication and sensation of the oral cavity and the surface of the face; SubCV and Gi, which regulates sleeping and motion-related arousal and motivation. In panel B, changes in HDB, MCPO, Pir, DEn, S1, V2L and V1 are related to sensation, and changes in BNST, LPGi and M2 are important for emotion, exploration, and action selection. 

      (5) Figure 4. As in Figure 1, the authors should consider including a schematic illustration of the brain areas that are being analysed using a reference atlas. It is also recommended to provide more details describing the quantification of the images. Without such information, the data is not convincing, in particular, the claim that Ox1R depletion causes a decrease in DRD1 in BNST is unclear. Additional unbiased quantitative approaches could be used to strengthen this point.

      We have added Figure 4—figure supplement 1 as a schematic illustration of the brain areas that were being analyzed using a reference atlas. More details describing the unbiased quantification of the images have been added to Materials and Methods. We have added Figure 4—figure supplement 3, to show DRD1, DRD2 and the merged signal separately.  

      (6) The discussion starts by stating that the main findings of this study are based on RNAscope and optophysiological experiments, however, the latter are not presented anywhere in the manuscript. This sentence (line 192) should be revised. The authors state in line 193 that OX1R is the only orexin receptor in the SN, but they show in Figure 1 that in the SN, 3% of neurons express OX2R and 2% co-express both receptors. 

      We thank the reviewer for the input. We have rephrased the beginning of the discussion to clarify the objectives (lines 238 - 246). In doing so, we changed "optophysiological experiments" and "single orexin receptor" (lines 192 and 193 in the original manuscript) to " Ca2+ imaging experiments" and "main subtype of orexin receptors ", respectively. In this context, it should be noted that Ca2+ imaging is considered an optophysiological method - optophysiology generally refers to techniques that combine optical methods with physiological measurements.

      The results of LPGi and BNST dopamine receptors in control and Ox1RΔDAT mice are poorly discussed. The authors should justify why these two regions were selected for further validation and how these may be related to the behavioural effects found in Ox1RΔDAT regarding exposure to a novel context.

      Ox1RΔDAT mice exhibited increased novelty- and orexin-induced locomotion compared to control mice. After orexin injection, PET imaging shows that the neural activity of BNST and LPGi was lower or higher than in control mice, respectively. We selected BNST and LPGi for further validation because we think their key functional roles in regulating emotion, exploratory behaviors and locomotor speed are related to novelty-induced locomotion. We confirmed changes in neural activity change by c-Fos staining and investigated the expression patterns of dopamine receptors in BNST and LPGi. Our findings suggested that Ox1R deletion in dopaminergic neurons results in the disinhibition of neural activity in LPGi via dopaminergic pathways and the decrease of dopamine-mediated neural activity in BNST. Emotion perception affects the decision of how to respond to the novelty. It is possible that novelty activates the orexin system and Ox1R signaling in dopaminergic neurons promotes emotion perception and inhibits exploration. Of course, further careful investigation is necessary to test this hypothesis in the future experiments. We have improved the rational description and discussion in the

      ‘Results’ and ‘Discussion’ section in the revised reviewed preprint (lines 210-213, 259-270, 293-308). 

      Reviewer #2 (Recommendations For The Authors):

      A major recommendation - if possible - would be to directly show that one or both of the two target areas - dBNST and LPGi - are associated with the behavioral effects caused by the deletion of the orexin receptor 1 in dopamine neurons.

      We completely agree that it would be very valuable to directly show dBNST and LPGi are associated with the behavioral effects caused by the deletion of Ox1R in dopaminergic neurons. While we are very interested in carefully investigating specific orexin-dopamine targeting areas and related neural circuits in the future, in the present manuscript, we mainly aim to give an overview of the behavioral roles of orexin-dopamine interaction and propose some promising downstream pathways. 

      The authors should state if data are corrected for multiple comparisons, e.g., in the PET study of different regions.

      We have included information about the post-hoc tests for all 2-way ANOVA analyses in the submitted manuscript. For the PET study, the p-values in the p-maps were not corrected for multiple comparison, Figure 3—figure supplement 2 shows the raw data of each mouse and the analysis method (t-test). In the revised reviewed preprint, we include the information on the analysis method in the figure legends of Figure 3. 

      We consider that saline and orexin injections mimic the resting and active state of mice, respectively, and would like to study genotype effect under each condition. Doing 2-way ANOVA takes in count the difference between orexin and saline injection, which could mask the genotype effect under a certain condition. Therefore, we decided to perform t-tests for each condition in Figure 3. While we provide readers with full information in Figure 3—figure supplement 2 with the raw data of each individual mouse, below we present the p-maps after multiple comparisons (Sidak’s post hoc test). After multiple comparisons, we could see changes in similar brain regions as in Figure 3, though significant values are reduced by the correction for multiple comparisons, and under orexin-injection condition, we fail to see significantly higher activity around the lateral paragigantocellular nucleus (LPGi), nucleus of the horizontal limb of the diagonal band (HDB) and magnocellular preoptic nucleus (MCPO) in Ox1RΔDAT mice. In order to more precisely identify the anatomical locations, we performed additional experiments to confirm the changes revealed by PET. For example, LPGi is a relatively small region confirmed and identified more precisely by c-Fos immunostaining (Figure 4A, C). 

      Author response image 1.

      PET imaging studies comparing Ox1RΔDAT and control mice, with post-hoc t-test to correct for multiple comparisons. 3D maps of p-values in PET imaging studies comparing Ox1RΔDAT and control mice, after intracerebroventricular (ICV) injection of (A) saline (NS) and (B) orexin A. Control-NS, n = 8; control-orexin, n = 6; Ox1RΔDAT, n = 8. M2, secondary motor cortex; MPA, medial preoptic area; Pir, piriform cortex; IEn, intermediate endopiriform claustrum; DEn, dorsal endopiriform claustrum; VEn, ventral endopiriform claustrum; LSS, lateral stripe of the striatum; BNST, the dorsal bed nucleus of the stria terminalis; S1Sh, primary somatosensory cortex, shoulder region; S1HL, primary somatosensory cortex, hindlimb region; S1BF, primary somatosensory cortex, barrel field; S1Tr, primary somatosensory cortex, trunk region; V1, primary visual cortex; V2L, secondary visual cortex, lateral area; SubCV, subcoeruleus nucleus, ventral part; Gi, gigantocellular reticular nucleus; IRt, intermediate reticular nucleus; sp5, spinal trigeminal tract.

      Provide a rationale for following up on BNST and LPGi and not any of the regions identified in the PET study.

      We thank the reviewer for the careful reading and important input. Ox1RΔDAT mice exhibited increased novelty- and orexin-induced locomotion compared to control mice. After orexin injection, PET imaging shows that the neural activity of BNST and LPGi was lower or higher than control mice, respectively.

      We selected BNST and LPGi for further validation because we think their key functional roles in regulating emotion, exploratory behaviors and locomotor speed are related to novelty-induced locomotion. We confirmed the neural activity change by c-Fos staining and investigated the expression patterns of dopamine receptors in BNST and LPGi. Our findings suggested that Ox1R deletion in dopaminergic neurons results in the disinhibition of neural activity in LPGi via dopaminergic pathways and the decrease of dopamine-mediated neural activity in BNST. Emotion perception affects the decision how to respond to the novelty. It is possible that novelty activates the orexin system and Ox1R signaling in dopaminergic neurons promotes emotion perception and inhibits exploration. Of course, further investigation is necessary to test this hypothesis in future. We have improved the rational description and discussion in the ‘Results’ and ‘Discussion’ section in the revised reviewed preprint (lines 210-213, 259-270, 293-308). 

      Heatmap in Fig. 1K should not have smoothing across the y-axis, individual cells should be discrete.

      We thank the reviewer for bringing this issue to our attention. The data had not been intentionally smoothed (neither across the x-axis nor the y-axis), but it was probably a formatting issue. We have corrected this and separated individual cell traces with lines (Figure 1K, Figure 1—figure supplement 3).

      Dopamine cells are well known to lack Fos expression in most cases. Did the authors consider using another IEG to show neural activation, e.g., pERK?

      We did not use another IEG. The electrophysiological and Ca2+ imaging studies presented here, along with previous electrophysiological studies by others, clearly show that orexin A acutely and directly stimulates SN and VTA dopaminergic neurons. Please see also the response to a related comment of Reviewer 1.

      Consider adding a lower magnification section to anatomical figures to aid the reader in orienting and identifying the location.

      We have added the schematic illustration of SN, VTA, BNST and LPGi in Figure 1I and Figure 4— figure supplement 1. We hope this helps the reader in orienting and identifying the location.  

      Data availability should be stated.

      There are no restrictions on data availability. We have added this section to the revised reviewed preprint.

      Line 50. Some more references both historical and recent could be given to support this statement about the function of dopamine.

      We have improved the description and references to support the statement about dopamine function (lines 46-58). We have cited recent studies and some reviews in the revised reviewed preprint (lines 4658). 

      The PET data (Fig. 3) might be easier to visualize and interpret if a white background was used. In addition, is there a more refined way of presenting the data in Fig 3, S1?

      It is common to present imaging data such as PET and MRI on a black background. We also have already applied this color scheme in multiple publications and would therefore prefer to stick to this color scheme. 

      While Figure 3 is the concise way to present PET data, we aim to show the original individual results of mice in Figure 3—figure supplement 2 and to demonstrate how we performed the statistical analysis. Therefore, we take an example voxel of the respective brain area, perform the t-test, and present the data as bars with individual dots. 

      Line 97. State what type of Ca imaging here, e.g., "we performed Ca imaging in ex vivo slices of VTA and SN".

      As the reviewer suggested, we have specified the type of Ca2+ imaging (line 112).

      Line 165. State which groups this post-mortem analysis was performed on and if any differences were to be found (not expected to find differences in this anatomical tracing experiment but good to report this as both groups were used).

      Postmortem analysis of c-Fos staining revealed low c-Fos expression in dopaminergic neurons in the VTA and SN of Ox1RΔDAT and control mice after ICV injection of saline or orexin A (1 nmol). No obvious changes were observed among the groups. We have improved the description in the revised reviewed preprint (lines 202-208).

      Line 192. What do you mean by optophysiological here? The Ca imaging (which is a fairly small, confirmatory element of the manuscript).

      We have changed ‘optophysiological experiments’ (line 192 in initial submitted manuscript) to ‘calcium imaging experiments’ and rephrased the beginning of the discussion to clarify the objectives (lines 238246).

      The protein level in the diet is substantially higher than in most rodent diets (34% here vs 14-20% in most commercial rodent chows). Please comment on this.

      This diet is for rat and mouse maintenance, purchased from ssniff Spezialdiäten GmbH (product V1554).

      The percentage of calories supplied by protein is affected by the calculation methods. The company calculated with pig equation before and the value was 34% in the old instruction data sheet. They have updated the value to 23% in the new data sheet with calculations by Atwater factors. We thank the reviewer for reminding us and have updated the values in the revised reviewed preprint (lines 314-316). 

      Editor's note:

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

      We have provided the source data and the statistical reporting for each Figure with the revision

      References

      Baimel, C., Lau, B. K., Qiao, M., & Borgland, S. L. (2017). Projection-target-defined effects of orexin and dynorphin on VTA dopamine neurons. Cell Rep, 18(6), 1346-1355.  https://doi.org/10.1016/j.celrep.2017.01.030

      Korotkova, T. M., Eriksson, K. S., Haas, H. L., & Brown, R. E. (2002). Selective excitation of GABAergic neurons in the substantia nigra of the rat by orexin/hypocretin in vitro. Regul Pept, 104(1-3), 83-89. https://doi.org/10.1016/s0167-0115(01)00323-8 

      Korotkova, T. M., Sergeeva, O. A., Eriksson, K. S., Haas, H. L., & Brown, R. E. (2003). Excitation of ventral tegmental area dopaminergic and nondopaminergic neurons by orexins/hypocretins. J Neurosci, 23(1), 7-11. https://www.ncbi.nlm.nih.gov/pubmed/12514194

      Liu, C., Xue, Y., Liu, M. F., Wang, Y., Liu, Z. R., Diao, H. L., & Chen, L. (2018). Orexins increase the firing activity of nigral dopaminergic neurons and participate in motor control in rats. J Neurochem, 147(3), 380-394. https://doi.org/10.1111/jnc.14568 

      Tung, L. W., Lu, G. L., Lee, Y. H., Yu, L., Lee, H. J., Leishman, E., Bradshaw, H., Hwang, L. L., Hung, M. S., Mackie, K., Zimmer, A., & Chiou, L. C. (2016). Orexins contribute to restraint stress-induced cocaine relapse by endocannabinoid-mediated disinhibition of dopaminergic neurons. Nat Commun, 7, 12199. https://doi.org/10.1038/ncomms12199

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank all of the reviewers for their helpful and the effort they made in reading and evaluating our manuscript. In response to them, we have made major changes to the text and figures and performed substantial new experiments. These new data and changes to the text and figures have substantially strengthened the manuscript. We believe that the manuscript is now very strong in both its impact and scope and we hope that reviewers will find it suitable for publication in eLife

      A point-by-point response to the reviewers' specific comments is provided below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this report, Yu et al ascribe potential tumor suppressive functions to the non-core regions of RAG1/2 recombinases. Using a well-established BCR-ABL oncogene-driven system, the authors model the development of B cell acute lymphoblastic leukemia in mice and found that RAG mutants lacking non-core regions show accelerated leukemogenesis. They further report that the loss of non-core regions of RAG1/2 increases genomic instability, possibly caused by increased off-target recombination of aberrant RAG-induced breaks. The authors conclude that the non-core regions of RAG1 in particular not only increase the fidelity of VDJ recombination, but may also influence the recombination "range" of off-target joints, and that in the absence of the non-core regions, mutant RAG1/2 (termed cRAGs) catalyze high levels of off-target recombination leading to the development of aggressive leukemia.

      Strengths:

      The authors used a genetically defined oncogene-driven model to study the effect of RAG non-core regions on leukemogenesis. The animal studies were well performed and generally included a good number of mice. Therefore, the finding that cRAG expression led to the development of more aggressive BCR-ABL+ leukemia compared to fRAG is solid.

      Weaknesses:

      In general, I find the mechanistic explanation offered by the authors to explain how the non-core regions of RAG1/2 suppress leukemogenesis to be less convincing. My main concern is that cRAG1 and cRAG2 are overexpressed relative to fRAG1/2. This raises the possibility that the observed increased aggressiveness of cRAG tumors compared to fRAG tumors could be solely due to cRAG1/2 overexpression, rather than any intrinsic differences in the activity of cRAG1/2 vs fRAG1/2; and indeed, the authors allude to this possibility in Fig S8, where it was shown that elevated expression of RAG (i.e. fRAG) correlated with decreased survival in pediatric ALL. Although it doesn't mean the authors' assertions are incorrect, this potential caveat should nevertheless be discussed.

      We appreciate the valuable suggestions from the reviewer. BCR-ABL1+ B-ALL is characterized by halted early B-lineage differentiation. In BCR-ABL1+ B cells, RAG recombinases are highly expressed, leading to the inactivation of genes that encode essential transcription factors for B-lineage differentiation. This results in cells being trapped within the precursor compartment, thereby elevating RAG gene expression. Our interpretation of the data suggests that, in BCR-ABL1+ B-ALL mouse models, the high expression of both cRAG and fRAG and the deletion of the non-core regions influence the precision of RAG targeting within the genome. This causes more genomic damage in cRAG tumors than in fRAG tumors, consequently leading to the observed increased aggressiveness of cRAG tumors compared to fRAG tumors. We discussed the issues on Page 12, lines 295-307 in the revised manuscript.

      Some of the conclusions drawn were not supported by the data.

      (1) I'm not sure that the authors can conclude based on μHC expression that there is a loss of pre-BCR checkpoint in cRAG tumors. In fact, Fig. 2B showed that the differences are not statistically significant overall, and more importantly, μHC expression should be detectable in small pre-B cells (CD43-). This is also corroborated by the authors' analysis of VDJ rearrangements, showing that it has occurred at the H chain locus in cRAG cells.

      We appreciate the insightful comment from the reviewer. Upon reevaluation of the data presented in Fig. 2B, we identified and rectified certain errors. The revised analysis now shows that the differences in μHC expression are statistically significant. This significant expression of μHC in fRAG leukemic cells implies that these cells may progress further in differentiation, potentially acquiring an immune phenotype. These modifications have been incorporated into the manuscript on page 7, lines 153-156 in the revised manuscript.

      (2) The authors found a high degree of polyclonal VDJ rearrangements in fRAG tumor cells but a much more limited oligoclonal VDJ repertoire in cRAG tumors. They concluded that this explains why cRAG tumors are more aggressive because BCR-ABL induced leukemia requires secondary oncogenic hits, resulting in the outgrowth of a few dominant clones (Page 19, lines 381-398). I'm not sure this is necessarily a causal relationship since we don't know if the oligoclonality of cRAG tumors is due to selection based on oncogenic potential or if it may actually reflect a more restricted usage of different VDJ gene segments during rearrangement.

      Thank you for your insightful comments and questions regarding the relationship between the oligoclonality of V(D)J rearrangements and the aggressiveness of cRAG tumors. You raise an important point regarding whether the observed oligoclonality is a result of selective pressure favoring clones with specific oncogenic potential, or if it reflects inherent limitations in V(D)J segment usage during rearrangement in cRAG models. In our study, we observed a marked difference in the V(D)J rearrangement patterns between fRAG and cRAG tumor cells, with cRAG tumors exhibiting a more limited, oligoclonal repertoire. This observation led us to speculate that the aggressive nature of cRAG tumors might be linked to a selective advantage conferred by specific V(D)J rearrangements that cooperate with the BCR-ABL1 oncogene to drive leukemogenesis. However, we acknowledge that our current data do not definitively establish a causal relationship between oligoclonality and tumor aggressiveness. The restricted V(D)J repertoire in cRAG tumors could indeed be due to a more constrained rearrangement process, possibly influenced by the altered expression or function of RAG1/2 in the absence of non-core regions. This could limit the diversity of V(D)J rearrangements, leading to the emergence of a few dominant clones not necessarily because they have greater oncogenic potential, but because of a narrowed field of rearrangement possibilities.

      To address this question more thoroughly, future studies could examine the functional consequences of specific V(D)J rearrangements found in dominant cRAG tumor clones. This could include assessing the oncogenic potential of these rearrangements in isolation and in cooperation with BCR-ABL1, as well as exploring the mechanistic basis for the restricted V(D)J repertoire. Such studies would provide deeper insight into the interplay between RAG-mediated recombination, clonal selection, and leukemogenesis in BCR-ABL1+ B-ALL.

      We appreciate your feedback on this matter and agree that further investigation is required to unravel the precise relationship between V(D)J rearrangement diversity and leukemic progression in cRAG models. We have revised our discussion to reflect these considerations and to clarify the speculative nature of our conclusions regarding the link between oligoclonality and tumor aggressiveness. We added more discussion on this issue on Page 7, lines 166-170 in the revised manuscript.

      (3) What constitutes a cancer gene can be highly context- and tissue-dependent. Given that there is no additional information on how any putative cancer gene was disrupted (e.g., truncation of regulatory or coding regions), it is not possible to infer whether increased off-target cRAG activity really directly contributed to the increased aggressiveness of leukemia.

      We totally agree you raised the issues. In Supplementary Table 3, we have presented data on off-target gene disruptions, specifically in introns, exons, downstream regions, promoters, 3' UTRs, and 5' UTRs. However, this dataset alone does not suffice to conclusively determine whether the increased off-target activity of cRAG directly influences the heightened aggressiveness of leukemia. To bridge this knowledge gap, our future research will extend to include both knockout and overexpression experiments targeting these off-target genes.

      (4) Fig. 6A, it seems that it is really the first four nucleotide (CACA) that determines fRAG binding and the first three (CAC) that determine cRAG binding, as opposed to five for fRAG and four for cRAG, as the author wrote (page 24, lines 493-497).

      We thank the reviewer for the insightful comment. In response, we have revised the text to accurately reflect the nucleotide sequences responsible for RAG binding and cleavage. Specifically, we now clarify that the first four nucleotides (CACA) are crucial for fRAG binding and cleavage, while the initial three nucleotides (CAC) are essential for cRAG binding and cleavage. These updates have been made on page 10, lines 242-245 of the revised manuscript.

      (5) Fig S3B, I don't really see why "significant variations in NHEJ" would necessarily equate "aberrant expression of DNA repair pathways in cRAG leukemic cells". This is purely speculative. Since it has been reported previously that alt-EJ/MMEJ can join off target RAG breaks, do the authors detect high levels of microhomology usage at break points in cRAG tumors?

      We appreciate the reviewer's comment. Currently, we have not observed microhomology usage at breakpoints in cRAG tumors. We plan to address this aspect in a future, more detailed study. Regarding the 'aberrant expression of DNA repair pathways in cRAG leukemic cells, we acknowledge that this is speculative. Therefore, we have carefully rephrased this to 'suggesting a potential aberrant expression of DNA repair pathways in cRAG leukemic cells.' This modification is reflected on page 12, lines 290-291 of the revised manuscript.

      (6) Fig. S7, CDKN2B inhibits CDK4/6 activation by cyclin D, but I don't think it has been shown to regulate CDK6 mRNA expression. The increase in CDK6 mRNA likely just reflects a more proliferative tumor but may have nothing to do with CDKN2B deletion in cRAG1 tumors.

      We fully concur with the reviewer's comment. We have deleted this inappropriate part from the text.

      Insufficient details in some figures. For instance, Fig. 1A, please include statistics in the plot showing a comparison of fRAG vs cRAG1, fRAG vs cRAG2, cRAG1 vs cRAG2. As of now, there's a single p-value (0.0425) stated in the main text and the legend but why is there only one p-value when fRAG is compared to cRAG1 or cRAG2? Similarly, the authors wrote "median survival days 11-26, 10-16, 11-21 days, P < 0.0023-0.0299, Fig. S2B." However, it is difficult for me to figure out what are the numbers referring to. For instance, is 11-26 referring to median survival of fRAG inoculated with three different concentrations of GFP+ leukemic cells or is 11-26 referring to median survival of fRAG, cRAG1, cRAG2 inoculated with 10^5 cells? It would be much clearer if the authors can provide the numbers for each pair-wise comparison, if not in the main text, then at least in the figure legend. In Fig. 5A-B, do the plots depict SVs in cRAG tumors or both cRAG and fRAG cells? Also in Fig. 5, why did 24 SVs give rise to 42 breakpoints, and not 48? Doesn't it take 2 breaks to accomplish rearrangement? In Fig. 6B-C, it is not clear how the recombination sizes were calculated. In the examples shown in Fig. 4, only cRAG1 tumors show intra-chromosomal joins (chr 12), while fRAG and cRAG2 tumors show exclusively inter-chromosomal joins.

      We appreciate the reviewer's feedback and have made the following revisions:

      (1) The text has been adjusted to rectify the previously mentioned error in the figure legends (page 1, lines 5-6).

      (2) We have clarified the intended message in the revised text (page 6, lines 129-130) and the figure legend (page 4-5, lines 107-113) for greater precision.

      (3) Figure 5A-B now presents an overview of all structural variants (SVs) identified in both cRAG and fRAG cells, offering a comprehensive comparison.

      (4) Among the analyzed SVs, 24 generated a total of 48 breakpoints, with 41 occurring within gene bodies and the remaining 7 in adjacent flanking sequences. This informs our exon-intron distribution profile analysis.

      (5) We have defined recombination sizes as ‘the DNA fragment size spanning the two breakpoints’ for clarity (page 10, lines 251-252).

      (6) All off-target recombinations identified in the genome-wide analyses of fRAG, cRAG1, and cRAG2 leukemic cells were determined to be intra-chromosomal joins, highlighting their specific nature within the genomic context.

      Insufficient details on certain reagents/methods. For instance, are the cRAG1/2 mice of the same genetic background as fRAG mice (C57BL/6 WT)? On Page 23, line 481, what is a cancer gene? How are they defined? In Fig. 3C, are the FACS plots gated on intact cells? Since apoptotic cells show high levels of gH2AX, I'm surprised that the fraction of gH2AX+ cells is so much lower in fRAG tumors compared to cRAG tumors. The in vitro VDJ assay shown in Fig 3B is not described in the Method section (although it is described in Fig S5b). Fig. 5A-B, do the plots depict SVs in cRAG tumors or both cRAG and fRAG cells?

      We are grateful for the reviewer's feedback and have incorporated their insights as follows:

      (1) We clarify that both cRAG1/2 and fRAG mice share the same genetic background, specifically the C57BL/6 WT strain, ensuring consistency across experimental models.

      (2) We define a 'cancer gene' as one harboring somatic mutations implicated in cancer. To support our analysis, we refer to the Catalogue Of Somatic Mutations In Cancer (COSMIC) at http://cancer.sanger.ac.uk/cosmic. COSMIC serves as the most extensive repository for understanding the role of somatic mutations in human cancers.

      (3) Upon thorough review of the raw data for γ-H2AX and the fluorescence-activated cell sorting (FACS) plots gated on intact cells, we propose that the observed discrepancies might stem from the limited sensitivity of the γ-H2AX flow cytometry detection method. This insight prompts our commitment to employing more efficient detection methodologies in forthcoming studies.

      (4) Detailed procedures for the in vitro V(D)J recombination assay have been included in the Methods section (page 15, lines 384-388) to enhance the manuscript's comprehensiveness and reproducibility.

      (5) The presented plots offer a comprehensive overview of structural variants (SVs) identified in both cRAG and fRAG cells, providing a holistic view of the genomic landscape across different models.

      Reviewer #3 (Public Review):

      Summary:

      In the manuscript, the authors summarized and introduced the correlation between the non-core regions of RAG1 and RAG2 in BCR-ABL1+acute B lymphoblastic leukemia and off-target recombination which has certain innovative and clinical significance.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      I would suggest that the authors tone down some of their conclusions, which are not necessarily supported by their own data. in addition, there are some minor mistakes in figure assembly/presentation. For instance, I believe that the axes labels in Fig. 1E were flipped. BrdU should be on y-axis and 7-AAD on the x-axis. Fig. 3B, the y-axis contains a typo, it should be "CD90.1..." and not "D90.1...". In Fig. 5C, the numbers seem to be flipped, with 93% corresponding to cRAG1 and 100% to cRAG2 (compare with the description on page 23, lines 474-475). Fig. 5C, y-axis, "hybrid" is a typo. Page 3, line 59: The abbreviation of RSS has already been described earlier (p4, line 53).

      We thank the reviewer for these suggestions. We carefully checked the raw data and corrected these mistakes in the revised manuscript.

      Page 3, line 63: "signal" segment (commonly referred to as signal ends), not "signaling" segment.

      We have changed “signaling segment” to “signal ends in the revised manuscript. (page 3, lines 54-55)

      Page 3, lines 64-65: VDJ recombination promotes the development of both B and T cells, and aberrant recombination can cause both B and T cell lymphomas.

      The statement about the role of V(D)J recombination in B and T cell development and its link to lymphomagenesis is grounded in a substantial body of research. Theoretical frameworks and empirical studies delineate how aberrations in the recombination process can lead to genomic instability, potentially triggering oncogenic events. This connection is extensively documented in immunology and oncology literature, illustrating the critical balance between necessary genetic rearrangements for immune diversity and the risk of malignancy when these processes are dysregulated (Thomson, et al.,2020; Mendes, et al.,2014; Onozawa and Aplan,2012).

      Page 4, line 72: "recombinant dispensability" is not a commonly used phrase. Do the authors mean the say that the non-core regions of RAG1/2 are not strictly required for VDJ recombination?

      We thank the reviewers for their insightful suggestion. We have revised the sentence to read, 'Although the non-core regions of RAG1/2 are not essential for V(D)J recombination, the evolutionary conservation of these regions suggests their potential significance in vivo, possibly affecting RAG activity and expression in both quantitative and qualitative manners.' This revision appears on page 3, lines 61-62, in the revised manuscript.

      Fig. 4. It would have been nice to show at least one more cRAG1 tumor circus plot.

      We appreciate the reviewer's comment and concur with the suggestion. In future sequencing experiments, we will consider including additional replicates. However, due to time and financial constraints, the current sequencing effort was limited to a maximum of three replicates.

      Reviewer #3 (Recommendations For The Authors):

      In the manuscript, the authors summarized and introduced the correlation between the non-core regions of RAG1 and RAG2 in BCR-ABL1+acute B lymphoblastic leukemia and off-target recombination which has certain innovative and clinical significance. The following issues need to be addressed by the authors.

      (1) Authors should check and review extensively for improvements to the use of English.

      We thank the reviewer for their comment. With assistance from a native English speaker, we have carefully revised the manuscript to enhance its readability.

      (2) Authors should revise the conclusion so that the above can be clearly reviewed and summarized.

      The conclusion has been partially revised in the revised manuscript.

      (3) The article should state that the experiment was independently repeated three times.

      The experiment was repeated under the same conditions three times and the information has been descripted in Statistics section on page 19, lines 473-475 in the revised manuscript.

      (4) The article will be more convincing if it uses references in the last 5 years.

      We are grateful to the reviewer for their guidance in enhancing our manuscript. We have incorporated additional references from the past five years in the revised version.

      (5) Additional experiments are suggested to elucidate the molecular mechanisms related to off-target recombination.

      We thank the reviewer for this suggestion. In future experiments, we plan to perform ChIP-seq analysis to investigate the relationship between chromatin accessibility and off-target effects, as well as to examine the impact of knocking out and overexpressing off-target genes on cancer development and progression.

      (6) It is suggested to further analyze the effect of the absence of non-core RAG region on the differentiation and development of peripheral B cells in mice by flow analysis and expression of B1 and B2.

      Thank you very much for highlighting this crucial issue. FACS analysis was performed, revealing that leukemia cells in peripheral B cells in mice did not express CD5. The data are presented as follows:

      Author response image 1.

      (7) Fig3A should have three biological replicates and the molecular weight should be labeled on the right side of the strip.

      Thank you for this suggestion. The experiment was independently repeated three times, and the molecular weights have been labeled on the right side of the bands in the revised version

      References:

      Mendes RD, Sarmento LM, Canté-Barrett K, Zuurbier L, Buijs-Gladdines JG, Póvoa V, Smits WK, Abecasis M, Yunes JA, Sonneveld E, Horstmann MA, Pieters R, Barata JT, Meijerink JP. 2014. PTEN microdeletions in T-cell acute lymphoblastic leukemia are caused by illegitimate RAG-mediated recombination events. BLOOD 124:567-578. doi:10.1182/blood-2014-03-562751

      Onozawa M, Aplan PD. 2012. Illegitimate V(D)J recombination involving nonantigen receptor loci in lymphoid malignancy. Genes Chromosomes Cancer 51:525-535. doi:10.1002/gcc.21942

      Thomson DW, Shahrin NH, Wang P, Wadham C, Shanmuganathan N, Scott HS, Dinger ME, Hughes TP, Schreiber AW, Branford S. 2020. Aberrant RAG-mediated recombination contributes to multiple structural rearrangements in lymphoid blast crisis of chronic myeloid leukemia. LEUKEMIA 34:2051-2063. doi:10.1038/s41375-020-0751-y

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zhang et al. demonstrate that CD4<sup>+</sup> single positive (SP) thymocytes, CD4<sup>+</sup> recent thymic emigrants (RTE), and CD4<sup>+</sup> T naive (Tn) cells from Cd11c-p28-flox mice, which lack IL-27p28 selectively in Cd11c+ cells, exhibit a hyper-Th1 phenotype instead of the expected hyper Th2 phenotype. Using IL-27R-deficient mice, the authors confirm that this hyper-Th1 phenotype is due to IL-27 signaling via IL-27R, rather than the effects of monomeric IL-27p28. They also crossed Cd11c-p28-flox mice with autoimmune-prone Aire-deficient mice and showed that both T cell responses and tissue pathology are enhanced, suggesting that SP, RTE, and Tn cells from Cd11c-p28-flox mice are poised to become Th1 cells in response to self-antigens. Regarding mechanism, the authors demonstrate that SP, RTE, and Tn cells from Cd11c-p28-flox mice have reduced DNA methylation at the IFN-g and Tbx21 loci, indicating 'de-repression', along with enhanced histone tri-methylation at H3K4, indicating a 'permissive' transcriptional state. They also find evidence for enhanced STAT1 activity, which is relevant given the well-established role of STAT1 in promoting Th1 responses, and surprising given IL-27 is a potent STAT1 activator. This latter finding suggests that the Th1-inhibiting property of thymic IL-27 may not be due to direct effects on the T cells themselves.

      Strengths:

      Overall the data presented are high quality and the manuscript is well-reasoned and composed. The basic finding - that thymic IL-27 production limits the Th1 potential of SP, RTE, and Tn cells - is both unexpected and well described.

      Weaknesses:

      A credible mechanistic explanation, cellular or molecular, is lacking. The authors convincingly affirm the hyper-Th1 phenotype at epigenetic level but it remains unclear whether the observed changes reflect the capacity of IL-27 to directly elicit epigenetic remodeling in developing thymocytes or knock-on effects from other cell types which, in turn, elicit the epigenetic changes (presumably via cytokines). The authors propose that increased STAT1 activity is a driving force for the epigenetic changes and resultant hyper-Th1 phenotype. That conclusion is logical given the data at hand but the alternative hypothesis - that the hyper-STAT1 response is just a downstream consequence of the hyper-Th1 phenotype - remains equally likely. Thus, while the discovery of a new anti-inflammatory function for IL-27 within the thymus is compelling, further mechanistic studies are needed to advance the finding beyond phenomenology.

      Thank you for your insightful comments and suggestions. We appreciate your feedback and have carefully considered the concerns raised regarding the mechanistic explanation of our findings. To address the issue of whether developing thymocytes are the direct targets of IL-27, we plan to conduct further studies using Cd4-IL-27ra knockout mice or mixed bone marrow chimeras consisting of wildtype and IL-27ra knockout cells. This approach will help us determine if IL-27 directly induces epigenetic remodeling in thymocytes or if the observed effects are secondary to influences from other cell types.

      Regarding the potential autocrine loop contributing to STAT1 hyperactivation, we have performed preliminary experiments by adding IFN-γ antibody to CD4<sup>+</sup> T cell cultures and observed no significant impact on STAT1 phosphorylation. If necessary, we will further investigate this possibility in vivo using Cd4-Ifng and CD11c-p28 double knockout mice.

      The detailed mechanisms underlying STAT1 hyperactivation remain to be elucidated. Recent studies have shown that IL-27p28 can act as an antagonist of gp130-mediated signaling. Structural analyses have also demonstrated that IL-27p28 interacts with EBI3 and the two receptor subunits IL-27Rα and gp130. Given these findings and the similar phenotypes observed in p28 and IL-27ra deficient mice, we speculate that the deficiency of either p28 or IL-27ra may increase the availability of gp130 for signaling by other cytokines. We will focus our future research on gp130-related cytokines to identify potential candidates that could lead to enhanced STAT1 activation in the absence of p28. Alternatively, the release of EBI3 in p28-deficient conditions may promote its interaction with other cytokine subunits. IL-35, which is composed of EBI3 and p35, is of particular interest given the involvement of IL-27Rα in its signaling pathway.

      To narrow down the candidate cytokines, we reanalyzed single-cell RNA sequencing data from CD11c-cre p28<sup>f/f</sup> and wild-type thymocytes (Signal Transduct Target Ther. 2022, DOI: 10.1038/s41392-022-01147-z). Our analysis revealed that thymic dendritic cells (DCs) were categorized into two distinct clusters, with both Il12a (p35, which forms IL-35 with EBI3) and Clcf1 (CLCF1) being upregulated in CD11c-cre p28<sup>f/f</sup> mice. In CD4 single-positive (SP) thymocytes, the expression levels of gp130 and IL-12Rβ2 (the receptor for IL-35) were comparable between knockout and wild-type mice. However, the mRNA levels of Lifr and Cntfr were low in CD4 SP thymocytes.

      Author response image 1.

      Single-cell RNA sequencing data from CD11c-cre p28<sup>f/f</sup> (KO) and wild-type thymocytes (Signal Transduct Target Ther. 2022, DOI: 10.1038/s41392-022-01147-z).

      We have planned to assess the protein levels of IL-35 and CLCF1 in dendritic cells, as well as their respective receptors, to evaluate their effects on STAT1 phosphorylation in CD4<sup>+</sup> thymocytes from both wild-type and p28-deficient mice. Unfortunately, we have encountered challenges with the mouse breeding and anticipate that it will take approximately six months to obtain the appropriate genotype necessary to complete these experiments.

      Reviewer #2 (Public Review):

      Summary:

      Naïve CD4 T cells in CD11c-Cre p28-floxed mice express highly elevated levels of proinflammatory IFNg and the transcription factor T-bet. This phenotype turned out to be imposed by thymic dendritic cells (DCs) during CD4SP T cell development in the thymus [PMID: 23175475]. The current study affirms these observations, first, by developmentally mapping the IFNg dysregulation to newly generated thymic CD4SP cells [PMID: 23175475], second, by demonstrating increased STAT1 activation being associated with increased T-bet expression in CD11c-Cre p28-floxed CD4 T cells [PMID: 36109504], and lastly, by confirming IL-27 as the key cytokine in this process [PMID: 27469302]. The authors further demonstrate that such dysregulated cytokine expression is specific to the Th1 cytokine IFNg, without affecting the expression of the Th2 cytokine IL-4, thus proposing a role for thymic DC-derived p28 in shaping the cytokine response of newly generated CD4 helper T cells. Mechanistically, CD4SP cells of CD11c-Cre p28-floxed mice were found to display epigenetic changes in the Ifng and Tbx21 gene loci that were consistent with increased transcriptional activities of IFNg and T-bet mRNA expression. Moreover, in autoimmune Aire-deficiency settings, CD11c-Cre p28-floxed CD4 T cells still expressed significantly increased amounts of IFNg, exacerbating the autoimmune response and disease severity. Based on these results, the investigators propose a model where thymic DC-derived IL-27 is necessary to suppress IFNg expression by CD4SP cells and thus would impose a Th2-skewed predisposition of newly generated CD4 T cells in the thymus, potentially relevant in autoimmunity.

      Strengths:

      Experiments are well-designed and executed. The conclusions are convincing and supported by the experimental results.

      Weaknesses:

      The premise of the current study is confusing as it tries to use the CD11c-p28 floxed mouse model to explain the Th2-prone immune profile of newly generated CD4SP thymocytes. Instead, it would be more helpful to (1) give full credit to the original study which already described the proinflammatory IFNg+ phenotype of CD4 T cells in CD11c-p28 floxed mice to be mediated by thymic dendritic cells [PMID: 23175475], and then, (2) build on that to explain that this study is aimed to understand the molecular basis of the original finding.

      In its essence, this study mostly rediscovers and reaffirms previously reported findings, but with different tools. While the mapping of epigenetic changes in the IFNg and T-bet gene loci and the STAT1 gene signature in CD4SP cells are interesting, these are expected results, and they only reaffirm what would be assumed from the literature. Thus, there is only incremental gain in new insights and information on the role of DC-derived IL-27 in driving the Th1 phenotype of CD4SP cells in CD11c-p28 floxed mice.

      Thank you for your valuable comments and suggestions. We appreciate your input and have carefully reviewed the concerns raised regarding the premise and novelty of our study.

      Indeed, the current study is built upon the foundational work of Zhang et al. (PMID: 23175475), which first described the proinflammatory IFN-γ<sup>+</sup> phenotype of CD4 T cells in CD11c-p28 floxed mice mediated by thymic dendritic cells. We have cited this study multiple times in our manuscript to acknowledge its significance. Our goal was to expand on this original finding by exploring the functional bias of newly generated CD4<sup>+</sup> T cells, elucidating the mechanisms underlying the hyper-Th1 phenotype in the absence of thymic DC-derived IL-27, and exploring its relevance in pathogenesis of autoimmunity.

      Our study revisits this phenomenon with a focus on the molecular and epigenetic changes that drive the Th1 bias in CD4SP cells. We demonstrated that the deletion of p28 in thymic dendritic cells leads to an unexpected hyperactivation of STAT1, which is associated with epigenetic modifications that favor Th1 differentiation. These findings provide a deeper understanding of the molecular basis behind the original observation of the Th1-skewed phenotype in CD11c-p28 floxed mice.

      However, as you pointed out, there is still a gap in understanding the precise link between p28 deficiency and STAT1 activation. We acknowledge that our study primarily reaffirms previously reported findings with different tools and approaches. While the mapping of epigenetic changes in the IFN-γ and T-bet gene loci and the STAT1 gene signature in CD4SP cells are interesting, they are indeed expected results based on the existing literature. This limits the novelty and incremental gain in new insights provided by our study.

      To address this gap and enhance the novelty of our findings, we plan to conduct further investigations to elucidate the detailed mechanisms connecting p28 deficiency to STAT1 hyperactivation. We will explore potential compensatory pathways or alternative signaling mechanisms that may contribute to the observed epigenetic changes and Th1 bias. Additionally, we will consider the broader impact of IL-27 deficiency on the thymic environment and its downstream effects on CD4<sup>+</sup> T cell differentiation.

      We appreciate your feedback and will work to strengthen the mechanistic underpinnings of our study. We believe that these additional efforts will provide a more comprehensive understanding of the role of DC-derived IL-27 in shaping the Th1 phenotype of CD4SP cells and contribute meaningful insights to the field.

      Altogether, the major issues of this study remain unresolved:

      (1) It is still unclear why the p28-deficiency in thymic dendritic cells would result in increased STAT1 activation in CD4SP cells. Based on their in vitro experiments with blocking anti-IFNg antibodies, the authors conclude that it is unlikely that the constitutive activation of STAT1 would be a secondary effect due to autocrine IFNg production by CD4SP cells. However, this possibility should be further tested with in vivo models, such as Ifng-deficient CD11c-p28 floxed mice. Alternatively, is this an indirect effect by other IFNg producers in the thymus, such as iNKT cells? It is necessary to explain what drives the STAT1 activation in CD11c-p28 floxed CD4SP cells in the first place.

      Thank you for your insightful suggestions. We appreciate your feedback and are committed to addressing the critical questions raised regarding the mechanisms underlying STAT1 activation in CD4SP cells in the context of p28 deficiency in thymic dendritic cells.

      To further investigate the potential autocrine loop for IFN-γ production, we will conduct in vivo studies using Cd4-Ifng and CD11c-p28 double knockout mice. This model will allow us to directly test whether IFN-γ produced by CD4SP cells themselves contributes to the observed STAT1 activation. Additionally, this approach will help exclude the possibility of indirect effects from other IFN-γ-producing cells in the thymus, such as invariant natural killer T (iNKT) cells, as suggested by the reviewer.

      As you correctly pointed out, a key unanswered question is what drives the initial STAT1 activation in CD4SP cells of CD11c-p28 floxed mice. Our current hypothesis is that p28 deficiency enhances the responsiveness of developing thymocytes to STAT1-activating cytokines. This hypothesis is supported by several lines of evidence:

      (1) Functional Antagonism: Recent studies have shown that IL-27p28 can act as an antagonist of gp130-mediated signaling. This suggests that in the absence of p28, the inhibitory effect of IL-27p28 on downstream signaling may be lost, leading to increased sensitivity to other cytokines that activate STAT1.

      (2) Structural Insights: Structural studies have demonstrated that IL-27p28 is centrally positioned within the complex formed with EBI3 and the two receptor subunits IL-27Rα and gp130. This positioning implies that p28 deficiency could disrupt the balance of cytokine signaling pathways involving these components.

      (3)  Phenotypic Similarity: We have observed a similar hyper-Th1 phenotype in mice lacking either p28 or IL-27ra. This similarity suggests that the absence of p28 may lead to increased availability of gp130 for signaling by other cytokines, thereby enhancing STAT1 activation.

      Based on these considerations, we hypothesize that the deficiency of p28 results in a greater availability of gp130 to transduce signals from other cytokines, ultimately leading to enhanced STAT1 activation in CD4SP cells. To identify the specific cytokine(s) responsible for this effect, we will focus on gp130-related cytokines, as outlined in our response to Reviewer 1. This will involve reanalysis of single-cell RNA sequencing data and further experimental validation to pinpoint the candidate cytokines driving the observed STAT1 hyperactivation.

      We are confident that these additional studies will provide a clearer understanding of the mechanisms linking p28 deficiency in thymic dendritic cells to increased STAT1 activation in CD4SP cells. We appreciate your guidance and look forward to sharing our findings.

      (2) It is also unclear whether CD4SP cells are the direct targets of IL-27 p28. The cell-intrinsic effects of IL-27 p28 signaling in CD4SP cells should be assessed and demonstrated, ideally by CD4SP-specific deletion of IL-27Ra, or by establishing bone marrow chimeras of IL-27Ra germline KO mice.

      Thanks for the suggestions. Further studies will be performed to test whether developing thymocytes are the direct targets of IL-27 using Cd4-IL-27ra knockout mice or mixed bone marrow chimeras of wildtype and IL-27ra knockout cells. Unfortunately, we have encountered challenges with the mouse breeding and anticipate that it will take approximately six months to obtain the appropriate genotype necessary to complete these experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Is the hyper-STAT1 response seen in T cells from Cd11c-p28-flox mice due to increased availability and/or increased responsiveness to STAT1 activating cytokines? Studies, where SP, RTE, and Tn cells are pulsed ex vivo with IL-27 and/or other STAT1-activating cytokines, would address the latter (with STAT1 phosphorylation as the major readout). Given the ability of IL-27 to activate STAT3, this pathway should also be addressed. It would be of interest if STAT1 signaling is selectively impaired, as suggested by the work of Twohig et al. (doi: 10.1038/s41590-019-0350-0.)(which should be cited and discussed).

      Thank you for your insightful suggestions. We appreciate your input and are committed to addressing the critical questions raised regarding the mechanisms underlying the hyper-activation of STAT1 in T cells from Cd11c-p28-flox mice.

      The detailed mechanisms driving the hyper-activation of STAT1 remain to be fully elucidated. Recent studies have shown that IL-27p28 can act as an antagonist of gp130-mediated signaling. Structural analyses have also demonstrated that IL-27p28 interacts with EBI3 and the two receptor subunits IL-27Rα and gp130. Considering these findings and the similar phenotypes observed in p28 and IL-27ra deficient mice, we speculate that the deficiency of either p28 or IL-27ra may increase the availability of gp130 for signaling by other cytokines. This could potentially enhance the responsiveness of developing thymocytes to STAT1-activating cytokines. We will focus our future research on gp130-related cytokines to identify the candidate(s) responsible for the enhanced STAT1 activation in the absence of p28. Alternatively, the release of EBI3 in the absence of p28 may facilitate its coupling with other cytokine subunits. IL-35, which is composed of EBI3 and p35, is of particular interest given the involvement of IL-27Rα in its signaling pathway.

      To narrow down the candidate cytokines, we reanalyzed single-cell RNA sequencing data from CD11c-cre p28<sup>f/f</sup> and wild-type thymocytes (Signal Transduct Target Ther. 2022, DOI: 10.1038/s41392-022-01147-z). Our analysis revealed that thymic dendritic cells (DCs) were categorized into two distinct clusters, with both Il12a (p35, which forms IL-35 with EBI3) and Clcf1 (CLCF1) being upregulated in CD11c-cre p28<sup>f/f</sup> mice. In CD4 single-positive (SP) thymocytes, the expression levels of gp130 and IL-12Rβ2 (the receptor for IL-35) were comparable between knockout and wild-type mice. However, the mRNA levels of Lifr and Cntfr were low in CD4 SP thymocytes.

      Single-cell RNA sequencing data from CD11c-cre p28<sup>f/f</sup> (KO) and wild-type thymocytes (Signal Transduct Target Ther. 2022, DOI: 10.1038/s41392-022-01147-z).

      We have planned to assess the protein levels of IL-35 and CLCF1 in dendritic cells, as well as their respective receptors, to evaluate their effects on STAT1 phosphorylation in CD4<sup>+</sup> thymocytes from both wild-type and p28-deficient mice. Unfortunately, we have encountered challenges with the mouse crosses and anticipate that it will take approximately six months to obtain the appropriate genotype necessary to complete these experiments.

      As you correctly noted, the ability of IL-27 to activate STAT3 signaling is an important consideration. We have carefully examined this pathway in our current study, and our results indicate that neither total nor phosphorylated STAT3 and STAT4 were found to be altered with IL-27p28 ablation (Figure 5B). This suggests that the impact is indeed specific to the STAT1 axis. We will also consider the possibility of selective impairment of STAT1 signaling, as suggested by the work of Twohig et al. (doi: 10.1038/s41590-019-0350-0), which we will cite and discuss in our revised manuscript.

      We appreciate your guidance and will work diligently to address these questions in our future studies. We look forward to sharing our findings and contributing to a deeper understanding of the role of IL-27 in the regulation of STAT1 activation in T cells.

      (2) It may be that the hyper-Th1 phenotype is not due to cell-intrinsic differences in STAT1 signaling (see Major Point 1) but rather, hyper-responsiveness to TCR + Co-stimulation (as provided in the re-stim assays used throughout). This issue is particularly relevant for the ChIP studies where the author notes that, "...we chose to treat the cells with anti-CD3 and anti-CD28 for 3 days prior to the assay". Why not treat these cells ex vivo with STAT1-activating cytokines instead of anti-CD3/CD28? The current methodology makes it impossible to distinguish between enhanced TCR/CD28 and cytokine signaling, and ultimately does not address SP, RTE, and Tn cells (since they are now activated, blasts.).

      Thank you for raising this important point. We appreciate your feedback and fully recognize the limitations of our current methodology, which uses anti-CD3/CD28 stimulation for ChIP experiments. This approach indeed complicates the distinction between enhanced TCR/CD28 signaling and cytokine-mediated STAT1 activation, particularly in the context of SP, RTE, and Tn cells, which become activated blasts under these conditions.

      To address these concerns and provide more precise insights into the mechanisms underlying the hyper-Th1 phenotype, we are revising our experimental strategy. Specifically, we are shifting our focus to directly investigate the role of STAT1-activating cytokines in the absence of p28. Based on our previous analysis and re-evaluation of single-cell RNA sequencing data, we have identified IL-35 and CLCF1 as the most promising candidate cytokines.

      We are now planning to perform ChIP experiments using these cytokines directly, rather than relying on TCR + co-stimulation. This approach will allow us to more accurately evaluate the impact of these cytokines on STAT1 signaling in CD4<sup>+</sup> T cells. By treating cells ex vivo with IL-35 and CLCF1, we aim to elucidate whether the observed hyper-Th1 phenotype is driven by enhanced responsiveness to these cytokines, independent of TCR/CD28 signaling.

      We regret to inform you that we have encountered unforeseen challenges with mouse crosses, which have delayed our progress. As a result, we anticipate a delay of approximately six months to obtain the appropriate genotypes necessary to complete these experiments. We understand the importance of these revisions and are committed to overcoming these challenges to provide a more robust and accurate analysis.

      (3) Studies involving STAT1-deficient mice are necessary (ideally with STAT1 deficiency restricted to the T cell compartment). At a minimum, it must be confirmed that these phenocopy Cd11c-p28-flox mice in terms of SP, RTE, and Tn cells (and their Th1-like character). If a similar hyper-Th1 phenotype is not seen, then the attendant hyper STAT1 response can only be viewed as a red herring.

      Thank you for raising this important consideration. We acknowledge the significance of addressing the role of STAT1 specifically within the T cell compartment to validate the mechanisms underlying the hyper-Th1 phenotype observed in Cd11c-p28-flox mice.

      We agree that studies involving STAT1-deficient mice, particularly with STAT1 deficiency restricted to the T cell compartment, are essential to confirm whether the hyper-Th1 phenotype is directly driven by STAT1 hyperactivation in T cells. Ideally, such studies would help determine if STAT1 deficiency in T cells phenocopies the Cd11c-p28-flox mice, particularly in terms of the SP, RTE, and Tn cells and their Th1-like characteristics.

      Unfortunately, we currently face challenges in obtaining and breeding the appropriate STAT1 conditional knockout mice with T cell-specific deletion. This has limited our ability to conduct these experiments in a timely manner. However, we recognize the importance of these studies and are actively working to secure the necessary resources and models to address this critical question.

      We understand that without these experiments, any conclusions drawn about the role of STAT1 hyperactivation in driving the hyper-Th1 phenotype must be considered with caution. If a similar hyper-Th1 phenotype is not observed in STAT1-deficient T cells, then the hyper-STAT1 response may indeed be a secondary or compensatory effect rather than a primary driver.

      We are committed to pursuing these studies and will prioritize them in our future work. We will keep you informed of our progress and will update the manuscript with the results of these experiments once completed. We appreciate your patience and understanding as we work to address this important aspect of our research.

      (4) The authors mine their RNA-seq data using a STAT1 geneset sourced from studies involving IL-21 as the upstream stimulus. Why was this geneset was chosen? It is true that IL-21 can activate STAT1 but STAT3 is typically viewed as its principal signaling pathway. There are many more appropriate genesets, especially from studies where T cells are cultured with traditional STAT1 stimuli (e.g. IL-27 in Hirahara et al., Immunity 2015 or interferons in Iwata et al., Immunity 2017)doi: 10.1016/j.immuni.2015.04.014, 10.1016/j.immuni.2017.05.005).

      Thank you for your insightful comments. We appreciate your attention to the choice of the STAT1 gene set in our RNA-seq analysis.

      Initially, we selected the STAT1 gene set from a study involving IL-21 stimulation (GSE63204) because IL-21 is known to activate STAT1, despite STAT3 being its principal signaling pathway. However, we acknowledge that this choice may not have been optimal given the context of our study, which focuses on the role of IL-27 and its impact on STAT1 signaling in T cells.

      We agree that gene sets derived from studies using more canonical STAT1 stimuli, such as IL-27 or interferons, would be more relevant for our analysis. In response to your suggestion, we have revised our approach and adopted a gene set from GSE65621, which compares STAT1-/- and wild-type CD4 T cells following IL-27 stimulation. This gene set is more aligned with the focus of our study and provides a more appropriate reference for identifying STAT1-activated genes.

      Our re-analysis revealed that 270 genes (FPKM > 1, log2FC > 2) were downregulated in STAT1-/- cells compared to wild-type cells, which we defined as STAT1-activated genes. Notably, approximately 50% of the upregulated differentially expressed genes (55 out of 137) in our dataset fell into the category of STAT1-activated genes, while none were classified as STAT1-suppressed genes (Figure 4B). Furthermore, Gene Set Enrichment Analysis (GSEA) demonstrated significant enrichment of STAT1-activated genes in the transcriptome of CD4 SP thymocytes from the knockout mice (NES = 1.67, nominal p-value = 10<sup>-16</sup>, Figure 4D).

      These findings support our conclusion that IL-27p28 deficiency leads to enhanced STAT1 activity in CD4 SP thymocytes. We believe that using a more relevant gene set has strengthened our analysis and provided clearer insights into the molecular mechanisms underlying the observed phenotype.

      We have cited the relevant studies (Hirahara et al., Immunity 2015; Iwata et al., Immunity 2017) to provide context for our revised analysis and to acknowledge the importance of canonical STAT1 stimuli in T cell signaling. We appreciate your guidance and are confident that these revisions have improved the robustness and relevance of our findings.

      (5) Given the ability of IL-27 to activate STAT1 in T cells, it is surprising that SP, RTE, and Tn cells from Cd11c-p28-flox mice exhibit more STAT1 signaling than WT controls. If not IL-27, then what is the stimulus for this STAT1 activity? The authors rule out autocrine IFN-g production in vitro (not in vivo) but provide no further insight.

      Thank you for raising this important question. We appreciate your interest in understanding the source of enhanced STAT1 signaling in SP, RTE, and Tn cells from Cd11c-p28-flox mice, especially given the role of IL-27 in activating STAT1 in T cells. As previously discussed, we have identified IL-35 and CLCF1 as the most likely candidate cytokines driving the observed STAT1 activity in the absence of p28. These cytokines are of particular interest due to their potential to activate STAT1 and their relevance in the context of our study.

      To address the question of what drives the enhanced STAT1 signaling, we are planning to perform ChIP experiments using these cytokines directly. This approach will allow us to evaluate their impact on STAT1 signaling more precisely, without relying on TCR + co-stimulation. By treating cells ex vivo with IL-35 and CLCF1, we aim to determine whether these cytokines are responsible for the increased STAT1 activity observed in Cd11c-p28-flox mice.

      We acknowledge that ruling out autocrine IFN-γ production in vitro, as we have done, does not fully address the potential role of IFN-γ in vivo. Therefore, we are also considering additional in vivo experiments to further investigate this possibility. These studies will help us determine whether other sources of IFN-γ or other cytokines contribute to the observed STAT1 hyperactivation. Unfortunately, due to unforeseen challenges with mouse crosses, we anticipate a delay of approximately six months to obtain the appropriate genotypes necessary for these experiments. We are actively working to resolve these challenges and will update the manuscript with the results of these experiments upon completion.

      (6) The RNAseq data affirms that SP, RTE, and Tn cells from Cd11c-p28-flox mice exhibit more STAT1 signaling than WT controls. However, this does little to explain the attendant hyper-Th1 phenotype. Is there evidence that epigenetic machinery is deregulated (to account for changes in DNA. histone methylation)? Were IFN-g and Tbet among these few observed DEG? If so, then this should be highlighted. If not, then the authors must address why not. Are there clues as to why STAT1 signing is exaggerated? Also, the hyper-STAT1 effect should be better described using more rigorous STAT1- and interferon-signature genesets (see the work of Virginia Pascual, Anne O'Garra).

      Thank you for your valuable feedback and suggestions. We appreciate your interest in understanding the mechanisms underlying the hyper-Th1 phenotype observed in Cd11c-p28-flox mice. Below, we address each of your points in detail:

      (1) Epigenetic Regulation:

      We have conducted a thorough analysis of the global levels of key histone modifications, including H3K4me3, H3K9me3, and H3K27me3, as well as the mRNA expression of the enzymes responsible for catalyzing these marks. Our results indicate that there are no significant differences in these histone modifications or the expression of the associated enzymes between Cd11c-p28<sup>f/f</sup> and wildtype mice (Figure 3-figure supplement 1A-C). This suggests that the enhanced STAT1 signaling is not a consequence of broad epigenetic deregulation. Instead, we hypothesize that the observed changes may be driven by more specific molecular mechanisms, such as cytokine signaling pathways.

      (2) IFN-γ and Tbx21 Expression:

      Regarding the expression of Th1-associated genes, our analysis revealed a modest induction of ifng and tbx21 (encoding T-bet) in the CD4SP population following TCR stimulation. However, the baseline expression levels of these genes were quite low in freshly isolated CD4SP cells. Specifically, ifng was undetectable, and tbx21 had an FPKM of 0.29 in wildtype mice compared to 1.05 in Cd11c-p28<sup>f/f</sup> mice. While these findings indicate some upregulation of Th1-associated genes, the overall expression levels remain relatively low, suggesting that additional factors may contribute to the hyper-Th1 phenotype.

      (3) STAT1 Signature Genesets:

      We have revised our analysis to incorporate more rigorous STAT1 and interferon-signature genesets, as suggested. We have adopted gene sets from well-established studies, including those by Virginia Pascual and Anne O'Garra, to provide a more comprehensive and accurate assessment of STAT1 signaling. This approach has enhanced our ability to identify and characterize the genes involved in the STAT1 pathway, providing clearer insights into the exaggerated STAT1 signaling observed in our model.

      We appreciate your guidance and are committed to refining our analysis to provide a more detailed understanding of the mechanisms driving the hyper-Th1 phenotype in Cd11c-p28-flox mice. We will continue to explore the potential roles of cytokines such as IL-35 and CLCF1, as well as other factors that may contribute to the observed changes in STAT1 signaling and Th1 differentiation. We look forward to sharing our updated findings and further discussing these mechanisms in our revised manuscript.

      (7) Is the hyper-Th1 phenotype of SP, RTE, and Tn cells from Cd11c-p28-flox mice unique to the CD4 compartment? Are developing CD8<sup>+</sup> cells similarly prone to increased STAT1 signaling and IFN-g production?

      Thank you for raising this important point. Our data indeed suggests that the hyper-Th1 phenotype observed in SP, RTE, and Tn cells from Cd11c-p28<sup>f/f</sup> mice is unique to the CD4<sup>+</sup> T cell compartment. Specifically, we found that while CD4<sup>+</sup> SP cells from Cd11c-p28<sup>f/f</sup> mice exhibited a significant upregulation in IL-27 receptor expression (both IL27Ra and gp130) compared to wild-type (WT) mice, CD8<sup>+</sup> SP cells from the same genotype showed markedly lower expression of these receptor subunits (Figure 1C in Sci Rep. 2016 Jul 29:6:30448. DOI: 10.1038/srep30448). This finding is further supported by our observation that the phosphorylation levels of STAT1, STAT3, and STAT4, downstream targets of IL-27 signaling, were comparable between CD8 SP cells from Cd11c-p28<sup>f/f</sup> and WT mice (Author response image 1). Additionally, we observed no significant difference in IFN-γ and granzyme B production between naïve CD8 T cells isolated from the lymph nodes of the two genotypes (Author response image 1). Taken together, these results suggest that the enhanced Th1 differentiation and IFN-γ production seen in the CD4<sup>+</sup> T cell population from Cd11c-p28<sup>f/f</sup> mice is not recapitulated in the CD8<sup>+</sup> T cell lineage.

      Author response image 2.

      (A) Intracellular staining was performed with freshly isolated thymocytes from Cd11c-p28<sup>f/f</sup> mice and WT littermates mice using antibodies against phosphorylated STAT1 (Y701), STAT3 (Y705), and STAT4 (Y693). The mean fluorescence intensity (MFI) for CD8 SP from three independent experiments (mean ± SD, n=3). (B) CD8<sup>+</sup> naive T cells were cultured under Th0 conditions for 3 days. The frequency of IFN-γ-, and granzyme B-producing CD8<sup>+</sup> T cells were determined analyzed by intracellular staining. Representative dot plots (left) and quantification (right, mean ± SD, n=6).

      Minor points and questions

      (1) Line 84 - Villarino et al. and Pflanz et al. are mis-referenced. Neither involves Trypanosome studies. The former is on Toxoplasma infection and, thus, should be properly referenced in the following sentence.

      Thank you for pointing out this error. You are correct that the references to Villarino et al. and Pflanz et al. were misapplied in the context of Trypanosome studies. Villarino et al. focuses on Toxoplasma infection, and we appreciate your guidance to ensure accurate citation. We will correct this in the manuscript and properly cite the studies in their appropriate contexts. Thank you for your vigilance in maintaining the accuracy of our references.

      (2) T-bet protein should also be measured by cytometry

      We sincerely thank the reviewer for the valuable suggestion regarding the measurement of T-bet protein levels. In response to this comment, we have performed additional experiments to quantify T-bet protein expression using flow cytometry. The results of these analyses have been incorporated into the revised manuscript as Figure 1F.

      Reviewer #2 (Recommendations For The Authors):

      (1) When new mouse strains are generated in this study, there is no comment on whether there are any changes in the frequency or cell number of CD4 T cells. For instance, in Aire-deficient CD11c-p28 floxed mice, it should be noted whether CD4SP, naïve CD4, and CD4 RTE are all the same in frequency and number compared to their littermate controls. Also, is there any effect on the generation of these thymocytes?

      We sincerely thank the reviewer for raising this important point regarding the potential changes in the frequency and cell numbers of CD4<sup>+</sup> T cells in the newly generated mouse strains. In response to the reviewer’s question, we would like to clarify the following:

      (1) Impact of Aire deficiency on CD4<sup>+</sup> T Cells:

      As previously reported by us and others (Aging Dis. 2019, doi: 10.14336/AD.2018.0608; Science. 2002, doi: 10.1126/science.1075958), Aire deficiency does not significantly alter the overall number or frequency of CD4 single-positive (CD4SP) thymocytes, recent thymic emigrants (RTEs), or naïve CD4<sup>+</sup> T cells. However, it profoundly affects their composition and functional properties, leading to the escape of autoreactive T cells and subsequent autoimmune manifestations.

      (2) Observations in Cd11c-p28<sup>f/f</sup>Aire<sup>-/-</sup> mice:

      In our study, we observed that the number and frequency of CD4<sup>+</sup> T cells in the spleen and lymph nodes were comparable among Cd11c-p28<sup>f/f</sup>, Aire<sup>-/-</sup>, and Cd11c-p28<sup>f/f</sup>Aire<sup>-/-</sup> mice, and WT controls. This suggests that the genetic modifications did not significantly impact the overall development or peripheral maintenance of CD4<sup>+</sup> T cells.

      Author response image 3.

      (3) Challenges in assessing RTEs in double knockout mice:

      To accurately assess RTEs in the double knockout mice, it would be necessary to cross these mice with Rag-GFP reporter mice, which specifically label RTEs. However, breeding the appropriate mouse strain for this analysis would require additional time and resources, which were beyond the scope of the current study.

      (2) There are a couple of typos throughout the manuscript. For example, line 91: IL-27Rα or line 313: phenotype.

      We apologize for the typographical errors. We have carefully reviewed the entire manuscript and corrected all identified mistakes, including those on line 91 (IL-27Rα) and line 305 (phenotype).

      (4) The authors should show each data point on their bar graphs.

      Thank you for the suggestion. We have presented each data point on their bar graphs in the revised manuscript.

      (4) It should be noted from which organs the RTE and the naïve T cells were harvested.

      Thank you for the constructive suggestion. We isolated CD4<sup>+</sup> RTEs and mature naive CD4<sup>+</sup> T cells by sorting GFP<sup>+</sup>CD4<sup>+</sup>CD8<sup>-</sup>CD<sup>-</sup>NK1.1<sup>-</sup> cells (RTEs) and GFP<sup>-</sup>CD4<sup>+</sup>CD8<sup>-</sup>CD<sup>-</sup>CD44<sup>lo</sup> cells (naive T cells) from lymph nodes. This detail has been added to the manuscript on line 475.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the expert reviewers for their careful consideration of our manuscript and the feedback to help us strengthen our work. Please find a response to each reviewer’s comments below. We have included the original text from the reviewer in unbolded text and our response, immediately below, in bold text for clarity. 

      Reviewer #1:

      (1) Appetite is controlled, not regulated; please reword throughout.

      The reviewer raises a valid point that we have misused the word “regulate” in certain instances and “control” would be more accurate term. We have made adjustments throughout the manuscript.

      (2) One minor point that would further strengthen the data is a more distinct analysis of receptors that are characteristic of the different populations of neuronal and non-neuronal cells; this part could be improved. 

      We thank the reviewer for this suggestion as we had not directly compared metabolicallyrelevant peptides/receptors between the mouse and rat DVC. We have included a list of selected receptors and neuropeptides expression (see Figure S13) for neuronal cells in mouse and rat. We have included this figure as a new supplement. There are some interesting insights from this data, including the relatively broad expression of Lepr in the rat compared with the mouse and the absence of proglucagon expressing neurons within the rat DVC.  

      Reviewer #2:

      (1) In some of the graphs, the label AP/NTS is used, but DVC would be more appropriate.

      We have reviewed the figures and legends to ensure appropriate use of DVC. We thank the reviewer for bringing this oversight to our attention.  

      (2) Line 124, p7 - Sprague Dawley RATS

      We have changed the text to “Sprague Dawley rats” 

      (3) Line 132, p7 - The phrase "were provided with given access to food" needs grammatical correction.

      We agree the text was poorly written. The sentence has been corrected to: “Wild-type Sprague

      Dawley rats (Charles River) were provided with ad libitum access to food (Purina Lab Diet

      5001) and water in temperature-controlled (22°C) rooms on a 12-hour light-dark cycle with daily health checks.” We have also reviewed the entire manuscript and made additional amendments where necessary.  

      (4) Page 15 - Mention that GFAP is a marker for astrocytes. Additionally, correct the typo "gfrap".

      We have corrected the misspelling of “Gfap” within the text. We appreciate the reviewer’s comment that there is value in communicating to the nonexpert reader that GFAP is a marker for astrocytes, however, as our data and that from other snRNA-Seq studies show that Gfap mRNA only labels a subset of astrocytes, our preference is to refrain from stating this. Our data suggests the sole use of Gfap as an astrocyte marker will not reflect the true astrocyte population.  

      (5) Line 432, p15 - What was the rationale for selecting clusters 23, 26, and 27?

      We chose to perform subclustering on these clusters because they displayed multiple cell identities when surveyed for the 473 marker genes as described in Methods 2.6. In order to separate these, the granularity was increased in them by sub-clustering.

      (6) Line 533, p18 - only 5 out of 34 neurons express GFRAL, which makes the language used a little bit misleading. As per the comment above, I would specify that only a subset (X%) of neurons express GFRAL, and apply the same approach for other markers.

      We thank the reviewer for raising this point. We agree the text, as written, was an oversimplification. We adjusted the text as recommended: that a subset (~15%) express detectable Gfral mRNA but is likely an underrepresentation due to the challenges in detecting lowly expressed transcripts such as Gfral.  

      (7) Line 547, p18 - This statement appears to refer to rat data specifically, rather than rodent data in general.

      The text has been corrected. 

      (8) Section 3.6 - The discussion on meal-related transcriptional programs in the murine DVC does not mention Figure S10A and B.

      We thank the reviewer for the observation. It is true that we do not discuss this figure. Fig10S is the integration of samples in treeArches, a necessary step to build the hierarchy in python so the learning algorithm uses only genes that are related to identity and not treatment, we obtained the same overlap of samples when we used R to assign identities. This figure demonstrates our integration was successful because it is only considering genes that are not-treatment related to establish identities, those which are expressed by cells regardless of their response to any treatment. For the meal-related analysis, we were interested in the genes that are changed by treatment, and this is why the analysis differed. We have included a sentence in the methods to clarify this point that states: " This sample integration was done to ensure that inter-sample variations were removed for the cell identity steps."

      (9) Page 5, citation 10 - the author cited a clinical trial for glucagon and GLP-1 receptor dual agonist survodutide for "DVC neurons' role in appetite and energy balance stems from their role as therapeutic targets for obesity". A more appropriate citation (such as a review) would be preferable.

      We appreciate the suggestion by the reviewer. We have updated our references to reflect a recent manuscript from the Alhadeff group which demonstrates the DVC acts as the target of GLP1-based therapies. We have also included a review as suggested 10.1038/s42255-02200606-9.

      (10) Line 52, p5 - a citation of obesity is needed, as the current ref only pertains to cancer cachexia.

      We have included a reference for obesity.  

      (11) In the discussion, it would be valuable to elaborate on the potential significance of DVCspecific glial cells (perhaps at the end of the second paragraph?).

      We thank the reviewer for this suggestion. Our discovery of a DVC-specific astrocyte transcriptional profile was underrepresented within the discussion. We have attempted to expand this discussion on the suspected roles for these DVC-specific astrocytes. Much of this discussion is based on the distinct localization pattern of Gfap mRNA in the DVC (see Image on Allen Brain ISH) which shows dense signal at the boundary of the AP and NTS. As astrocytes have well established roles in maintaining BBB integrity, it is our speculation that this is a major role of these cells. However, functional studies will be critical to assess the roles of these astrocytes in DVC biology.  

      (12) Line 683, p22 - Consider adding PMID: 38987598 which describes the dissociable GLP-1R circuits.

      We appreciate this recommendation – we have included this reference.  

      (13) The authors suggest that a possible explanation for the discrepancy between snRNA-Seq and in situ hybridization data is that Agrp and Hcrt mRNA reads in snRNA-Seq overwhelmingly mapped to non-coding regions. To what extent could this limitation affect other genes included in the current analyzed 10x datasets?

      As shown by Pool and cols. (https://doi.org/10.1038/s41592-023-02003-w) including intronic reads improves sensitivity and more accurately reflects endogenous gene expression. Therefore, including intronic reads is considered more of a strength than a limitation and is now default in platforms such as CellRanger. While including intronic reads for mapping snRNA-Seq data, we would advise corroboration of snRNA-Seq findings with published literature or detection of coding mRNA or protein. In our case, the detection of hypothalamic neuropeptide via snRNA-Seq data could not be verified by performing in situ hybridizations using probes that detect exons.  Therefore, Hcrt and Agrp having only intronic reads suggest a regulatory (reviewed in https://doi.org/10.3389/fgene.2018.00672) rather than a coding role in the DVC.

      (14) Given the manuscript's focus on feeding and metabolism, I believe a more detailed description and comparison of the transcription profile of known receptors, neurotransmitters, and neuropeptides involved in food intake and energy homeostasis between mice and rats would add value. Adding a curated list of key genes related to feeding regulation would be particularly informative.

      A similar request was made by reviewer #1. Please see the full response above. Briefly, we have performed additional analysis of the mouse and rat DVC data and included this data as an additional supplemental figure (Figure S13).  

      (15) Line 479-482, p17 - It would be helpful if the authors could quantify (e.g., number and/or percentage) the extent of TH and CCK co-expression.

      We have amended the text of the manuscript to include quantification of Cck and Th colocalization.  According to our snRNA-seq data, out of the 764 Th-expressing neurons, 80 coexpress Cck in the mouse (~10%). The Cck-expressing cells are more numerous, 3,821 in total.  

      (16) The number of animals used differs significantly between species, which the authors acknowledge as a limitation in the discussion. Since the authors took advantage of previously published mouse data sets (Ludwig and Dowsett data sets), I wonder if the authors could compare/integrate any rat data set currently available in rats as well to partially address the sample size disparity.

      We agree with the review that our rat database is considerably smaller than our mouse database, making comparisons between rat and mouse DVC challenging. We attempted to increase the size of our rat DVC atlas by incorporating publicly available rat DVC snRNA-Seq data (Reiner et al 2022). However, we found several issues with the quality of this data including low UMIs/cell and gene #/cell. For these reasons, we decided against merging these two datasets. So while relatively small, our rat DVC atlas uses high quality data and serves as a valuable starting point. By introducing TreeArches as a method to relatively easily incorporate new snRNA-Seq data into our own, it is our hope that future studies will do so and thus expand the rat DVC atlas we have built.    

      (17) In the Materials and Methods section, LiCl is mentioned as one of the treatment conditions; however, very little corresponding data are presented or discussed. Please include these results and elaborate on the rationale for selecting LiCl over other anorectic compounds.

      The reviewer is correct, some of the tissues used in this study were from animals treated with LiCl prior to euthanasia. Our intent was to contrast the transcriptional effects induced by LiCl ( an anorectic agent with aversive properties) with refeeding (a naturally rewarding and satiating stimuli). However, upon analyzing the data, we found very few transcriptional changes induced by LiCl. It is unclear to us whether this was a technical failure in the experiment and so did not elaborate on the results.  

      Reviewer #3 (Recommendations for the authors):

      (1) The use of both sexes is indicated in the discussion, but methods and results do not address sex distribution in the investigated groups. Also, the groups could be more clearly described, e.g., the size of the 2 hour refeeding mouse group varies from n=10 to n=5.

      We have clarified the text, in line with the reviewer’s suggestion. There were two cohorts of fasted/ refed mice (n=5 each), so in the manuscript methods it is stated as n=10 because of this. The fasted-only group, which was not refed before euthanasia is a separate group, n=5.

      (2) Page 20, the last sentence needs to be reworded.

      We thank the reviewer for this recommendation. The text has been amended to improve clarity of the sentence. 

      (3) Page 22, lines 691-692 - this sentence needs to be reworded.

      We thank the reviewer for this comment. The offending sentences have been amended.  

      (4) While the authors find transcriptional changes in all neuronal and non-neuronal cell types, which is interesting, the verification of known transcriptional changes (e.g., cFos) is unaddressed. cFos is a common gene upregulated with refeeding that was surprisingly not investigated, even though this should be a strong maker of proper meal-induced neuronal activation in the DMV. This is a missed opportunity either to verify the data set or to highlight important limitations if that had been attempted without success.

      This is a highly salient point made by the reviewer. Including Fos expression serves as an internal validation of our refeeding condition and the absence of Fos mRNA levels from the original manuscript was an oversight on our part. As shown in our volcano plot, between ad libitum fed and refed mice, there are two significantly Fos-associated genes upregulated in the refed group. Therefore, we are confident that the snRNA-Seq analysis accurately captured rapid changes in response to refeeding in the DVC. Only genes differentially expressed (log2 Fold-change >0.5 per group) were considered in the analysis. NS= non-significant.

      Author response image 1.

      (5) The focus on transmitter classification is highlighted, but surprisingly, the well-accepted distinction of GABAergic neurons by Slc32a1 was not used, instead, Gad1 and Gad2 were used as GABAergic markers. While this may be proper for the DMV, given numerous findings that Gad1/2 are not proper markers for GABAergic neurons and often co-expressed in glutamatergic populations, this confound should have been addressed to make a case if and why they would be proper markers in the DMV.

      The reviewer raises an important point. Indeed, there are discrepancies in expression between the Gad1/2 genes and Slc32a1 gene in other data sets. To analyze this within our data set, we examined the mainly GABAergic magnaclass 1 (see Slc32a1 UMAP plot below).  In magnaclass 1, only 5% and 3% of all neurons exclusively express solely Slc32a1 without either Gad1 or Gad2, respectively. In line with the reviewer’s comment, we found that 54% of neurons express either Gad1 or Gad2 but had no detectable Slc32a1. While our failure to detect more cells that co-express Slc32a1 and Gad genes may be partially due to the low expression of Slc32a1, it is also very likely that the DVC, like other brain regions, contains neurons that express the Gad enzymes without co-expression of Slc32a1.  

      This was very much the case with the GLP1 cell cluster, which we identified as the population which had the highest co-expression of excitatory and inhibitory markers. When we refined this analysis to look at expression of excitatory markers with Slc32a1 (and not other inhibitory genes), there was a marked reduction in the proportion of GLP1 neurons meeting this criterion. We find this is mainly due to the GLP1 cells expressing Gad2 (see plots below). We still find that there are some GLP1-expressing neurons that express excitatory markers and Slc32a1 and that the GLP1 neurons have a higher proportion of these co-expressing cells than other cell types.  

      We have extended our results section to reflect this and thank the reviewer for recommending this analysis.  

      Author response image 2.

      Slc32a1 expression across all neurons.  

      Author response image 3.

      Proportion of neurons in all cell identities expressing glutamatergic markers alone (dark green), Slc32a1 alone (light green), both glutamatergic markers and Slc32a1 (purple) or expressing neither Slc32a1 or glutamatergic markers  (grey).  

      Author response image 4.

      Balloon plot of Slc32a1, Gad1 and Gad2 across cell types. The GLP1-expressing neurons express Gad2 but minimal Slc32a1.  

      (6) The Pdgfra IHC as verification is great, but images are not very convincing in distinguishing the 2 (mouse) or 3 (rat) classes of cells. Why not compare Pdgfra and HuC/D co-localization by IHC and snRNAseq data (using the genes for HuC/D) in the mouse and in the rat? That would also clarify how specific HuC/D is for DMV neurons, or if it may also be expressed in non-neuronal populations.

      In agreement with the suggestion by the reviewer, we reanalyzed the snRNA-Seq data to identify the extent of the co-expression of HuC/HuD (i.e. Elavl3 and Elavl4 genes, respectively) in Pdgfra-expressing neurons. The gene expression of the 34 rat neurons belonging to this group are shown in the following heatmap in which each column represents one neuron. As shown, most neurons co-express Pdgfra and either HuC or HuD gene. In addition, we shown the UMAP plots of the rat neurons showing expression of the same genes regardless of the neuronal identity assigned. The Pdgfra neurons are visible in darker blue in the last UMAP plot. It's important to note that HuD is a more specific neuronal marker as shown in the table with the average expression of Elavl3/4 genes, since HuC is expressed by glial cells, specially OPCs and oligodendrocytes. As the HUC/D antibody detects both proteins, this complicates the interpretation of the immunofluorescent staining. While, the snRNA-Seq data suggests these Pdgfra expressing cells are indeed neurons (albeit a rare population), we aim to confirm this in separate studies.  

      Author response image 5.

      Author response image 6.

      Average expression (log-normalized counts) of HuC/D by layer 1 cell identity in the rat cells:

      Author response table 1.

      (7) The importance of sub-clustering for clusters 23, 26, and 27 is not immediately clear. Does this have any relevance to the mouse vs. rat data? Or fed, fast, refeeding data sets? Or is it just to show the depth that can be achieved?

      We appreciate that our justification was not clear within the manuscript. We have clarified our rationale below but briefly, in each case distinct transcriptional profiles were observed, and we pursued this by performing sub-clustering.   

      Cluster 23 was subclustered as it was found to contain both pre-myelinating and a subset of myelinating oligodendrocytes, therefore, to label them effectively in R instead of cell by cell, those subclusters showing pre-myelinating oligodendrocyte markers were instructed to be labeled as such in the dataset. The remaining cells were labeled as mature oligodendrocytes.

      A similar approach was taken for cluster 27 which contained pericytes, endothelial and smooth muscle cells (Figure S5).

      In the case of cluster 26, it was possible to find two subclusters of fibroblasts when mapping markers, so they were sub-clustered to instruct in R to label a group with one identity and the other, with the other identity. Therefore, the sub-clustering was done as an aid to label the different identities found through markers mapping (Table S5) in the first clustering round.

      All labels were transferred from mouse to rat data using treeArches, including those resulting from the sub-clustering of these clusters. Because this was done to establish identity, it should not be relevant for treatment analyses (e.g. fasted, refed) since they are built from markers that don't change by conditions but remain as identity markers. Indeed, our dataset has an even distribution of these subclusters among samples.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This work investigated the role of CXXC-finger protein 1 (CXXC1) in regulatory T cells. CXXC1-bound genomic regions largely overlap with Foxp3-bound regions and regions with H3K4me3 histone modifications in Treg cells. CXXC1 and Foxp3 interact with each other, as shown by co-immunoprecipitation. Mice with Treg-specific CXXC1 knockout (KO) succumb to lymphoproliferative diseases between 3 to 4 weeks of age, similar to Foxp3 KO mice. Although the immune suppression function of CXXC1 KO Treg is comparable to WT Treg in an in vitro assay, these KO Tregs failed to suppress autoimmune diseases such as EAE and colitis in Treg transfer models in vivo. This is partly due to the diminished survival of the KO Tregs after transfer. CXXC1 KO Tregs do not have an altered DNA methylation pattern; instead, they display weakened H3K4me3 modifications within the broad H3K4me3 domains, which contain a set of Treg signature genes. These results suggest that CXXC1 and Foxp3 collaborate to regulate Treg homeostasis and function by promoting Treg signature gene expression through maintaining H3K4me3 modification.

      Strengths:

      Epigenetic regulation of Treg cells has been a constantly evolving area of research. The current study revealed CXXC1 as a previously unidentified epigenetic regulator of Tregs. The strong phenotype of the knockout mouse supports the critical role CXXC1 plays in Treg cells. Mechanistically, the link between CXXC1 and the maintenance of broad H3K4me3 domains is also a novel finding.

      Weaknesses:

      (1) It is not clear why the authors chose to compare H3K4me3 and H3K27me3 enriched genomic regions. There are other histone modifications associated with transcription activation or repression. Please provide justification.

      Thank you for highlighting this important point. We chose to focus on H3K4me3 and H3K27me3 enriched genomic regions because these histone modifications are well-characterized markers of transcriptional activation and repression, respectively. H3K4me3 is predominantly associated with active promoters, while H3K27me3 marks repressed chromatin states, particularly in the context of gene regulation at promoters. This duality provides a robust framework for investigating the balance between transcriptional activation and repression in Treg cells. While histone acetylation, such as H3K27ac, is linked to enhancer activity and transcriptional elongation, our focus was on promoter-level regulation, where H3K4me3 and H3K27me3 are most relevant. Although other histone modifications could provide additional insights, we chose to focus on these two to maintain clarity and feasibility in our analysis. We have revised the text accordingly; please refer to Page 18, lines 353-356.

      (2) It is not clear what separates Clusters 1 and 3 in Figure 1C. It seems they share the same features.

      We apologize for not clarifying these clusters clearly. Cluster 1 and 3 are both H3K4me3 only group, with H3K4me3 enrichment and gene expression levels being higher in Cluster 1. At first, we divided the promoters into four categories because we wanted to try to classify them into four categories: H3K4me3 only, H3K27me3 only, H3K4me3-H3K27me3 co-occupied, and None. However, in actual classification, we could not distinguish H3K4me3-H3K27me3 co-occupied group. Instead, we had two categories of H3K4me3 only, with cluster 1 having a higher enrichment level for H3K4me3 and gene expression levels.

      (3) The claim, "These observations support the hypothesis that FOXP3 primarily functions as an activator by promoting H3K4me3 deposition in Treg cells." (line 344), seems to be a bit of an overstatement. Foxp3 certainly can promote transcription in ways other than promoting H3K3me3 deposition, and it also can repress gene transcription without affecting H3K27me3 deposition. Therefore, it is not justified to claim that promoting H3K4me3 deposition is Foxp3's primary function.

      Thank you for your insightful feedback. We agree that the statement in line 344 may have overstated the role of FOXP3 in promoting H3K4me3 deposition as its primary function. As you pointed out, FOXP3 is indeed a multifaceted transcription factor that regulates gene expression through various mechanisms. It can promote transcription independent of H3K4me3 deposition, as well as repress transcription without directly influencing H3K27me3 levels.

      To more accurately reflect the broader regulatory functions of FOXP3, we have revised the manuscript. The updated text (Page 19, lines 385-388) now reads:

      "These findings collectively support the conclusion that FOXP3 contributes to transcriptional activation in Treg cells by promoting H3K4me3 deposition at target loci, while also regulating gene expression directly or indirectly through other epigenetic modifications.

      (4) For the in vitro suppression assay in Figure S4C, and the Treg transfer EAE and colitis experiments in Figure 4, the Tregs should be isolated from Cxxc1 fl/fl x Foxp3 cre/wt female heterozygous mice instead of Cxxc1 fl/fl x Foxp3 cre/cre (or cre/Y) mice. Tregs from the homozygous KO mice are already activated by the lymphoproliferative environment and could have vastly different gene expression patterns and homeostatic features compared to resting Tregs. Therefore, it's not a fair comparison between these activated KO Tregs and resting WT Tregs.

      Thank you for raising this insightful point regarding the potential activation status of Treg cells in homozygous knockout mice. To address this concern, we performed additional experiments using Treg cells isolated from Foxp3<sup>Cre/+</sup>Cxxc1<sup>fl/fl</sup> (hereafter referred to as “het-KO”) female mice and their littermate controls, Foxp3<sup>Cre/+</sup>Cxxc1<sup>fl/+</sup> (referred to as “het-WT”) mice.

      The results of these new experiments are now included in the manuscript (Page25, lines 507–509, Figure 6E and Figure S6A-E):

      (1) In the in vitro suppression assay, Treg cells from het-KO mice exhibited reduced suppressive function compared to het-WT Treg cells. This finding underscores the intrinsic defect in Treg cells suppressive capacity attributable to the loss of one Cxxc1 allele.

      (2) In the experimental autoimmune encephalomyelitis (EAE) model, Treg cells isolated from het-KO mice also demonstrated impaired suppressive function.

      (5) The manuscript didn't provide a potential mechanism for how CXXC1 strengthens broad H3K4me3-modified genomic regions. The authors should perform Foxp3 ChIP-seq or Cut-n-Taq with WT and Cxxc1 cKO Tregs to determine whether CXXC1 deletion changes Foxp3's binding pattern in Treg cells.

      Thank you for raising this important point. To address your suggestion, we performed CUT&Tag experiments and found that Cxxc1 deletion does not alter FOXP3 binding patterns in Treg cells. Most FOXP3-bound regions in WT Treg cells were similarly enriched in KO Treg cells, indicating that Cxxc1 deficiency does not impair FOXP3’s DNA-binding ability. These results have been added to the revised manuscript (Page 28, lines 567-575, Figure S8A-B) and are further discussed in the Discussion (Pages 28-29, lines 581-587).

      Reviewer #2 (Public review):

      FOXP3 has been known to form diverse complexes with different transcription factors and enzymes responsible for epigenetic modifications, but how extracellular signals timely regulate FOXP3 complex dynamics remains to be fully understood. Histone H3K4 tri-methylation (H3K4me3) and CXXC finger protein 1 (CXXC1), which is required to regulate H3K4me3, also remain to be fully investigated in Treg cells. Here, Meng et al. performed a comprehensive analysis of H3K4me3 CUT&Tag assay on Treg cells and a comparison of the dataset with the FOXP3 ChIP-seq dataset revealed that FOXP3 could facilitate the regulation of target genes by promoting H3K4me3 deposition.

      Moreover, CXXC1-FOXP3 interaction is required for this regulation. They found that specific knockdown of Cxxc1 in Treg leads to spontaneous severe multi-organ inflammation in mice and that Cxxc1-deficient Treg exhibits enhanced activation and impaired suppression activity. In addition, they have also found that CXXC1 shares several binding sites with FOXP3 especially on Treg signature gene loci, which are necessary for maintaining homeostasis and identity of Treg cells.

      The findings of the current study are pretty intriguing, and it would be great if the authors could fully address the following comments to support these interesting findings.

      Major points:

      (1) There is insufficient evidence in the first part of the Results to support the conclusion that "FOXP3 functions as an activator by promoting H3K4Me3 deposition in Treg cells". The authors should compare the results for H3K4Me3 in FOXP3-negative conventional T cells to demonstrate that at these promoter loci, FOXP3 promotes H3K4Me3 deposition.

      Thank you for this insightful comment. We have already performed additional experiments comparing H3K4Me3 levels between FOXP3-positive Treg cells and FOXP3-negative conventional T cells (Tconv). Please refer to Pages 18, lines 361-368, and Figure 1C and Figure S1C for the results. Our results show that H3K4Me3 abundance is higher at many Treg-specific gene loci in Treg cells compared to Tconv cells. This supports our conclusion that FOXP3 promotes H3K4Me3 deposition at these loci.

      (2) In Figure 3 F&G, the activation status and IFNγ production should be analyzed in Treg cells and Tconv cells separately rather than in total CD4+ T cells. Moreover, are there changes in autoantibodies and IgG and IgE levels in the serum of cKO mice?

      Thank you for your valuable suggestions. In response to your comment, we reanalyzed the data in Figures 3F and 3G to assess the activation status and IFN-γ production in Tconv cells. The updated analysis revealed that Cxxc1 deletion in Treg cells leads to increased activation and IFN-γ production in Tconv cells. Additionally, we corrected the analysis of IL-17A and IL-4 expression, which were upregulated in Tconv cells. These updated results are now included in the revised manuscript (Page 21, lines 429-431, Figure 3I and Figure S3E-F).

      Additionally, we examined autoantibodies and immunoglobulin levels in the serum of Cxxc1 cKO mice. Our data show a significant increase in serum IgG levels, accompanied by elevated IgG autoantibodies, indicating heightened autoimmune responses. In contrast, serum IgE levels remained largely unchanged. The results are detailed in the revised manuscript (Page 21, lines 421-423, Figure 3E and Figure S3B).

      (3) Why did Cxxc1-deficient Treg cells not show impaired suppression than WT Treg during in vitro suppression assay, despite the reduced expression of Treg cell suppression assay -associated markers at the transcriptional level demonstrated in both scRNA-seq and bulk RNA-seq?

      Thank you for your thoughtful comment. The absence of impaired suppression in Cxxc1-deficient Treg cells from homozygous knockout (KO) mice during the in vitro suppression assay, despite the reduced expression of Treg-associated markers at the transcriptional level (as demonstrated by scRNA-seq), can likely be explained by the activated state of these Treg cells. In homozygous KO mice, Treg cells are already activated due to the lymphoproliferative environment, resulting in gene expression patterns that differ from those of resting Treg cells. This pre-activation may obscure the effect of Cxxc1 deletion on their suppressive function in vitro.

      To address this limitation, we used heterozygous Foxp3<sup>Cre/+</sup>Cxxc1<sup>fl/fl</sup> (het-KO) female mice, along with their littermate controls, Foxp3<sup>Cre/+</sup>Cxxc1<sup>fl/+</sup> (het-WT) mice. In these heterozygous mice, we observed an impairment in Treg cell suppressive function in vitro, which was accompanied by the downregulation of several key Treg-associated genes, as confirmed by RNA-Seq analysis.

      These updated findings, based on the use of het-KO mice, are now incorporated into the revised manuscript (Page 25, lines 507–509, Figure 6E).

      (4) Is there a disease in which Cxxc1 is expressed at low levels or absent in Treg cells? Is the same immunodeficiency phenotype present in patients as in mice?

      This is indeed a very meaningful and intriguing question, and we are equally interested in understanding whether low or absent Cxxc1 expression in Treg cells is associated with any human diseases. However, despite an extensive review of the literature and available data, we found no reports linking Cxxc1 deficiency in Treg cells to immunodeficiency phenotypes in patients comparable to those observed in mice.

      Reviewer #3 (Public review):

      In the report entitled "CXXC-finger protein 1 associates with FOXP3 to stabilize homeostasis and suppressive functions of regulatory T cells", the authors demonstrated that Cxxc1-deletion in Treg cells leads to the development of severe inflammatory disease with impaired suppressive function. Mechanistically, CXXC1 interacts with Foxp3 and regulates the expression of key Treg signature genes by modulating H3K4me3 deposition. Their findings are interesting and significant. However, there are several concerns regarding their analysis and conclusions.

      Major concerns:

      (1) Despite cKO mice showing an increase in Treg cells in the lymph nodes and Cxxc1-deficient Treg cells having normal suppressive function, the majority of cKO mice died within a month. What causes cKO mice to die from severe inflammation?

      Considering the results of Figures 4 and 5, a decrease in the Treg cell population due to their reduced proliferative capacity may be one of the causes. It would be informative to analyze the population of tissue Treg cells.

      Thank you for your insightful observation regarding the mortality of cKO mice despite increased Treg cells in lymph nodes and the normal suppressive function of Cxxc1-deficient Treg cells.

      As suggested, we hypothesized that the reduction of tissue-resident Treg cells could be a key factor. Additional experiments revealed a significant decrease in Treg cell populations in the small intestine lamina propria (LPL), liver, and lung of cKO mice. These findings highlight the critical role of tissue-resident Treg cells in preventing systemic inflammation.

      This reduction aligns with Figures 4 and 5, which demonstrate impaired proliferation and survival of Cxxc1-deficient Treg cells. Together, these defects lead to insufficient Treg populations in peripheral tissues, escalating localized inflammation into systemic immune dysregulation and early mortality.

      These additional results have been incorporated into the revised manuscript (Page21, lines 424-427, Figure 3G and Figure S3C).

      (2) In Figure 5B, scRNA-seq analysis indicated that the Mki67+ Treg subset is comparable between WT and Cxxc1-deficient Treg cells. On the other hand, FACS analysis demonstrated that Cxxc1-deficient Treg shows less Ki-67 expression compared to WT in Figure 5I. The authors should explain this discrepancy.

      Thank you for pointing out the apparent discrepancy between the scRNA-seq and FACS analyses regarding Ki-67 expression in Cxxc1-deficient Treg cells.

      In Figure 5B, the scRNA-seq analysis identified the Mki67+ Treg subset as comparable between WT and Cxxc1-deficient Treg cells. This finding reflects the overall proportion of cells expressing Mki67 transcripts within the Treg population. In contrast, the FACS analysis in Figure 5I specifically measures Ki-67 protein levels, revealing reduced expression in Cxxc1-deficient Treg cells compared to WT.

      To resolve this discrepancy, we performed additional analyses of the scRNA-seq data to directly compare the expression levels of Mki67 mRNA between WT and Cxxc1-deficient Treg cells. The results revealed a consistent reduction in Mki67 transcript levels in Cxxc1-deficient Treg cells, aligning with the reduced Ki-67 protein levels observed by FACS.

      These new analyses have been included in the revised manuscript (Author response image 1) to clarify this point and demonstrate consistency between the scRNA-seq and FACS data.

      Author response image 1.

      Violin plots displaying the expression levels of Mki67 in T<sub>reg</sub> cells from Foxp3<sup>cre</sup> and Foxp3<sup>cre</sup>Cxxc1<sup>fl/fl</sup> mice.

      In addition, the authors concluded on line 441 that CXXC1 plays a crucial role in maintaining Treg cell stability. However, there appears to be no data on Treg stability. Which data represent the Treg stability?

      Thank you for your valuable comment. We agree that our wording in line 441 may have been too conclusive. Our data focus on the impact of Cxxc1 deficiency on Treg cell homeostasis and transcriptional regulation, rather than directly measuring Treg cell stability. Specifically, the downregulation of Treg-specific suppressive genes and upregulation of pro-inflammatory markers suggest a shift in Treg cell function, which points to disrupted homeostasis rather than stability.

      We have revised the manuscript to clarify that CXXC1 plays a crucial role in maintaining Treg cell function and homeostasis, rather than stability (Page 24, lines 489-491).

      (3) The authors found that Cxxc1-deficient Treg cells exhibit weaker H3K4me3 signals compared to WT in Figure 7. This result suggests that Cxxc1 regulates H3K4me3 modification via H3K4 methyltransferases in Treg cells. The authors should clarify which H3K4 methyltransferases contribute to the modulation of H3K4me3 deposition by Cxxc1 in Treg cells.

      We appreciate the reviewer’s insightful comment regarding the role of H3K4 methyltransferases in regulating H3K4me3 deposition by CXXC1 in Treg cells.

      CXXC1 has been reported to function as a non-catalytic component of the Set1/COMPASS complex, which includes the H3K4 methyltransferases SETD1A and SETD1B—key enzymes responsible for H3K4 trimethylation(1-4). Based on these findings, we propose that CXXC1 modulates H3K4me3 levels in Treg cells by interacting with and stabilizing the activity of the Set1/COMPASS complex.

      These revisions are further discussed in the Discussion (Page 30-31, lines 624-632).

      Furthermore, it would be important to investigate whether Cxxc1-deletion alters Foxp3 binding to target genes.

      Thank you for raising this important point. To address your suggestion, we performed CUT&Tag experiments and found that Cxxc1 deletion does not alter FOXP3 binding patterns in Treg cells. Most FOXP3-bound regions in WT Treg cells were similarly enriched in KO Treg cells, indicating that Cxxc1 deficiency does not impair FOXP3’s DNA-binding ability. These results have been added to the revised manuscript (Page 28, lines 567-575, Figure S8A-B) and are further discussed in the Discussion (Pages 28-29, lines 581-587).

      (4) In Figure 7, the authors concluded that CXXC1 promotes Treg cell homeostasis and function by preserving the H3K4me3 modification since Cxxc1-deficient Treg cells show lower H3K4me3 densities at the key Treg signature genes. Are these Cxxc1-deficient Treg cells derived from mosaic mice? If Cxxc1-deficient Treg cells are derived from cKO mice, the gene expression and H3K4me3 modification status are inconsistent because scRNA-seq analysis indicated that expression of these Treg signature genes was increased in Cxxc1-deficient Treg cells compared to WT (Figure 5F and G).

      Thank you for your insightful comment. To clarify, the Cxxc1-deficient Treg cells analyzed for H3K4me3 modifications in Figure 7 were derived from Cxxc1 conditional knockout (cKO) mice, not mosaic mice.

      Regarding the apparent inconsistency between reduced H3K4me3 levels and the increased expression of Treg signature genes observed in scRNA-seq analysis (Figure 5F and G), we believe this discrepancy can be attributed to distinct mechanisms regulating gene expression. H3K4me3 is an epigenetic mark that facilitates chromatin accessibility and transcriptional regulation, reflecting upstream chromatin dynamics. However, gene expression levels are influenced by a combination of factors, including transcriptional activators, downstream compensatory mechanisms, and the inflammatory environment in cKO mice.

      The upregulation of Treg signature genes in scRNA-seq data likely reflects an activated or pro-inflammatory state of Cxxc1-deficient Treg cells in response to systemic inflammation, as previously described in the manuscript. This contrasts with the intrinsic reduction in H3K4me3 levels at these loci, indicating a loss of epigenetic regulation by CXXC1.

      To further support this interpretation, RNA-seq analysis of Treg cells from Foxp3<sup>Cre/+</sup> Cxxc1<sup>fl/fl</sup> (“het-KO”) and their littermate Foxp3<sup>Cre/+</sup> Cxxc1<sup>fl/+</sup> (“het-WT”) female mice (Figure S6C) revealed a significant reduction in key Treg signature genes such as Icos, Ctla4, Tnfrsf18, and Nt5e in het-KO Treg cells. These results align with the diminished H3K4me3 modifications observed in cKO Treg cells, further underscoring the role of CXXC1 as an epigenetic regulator.

      In summary, while the gene expression changes observed in scRNA-seq may reflect adaptive responses to inflammation, the reduced H3K4me3 modifications directly highlight the critical role of CXXC1 in maintaining the epigenetic landscape essential for Treg cell homeostasis and function.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In Figure 7E, the y-axis scale for H3K4me3 peaks at the Ctla4 locus should be consistent between WT and cKO samples.

      We thank the reviewer for pointing out the inconsistency in the y-axis scale for the H3K4me3 peaks at the Ctla4 locus in Figure 7E. We have carefully revised the figure to ensure that the y-axis scale is now consistent between the WT and cKO samples.

      We appreciate the reviewer’s attention to this detail, as it enhances the rigor of the data presentation. Please find the updated Figure 7E in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      In lines 455 and 466, the name of Treg signature markers validated by flow cytometry should be written as protein name and capitalized.

      Thank you for pointing this out. We have carefully reviewed lines 455 and 466 and have revised the text to ensure that the Treg signature markers validated by flow cytometry are referred to using their protein names, with proper capitalization.

      Reviewer #3 (Recommendations for the authors):

      (1) On line 431, "Cxxc1-deficient cells" should be Cxxc1-deficient Treg cells".

      We thank the reviewer for highlighting this oversight. On line 431, we have revised "Cxxc1-deficient cells" to "Cxxc1-deficient Treg cells" to provide a more accurate and specific description. We appreciate the reviewer's attention to detail, as this correction improves the precision of our manuscript.

      (2) In Figure 4H, negative values should be removed from the y-axis.

      Thank you for your observation. We have revised Figure 4H to remove the negative values from the y-axis, as requested. This adjustment ensures a more accurate and meaningful representation of the data.

      (3) It is better to provide the lists of overlapping genes in Figure 7C.

      Thank you for your suggestion. We agree that providing the lists of overlapping genes in Figure 7C would enhance the clarity and reproducibility of the results. We have now included the gene lists as supplementary information (Supplementary Table 3) accompanying Figure 7C.

      (1) Lee, J. H. & Skalnik, D. G. CpG-binding protein (CXXC finger protein 1) is a component of the mammalian set1 histone H3-Lys4 methyltransferase complex, the analogue of the yeast Set1/COMPASS complex. Journal of Biological Chemistry 280, 41725-41731, doi:10.1074/jbc.M508312200 (2005).

      (2) Thomson, J. P., Skene, P. J., Selfridge, J., Clouaire, T., Guy, J., Webb, S., Kerr, A. R. W., Deaton, A., Andrews, R., James, K. D., Turner, D. J., Illingworth, R. & Bird, A. CpG islands influence chromatin structure via the CpG-binding protein Cfp1. Nature 464, 1082-U1162, doi:10.1038/nature08924 (2010).

      (3) Shilatifard, A. in Annual Review of Biochemistry, Vol 81 Vol. 81 Annual Review of Biochemistry (ed R. D. Kornberg)  65-95 (2012).

      (4) Brown, D. A., Di Cerbo, V., Feldmann, A., Ahn, J., Ito, S., Blackledge, N. P., Nakayama, M., McClellan, M., Dimitrova, E., Turberfield, A. H., Long, H. K., King, H. W., Kriaucionis, S., Schermelleh, L., Kutateladze, T. G., Koseki, H. & Klose, R. J. The SET1 Complex Selects Actively Transcribed Target Genes via Multivalent Interaction with CpG Island Chromatin. Cell Reports 20, 2313-2327, doi:10.1016/j.celrep.2017.08.030 (2017).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zhang et al. describe a delicate relationship between Tet2 and FBP1 in the regulation of hepatic gluconeogenesis.

      Strengths:

      The studies are very mechanistic, indicating that this interaction occurs via demethylation of HNF4a. Phosphorylation of HNF4a at ser 313 induced by metformin also controls the interaction between Tet2 and FBP1.

      We are grateful for the reviewer's praise on the manuscript.

      Weaknesses:

      The results are briefly described, and oftentimes, the necessary information is not provided to interpret the data. Similarly, the methods section is not well developed to inform the reader about how these experiments were performed. While the findings are interesting, the results section needs to be better developed to increase confidence in the interpretation of the results.

      Thanks very much for pointing out the shortcomings of the manuscript. We apologize that we did not provide detailed description for some experimental methods and results. Following reviewer’s suggestion, we added the details in method section, including the generation of whole-body Tet2 KO mice and liver-specific Tet2 knockdown mice (AAV8-shTet2), the missing information of reagent, antibody, primer sequences and mutant generation, and the methods of chromatin immunoprecipitation (ChIP) and immunofluorescence. The interpretation of the results was also further developed according to reviewer’s comments.

      Reviewer #2 (Public review):

      Summary:

      This study reveals a novel role of TET2 in regulating gluconeogenesis. It shows that fasting and a high-fat diet increase TET2 expression in mice, and TET2 knockout reduces glucose production. The findings highlight that TET2 positively regulates FBP1, a key enzyme in gluconeogenesis, by interacting with HNF4α to demethylate the FBP1 promoter in response to glucagon. Additionally, metformin reduces FBP1 expression by preventing TET2-HNF4α interaction. This identifies an HNF4α-TET2-FBP1 axis as a potential target for T2D treatment.

      Strengths:

      The authors use several methods in vivo (PTT, GTT, and ITT in fasted and HFD mice; and KO mice) and in vitro (in HepG2 and primary hepatocytes) to support the existence of the HNF4alpha-TET-2-FBP-1 axis in the control of gluconeogenesis. These findings uncovered a previously unknown function of TET2 in gluconeogenesis.

      We are grateful for the reviewer's praise on the manuscript.

      Weaknesses:

      Although the authors provide evidence of an HNF4α-TET2-FBP1 axis in the control of gluconeogenesis, which contributes to the therapeutic effect of metformin on T2D, its role in the pathogenesis of T2D is less clear. The mechanisms by which TET2 is up-regulated by glucagon should be more explored.

      Thanks very much for pointing out the shortcomings of the manuscript. We agree with the reviewer that the manuscript is focused on the function of HNF4α-TET2-FBP1 axis in the control of gluconeogenesis, but not on its role in the pathogenesis of T2D. Following reviewer’s suggestion, we changed the title of the manuscript to “HNF4α-TET2-FBP1 axis contributes to gluconeogenesis and type 2 diabetes”. For the mechanisms by which TET2 is up-regulated by glucagon, we examined TET2 mRNA levels at different time points after a single dose of glucagon treatment in HepG2 cells. Interestingly, the results showed that TET2 mRNA levels significantly increased by 6 folds at 30 min and the sustained effect of glucagon on Tet2 mRNA levels persisted for more than 48 hours (refer to Fig. 3E).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):<br /> The authors indicate that they have overexpressed TET2 in HepG2 cells and primary mouse hepatocytes. The degree of overexpression should be shown. Is this similar to an increase in TET2 with fasting or HFD treatment?

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we examined the protein levels of overexpressed TET2 in HepG2 cells and primary mouse hepatocytes. The results revealed that the degree of TET2 overexpression (refer to Fig. 3J) is similar to the increase of TET2 under fasting or HFD treatment (Fig. 1C, D).

      In Figures 2E-2G, the authors report results in Tet2-KO mice. Information on how these mice were generated is lacking. There is limited information about how Tet2-KO cells were generated, but again, I could not find anything about these mice in the methods section or figure legend. Is this whole-body or liver-specific Tet2-KO? How old were the mice at the time of PTT, GTT, or ITT?

      Were these mice on chow or HFD? Are there any differences in body weight between WT and Tet2-KO mice?

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we provided the detailed information about the Tet2-KO mice, including the mouse generation in methods section. Moreover, the details of Tet2-KO mice used in each figure were clearly described in the figure legend. In this study, two mouse models were employed: whole-body Tet2-KO mice and liver-specific TET2 knockdown mice (AAV8-shTet2). The mice used for PTT, GTT and ITT were 8 weeks old and on HFD. To address reviewer’s concern, we compared the body weight of WT and Tet2-KO mice and results revealed that no significant differences in the body weight between WT and Tet2-KO mice at 8 and 10 weeks old when on a normal chow diet, as depicted in Figure 2I.

      Figures 3A-C shows that 48 hours after glucagon treatment, Tet2 and FBP1 mRNA increased. It's surprising that a single dose of glucagon would have effects that last that long. The peak rise in glucose following glucagon treatment occurs in 30 minutes. How do authors explain such a long effect of glucagon on Tet2 mRNA and protein?

      Thanks for reviewer’s constructive comment. To address reviewer’s concern, we examined the mRNA levels of TET2 and FBP1 at different time points following a single dose of glucagon treatment in HepG2 cells. Interestingly, the results showed that TET2 mRNA levels significantly increased by 6 folds at 30 min and the sustained effect of glucagon on Tet2 mRNA levels persisted for more than 48 hours (refer to Fig. 3E). The detailed mechanism underlying long effect of glucagon on Tet2 mRNA and protein needs further exploration.

      It's interesting that in Figure 3F, Fbp1 and Tet2 mRNA expression correlated positively in both ad libitum and fasting conditions. I would expect that during fed conditions, gluconeogenesis would not be activated and thus would expect no correlation.

      Thanks for reviewer’s constructive comment. According to the results in new Fig. 3H, the mRNA levels of Fbp1 and Tet2 indeed positively correlated in both ad libitum and fasting conditions, while the r value is higher and p value is lower in fasting condition compared to ad libitum. Notably, both the expression levels of Fbp1 and Tet2 increased under fasting treatment, which is consistent with Fig. 1C and Fig. 4K.

      The authors state that "Our results demonstrated that HNF4α recruits TET2 to the FBP1 promoter and activates FBP1 expression through demethylation" What data points out that this is mediated through demethylation?

      Thanks for reviewer’s constructive comment. Following reviewer’s suggestion, we conducted new ChIP experiments. These data demonstrated that HNF4α recruits TET2 to the FBP1 promoter and activates FBP1 expression through demethylation, as showed in Fig. 4F-H.

      For Figures 5B, 4D, and 3L-N y-axes are labeled as fold enrichment. The authors should clearly indicate what was being measured on y-axes.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we clearly labeled all the y-axes in each figure.

      The authors indicate that metformin increases phosphorylation of Hnf4a at ser 313 Figure 5C. How do we know that ser 313 is involved? Only one antibody is listed for Hnf4a (SAB, 32591).

      Thanks very much for pointing out. We determined the phosphorylation levels of HNF4α at S313 using Anti-HNF4α (phospho S313) (ab78356), we apologize for not labeling it clearly. Now, we made it clear in Fig. 5C and the detailed information of the antibody was added to the method section of “Western Blot and Immunoprecipitation”.

      How did the authors make phosphomimetic mutation (S313D) and phosphoresistant mutation (S313A) of HNF4α? This is not described.

      Thanks very much for pointing out. Following reviewer’s suggestion, the detailed method for making phosphomimetic mutation (S313D) and phosphoresistant mutation (S313A) of HNF4α was added to the method section of “Gene Knockout Cells and Mutagenesis”.

      Reviewer #2 (Recommendations for the authors):

      Major points:

      (1) Other key gluconeogenesis genes (e.g. PEPCK and G6Pase) should have been investigated to demonstrate whether or not the regulation of TET-2 is specific on FBP-1.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we designed the qPCR to assay other key gluconeogenesis genes, including PEPCK and G6Pase, and the results showed that glucagon treatment had no effect on PEPCK and G6Pase expression (Fig. 3D), suggesting the regulation of TET2 is specific on FBP1.

      (2) The methods are not well defined and more details should be given, for example, to explain how the Tet2 KO mice were generated. Since these animals are not KO liver-specific and TET2 is expressed in a variety of tissues and organs and is predominantly found in hematopoietic cells, including bone marrow and blood cells, the phenotype of these mice should be better characterized.

      Thanks for reviewer’s helpful comment. The Tet2 knockout (Tet2 KO) mice were originally purchased from the Jackson Laboratory (strain No. 023359) and we added the detailed information to method section of “Animal”. According to the previously reported phenotype of Tet2 KO mice, it mainly includes bone marrow, spleen, islet and heart. Specifically, Tet2 KO mice led to an increase of total cell numbers in the bone marrow and spleen (PMID: 21873190), as well as an elevated white blood cell (WBC) count (PMID: 37541212). Additionally, Tet2 KO mice exhibited splenomegaly (PMID: 37541212, PMID: 21723200, PMID: 38773071, PMID: 21723200). And the morphology of the islets (PMID: 34417463), anatomical chamber volumes or ventricular functions (PMID: 38357791) were indistinguishable between the Tet2 KO and wild type (WT) mice.

      (3) An experiment showing the co-localization of TET2 and HNF4α in the mouse liver in fasted mice and/or in HFD-mice would strengthen the data shown in Figure 3.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, the experiments showing the co-localization of TET2 and HNF4α in the mouse liver in fasted mice and FD mice were conducted, as shown in new Fig. 4B and C.

      Minor points:

      (1) Given that the manuscript does not focus on the role of TET2 in the pathogenesis of T2D, its title should be changed.

      hanks for reviewer’s helpful comment. Following reviewer’s suggestion, we changed the title of the manuscript to “HNF4α-TET2-FBP1 axis contributes to gluconeogenesis and type 2 diabetes”.

      (2) Please indicate the molecular weight of bands in all figures.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, the molecular weight of bands was indicated in all figures.

      (3) Why do the control values of the y-axis in Figure 1 A and B are so different? Please maintain the same scale in both figures.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we recalculated and normalized the control value in Fig. 1A to maintain the same scale in both figures.

      (4) In Figure 2F, do the plasma insulin levels have altered in response to GTT in Tet2-KO mice? If so, please show the data and discuss.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we examined the plasma insulin levels in the process of GTT assay, and the result revealed that Tet2-KO mice showed lower insulin levels after glucose administration, which reflects higher insulin sensitivity, as shown in new Fig. 2H.

      (5) The increase of TET2 hepatic protein levels in response to fasting occur in other tissues and hematopoietic cells?

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we examined Tet2 protein levels under fasting condition in other tissues and hematopoietic cells, and found that fasting also increased Tet2 protein levels in kidney, brain, and hematopoietic cells, but not in heart.

      Author response image 1.

      (6) Please indicate the glucagon concentration and metformin dose in all figures in which they are mentioned.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, the glucagon concentration (20 nM) and metformin concentration (10 mM for HepG2 cell treatment and 300 mg/kg per day for mice treatment) were added in the figure legends, respectively.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the current manuscript, the authors use theoretical and analytical tools to examine the possibility of neural projections to engage ensembles of synaptic clusters in active dendrites. The analysis is divided into multiple models that differ in the connectivity parameters, speed of interactions, and identity of the signal (electric vs. second messenger). They first show that random connectivity almost ensures the representation of presynaptic ensembles. As expected, this convergence is much more likely for small group sizes and slow processes, such as calcium dynamics. Conversely, fast signals (spikes and postsynaptic potentials) and large groups are much less likely to recruit spatially clustered inputs. Dendritic nonlinearity in the postsynaptic cells was found to play a highly important role in distinguishing these clustered activation patterns, both when activated simultaneously and in sequence. The authors tackled the difficult issue of noise, showing a beneficiary effect when noise 'happens' to fill in gaps in a sequential pattern but degraded performance at higher background activity levels. Last, the authors simulated selectivity to chemical and electrical signals. While they find that longer sequences are less perturbed by noise, in more realistic activation conditions, the signals are not well resolved in the soma.

      While I think the premise of the manuscript is worth exploring, I have a number of reservations regarding the results.

      (1) In the analysis, the authors made a simplifying assumption that the chemical and electrical processes are independent. However, this is not the case; excitatory inputs to spines often trigger depolarization combined with pronounced calcium influx; this mixed signaling could have dramatic implications on the analysis, particularly if the dendrites are nonlinear (see below)

      We thank the reviewer for pointing out that we were not entirely clear about the strong basis upon which we had built our analyses of nonlinearity. In the previous version we had relied on published work, notably (Bhalla 2017), which does include these nonlinearities. However, we agree it is preferable to unambiguously demonstrate all the reported selectivity properties in a single model with all the nonlinearities discussed. We have now done so. This is now reported in the paper:

      “A single model exhibits multiple forms of nonlinear dendritic selectivity

      We implemented all three forms of selectivity described above, in a single model which included six voltage and calcium-gated ion channels, NMDA, AMPA and GABA receptors, and chemical signaling processes in spines and dendrites. The goal of this was three fold: To show how these nonlinear operations emerge in a mechanistically detailed model, to show that they can coexist, and to show that they are separated in time-scales. We implemented a Y-branched neuron model with additional electrical compartments for the dendritic spines (Methods). This model was closely based on a published detailed chemical-electrical model (Bhalla 2017). We stimulated this model with synaptic input corresponding to the three kinds of spatiotemporal patterns described in figures Figure 8 - Supplement 1 (sequential synaptic activity triggering electrical sequence selectivity), Figure 8 - Supplement 2 (spatially grouped synaptic stimuli leading to local Ca4_CaM activation), and Figure 8 - Supplement 3 (sequential bursts of synaptic activity triggering chemical sequence selectivity). We found that each of these mechanisms show nonlinear selectivity with respect to both synaptic spacing and synaptic weights. Further, these forms of selectivity coexist in the composite model (Figure 8 Supplements 1, 2, 3), separated by the time-scales of the stimulus patterns (~ 100 ms, ~ 1s and ~10s respectively). Thus mixed signaling in active nonlinear dendrites yields selectivity of the same form as we explored in simpler individual models. A more complete analysis of the effect of morphology, branching and channel distributions deserves a separate in-depth analysis, and is outside the scope of the current study.”

      (2) Sequence detection in active dendrites is often simplified to investigating activation in a part of or the entirety of individual branches. However, the authors did not do that for most of their analysis. Instead, they treat the entire dendritic tree as one long branch and count how many inputs form clusters. I fail to see why simplification is required and suspect it can lead to wrong results. For example, two inputs that are mapped to different dendrites in the 'original' morphology but then happen to fall next to each other when the branches are staggered to form the long dendrites would be counted as neighbors.

      We have added the below section within the main text in the section titled “Grouped Convergence of Inputs” to address the effect of branching.

      “End-effects limit convergence zones for highly branched neurons

      Neurons exhibit considerable diversity with respect to their morphologies. How synapses extending across dendritic branch points interact in the context of a synaptic cluster/group, is a topic that needs detailed examination via experimental and modeling approaches. However for the sake of analysis, we present calculations under the assumption that selectivity for grouped inputs might be degraded across branch points.

      Zones beginning close to a branch point might get interrupted. Consider a neuron with B branches. The length of the typical branch would be L/B. As a conservative estimate if we exclude a region of length Z for every branch, the expected number of zones that begin too close to a branch point is

                                                                          [Equation 3]

      For typical pyramidal neurons B~50, so Eend ~ 0.05 for values of Z of ~10 µm. Thus pyramidal neurons will not be much affected by branching effects, Profusely branching neurons like Purkinje cells have B~900 for a total L of ~7800 µm, (McConnell and Berry, 1978), hence Eend ~1 for values of Z of ~10 µm. Thus almost all groups in Purkinje neurons would run into a branch point or terminal. For the case of electrical groups, this estimate would be scaled by a factor of 5 if we consider a zone length of 50 µm. However, it is important to note that these are very conservative estimates, as for clusters of 4-5 inputs, the number of synapses available within a zone are far greater (~100 synapses within 50 µm).”

      (3) The simulations were poorly executed. Figures 5 and 6 show examples but no summary statistics.

      We have included the summary statistics in Figure 5F and Figure 6E. The statistics for both these panels were generated by simulating multiple spatiotemporal combinations of ectopic input in the presence of different stimulus patterns for each sequence length.

      The authors emphasize the importance of nonlinear dendritic interactions, but they do not include them in their analysis of the ectopic signals! I find it to be wholly expected that the effects of dendritic ensembles are not pronounced when the dendrites are linear.

      We would like to clarify that both Figures 5 and 6 already included nonlinearities. In Figure 5, the chemical mechanism involving the bistable switch motif is strongly selective for ordered inputs in a nonlinear manner. A separate panel highlighting this (Panel C) has now been included in Figure 5. This result had been previously shown in Figure 3I of (Bhalla 2017). We have reproduced it in Figure 5C.

      The published electrical model used in Figure 6 also has a nonlinearity which predominantly stems from the interaction of the impedance gradient along the dendrite with the voltage dependence of NMDARs. Check Figure 4C,D of (Branco, Clark, and Häusser 2010).

      To provide a comprehensive analysis of dendritic integration, the authors could simulate more realistic synaptic conductances and voltage-gated channels. They would find much more complicated interactions between inputs on a single site, a sliding temporal and spatial window of nonlinear integration that depends on dendritic morphology, active and passive parameters, and synaptic properties. At different activation levels, the rules of synaptic integration shift to cooperativity between different dendrites and cellular compartments, further complicated by nonlinear interactions between somatic spikes and dendritic events.

      We would like to clarify two points. First, the key goal of our study was to understand the role played by random connectivity in giving rise to clustered computation. In this revision we provide simulations to show the mechanistic basis for the nonlinearities, and then abstracted these out in order to scale the analysis to networks. These nonlinearities were taken as a given, though we elaborated previous work slightly in order to address the question of ectopic inputs. Second, in our original submission we relied on published work for the estimates of dendritic nonlinearities. Previous work from (Poirazi, Brannon, and Mel 2003; Branco, Clark, and Häusser 2010; Bhalla 2017) have already carried out highly detailed realistic simulations, and in some cases including chemical and electrical nonlinearities as the reviewer mentions (Bhalla 2017). Hence we did not feel that this needed to be redone.

      In this resubmission we have addressed the above and two additional concerns, namely whether the different forms of selectivity can coexist in a single model including all these nonlinearities, and whether there is separation of time-scales. The answer is yes to both. The outcome of this is presented in Figure 8 and the associated supplementary figures, and all simulation details are provided on the github repository associated with this paper. A more complete analysis of interaction of multiple nonlinearities in a detailed model is material for further study.

      While it is tempting to extend back-of-the-napkin calculations of how many inputs can recruit nonlinear integration in active dendrites, the biological implementation is very different from this hypothetical. It is important to consider these questions, but I am not convinced that this manuscript adequately addressed the questions it set out to probe, nor does it provide information that was unknown beforehand.

      We developed our analysis systematically, and perhaps the reviewer refers to the first few calculations as back-of-the-napkin. However, the derivation rapidly becomes more complex when we factor in combinatorics and the effect of noise. This derivation is in the supplementary material. Furthermore, the exact form of the combinatorial and noise equations was non-trivial to derive and we worked closely with the connectivity simulations (Figures 2 and 4) to obtain equations which scale across a large parameter space by sampling connectivity for over 100000 neurons and activity over 100 trials for each of these neurons for each network configuration we have tested.

      the biological implementation is very different from this hypothetical.

      We do not quite understand in what respect the reviewer feels that this calculation is very different from the biological implementation. The calculation is about projection patterns. In the discussion we consider at length how our findings of selectivity from random projections may be an effective starting point for more elaborate biological connection rules. We have added the following sentence:

      “We present a first-order analysis of the simplest kind of connectivity rule (random), upon which more elaborate rules such as spatial gradients and activity-dependent wiring may be developed.”

      In case the reviewer was referring to the biological implementation of nonlinear integration, we treat the nonlinear integration in the dendrites as a separate set of simulations, most of which are closely based on published work (Bhalla 2017). We use these in the later sections of the paper to estimate selectivity terms, which inform our final analysis.

      In the revision we have worked to clarify this progression of the analysis. As indicated above, we have also made a composite model of all of the nonlinear dendritic mechanisms, chemical and electrical, which underlie our analysis.

      nor does it provide information that was unknown beforehand.

      We conducted a broad literature survey and to the best of our knowledge these calculations and findings have not been obtained previously. If the reviewer has some specific examples in mind we would be pleased to refer to it.

      Reviewer #2 (Public Review):

      Summary:

      If synaptic input is functionally clustered on dendrites, nonlinear integration could increase the computational power of neural networks. But this requires the right synapses to be located in the right places. This paper aims to address the question of whether such synaptic arrangements could arise by chance (i.e. without special rules for axon guidance or structural plasticity), and could therefore be exploited even in randomly connected networks. This is important, particularly for the dendrites and biological computation communities, where there is a pressing need to integrate decades of work at the single-neuron level with contemporary ideas about network function.

      Using an abstract model where ensembles of neurons project randomly to a postsynaptic population, back-of-envelope calculations are presented that predict the probability of finding clustered synapses and spatiotemporal sequences. Using data-constrained parameters, the authors conclude that clustering and sequences are indeed likely to occur by chance (for large enough ensembles), but require strong dendritic nonlinearities and low background noise to be useful.

      Strengths:

      (1) The back-of-envelope reasoning presented can provide fast and valuable intuition. The authors have also made the effort to connect the model parameters with measured values. Even an approximate understanding of cluster probability can direct theory and experiments towards promising directions, or away from lost causes.

      (2) I found the general approach to be refreshingly transparent and objective. Assumptions are stated clearly about the model and statistics of different circuits. Along with some positive results, many of the computed cluster probabilities are vanishingly small, and noise is found to be quite detrimental in several cases. This is important to know, and I was happy to see the authors take a balanced look at conditions that help/hinder clustering, rather than to just focus on a particular regime that works.

      (3) This paper is also a timely reminder that synaptic clusters and sequences can exist on multiple spatial and temporal scales. The authors present results pertaining to the standard `electrical' regime (~50-100 µm, <50 ms), as well as two modes of chemical signaling (~10 µm, 100-1000 ms). The senior author is indeed an authority on the latter, and the simulations in Figure 5, extending those from Bhalla (2017), are unique in this area. In my view, the role of chemical signaling in neural computation is understudied theoretically, but research will be increasingly important as experimental technologies continue to develop.

      Weaknesses:

      (1) The paper is mostly let down by the presentation. In the current form, some patience is needed to grasp the main questions and results, and it is hard to keep track of the many abbreviations and definitions. A paper like this can be impactful, but the writing needs to be crisp, and the logic of the derivation accessible to non-experts. See, for instance, Stepanyants, Hof & Chklovskii (2002) for a relevant example.

      It would be good to see a restructure that communicates the main points clearly and concisely, perhaps leaving other observations to an optional appendix. For the interested but time-pressed reader, I recommend starting with the last paragraph of the introduction, working through the main derivation on page 7, and writing out the full expression with key parameters exposed. Next, look at Table 1 and Figure 2J to see where different circuits and mechanisms fit in this scheme. Beyond this, the sequence derivation on page 15 and biophysical simulations in Figures 5 and 6 are also highlights.

      We appreciate the reviewers' suggestions. We have tightened the flow of the introduction. We understand that the abbreviations and definitions are challenging and have therefore provided intuitions and summaries of the equations discussed in the main text.

      Clusters calculations

      “Our approach is to ask how likely it is that a given set of inputs lands on a short segment of dendrite, and then scale it up to all segments on the entire dendritic length of the cell.

      Thus, the probability of occurrence of groups that receive connections from each of the M ensembles (PcFMG) is a function of the connection probability (p) between the two layers, the number of neurons in an ensemble (N), the relative zone-length with respect to the total dendritic arbor (Z/L) and the number of ensembles (M).”

      Sequence calculations

      “Here we estimate the likelihood of the first ensemble input arriving anywhere on the dendrite, and ask how likely it is that succeeding inputs of the sequence would arrive within a set spacing.

      Thus, the probability of occurrence of sequences that receive sequential connections (PcPOSS) from each of the M ensembles is a function of the connection probability (p) between the two layers, the number of neurons in an ensemble (N), the relative window size with respect to the total dendritic arbor (Δ/L) and the number of ensembles (M).”

      (2) I wonder if the authors are being overly conservative at times. The result highlighted in the abstract is that 10/100000 postsynaptic neurons are expected to exhibit synaptic clustering. This seems like a very small number, especially if circuits are to rely on such a mechanism. However, this figure assumes the convergence of 3-5 distinct ensembles. Convergence of inputs from just 2 ense mbles would be much more prevalent, but still advantageous computationally. There has been excitement in the field about experiments showing the clustering of synapses encoding even a single feature.

      We agree that short clusters of two inputs would be far more likely. We focused our analysis on clusters with three of more ensembles because of the following reasons:

      (1) The signal to noise in these clusters was very poor as the likelihood of noise clusters is high.

      (2) It is difficult to trigger nonlinearities with very few synaptic inputs.

      (3) At the ensemble sizes we considered (100 for clusters, 1000 for sequences), clusters arising from just two ensembles would result in high probability of occurrence on all neurons in a network (~50% in cortex, see p_CMFG in figures below.). These dense neural representations make it difficult for downstream networks to decode (Foldiak 2003).

      However, in the presence of ensembles containing fewer neurons or when the connection probability between the layers is low, short clusters can result in sparse representations (Figure 2 - Supplement 2). Arguments 1 and 2 hold for short sequences as well.

      (3) The analysis supporting the claim that strong nonlinearities are needed for cluster/sequence detection is unconvincing. In the analysis, different synapse distributions on a single long dendrite are convolved with a sigmoid function and then the sum is taken to reflect the somatic response. In reality, dendritic nonlinearities influence the soma in a complex and dynamic manner. It may be that the abstract approach the authors use captures some of this, but it needs to be validated with simulations to be trusted (in line with previous work, e.g. Poirazi, Brannon & Mel, (2003)).

      We agree that multiple factors might affect the influence of nonlinearities on the soma. The key goal of our study was to understand the role played by random connectivity in giving rise to clustered computation. Since simulating a wide range of connectivity and activity patterns in a detailed biophysical model was computationally expensive, we analyzed the exemplar detailed models for nonlinearity separately (Figures 5, 6, and new figure 8), and then used our abstract models as a proxy for understanding population dynamics. A complete analysis of the role played by morphology, channel kinetics and the effect of branching requires an in-depth study of its own, and some of these questions have already been tackled by (Poirazi, Brannon, and Mel 2003; Branco, Clark, and Häusser 2010; Bhalla 2017). However, in the revision, we have implemented a single model which incorporates the range of ion-channel, synaptic and biochemical signaling nonlinearities which we discuss in the paper (Figure 8, and Figure 8 Supplement 1, 2,3). We use this to demonstrate all three forms of sequence and grouped computation we use in the study, where the only difference is in the stimulus pattern and the separation of time-scales inherent in the stimuli.

      (4) It is unclear whether some of the conclusions would hold in the presence of learning. In the signal-to-noise analysis, all synaptic strengths are assumed equal. But if synapses involved in salient clusters or sequences were potentiated, presumably detection would become easier? Similarly, if presynaptic tuning and/or timing were reorganized through learning, the conditions for synaptic arrangements to be useful could be relaxed. Answering these questions is beyond the scope of the study, but there is a caveat there nonetheless.

      We agree with the reviewer. If synapses receiving connectivity from ensembles had stronger weights, this would make detection easier. Dendritic spikes arising from clustered inputs have been implicated in local cooperative plasticity (Golding, Staff, and Spruston 2002; Losonczy, Makara, and Magee 2008). Further, plasticity related proteins synthesized at a synapse undergoing L-LTP can diffuse to neighboring weakly co-active synapses, and thereby mediate cooperative plasticity (Harvey et al. 2008; Govindarajan, Kelleher, and Tonegawa 2006; Govindarajan et al. 2011). Thus if clusters of synapses were likely to be co-active, they could further engage these local plasticity mechanisms which could potentiate them while not potentiating synapses that are activated by background activity. This would depend on the activity correlation between synapses receiving ensemble inputs within a cluster vs those activated by background activity. We have mentioned some of these ideas in a published opinion paper (Pulikkottil, Somashekar, and Bhalla 2021). In the current study, we wanted to understand whether even in the absence of specialized connection rules, interesting computations could still emerge. Thus, we focused on asking whether clustered or sequential convergence could arise even in a purely randomly connected network, with the most basic set of assumptions. We agree that an analysis of how selectivity evolves with learning would be an interesting topic for further work.

      References

      Bhalla, Upinder S. 2017. “Synaptic Input Sequence Discrimination on Behavioral Timescales Mediated by Reaction-Diffusion Chemistry in Dendrites.” Edited by Frances K Skinner. eLife 6 (April):e25827. https://doi.org/10.7554/eLife.25827.

      Branco, Tiago, Beverley A. Clark, and Michael Häusser. 2010. “Dendritic Discrimination of Temporal Input Sequences in Cortical Neurons.” Science (New York, N.Y.) 329 (5999): 1671–75. https://doi.org/10.1126/science.1189664.

      Foldiak, Peter. 2003. “Sparse Coding in the Primate Cortex.” The Handbook of Brain Theory and Neural Networks. https://research-repository.st-andrews.ac.uk/bitstream/handle/10023/2994/FoldiakSparse HBTNN2e02.pdf?sequence=1.

      Golding, Nace L., Nathan P. Staff, and Nelson Spruston. 2002. “Dendritic Spikes as a Mechanism for Cooperative Long-Term Potentiation.” Nature 418 (6895): 326–31. https://doi.org/10.1038/nature00854.

      Govindarajan, Arvind, Inbal Israely, Shu-Ying Huang, and Susumu Tonegawa. 2011. “The Dendritic Branch Is the Preferred Integrative Unit for Protein Synthesis-Dependent LTP.” Neuron 69 (1): 132–46. https://doi.org/10.1016/j.neuron.2010.12.008.

      Govindarajan, Arvind, Raymond J. Kelleher, and Susumu Tonegawa. 2006. “A Clustered Plasticity Model of Long-Term Memory Engrams.” Nature Reviews Neuroscience 7 (7): 575–83. https://doi.org/10.1038/nrn1937.

      Harvey, Christopher D., Ryohei Yasuda, Haining Zhong, and Karel Svoboda. 2008. “The Spread of Ras Activity Triggered by Activation of a Single Dendritic Spine.” Science (New York, N.Y.) 321 (5885): 136–40. https://doi.org/10.1126/science.1159675.

      Losonczy, Attila, Judit K. Makara, and Jeffrey C. Magee. 2008. “Compartmentalized Dendritic Plasticity and Input Feature Storage in Neurons.” Nature 452 (7186): 436–41. https://doi.org/10.1038/nature06725.

      Poirazi, Panayiota, Terrence Brannon, and Bartlett W. Mel. 2003. “Pyramidal Neuron as Two-Layer Neural Network.” Neuron 37 (6): 989–99. https://doi.org/10.1016/S0896-6273(03)00149-1.

      Pulikkottil, Vinu Varghese, Bhanu Priya Somashekar, and Upinder S. Bhalla.     2021.

      “Computation, Wiring, and Plasticity in Synaptic Clusters.” Current Opinion in Neurobiology, Computational Neuroscience, 70 (October):101–12. https://doi.org/10.1016/j.conb.2021.08.001.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This useful manuscript reports mechanisms behind the increase in fecundity in response to sub-lethal doses of pesticides in the crop pest, the brown plant hopper. The authors hypothesize that the pesticide works by inducing the JH titer, which through the JH signaling pathway induces egg development. Evidence for this is, however, inadequate.

      We greatly appreciate your valuable comments and constructive suggestions for our work. All in all, the manuscript has been carefully edited and improved following your suggestions. We also provide more evidence to support our statements by conducting new experiments. First, we found that also EB treatment of adult females can stimulate egg-laying. Second, EB treatment in female adults increases the number of mature eggs in the ovary and ovarioles. Third, EB treatment in females enhances the expression of the kr-h1 gene in the whole body of BPH. Finally, EB treatment in female adults increases the JHIII titer, but has no impact on the 20E titer.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Gao et al. have demonstrated that the pesticide emamectin benzoate (EB) treatment of brown planthopper (BPH) leads to increased egg-laying in the insect, which is a common agricultural pest. The authors hypothesize that EB upregulates JH titer resulting in increased fecundity.

      Strengths:

      The finding that a class of pesticide increases the fecundity of brown planthopper is interesting.

      We greatly appreciate your positive comments on our work.

      Weaknesses:

      (1) EB is an allosteric modulator of GluCl. That means EB physically interacts with GluCl initiating a structural change in the cannel protein. Yet the authors' central hypothesis here is about how EB can upregulate the mRNA of GluCl. I do not know whether there is any evidence that an allosteric modulator can function as a transcriptional activator for the same receptor protein. The basic premise of the paper sounds counterintuitive. This is a structural problem and should be addressed by the authors by giving sufficient evidence about such demonstrated mechanisms before.

      Thank you for your question. As the reviewer points out, EB physically interacts with its target protein GluCl and thus affects its downstream signaling pathway. In the manuscript, we reported that EB-treated brown planthoppers display increased expression of GluCl in the adult stage (Fig. 5A). Actually, there are many studies showing that insects treated with insecticides can increase the expression of target genes. For example, the relative expression level of the ryanodine receptor gene of the rice stem borer, Chilo suppressalis was increased 10-fold after treatment with chlorantraniliprole, an insecticide which targets the ryanodine receptor (Peng et al., 2017). Besides this, in Drosophila, starvation (and low insulin) elevates the transcription level of the sNPF and tachykinin receptors (Ko et al., 2015; Root et al., 2011). In brown planthoppers, reduction in mRNA and protein expression of a nicotinic acetylcholine receptor α8 subunit is associated with resistance to imidacloprid (Zhang et al., 2015). RNA interference knockdown of α8 gene decreased the sensitivity of N. lugens to imidacloprid (Zhang et al., 2015). Hence, expression of receptor genes can be regulated by diverse factors including insecticide treatment. In our case, we found that EB can upregulate its target gene GluCl. However, we did not claim that EB functions as transcriptional activator for GluCl, and we still do not know why EB treatment changes the expression of GluCl in the brown planthopper. Considering our experiments are lasting several days, it might be an indirect (or secondary) effect caused by other factors, which change the expression of GluCl gene upon EB action of the channel. One reason is maybe that the allosteric interaction with GluCl by EB makes it dysfunctional and the cellular response is to upregulate the channel/receptor to compensate. We have inserted text on lines 738 - 757 to explain these possibilities.

      (2) I am surprised to see a 4th instar larval application or treatment with EB results in the upregulation of JH in the adult stages. Complicating the results further is the observation that a 4th instar EB application results in an immediate decrease in JH titer. There is a high possibility that this late JH titer increase is an indirect effect.

      Thank you for your question. Treatment with low doses or sublethal doses of insecticides might have a strong and complex impact on insects (Gandara et al., 2024; Gong et al., 2022; Li et al., 2023; Martelli et al., 2022). We kept the 4th instar of brown planthoppers feeding on EB for four days. They will develop to 5th instar after four days treatment, which is the final nymphal stage of BPH. Since the brown planthopper is a hemimetabolous insect, we cannot rule out the possibility that an indirect effect of treatment with EB results in the upregulation of JH in the adult stages. In this new revised manuscript, we investigated the impact of EB treatment in the adult stage. We found that female adults treated with EB also laid more eggs than controls (Figure 1-figure supplement 1A). The following experiments were performed in adults to address how EB treated stimulates egg-laying in adult brown planthopper.

      (1) We found that EB treatment in adults increases the number of mature eggs in ovary (new Figure 2-figure supplement 1). We add this results in lines 234 – 238 and 281-285.

      (2) We measured the JH titer after the female adults had been treated with EB. We found that EB can also increase the JH titer but has no impact on the 20E titer in the female adult (Figure 3-S3A and B). We add this results in lines 351 – 356 and 281-285.

      (3) EB treatment in adults increases the gene expression of JHAMT and Kr-h1 (Figure 3-S3C and D). We add this results in lines 378 – 379, lines 387-390 and lines 457-462.

      (3) The writing quality of the paper needs improvement. Particularly with respect to describing processes and abbreviations. In several instances the authors have not adequately described the processes they have introduced, thus confusing readers.

      Thank you for your suggestion. We have thoroughly revised the paper to improve clarity.

      (4) In the section 'EB promotes ovarian development' the authors have shown that EB treatment results in increased detention of eggs which contradicts their own results which show that EB promotes egg laying. Again, this is a serious contradiction that nullifies their hypothesis.

      Thank you for pointing this out. We revised the figure 2B to show number of mature eggs in the ovary. The number of mature eggs in ovaries of females that fed on EB was higher than in control females. We also show that BPH fed with EB laid more eggs than controls. Thus, our results suggest that EB promotes ovary maturation (and egg production) and also increases egg laying (Figure 1 and Table S1). Thus, we found that EB treatment can increase both the production of eggs and increase egg laying. We add this results in lines 234 – 238.

      (5) Furthermore, the results suggest that oogenesis is not affected by EB application. The authors should devote a section to discussing how they are observing increased egg numbers in EB-treated insects while not impacting Oogenesis.

      Thank you for your suggestions, and apologies for the lack of clarity in our initial explanation. First, we found that EB treatment led to an increase in the number of eggs laid by female brown planthoppers (Figure 1). Through dissection experiments, we observed that EB-treated females had more mature eggs in their ovaries (Figure 2A and B), indicating that the increased egg-laying was due to a larger production of mature eggs in the ovaries after EB treatment. This is now explained on lines 229-238.

      Additionally, since there is no systematic description of oogenesis in the brown planthopper, we were the first to observe the oogenesis process in this species using immunohistochemistry and laser confocal microscopy. Based on the developmental characteristics, we defined the different stages of oogenesis (Figure 2C, Figure 2-figure supplement 2). We did not observe any significant effect of EB treatment on the various stages of oogenesis, indicating that EB treatment does not impair normal egg development (Figure 2D). Instead, the increase in vitellogenin accelerates the production of mature eggs. This is now explained on lines 243-262.

      During the maturation process, eggs require uptake of vitellogenin, and an increase in vitellogenin (Vg) content can accelerate egg maturation, producing more mature eggs. Our molecular data suggest that EB treatment leads to an upregulation of vg expression. Based on these findings, we conclude that the increase in egg-laying caused by EB treatment is due to the upregulation of vg (Figure 3I), which raises vitellogenin content, promoting the uptake of vitellogenin by maturing eggs and resulting in the production of more mature eggs. We have revised the text on lines 389-395 to clarify this point.

      (6) Met is the receptor of JH and to my understanding, remains mostly constant in terms of its mRNA or protein levels throughout various developmental periods in many different insects. Therefore, the presence of JH becomes the major driving factor for physiological events and not the presence of the receptor Met. Here the authors have demonstrated an increase in Met mRNA as a result of EB treatment. Their central hypothesis is that EB increases JH titer to result in enhanced fecundity. JH action will not result in the activation of Met. Although not contradictory to the hypothesis, the increase in mRNA content of Met is contrary to the findings of the JH field thus far.

      Thank you for your comment. Our results showed that EB treatment can mildly increase (about 2-fold) expression of the Met gene in brown planthoppers (Figure 3G). And our data indicated that Met and FAMeT expression levels were not influenced so much by EB compared with kr-h1 and vg (Figure 3H and I). We agree that JH action will not result in the increase of Met. However, we cannot rule out the possibility of other factors (indirect effects), induced by EB treatment that increase the mRNA expression level of Met. One recent paper reported that downregulation of transcription factor CncC will increase met expression in beetles (see Figure 6A in this reference) (Jiang et al., 2023). Many studies have reported that insecticide treatment will activate the CncC gene signaling pathway, which regulates detoxification gene expression (Amezian et al., 2023; Fu et al., 2024; Hu et al., 2021). Hence, it is possible that EB might influence the CncC gene pathway which then induces met expression. This EB effect on met upregulation may be similar to the upregulation of GluCl and some other secondary effects. We have discussed this on lines 725-738.

      (7) As pointed out before, it is hard to rationalize how a 4th instar exposure to EB can result in the upregulation of key genes involved in JH synthesis at the adult stage. The authors must consider providing a plausible explanation and discussion in this regard.

      Thank you for your comments. It must be mentioned that although we exposed the BPH to EB at 4th instar, we make the insect feed on the EB-treated rice plants for four days. After that, the insect will develop into 5<sup>th</sup> instar, the final nymphal stage of brown planthopper. Since brown planthoppers do not have a pupal stage, this might cause the EB presented to the insects last a longer time even in the adult stage. Besides this, we found that EB treatment will increase the weight of adult females (Figure 1-figure supplement 3E and F), which indicates that EB might increase food intake in BPHs that might produce more insulin peptide. Insulin might increase the JH synthesis at the adult stage. In our revised study we also investigate EB impairment in adult BPHs. We found that, similar to the nymphal stage, EB treatment in adult BPHs also increases the egg laying. Furthermore, the JH titer was increased after treatment of BPH with EB in adults. Besides this, GluCl and kr-h1 genes were also up-regulated after EB treatment in the adult stage. We have discussed this on lines 739-746.

      (8) I have strong reservations against such an irrational hypothesis that Met (the receptor for JH) and JH-Met target gene Kr-h1 regulate JH titer (Line 311, Fig 3 supplemental 2D). This would be the first report of such an event on the JH field and therefore must be analysed in depth. I strongly suggest the authors remove such claims from the manuscript without substantiating it.

      Thank you for your suggestions and comments. We have changed our claims in this revised MS. We found that EB treatment can enhance Kr-h1 expression. We have no evidence to support that JH can induce met expression. We have rewritten the manuscript to avoid confusion (see text on lines 725-735).

      (9) Kr-h1 is JH/Met target gene. The authors demonstrate that silencing of Kr-h1 results in inhibition of FAMeT, which is a gene involved in JH synthesis. A feedback loop in JH synthesis is unreported. It is the view of this reviewer that the authors must go ahead with a mechanistic detail of Kr-h1 mediated JH upregulation before this can be concluded. Mere qPCR experiments are not sufficient to substantiate a claim that is completely contrary to the current understanding of the JH signalling pathway.

      Thank you for your suggestions and comments. We agree that only qPCR experiments are not enough to provide this kind of claim. More evidences need to be provided to support this. We have revised the MS to avoid confusion (see text on lines 725-735).

      (10) The authors have performed knockdowns of JHAMT, Met, and Kr-h1 to demonstrate the effect of these factors on fecundity in BPH. Additionally, they have performed rescue experiments with EB application on these knockdown insects (Figure 3K-M). This, I believe, is a very flawed experiment. The authors demonstrate EB works through JHAMT in upregulating JH titer. In the absence of JHAMT, EB application is not expected to rescue the phenotype. But the authors have reported a complete rescue here. In the absence of Met, the receptor of JH, either EB or JH is not expected to rescue the phenotype. But a complete rescue has been reported. These two experimental results contradict their own hypothesis.

      Thank you for your comments. We thought that this rescue is possible since knockdown of the genes is incomplete when using dsRNA injection (and residual gene expression allows for EB action). It is not a total knockout and actually, these genes still have a low level of expression in the dsRNA-injected insects. Since EB can upregulate the expression of JHAMT, Met, and Kr-h1, it is reasonable that EB treatment can rescue the down-regulation effects of these three genes and make fecundity completely rescued. We have clarified this on lines 411-413).

      (11) A significant section of the paper deals with how EB upregulates JH titer. JH is a hormone synthesized in the Corpora Allata. Yet the authors have chosen to use the whole body for all of their experiment. Changes in the whole body for mRNA of those enzymes involved in JH synthesis may not reflect the situation in Corpora Allata. Although working with Corpora Allata is challenging, discarding the abdomen and thorax region and working with the head and neck region of the insect is easily doable. Results from such sampling are always more convincing when it comes to JH synthesis studies.

      Thank you for your suggestions. Because the head is very difficult to separate from the thorax region in brown planthoppers as you can see in Author response image 1. We are now trying to answer how EB regulates JH synthesis using Drosophila as a model.

      Author response image 1.

      The brown planthopper

      (12) The phenomenon reported was specific to BPH and not found in other insects. This limits the implications of the study.

      Thank you for your comments. The brown planthopper is a serious insect pest on rice in Asia. Our findings can guide the use of this insecticide in the field. Besides this, our findings indicated that EB, which targets GluCl can impair the JH titer. Our findings added new implications for how a neuronal system influences the JH signaling pathway. We will further investigate how EB influences JH in the future and will use Drosophila as a model to study the molecular mechanisms.

      (13) Overall, the molecular experiments are very poorly designed and can at best be termed superficial. There are several contradictions within the paper and no discussion or explanation has been provided for that.

      Thank you for your comments. We have revised the paper according to your suggestions and added further explanation of our results in the discussion parts and hope the conclusions are better supported in the new version. We have discussed this on lines 725-746 and 778-799.

      Reviewer #2 (Public Review):

      The brown plant hopper (BPH) is a notorious crop pest and pesticides are the most widespread means of controlling its population. This manuscript shows that in response to sublethal doses of the pesticide (EB), BPH females show enhanced fecundity. This is in keeping with field reports of population resurgence post-pesticide treatment. The authors work out the mechanism behind this increase in fecundity. They show that in response to EB exposure, the expression of its target receptor, GluCl, increases. This, they show, results in an increase in the expression of genes that regulate the synthesis of juvenile hormone (JH) and JH itself, which, in turn, results in enhanced egg-production and egg-laying. Interestingly, these effects of EB exposure are species-specific, as the authors report that other species of plant hoppers either don't show enhanced fecundity or show reduced fecundity. As the authors point out, it is unclear how an increase in GluCl levels could result in increased JH regulatory genes.

      We greatly appreciate your valuable comments and constructive suggestion to our work. We will try to figure out how EB interacts with its molecular target GluCl and then increases JH regulatory genes in the future work using Drosophila as models.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Overall, the molecular experiments are very poorly designed and can at best be termed superficial. There are several contradictions within the paper and no discussion or explanation has been provided for that.

      The authors should consider a thorough revision.

      Thank you for your comments. We have thoroughly revised the paper according to your suggestions and added further experiments and explanations of our results in the discussion parts.

      Reviewer #2 (Recommendations For The Authors):

      It would help the reader to have more schematics along with the figures. The final figure is helpful, but knowing the JH pathway, and where it acts would help with the interpretations as one reads the manuscript and the figures. The pathways represented in 4N or 5J are helpful but could be improved upon for better presentation.

      It would be nice to have some discussion on how the authors think EB exposure results in an increase in GluCl expression, and how that in turn affects the expression of so many genes.

      Thank you for your comments. We have thoroughly revised the paper according to your suggestions and added further experiments and explanations of how we think EB exposure results in an increase in JH titer and other genes in the discussion parts. We have added the test on lines 753-761.

      References

      Amezian, D., Fricaux, T., de Sousa, G., Maiwald, F., Huditz, H.-I., Nauen, R., Le Goff, G., 2023. Investigating the role of the ROS/CncC signaling pathway in the response to xenobiotics in Spodoptera frugiperda using Sf9 cells. Pesticide Biochemistry and Physiology 195, 105563.

      Fu, B., Liang, J., Hu, J., Du, T., Tan, Q., He, C., Wei, X., Gong, P., Yang, J., Liu, S., Huang, M., Gui, L., Liu, K., Zhou, X., Nauen, R., Bass, C., Yang, X., Zhang, Y., 2024. GPCR–MAPK signaling pathways underpin fitness trade-offs in whitefly. Proceedings of the National Academy of Sciences 121, e2402407121.

      Gandara, L., Jacoby, R., Laurent, F., Spatuzzi, M., Vlachopoulos, N., Borst, N.O., Ekmen, G., Potel, C.M., Garrido-Rodriguez, M., Böhmert, A.L., Misunou, N., Bartmanski, B.J., Li, X.C., Kutra, D., Hériché, J.-K., Tischer, C., Zimmermann-Kogadeeva, M., Ingham, V.A., Savitski, M.M., Masson, J.-B., Zimmermann, M., Crocker, J., 2024. Pervasive sublethal effects of agrochemicals on insects at environmentally relevant concentrations. Science 386, 446-453.

      Gong, Y., Cheng, S., Desneux, N., Gao, X., Xiu, X., Wang, F., Hou, M., 2022. Transgenerational hormesis effects of nitenpyram on fitness and insecticide tolerance/resistance of Nilaparvata lugens. Journal of Pest Science.

      Hu, B., Huang, H., Hu, S., Ren, M., Wei, Q., Tian, X., Esmail Abdalla Elzaki, M., Bass, C., Su, J., Reddy Palli, S., 2021. Changes in both trans- and cis-regulatory elements mediate insecticide resistance in a lepidopteron pest, Spodoptera exigua. PLOS Genetics 17, e1009403.

      Jiang, H., Meng, X., Zhang, N., Ge, H., Wei, J., Qian, K., Zheng, Y., Park, Y., Reddy Palli, S., Wang, J., 2023. The pleiotropic AMPK–CncC signaling pathway regulates the trade-off between detoxification and reproduction. Proceedings of the National Academy of Sciences 120, e2214038120.

      Ko, K.I., Root, C.M., Lindsay, S.A., Zaninovich, O.A., Shepherd, A.K., Wasserman, S.A., Kim, S.M., Wang, J.W., 2015. Starvation promotes concerted modulation of appetitive olfactory behavior via parallel neuromodulatory circuits. eLife 4, e08298.

      Li, Z., Wang, Y., Qin, Q., Chen, L., Dang, X., Ma, Z., Zhou, Z., 2023. Imidacloprid disrupts larval molting regulation and nutrient energy metabolism, causing developmental delay in honey bee Apis mellifera. eLife

      Martelli, F., Hernandes, N.H., Zuo, Z., Wang, J., Wong, C.-O., Karagas, N.E., Roessner, U., Rupasinghe, T., Robin, C., Venkatachalam, K., Perry, T., Batterham, P., Bellen, H.J., 2022. Low doses of the organic insecticide spinosad trigger lysosomal defects, elevated ROS, lipid dysregulation, and neurodegeneration in flies. eLife 11, e73812.

      Peng, Y.C., Sheng, C.W., Casida, J.E., Zhao, C.Q., Han, Z.J., 2017. Ryanodine receptor genes of the rice stem borer, Chilo suppressalis: Molecular cloning, alternative splicing and expression profiling. Pestic. Biochem. Physiol. 135, 69-77.

      Root, Cory M., Ko, Kang I., Jafari, A., Wang, Jing W., 2011. Presynaptic facilitation by neuropeptide signaling mediates odor-driven food search. Cell 145, 133-144.

      Zhang, Y., Wang, X., Yang, B., Hu, Y., Huang, L., Bass, C., Liu, Z., 2015. Reduction in mRNA and protein expression of a nicotinic acetylcholine receptor α8 subunit is associated with resistance to imidacloprid in the brown planthopper, Nilaparvata lugens. Journal of Neurochemistry 135, 686-694.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is an important study that leverages a human-chimpanzee tetraploid iPSC model to test whether cis-regulatory divergence between species tends to be cell type-specific. The evidence supporting the study's primary conclusion--that species differences in gene regulation are enriched in cell type-specific genes and regulatory elements--is compelling, although attention to biases introduced by sequence conservation is merited, and the case that is made for cell type-specific changes reflecting adaptive evolution is incomplete. This work will be of broad interest in evolutionary and functional genomics.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study aims to identify gene expression differences exclusively caused by cis-regulatory genetic changes by utilizing hybrid cell lines derived from human and chimpanzee. While previous attempts have focused on specific tissues, this study expands the comparison to six different tissues to investigate tissue specificity and derive insights into the evolution of gene expression.

      One notable strength of this work lies in the use of composite cell lines, enabling a comparison of gene expression between human and chimpanzee within the same nucleus and shared trans factors environment. However, a potential weakness of the methodology is the use of bulk RNA-seq in diverse tissues, which limits the ability to determine cell-type-specific gene expression and chromatin accessibility regions.

      We agree that profiling single cells could lead to additional exciting discoveries. Although heterogeneity in cell types within samples will indeed reduce our power to detect cell-type-specific divergence, thankfully any heterogeneity will not introduce false positives, since our use of interspecies hybrids controls for differences in cell-type abundance. As a result, we think that the molecular differences we identified in this study represent a subset of the true cell-type specific cis-regulatory differences that would be identified with deep single-cell profiling. We have included a new paragraph in the discussion on future directions, highlighting the utility of single-cell profiling as an exciting future direction (lines 482-490): “In addition to following up on our findings on GAD1 and FABP7, there are other exciting future directions for this work. First, additional bulk assays such as those that measure methylation, chromatin conformation, and translation rate could lead to a better understanding of what molecular features ultimately lead to cell type-specific changes in gene expression. Furthermore, the use of deep single cell profiling of hybrid lines derived from iPSCs from multiple individuals of each species during differentiation could enable the identification of many more highly context-specific changes in gene expression and chromatin accessibility such as the differences in GAD1 we highlighted here. Finally, integration with data from massively parallel reporter assays and deep learning models will help us link specific variants to the molecular differences we identified in this study.”

      Another concern is the use of two replicates derived from the same pair of individuals. While the authors produced cell lines from two pairs of individuals in a previous study (Agloglia et al., 2021), I wonder why only one pair was used in this study. Incorporating interindividual variation would enhance the robustness of the species differences identified here.

      We agree that additional replicates, especially from lines from other individuals, would have improved the robustness of the species differences we identified. In our experience with these hybrid cells (as well as related work from many other labs), inter-species differences typically have much larger magnitudes than intra-species differences, so we expect that the vast majority of differences we identified would be validated with data from additional individuals. Unfortunately, differentiating additional cells and generating these data for this study would be cost-prohibitive. We now mention the use of additional replicates in lines 485-488 of the discussion: “Furthermore, the use of deep single cell profiling of hybrid lines derived from iPSCs from multiple individuals of each species during differentiation could enable the identification of many more highly context-specific changes in gene expression and chromatin accessibility such as the differences in GAD1 we highlighted here.”

      Furthermore, the study offers the opportunity to relate inter-species differences to trends in molecular evolution. The authors discovered that expression variance and haploinsufficiency score do not fully account for the enrichment of divergence in cell-type-specific genes. The reviewer suggests exploring this further by incorporating external datasets that bin genes based on interindividual transcriptomics variation as a measure of extant transcriptomics constraint (e.g., GTEx reanalysis by Garcia-Perez et al., 2023 - PMID: 36777183). Additionally, stratifying sequence conservation on ASCA regions, which exhibit similar enrichment of cell-type-specific features, using the Zoonomia data mentioned also in the text (Andrews et al., 2023 -- PMID: 37104580) could provide valuable insights.

      To address this, we used PhastCons scores computed from a 470-way alignment of mammals as we could not find publicly available PhastCons data from Zoonomia. When stratifying by the median PhastCons score of all sites in a peak, we observe very similar results to those obtained when stratifying by the constraint metric from the gnomAD consortium (see below). The one potential difference is that peaks in the top two bins have slightly weaker enrichment relative to the other bins when using PhastCons, but this is not the case when using gnomAD’s metric. We have elected to include this in the public review but not the manuscript as we are reluctant to add to the complexity of what is already complex analysis.

      Author response image 1.

      Finally, we think that comparisons of the properties of gene expression variance computed from ASE (as done by Starr et al.) and total expression (as done by Garcia-Perez et al.) is a very interesting, potentially complex question that is beyond the scope of this paper but an exciting direction for future work.

      Another potential strength of this study is the identification of specific cases of paired allele-specific expression (ASE) and allele-specific chromatin accessibility (ASCA) with biological significance.

      Prioritizing specific variants remains a challenge, and the authors apply a machine-learning approach to identify potential causative variants that disrupt binding sites in two examples (FABP7 and GAD1 in motor neurons). However, additional work is needed to convincingly demonstrate the functionality of these selected variants. Strengthening this section with additional validation of ASE, ASCA, and the specific putative causal variants identified would enhance the overall robustness of the paper.

      We strongly agree with the reviewer that additional work validating our results would be of considerable interest. We hope to perform follow-up experiments in the future. For now, we have been careful to present these variants only as candidate causal variants.

      Additionally, the authors support the selected ASE-ASCA pairs by examining external datasets of adult brain comparative genomics (Ma et al., 2022) and organoids (Kanton et al., 2019). While these resources are valuable for comparing observed species biases, the analysis is not systematic, even for the two selected genes. For example, it would be beneficial to investigate if FABP7 exhibits species bias in any cell type in Kanton et al.'s organoids or if GAD1 is species-biased in adult primate brains from Ma et al. Comparing these datasets with the present study, along with the Agoglia et al. reference, would provide a more comprehensive perspective.

      We agree with the reviewer’s suggestion that investigating GAD1 and FABP7 expression in other datasets is worthwhile. Unfortunately, the difference in human vs. chimpanzee organoid maturation rates and effects of culture conditions in Kanton et al. makes it unsuitable for plotting the expression of FABP7 as its expression is highly dependent on neuronal maturation. We therefore plotted bulk RNAseq data from multiple cortical regions from Sousa et al. 2017 (see below). This corroborates our claim that FABP7 has human-biased expression in adult humans compared to chimpanzees and rhesus macaques. We also investigated expression of GAD1 in the Ma et al. data as the reviewer suggested.

      Author response image 2.

      While there are differences in GAD1 expression between adult humans and chimpanzees, they are unlikely to be linked to the HAR we highlight as it is likely a transiently active cis-regulatory element (see below). In addition, some cell types seem to have chimpanzee-derived changes in GAD1 expression (e.g. SST positive neurons) whereas others seem to have human-derived changes in GAD1 expression (e.g. LAMP5 positive neurons).

      Author response image 3.

      While these are potentially interesting observations, we think that their inclusion in the manuscript might distract from our emphasis on the cell type-specific and developmental stage-specific of the changes in FABP7 and GAD1 expression we observe so we have not included them in the manuscript.

      The use of the term "human-derived" in ASE and ASCA should be avoided since there is no outgroup in the analysis to provide a reference for the observed changes.

      We agree with the reviewer that the term human-derived should be used with care and have changed the phrasing of line 230 to “human-chimpanzee differences in expression”. With regard to FABP7 we think that our analysis of the Ma et al. data—which includes data from rhesus macaques as an outgroup—justifies our use of “human-derived” in lines 360 and 457. As chimpanzee and macaque expression of FABP7 are similar but human expression is quite different, the most parsimonious explanation for our observations is that FABP7 upregulation occurred in the human lineage.

      Finally, throughout the paper, the authors refer to "hybrid cell lines." It has been suggested to use the term "composite cell lines" instead to address potential societal concerns associated with the term "hybrid," which some may associate with reproductive relationships (Pavlovic et al., 2022 -- PMID: 35082442). It would be interesting to know the authors' perspective on these concerns and recommendations presented in Pavlovic et al., given their position as pioneers in this field.

      We appreciate this question. Whether to refer to our fused cells as “hybrids” or not was indeed a question we considered at great length, starting from the very beginning of this project in 2015. From consultations with multiple bioethicists-- both formal and informal-- we have long been aware of the possibility of misunderstanding based on the word “hybrid”. However, we felt this possibility was outweighed by the long and well-established history of other scientists referring to interspecies fused cells as hybrids. This convention-- which is based on hundreds of papers about heterokaryons, somatic cell hybrids, and radiation hybrids-- goes back over 50 years (e.g. Bolund et al, Exp Cell Res 1969). Soon after the establishment of this nomenclature, cell fusion became widespread and ever since then it has become commonplace to generate interspecies hybrid cells from animals, plants and fungi.

      It is also important to note that in over two years since we published the first two papers on humanchimpanzee fused cells, we have been unable to find any misunderstanding of our use of the term “hybrid”. We have searched blogs, media articles, and social media, all with no evidence of misunderstanding. Therefore, in the current manuscript, rather than creating confusion by renaming a well-established approach, we have opted to clearly and prominently define hybrid cells: in the abstract of our paper we introduce the hybrid cells as “the product of fusing induced pluripotent stem (iPS) cells of each species in vitro.”

      Reviewer #2 (Public Review):

      In this paper, Wang and colleagues build on previous technical and analytical achievements in establishing tetraploid human-chimpanzee hybrid iPSCs to investigate the cell type-specificity of allelespecific expression and allele-specific chromatin accessibility across six differentiated cell types (here, "allele-specific" indicates species differences with a cis-regulatory basis). The combined body of work is remarkable in its creativity and ambition and has real potential for overcoming major challenges in understanding the evolutionary genetics of between-species differences. The present paper contributes to these efforts by showing how differentiated cells can be used to test a long-standing hypothesis in evolutionary genetics: that cis-regulatory changes may be particularly important in divergence because of their potential for modularity.

      In my view, the paper succeeds in making this case: allele (species)-specific expression (ASE) and allelespecific chromatin accessibility (ASCA) are enriched in genes asymmetrically expressed in one cell type, and many cases of ASE/ASCA are cell type-specific. The authors do an excellent job showing that these results are robust across a set of possible analysis decisions. It is somewhat less clear whether these enrichments are primarily a product of relaxed constraint on cell type-specific genes or primarily result from positive selection in the human or chimp lineage. While the authors attempt to control for constraint using several variables (variance in ASE in humans and the sequence-based probability of haploinsufficiency score, pHI), these are imperfect proxies for constraint. For the pHI scores, enrichments for ASE also appear to be strongest in the least constrained genes. Overall, the relative role of relaxation of constraint versus positive selection is unresolved, although the manuscript's language leans in favor of an important role for selection.

      We agree with the reviewer and apologize for the wording that indeed focused more on positive selection than relaxed constraint. We have added language clarifying that our stance is that our analyses suggest some role for positive selection, but that we do not claim that positive selection plays a larger role than reduced constraint (lines 432-437): “Overall, this suggests that broad changes in expression in cell type-specifically expressed genes may be an important substrate for evolution but it remains unclear whether positive selection or lower constraint plays a larger role in driving the faster evolution of more cell type-specifically expressed genes. Future work will be required to more precisely quantify the relative roles of positive selection and evolutionary constraint in driving changes in gene expression.”

      The remainder of the manuscript draws on the cell type-specific ASE/ASCA data to nominate candidate genes and pathways that may have been important in differentiating humans and chimpanzees. Several approaches are used here, including comparing human-chimp ASE to the distribution of ASE observed in humans and investigating biases in the direction of ASE for genes in the same pathway. The authors also identify interesting candidate genes based on their role in development or their proximity to human accelerated regions (where many changes have arisen on the human lineage in otherwise deeply conserved sequence) and use a deep neural network to identify sequence changes that might be causally responsible for ASE/ASCA. These analyses have value and highlight potential strategies for using ASE/ASCA and hybrid cell line data as a hypothesis-generating tool. Of course, the functional follow-up that experimentally tested these hypotheses or linked sequence/expression changes in the candidate pathways to organismal phenotype would have strengthened the paper further- but this is a lot to ask in an already technically and analytically challenging piece of work.

      We thank the reviewer for the kind words and strongly agree that follow-up experiments and orthogonal analyses will be key in validating our results and establishing links to human-specific phenotypes.

      As a minor critique, the present paper is very closely integrated with other manuscripts that have used the hybrid human-chimp cell lines for biological insight or methods development. Although its contributions make it a strong stand-alone contribution, some aspects of the methods are not described in sufficient detail for readers to understand (even on a general conceptual level) without referencing that work, which may somewhat limit reader understanding.

      We agree with the points the reviewer raises regarding the clarity of our methods. We have amended several sections to provide more conceptual information while pointing the reader to other publications for the technical details. For convenience, we include the text here as well as in the new draft.

      Lines 207-214 now provide more intuition for the method used to detect lineage-specific selection: “Next, we sought to use our RNA-seq data to identify instances of lineage-specific selection. In the absence of positive selection, one would expect that an approximately equal number of genes in a pathway would have human-biased vs. chimpanzee-biased ASE. Significant deviation from this expectation (as determined by the binomial test) rejects the null hypothesis of neutral evolution, instead providing evidence of lineage-specific selection on this pathway. Using our previously published modification of this test that incorporates a tissue-specific measure of constraint on gene expression, we detected several signals of lineage-specific selection, some of which were cell type-specific (Starr et al., 2023, Additional file 2).” This is also reflected in the Methods in lines 729-731: “Positive selection on a gene set is only inferred if there is statistically significant human- or chimpanzee-biased ASE in that gene set (using an FDR-corrected p-value from the binomial test).”

      Reviewer #3 (Public Review):

      The authors utilize chimpanzee-human hybrid cell lines to assess cis-regulatory evolution. These hybrid cell lines offer a well-controlled environment, enabling clear differentiation between cis-regulatory effects and environmental or other trans effects.

      In their research, Wang et al. expand the range of chimpanzee-human hybrid cell lines to encompass six new developmental cell types derived from all three germ layers. This expansion allows them to discern cell type-specific cis-regulatory changes between species from more pleiotropic ones. Although the study investigates only two iPSC clones, the RNA- and ATAC-seq data produced for this paper is a valuable resource.

      The authors begin their analysis by examining the relationship between allele-specific expression (ASE) as a measure of species divergence and cell type specificity. They find that cell-type-specific genes exhibit more divergent expression. By integrating this data with measures of constraint within human populations, the authors conclude that the increased divergence of tissue-specific genes is, at least in part, attributable to positive selection. A similar pattern emerges when assessing allele-specific chromatin accessibility (ASCA) as a measure of divergence of cis-regulatory elements (CREs) in the same cell lines.

      By correlating these two measures, the authors identify 95 CRE-gene pairs where tissue-specific ASE aligns with tissue-specific ASCA. Among these pairs, the authors select two genes of interest for further investigation. Notably, the authors employ an intriguing machine-learning approach in which they compare the inferred chromatin state of the human sequence with that of the chimpanzee sequence to pinpoint putatively causal variants.

      Overall, this study delves into the examination of gene expression and chromatin accessibility within hybrid cell lines, showcasing how this data can be leveraged to identify potential causal sequence differences underlying between-species expression changes.

      We appreciate this assessment.

      I have three major concerns regarding this study:

      1. The only evidence that the cells are indeed differentiated in the right direction is the expression of one prominent marker gene per cell type. Especially for the comparison of conservation between the differentiated cell types, it would be beneficial to describe the cell type diversity and the differentiation success in more detail.

      We appreciate this assessment. We agree that evidence beyond a single marker gene is necessary to demonstrate that the differentiations were successful and that a discussion of the limitations of these differentiations in the manuscript is worthwhile. We included figures showing additional marker genes and a thorough discussion of the differentiations in the supplement. For convenience, we have copied the supplemental figure and text here:

      “Before continuing with the analysis, we tested whether the differentiations were successful and contained primarily our target cell types. The very low expression of NANOG, a marker for pluripotency, across all differentiations indicates that the samples contain very few iPSCs (Agoglia et al., 2021). For cardiomyocytes (CM), NKX2-5, MYBPC3, and TNNT2 definitively distinguish CM from other heart cell types and their high expression indicates successful differentiations (Burridge et al., 2014). For motor neurons, the high expression of ELAVL2, a pan-neuronal marker, indicates a high abundance of neurons in the sample (Mickelsen et al., 2019). The expression of ISL1 and OLIG2 further demonstrates that these are motor neurons and not other types of neurons (Maury et al., 2015). For retinal pigment epithelium (RPE), the combined expression of MITF, PAX6, and TYRP1 provides strong evidence that the differentiations were successful in producing RPE cells (Sharma et al., 2019). For skeletal muscle, the very high expression of MYL1, MYLPF, and MYOG indicates that these samples contain a high proportion of skeletal muscle cells (Chal et al., 2016). In general, all these populations of cells contain some proportion of progenitors as there is detectable expression of MKI67 in all samples.

      The low expression of ALB (a marker for mature hepatocytes) and the high expression of TTR and GPC3 (markers for hepatocyte progenitors) combined with the high expression of HNF1B indicate that the bulk of the cells in the HP samples are hepatocyte progenitors rather than mature hepatocytes or endoderm cells, although there are likely some endoderm cells and immature hepatocytes in the sample (Hay et al., 2008; Mallanna & Duncan, 2013). Similarly, the combined expression of PDX1 and NKX6-1 and the low expression of NEUROG3 (a marker of endocrine progenitors which differentiate from pancreatic progenitors) in the PP samples indicates that these primarily contain pancreatic progenitors but likely contain some endocrine progenitors and endoderm cells (Cogger et al., 2017; Korytnikov & Nostro, 2016).

      Notably, HP and PP are closely related cell types that are derived from the same lineage. Indeed, heterogeneous multipotent progenitors can contribute to both the adult liver and adult pancreas in mice (Willnow et al., 2021). Progenitors that express PDX1 (often used as a marker for the pancreatic lineage) can differentiate into hepatocytes (Willnow et al., 2021). As a result, some overlap in the transcriptomic signature of both cell types is expected and we cannot rule out that the HP samples contain cells that could differentiate into pancreatic cells or that the PP samples contain cells that could differentiate into hepatocytes. However, the expression of NKX6-1 and GP2, markers for pancreatic progenitors, in the PP samples but not the HP samples indicates that these two populations of cells are distinct. Overall, the similarity of PP and HP likely explains the lower number of cell type-specific genes and genes showing cell type-specific ASE for these cell types. This similarity does not alter the conclusions presented in the main text.”

      Author response image 4.

      Author response image 5.

      Marker gene expression in different cell types. In order, the panels show: a marker for pluripotency, a marker gene for dividing cells, marker genes for cardiomyocytes, marker genes for hepatocytes and hepatocyte progenitors, marker genes for motor neurons, marker genes for pancreatic progenitors and more mature pancreatic cell types, marker genes for retinal pigment epithelial cells, and marker genes for skeletal myocytes. Hepatocyte progenitors and pancreatic progenitors generally show similar gene expression profiles. TPM: transcript per million.

      1. Check for a potential confounding effect of sequence similarity on the power to detect ASE or ASCA.

      We agree that checking for confounding by power to detect ASE or ASCA would increase confidence in our results. We have added supplementary figures 29-33 to show the results as well as a discussion of these figures in the text (lines 318-326):

      “Finally, it is possible that CREs and genes that are less conserved will have more SNPs, and therefore more power to call ASCA and ASE, leading to systematically biased estimates. There is a weak positive correlation between the number of SNPs and the -log10(FDR) for ASE and a weak negative or no correlation for ASCA (Supp Fig. 29). Similarly, we observe a weak relationship between the number of SNPs in CREs or genes and absolute log fold-change estimates (Supp Fig. 30). Although the relationship between the number of SNPs and ASE/ASCA is weak, we confirmed that cell type-specific genes and peaks are still strongly enriched for ASE and ASCA when stratifying by number of SNPs (Supp Fig. 31-32). Overall, our analysis suggests that the result that more cell type-specific genes and CREs are more evolutionarily diverged is robust to a variety of possible confounders.”

      Author response image 6.

      Relationship between number of SNPs and -log10(FDR) in a) ASE and -log10(pvalue) b) ASCA. These scatter plots show the relationship between the number of SNPs in a gene or peak and the -log10(FDR) for ASE or ASCA. Genes with significant ASE (FDR < 0.05) and peaks with significant ASCA (binomial p-value < 0.05) were annotated as blue dots, and all other genes and peaks were annotated as gray dots. All genes in each cell type in RNA-seq are shown. For clarity, the few outlier peaks with more than 200 SNPs are excluded from these plots.

      Author response image 7.

      Relationship between number of SNPs and absolute log2 fold-change in a) ASE and b) ASCA. These scatter plots show the relationship between the number of SNPs in a gene or peak and the estimated absolute log2 fold-change for ASE or ASCA. Genes with significant ASE (FDR < 0.05) and peaks with significant ASCA (binomial p-value < 0.05) were annotated as blue dots, and all other genes and peaks were annotated as gray dots. All genes in each cell type in RNA-seq are shown. For clarity, the few outlier peaks with more than 200 SNPs are excluded from these plots.

      Author response image 8.

      Cell type-specifically expressed genes are enriched for genes with ASE when stratifying by the number of SNPs per gene. a) Results when SKM is included. Genes were put into five bins with an equal number of genes in each bin. Genes with the fewest SNPs are in the 0-20% bin and genes with the most SNPs are in the 80-100% bin. Significance (using the Wald test) is indicated by asterisks where *** indicates p < 0.005, ** indicates p < 0.01, and * indicates p < 0.05. b) The same as in (a) but excluding SKM.

      Author response image 9.

      Cell type-specific peaks are enriched for ASCA when stratifying by the number of SNPs per peak. a) Peaks with an absolute log2 fold-change greater than or equal to 0.5 were called as having ASCA. Peaks were put into five bins with an equal number of peaks in each bin. Peaks with the fewest SNPs are in the 0-20% bin and genes with the most SNPs are in the 80-100% bin. Significance (using the Wald test) is indicated by asterisks where *** indicates p < 0.005, ** indicates p < 0.01, and * indicates p < 0.05. b) The same as in (a) but peaks with a binomial p-value less than or equal to 0.05 were called as having ASCA.

      1. In the last part the authors showcase 2 examples for which the log2 fold changes in chromatin state scores as inferred by the machine learning model Sei are used. This is an interesting and creative approach, however, more sanity checks on this application are necessary.

      We agree with the reviewer about the importance of sanity checks and apologize for omitting these from the manuscript. Below we highlight several such checks from previous publications:

      In the original Sei paper (Chen et al. 2022), the authors included several tests of their model’s ability to predict the effects on individual genetic variants. Using eQTL data from GTEx, they found that variants predicted to increase enhancer activity were more likely to be up-regulating eQTLs, and those predicted to increase polycomb repression had the expected repressive effect. These relationships became stronger when restricting the analysis only to fine-mapped eQTLs with >95% posterior probabilities of causality. Chen et al. also found that previously known disease-causing noncoding variants from the Human Gene Mutation Database were far more likely to reduce predicted enhancer/promoter activity than matched variants not linked to any disease.

      In addition, we note that a similar approach to ours was recently used to analyze all HARs and included considerable efforts to validate the utility of the Sei predictions in identifying causal variants (Whalen et al. 2023 in Neuron). For example, Whalen et al. found that the Sei output correlated with the effects of genetic variants on expression in a massively parallel reporter assay. They also found that the effect sizes predicted by Sei were much higher for variants in HARs than polymorphic variants in the human population, which is consistent with the idea that variants in HARs lie in highly conserved bases that are more likely to disrupt cis-regulatory elements. Finally, Whalen et al. found that effects on chromatin state predicted by Sei were generally highly correlated across tissues, supporting our approach that leverages all Sei outputs regardless of which cell type or tissue they correspond to. Overall, we think that Sei is a potentially powerful way to prioritize causal variants and that improved machine learning models trained on more extensive and context-specific data will be even more powerful.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their comments and provide answers /clarifications and new data; There were 3 important recurrent points we already address here: 

      (a) The reviewers were concerned that the observed motor defects (measured by startle induced negative geotaxis- “SING”) where a reasonable behavioral measure of DAN function.

      Previously, Riemensperger et al., 2013 (PMID: 24239353) already linked synaptic loss of the dopaminergic PAM neurons to SING impairments. Furthermore, in a separate paper that we recently posted on BioRxiv, we show that the SING defects in PD mutants are rescued when the flies are fed L-DOPA (Kaempf et al 2024; BioRxiv). In this same paper we also show a very strong correlation between SING defects and defects in dopaminergic synaptic innervation of PAM DAN onto Mushroom body neurons. Both experiments suggest that the motor defects are the result of defects in dopamine release. Altogether, these data suggest that the combination of the SING assay and a quantification of the synaptic region of PAM DAN onto Mushroom body neurons is a suitable measure for DAN function.

      (b) The reviewers asked if the OPN dysfunction in young animals is connected to dopaminergic neuron (DAN) dysfunction in later life; 

      We have conducted additional experiments and have included the results (new Figure 6): Our young PD mutants (we included Aux<sup>R927G</sup>, Synj<sup>R258Q</sup> and LRRK2<sup>G2019S</sup>) show olfactory defects, but normal DAN function (measured by assessing the TH-labeled synaptic area onto the Mushroom body neurons and by SING). Aged PD mutants show both olfactory defects and DAN dysfunction. When we express the wildtype PD gene in (a.o.) OPN of PD mutants using the GH146-Gal4 (that does not drive expression in DAN) we are able to rescue the DAN defects (synaptic area and SING) that occur later in life. This indeed suggests there is a cell non-autonomous positive effect on DAN dysfunction that occurs at later stages in the life of our PD mutants (new Figure 6a). 

      In a set of independent experiments, we also fed one of our mutants (LRRK2<sup>G2019S</sup>) nicotine, activating Nicotinic acetylcholine receptors (that are also activated by the release of acetylcholine from cholinergic neurons such as OPN). While nicotine does not rescue the olfactory preference defect, the OPN synapse morphology defect or the OPN-associated defects in Ca<sup>2+</sup>-imaging in LRRK2<sup>G2019S</sup> mutants (Figure 6b), it does rescue the DAN-associated defects, including SING, synapse loss and defects in Ca<sup>2+</sup>-imaging (Figure 6c).

      Finally, we generated human induced dopaminergic neurons derived from iPSC with a LRRK2<sup>G2019S</sup> mutation and incubated these neurons with nicotine. Again, this induced a rescue of a LRRK2-mutant-induced defect in neuronal activity measured by Ca<sup>2+</sup>-imaging. This is specific to nicotine since the rescue was absent when cells were also incubated with mecamylamine, a non-competitive antagonist of nicotinic acetylcholine receptors, trumping the effects of nicotine (Figure 6d-e").

      (c) The reviewers indicated that the GH146 Gal 4 driver is expressed in other cells than OPN and thus, they noted that the defects we observe may not only be the result of OPN dysfunction. 

      It is correct that GH146-dependent Gal expression includes OPNs (that are cholinergic) and one pair of inhibitory APL neurons (that are GABAergic) (Li et al., 2017 (PMID: 29149607), Lui et al., 2009 (PMID: 19043409)). We have adapted the text to explicitly state this. There are only 2 APL per fly brain and our single cell sequencing experiment does not have the resolution to allow us to test if these neurons had a significant number of DEG. However, as indicated above (in (b)), we are able to rescue DAN dysfunction by mimicking cholinergic output (application of nicotine). These data do not exclude that APL-neuron problems contribute to the defects we observe in our PD mutants, but they do suggest that cholinergic output is critical to maintain normal DAN function.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      This is a fantastic, comprehensive, timely, and landmark pan-species work that demonstrates the convergence of multiple familial PD mutations onto a synaptic program. It is extremely well written and I have only a few comments that do not require additional data collection. 

      Thank you for this enthusiastic endorsement.

      Major Comments:  

      neurons and the olfactory system are acutely impacted by these PD mutations. However, I wonder if this is the case:  

      (1) In the functional experiments performing calcium imaging on projection neurons I could not find a count of cell bodies across conditions. Since the loss of OPNs could explain the reduced calcium signal, this is a critical control to perform. A differential abundance test on the single-cell data would also suffice here and be easy for the authors to perform with their existing data. 

      This is indeed an important number, and we had included this in the Supplemental figure 2a.

      Also, the number of DAN and Visual projection neurons were not significantly different between the genotypes (Supplemental Figure 2a in the manuscript). 

      (2) One of the authors' conclusions is that cholinergic

      a. Most Drosophila excitatory neurons are cholinergic

      and only a subpopulation appear to be dysregulated by these mutations. The authors point out that visual neurons also have many DEGs, couldn't the visual system also be dysregulated in these flies? Is there something special about these cholinergic neurons versus other cholinergic neurons in the fly brain? I wonder if they can leverage their nice dataset to say something about vulnerability. 

      Yes, the reviewer is right, and we have changed our wording to be more specific. The reviewer also noted correctly that neurons in the visual system rank high in terms of number of DEGs, but we did not conduct elaborate experiments to assess if these visual system neurons are functional. Of note, several of our mutants show (subtle) electroretinogram defects, that are a measure of visual system integrity, but further work is needed to determine the origin of these defects. 

      The question about the nature of the underlying vulnerability pathways is interesting. In preliminary work we have selected a number of DEGs common to vulnerable cells in several PD mutants, and conducted a screen where we manipulated the expression of these DEGs and looked for rescue of the olfactory preference defects in our PD mutants. The strongest genetic interaction was with genes encoding proteins involved in proteostasis (Atg8/LC3, Lamp1 and Hsc70-4) (Reviewer Figure 3). While interesting, these results require further work to understand the underlying molecular mechanisms. We present these preliminary data here but have not included them in the main manuscript. 

      b. As far as I can tell, the cross-species analysis of DEGs (Figure 3) is agnostic to neuronal cell type, although the conclusion seems to suggest only cholinergic neurons were contrasted. Is this correct? Could you please clarify this in the text as it's an important detail. If not, Have the authors tried comparing only cholinergic neuron DEGs across species? That would lend strength to their specificity argument. The results for the NBM are impressive. Could the authors add more detail to the main text here about other regions to the main text? 

      The reviewer is correct that we compiled the DEG of all affected cells, the majority of which are cholinergic neurons. 

      For the human data we focused on the NBM samples, because it contained the highest fraction of cholinergic neurons (as compared to the other 2 regions), but even so, it was not possible to analyze the cholinergic neurons alone because the fraction of cholinergic neurons in the human material was too low to be statistically analyzed independently. Note that both wildtype and PD samples contained a low number of cholinergic neurons (i.e. the DEG differences we detected were not the result of sequencing different types of cells - see also Supplemental Figure 3b and d). We have indicated this more clearly in the text.

      c. Uniquely within the human data, are cholinergic neurons more dysregulated than others? I understand this is not an early timepoint but would still be useful to discuss. 

      As indicated in the previous point, unfortunately the fraction of cholinergic neurons in the human material was low and we were not able to analyze these cells on their own. 

      Author response image 1.

      Upregulation of protein homeostasis rescues hyposmia across familial models of PD. Results of a behavioral screen for cell-specific rescue of olfactory preference defects of young PD fly models using up and downregulation of deregulated genes in affected cell types. Genes implicated in the indicated pathways are over expressed or knocked down using GH146-Gal4 (OPN>) and UAS-constructs (over expression or RNAi) . UAS-only (-) and OPN>UAS (+) were scored in parallel and are compared to each other. n.d. not determined; Bars represent mean ± s.e.m.; grey zone indicates the variance of controls; n≥5 independent experiments per genotype, with ~50 flies each; red bars: p<0.05 in ANOVA and Bonferroni-corrected comparison to UAS-only control.

      d. In the discussion, the authors say that olfactory neurons are uniquely poised to be dysregulated as they are large and have high activity. Is this really true compared to other circuits? I didn't find the references convincing and I am not sure this has been borne out in electron microscopy reconstructions for anatomy.  

      We agree and have toned down this statement.

      Reviewer #2 (Public Review):  

      Summary:  

      Pech et al selected 5 Parkinson's disease-causing genes, and generated multiple

      Drosophila lines by replacing the Drosophila lrrk, rab39, auxilin (aux), synaptojanin

      (synj), and Pink1 genes with wild-type and pathogenic mutant human or Drosophila cDNA sequences. First, the authors performed a panel of assays to characterize the phenotypes of the models mentioned above. Next, by using single-cell RNA-seq and comparing fly data with human postmortem tissue data, the authors identified multiple cell clusters being commonly dysregulated in these models, highlighting the olfactory projection neurons. Next, by using selective expression of Ca<sup>2+</sup>-sensor GCaMP3 in the OPN, the authors confirmed the synaptic impairment in these models, which was further strengthened by olfactory performance defects.  

      Strengths:  

      The authors overall investigated the functionality of PD-related mutations at endogenous levels and found a very interesting shared pathway through singlecell analysis, more importantly, they performed nice follow-up work using multiple assays.  

      Weaknesses:  

      While the authors state this is a new collection of five familial PD knock-in models, the Aux<sup>R927G</sup> model has been published and carefully characterized in Jacquemyn et al., 2023. ERG has been performed for Aux R927G in Jacquemyn et al., 2023, but the findings are different from what's shown in Figure 1b and Supplementary Figure 1d, which the authors should try to explain. 

      We should have explained this better: the ERG assay in Jacquemyn et al., and here, in Pech et al., are different. While the ERGs in our previous publication were recorded under normal endogenous conditions, the flies in our current study were exposed to constant light for 7 days. This is often done to accelerate the degeneration phenotype. We have now indicated this in the text (and also refer to the different experimental set up compared to Jacquemyn et al).

      Moreover, according to the authors, the hPINK1control was the expression of human PINK1 with UAS-hPINK1 and nsyb-Gal4 due to technical obstacles. Having PINK1 WT being an overexpression model, makes it difficult to explain PINK1 mutant phenotypes. It will be strengthened if the authors use UAS-hPINK1 and nsyb-Gal4 (or maybe ubiquitous Gal4) to rescue hPink1L347P and hPink1P399L phenotypes.

      The UAS-hPink1 was originally created by the Lu lab (Yang et al., 2003, PMID: 12670421) and has been amply used before in Pink1 loss-of-function backgrounds (e.g. in Yang et al., 2006, PMID: 16818890). In our work, the control we refer to was UAS-hPink1 expression (driven by nSyb-gal4) in a Pink1 knock-out background. For unknown reasons we were unable to replace the fly Pink1 with a human pink1 cDNA, we explained this in the methods section and added a remark in the new manuscript.

      In addition, although the authors picked these models targeting different biology/ pathways, however, Aux and Synj both act in related steps of Clathrin-mediated endocytosis, with LRRK2 being their accessory regulatory proteins. Therefore, is the data set more favorable in identifying synaptic-related defects? 

      We picked these particular mutants, as they were the first we created in the context of a much larger collection of “PD flies” (see also Kaempf et al 2024, BioRxiv). We have made adaptations to the text to tone down the statement on the broad selection of mutants. 

      GH146-GAL4+ PNs are derived from three neuroblast lineages, producing both cholinergic and GABAergic inhibitory PNs (Li et al, 2017). Therefore, OPN neurons have more than "cholinergic projection neurons". How do we know from singlecell data that cholinergic neurons were more vulnerable across 5 models? 

      The reviewer is correct that GH146 drives expression in other cells than OPN and we now clearly state this in the text. We do present additional arguments that substantiate our conclusion that cholinergic neurons are affected: (1) our single cell sequencing identifies the most DEGs in cholinergic neurons. (2) nicotine (a compound activating cholinergic receptors) rescues dopamine-related problems in old PD-mutant flies. (3) Likewise, nicotine also alleviates problems we observed in LRRK2 mutant human induced dopaminergic neurons and this is blocked by mecamylamine, a non-competitive antagonist of nicotinic acetylcholine receptors.

      In Figure 1b, the authors assumed that locomotion defects were caused by dopaminergic neuron dysfunction. However, to better support it, the author should perform rescue experiments using dopaminergic neuron-specific Gal4 drivers. Otherwise, the authors may consider staining DA neurons and performing cell counting. Furthermore, the authors stated in the discussion, that "We now place cholinergic failure firmly ahead of dopaminergic system failure in flies", which feels rushed and insufficient to draw such a conclusion, especially given no experimental evidence was provided, particularly related to DA neuron dysfunction, in this manuscript. 

      Previously, Riemensperger et al., 2013 (PMID: 24239353) already linked synaptic loss of the dopaminergic PAM neurons to locomotion impairments (measured by SING). Furthermore, in a separate paper we show that the motor defects (SING) observed in PD mutants are rescued when the flies are fed L-DOPA, but not D-DOPA (Kaempf et al 2024; BioRxiv). In this same paper, we also show a significant correlation between SING defects and defects in dopaminergic synaptic innervation of PAM DAN onto Mushroom body neurons. We have referred to both articles in the revised manuscript.

      The statement on cholinergic failure ahead of dopaminergic failure was made in the context of the sequence of events: young flies did not show DAN defects, but they did display olfactory defects. The statement was indeed not meant to imply causality. However, we have now conducted new experiments where we express wild type PD genes using GH146-Gal4 (that does not express in DAN) in the PD mutants and assess dopaminergic-relevant phenotypes later in life (see also new Figure 6 in the manuscript). This shows that GH146Gal4-specific rescue is sufficient to alleviate the DAN-dependent SING defects in old flies. Likewise, as indicated above, application of nicotine is also sufficient to rescue the DAN-associated defects (in PD mutant flies and human induced mutant dopaminergic neurons).  

      It is interesting to see that different familial PD mutations converge onto synapses. The authors have suggested that different mechanisms may be involved directly through regulating synaptic functions, or indirectly through mitochondria or transport. It will be improved if the authors extend their analysis on Figure 3, and better utilize their single-cell data to dissect the mechanisms. For example, for all the candidates listed in Figure 3C, are they all altered in the same direction across 5 models?  

      This is indeed the case: the criteria for "commonly deregulated" included that the DEGs are changed in the same direction across several mutants. We ranked genes according to their mean gene expression across the mutants as compared it to the wildtype control: i.e. only if the DEGs are all up- or all down-regulated they end up on the top or bottom of our list. We added a remark in the revised manuscript. In preliminary work we also selected a number of the DEGs and conducted a screen where we manipulated the expression of these genes looking for rescue of the olfactory preference defects in our PD mutants. The strongest genetic interaction was with genes encoding proteins involved in proteostasis (Atg8/LC3, Lamp1 and Hsc70-4; and we also show a genetic interaction between EndoA and Lrrk in this work and in Matta et al., 2012) (Author response image 1 above). While interesting, these results require further work to understand the underlying molecular mechanisms. We present these preliminary data here, but have not included them in the main manuscript. 

      While this approach is carefully performed, the authors should state in the discussions the strengths and the caveats of the current strategy. For example, what kind of knowledge have we gained by introducing these mutations at an endogenous locus? Are there any caveats of having scRNAseq at day 5 only but being compared with postmortem human disease tissue?  

      We have included a “strengths and caveats section” in the discussion addressing these points.

      Reviewer #3 (Public Review):  

      Summary:  

      This study investigates the cellular and molecular events leading to hyposmia, an early dysfunction in Parkinson's disease (PD), which develops up to 10 years prior to motor symptoms. The authors use five Drosophila knock-in models of familial PD genes (LRRK2, RAB39B, PINK1, DNAJC6 (Aux), and SYNJ1 (Synj)), three expressing human genes and two Drosophila genes with equivalent mutations.  

      The authors carry out single-cell RNA sequencing of young fly brains and singlenucleus RNA sequencing of human brain samples. The authors found that cholinergic olfactory projection neurons (OPN) were consistently affected across the fly models, showing synaptic dysfunction before the onset of motor deficits, known to be associated with dopaminergic neuron (DAN) dysfunction.  

      Single-cell RNA sequencing revealed significant transcriptional deregulation of synaptic genes in OPNs across all five fly PD models. This synaptic dysfunction was confirmed by impaired calcium signalling and morphological changes in synaptic OPN terminals. Furthermore, these young PD flies exhibited olfactory behavioural deficits that were rescued by selective expression of wild-type genes in OPNs.  

      Single-nucleus RNA sequencing of post-mortem brain samples from PD patients with LRRK2 risk mutations revealed similar synaptic gene deregulation in cholinergic neurons, particularly in the nucleus basalis of Meynert (NBM). Gene ontology analysis highlighted enrichment for processes related to presynaptic function, protein homeostasis, RNA regulation, and mitochondrial function.  

      This study provides compelling evidence for the early and primary involvement of cholinergic dysfunction in PD pathogenesis, preceding the canonical DAN degeneration. The convergence of familial PD mutations on synaptic dysfunction in cholinergic projection neurons suggests a common mechanism contributing to early non-motor symptoms like hyposmia. The authors also emphasise the potential of targeting cholinergic neurons for early diagnosis and intervention in PD.  

      Strengths:  

      This study presents a novel approach, combining multiple mutants to identify salient disease mechanisms. The quality of the data and analysis is of a high standard, providing compelling evidence for the role of OPN neurons in olfactory dysfunction in PD. The comprehensive single-cell RNA sequencing data from both flies and humans is a valuable resource for the research community. The identification of consistent impairments in cholinergic olfactory neurons, at early disease stages, is a powerful finding that highlights the convergent nature of PD progression. The comparison between fly models and human patients' brains provides strong evidence of the conservation of molecular mechanisms of disease, which can be built upon in further studies using flies to prove causal relationships between the defects described here and neurodegeneration.  

      The identification of specific neurons involved in olfactory dysfunction opens up potential avenues for diagnostic and therapeutic interventions.  

      Weaknesses:  

      The causal relationship between early olfactory dysfunction and later motor symptoms in PD remains unclear. It is also uncertain whether this early defect contributes to neurodegeneration or is simply a reflection of the sensitivity of olfactory neurons to cellular impairments. The study does not investigate whether the observed early olfactory impairment in flies leads to later DAN deficits. Additionally, the single-cell RNA sequencing analysis reveals several affected neuronal populations that are not further explored. The main weakness of the paper is the lack of conclusive evidence linking early olfactory dysfunction to later disease progression.

      We agree that this is an interesting avenue to pursue and as indicated above in Figure 6 and in the reworked manuscript, we have now included data that strengthens the connection between early OPN defects and the later DAN dependent problems. Additional future work will be needed to elucidate the mechanisms of this cell-non autonomous effect. 

      The rationale behind the selection of specific mutants and neuronal populations for further analysis could be better qualified. 

      We have added further explanation in the reworked text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      Minor Comments:  

      (1) Questions about the sequencing methods and analysis approaches. From reading the methods and main text, I was confused about aspects of the Drosophila single-cell profiling. Firstly, did the authors multiplex their fly samples? 

      No, we did not. Genotypes were separately prepared and sequenced, but they were all processed in parallel to avoid batch effects. 

      Secondly, it seems like there are two rounds of dataset integration performed, Harmony and Seurat's CCA-based method. This seems unorthodox. Could the authors comment on why they perform two integrations? 

      Thanks for pointing this out, this was a mistake in the methods section (copied from a much older version of the manuscript). In this manuscript, we only used harmony for dataset integration and removed the methods on Seurat-CCA. 

      Finally, for all dataset integrations please state in the main text how datasets were integrated (by age, genotype, etc). 

      Datasets were integrated by sample id, corresponding to individual libraries.

      (2) The authors focus on OPNs with a really nice set of experiments. I noticed however that Kenyon cells were also dysregulated. What about Olfactory sensory neurons? Could the authors provide comments on this? 

      Olfactory sensory neurons are located in the antennae of the fly brain and were not captured by our analysis. However, the GH146-Gal4-specific rescue experiments indicate these sensory neurons are likely not severely functionally impaired. Kenyon cells are an interesting affected cell type to look at in future experiments, as they are directly connected to DANs.

      (3) There are several citations of Jenett et al 2012 that seem wrong (related to single-cell datasets).

      We are sorry for this and have corrected this in the text.  

      Reviewer #2 (Recommendations For The Authors):  

      (1) In the key resources table, a line called CG5010k.o. (chchd2k.o.) was mentioned, but was not used in the paper. The authors should remove it. 

      Sorry, this was from a previous older version of the manuscript. We fixed this.

      (2) Why did the authors use human CDS for LRRK2, Rab39B, and PINK1, but fly CDS for Aux and Synj1? Is it based on the conservation of amino acid residues? Although the authors cited a review (Kalia & Lang, 2015) to justify the selection of the mutations, for the interest of a broad audience, it is recommended that the authors expand their introduction for the rationale of their selection, including the pathogenicity of each selected mutation, original human genetics evidence, conservation between fly and human. 

      (a) We used Drosophila cDNA for rescue experiments with aux and synj since knockin of the human homologues at the locus of these genes did not rescue its loss-offunction (lethality). 

      (b) We expanded the introduction to provide further explanation on the selection of our mutants we analyzed in this work. We picked these particular mutants, as they were the first we created in the context of a much larger collection of “PD flies” (see also Kaempf et al 2024, BioRxiv). We have made adaptations to the text to tone down the statement on the broad selection of mutants. 

      (3) Supplemental Figure 1a, is mRNA level normalized to an internal control? If not, it is not appropriate to compare the results directly from two primer sets, since each primer set may have different amplification efficiency. 

      We are sorry for the lack of information. Indeed, mRNA levels were determined using the Δ-Δ-CT method, where Ct values were first normalized to the housekeeping gene Rp49, and next expressed as a percent of endogenous Drosophila gene expression. We expanded the methods section and now also enlist the primers for Rp49 along with the other qPCR primers in Supplemental File 1.

      (4) For Figure 2, it may be helpful to have a supplemental table or figure showcasing the clusters with significant changes (based on cell number-adjusted DEGs) for each model, i.e., what are those black cell clusters in Figure 2? "Thus, cellular identity and cellular composition are preserved in young PD fly models." In Figure S2A, the authors only show cell composition percentages for 3 cell clusters, are the bars 95% standard error? 

      The error bars in Supplemental Figure 2a represent the 95 % CI. We have included a new supplemental table with the number of cells per cell cluster for each mutant (Supplemental File 3).

      What about the remaining 183 cell clusters? Are there any KI-model cell clusters that are statistically different than controls? What about the annotated cell types (e.g., the 81 with cell identities)? Please consider at least providing or pointing to a table to state how many have significant differences, or if there are truly none. 

      As mentioned above, we have included a new supplemental table with the number of cells per cell cluster for each mutant (Supplemental File 3).

      (5) What are the rows in the sunburst plot in Figure 3a? Please be more descriptive in the figure legend or label the figure. 

      We have expanded on this in the figure legend and now also include a summary of the SynGO analysis in Supplemental File 7. In Figure 3a, a summary sunburst plot is presented, reflecting the GO terms (inner rings, indicated in a) with their subdivided levels (the complete list is provided in Supplemental File 7). In Figure 3a’ and a” the DEG data acquired from the different datasets (human vs fly) are applied to the sunburst plot where rings are color-coded according to enrichment Q-value.

      (6) In Table S4, which clusters (in the table) have normalized residuals that are outside of the 95% confidence interval of the regression model displayed in Figure S2e? They use this analysis to adjust for cell number bias and point out the "most significant cell clusters" affected in each model. This may be helpful for readers who want to grab a full list of responsive clusters. 

      We have included this information in Supplemental File 5 (Tab “Cell types outside of CIs”) in the supplemental data of the manuscript.

      (7) The human samples used all have different LRRK2 variants: for the crossspecies comparisons, do Lrrk flies have greater similarity to the human PD cases compared to the other fly models?

      No, comparing the vulnerable gene signatures from each of the fly mutants to the DEGs from the human samples does not show any greater similarity between the LRRK mutants compared to the other mutants.

      Reviewer #3 (Recommendations For The Authors):  

      Clarifications required:  

      Some of the mutations used are not common PD-associated genes, the authors should explain the rationale behind using these particular mutants, and not using well-established fly models of PD (like for example GBA flies) or SNCA overexpression.

      We opted to use knock-ins of mutations that are causal to Parkinsonism. Given flies do not express an alpha-synuclein homologue we were not able to add this ‘as such’ to our collection. Future work can indeed also include expression models or risk factor models (like GBA). As also requested by another reviewer, we did add further rationale and explanation to the genes we chose to analyze in this work.

      Why starvation rather than lifespan for PD models? For the lifespan data shown there are no error bars, if the stats test is a log-rank or Cox proportional hazards (usually used in survival analysis, this should be stated), it would also be good to have the survival plots for all the survival during starvation, not just PINK1. 

      While starvation assays can provide valuable insights into acute metabolic and physiological stress responses, we acknowledge that lifespan is a critical parameter and would provide a more comprehensive understanding of the PD models in our study. Based on this consideration and the reviewer’s feedback we have removed the starvation data from the manuscript. Unfortunately, we did not perform lifespan experiments, which is why these data were not included in the manuscript. However, based on our observations (though not detailed analysis), all genotypes tested—except for the PINK1 mutants—appeared to have a normal lifespan. For PINK1 mutants, most flies died by 25 days of age. Therefore, we conducted our assays using 15-day-old PINK1 mutant flies.

      Do the fly models used have different lifespans, and how close to death was the SING assay performed? Different mutations show different effects, most phenotypes are really mild (hRab39BG192R has no phenotype), and PINK1 has the strongest, are these simply reflections of how strong the model is?  

      The ages of flies we analyzed are indicated in the legend. As mentioned before, all but PINK1 mutants- had a normal life span: i.e. we did not detect abnormal low number of flies or premature death at 50 days of age, except for the PINK1 mutants tested in this manuscript where most flies died by 25 days of age. Therefore, we conducted our assays using 15-day-old PINK1 mutant flies.

      Rab39G192R has no phenotype in the tests presented, suggesting no degeneration, why use RabG192R for scRNA seq? Seems an odd choice, the authors should explain. 

      Single-cell sequencing was initiated before the full phenotypic characterization of all mutants was completed. Although basic characterization of the Rab39<sup>G192R</sup> mutant PD flies revealed either no significant phenotypes or only mild effects in the assays performed (Figure 1), the sequencing data provided additional insights into potential cellular and molecular alterations. Furthermore, all PD-mutant knock-ins, including Rab39<sup>G192R</sup> mutant PD flies, show dysfunctional synaptic terminals of their OPN neurons as they had significantly weaker Ca<sup>2+</sup>-responses, even though their synaptic area was increased (Figure 4 g-h). Furthermore, all mutants also had olfactory behavior defects (Figure 5 a). 

      When the authors state that “For example, in the NBM, an area associated with PD (Arendt et al., 1983), 20% of the DEG that has an orthologous gene in the fly are also found among the most deregulated genes across PD fly models" a test should be performed to confirm this is a significant overlap (such as a hypergeometric test). 

      We have performed this test, of the 2486 significantly differential human genes, 1149 have a fly orthologue, and of these, 28.46 % overlap with the deregulated fly genes (5 % top and bottom gene as shown in Supplemental Table 7). Performing a hypergeometric test confirms that this overlap is significant, with a p-value of 9.06e<sup>76</sup>. We have included this in the text.

      The authors speak of deregulation when speaking of the overlap between human and fly DE genes, but do the over-expressed genes in flies overlap with overexpressed genes in humans, or is the direction of transcription deregulation not concordant? If it is mostly not concordant, can the authors please comment as to why they might think that is the case? 

      In our fly experiments, we identified DEG in affected cell types and then defined common DEG by looking at the average change across the fly mutants. Genes that show a consistent change (all or mostly up, or all or mostly down) in the different mutants will end at the top of our list while genes that are up in some mutants and downregulated in others will average out and not end up in our commonly deregulated gene list. For comparison to the human data, we only looked for the presence of the human homologue, but did not assess if the change occurred in the same direction. More work will be needed to define the most relevant changes, but in a mini-screen we did select a number of DEG present in fly and human datasets from different functional categories and tested if they genetically interact with our PD mutants. As shown in Reviewer Figure 3, we find that modulating proteostasis pathway-encoding genes rescue the olfactory preference defect across many PD mutants. 

      Can the authors explain why only the NMB region was used for comparison with the fly data?  

      We used the NMB because this region has the highest number of cholinergic neurons to compare the deregulation in those neurons to the deregulation in the cholinergic OPN of mutant PD flies.

      In Figure 4, can the genotypes please be stated in full and why is the hPINK1 fly giving no detectable signal? 

      Despite several attempts, we failed to knock-in wild type hPink1 in the fly pink1 locus. Therefore, the hPink1 control used throughout the manuscript was the nSybGal4>UAS-hPink1 in Pink1 knock-out background, except for Figure 4. Particularly, for experiments in this figure, we could not use UAS-hPink1 with nSyb-Gal4, since we needed OPN-specific expression of Gal4 to drive UAS-GCamP expression.

      Therefore, this was labeled as “not determined” (“n.d.”), as indicated in the figure and the legend. We explained this better in the methods section, added a remark in the new manuscript and expanded the legend of Figure 4.

      The paper states that" These findings imply that factors affecting the function of cholinergic neurons might, by the absence of insufficient innervation, lead to DAN problems and degeneration, warranting further exploration of the underlying molecular mechanisms", this should be less strong, the paper never looks at DAN, only at OPN neurons. Fly neurons are mostly cholinergic, and human neurons are mostly glutamatergic, so jumping from one system to the other might not be as straightforward, the authors should comment on this. 

      We now included a new exciting experiment where we assessed DAN function in aged PD mutants where the wildtype gene was expressed in OPN using GH146-Gal4. We find this manipulation rescued DAN defects (measured by SING) in older flies. We further corroborated our observation by “replacing” cholinergic innervation with nicotine feeding in PD mutants. Also, this rescues the SING defect as well as the defects in neuronal activity in PAM DAN (based on live synaptic calcium imaging). Finally, we also show that incubating LRRK2<sup>G2019S</sup> mutant human induced dopaminergic neurons with nicotine is sufficient to rescue functional defects in these neurons (measured using calcium imaging). We included this data in the new manuscript and show them also in Figure 6 above (new Figure 6 in the revised manuscript). 

      Experiments that would improve the manuscript:  

      Does rescue of OPN function also rescue later progressive symptoms (geotaxis response)?  

      It does, as indicated in the previous point and shown in Figure 6.

      Do the fly PD models used show DAN degeneration? This could be assessed by stains with anti-TH stains. 

      We quantified DAN cell bodies using anti-TH, but see very little or no loss. There is, however, loss of synaptic innervation of the PAM onto the mushroom bodies. We included the data in a new Figure 6 (see also Figure 6). Furthermore, we have quantified this across the genetic space of familial Parkinsonism in Kaempf et al., 2024, BioRxiv. Note that this phenotype is also rescued by expressing wildtype CDS in their OPN using GH146-Gal4.

      Minor issues: 

      The final sentence on page 5 is repetitive with the introduction. 

      Indeed, we removed the redundant sentence.

      First line of the new section on page 6, the authors probably mean cholinergic olfactory projection neurons, not just cholinergic neurons. 

      Yes, and corrected.

      At the top of page 7 the authors state: "Additionally, we also found enrichment of genes involved in RNA regulation and mitochondrial function that are also important for the functioning of synaptic terminals", where is the data showing this? The authors should point to the supplemental file showing this.  

      We now included a reference to Supplemental File 7 that includes a summary of those data. Additionally, we also included references to back this claim.

      Just before the discussion, Rab39BG193R should be Rab39BG192R.  

      Sorry for this, it is now corrected.

      Stating "fifth row" in Fig 5c and d is confusing, can the figure be labelled more clearly?  

      We modified the figure (including extra marks and colors) and expanded the legend and the main text to differentiate better between expression of the rescues in OPN versus T1 neurons revealing that only expression in OPN neurons rescues the olfactory defects while expression in T1 neurons does not.

      In the methods, the authors describe clustering done both in Scanpy and Seurant, why were both run? Which clustering was used for further analysis?

      We only used Scanpy with Harmony and removed the methods on Seurat-CCA. Thanks for pointing this out, this was a mistake in the methods section (copied from a previous version of the manuscript).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Comments):

      (1) The central concern for this manuscript is the apparent lack of reproducibility. The way the authors discuss the issue (lines 523-554) it sounds as though they are unable to reproduce their initial results (which are reported in the main text), even when previous versions of AlphaFold2 are used. If this is the case, it does not seem that AlphaFold can be a reliable tool for predicting antibody-peptide interactions.

      The driving point behind the multiple sequence alignment (MSA) discussion was indeed to point out that AlphaFold2 (AF2) performance when predicting scFv:peptide complexes is highly dependent upon the MSA, but that is a function of MSA generation algorithm (MMseqs2, HHbiltz, jackhmmer, hhsearch, kalign, etc) and sequence databases, and less an intrinsic function of AF2. It is important to report MSA-dependent performance precisely because this results in changing capabilities with respect to peptide prediction.

      Performance also significantly varies with the target peptide and scFv framework changes. By reporting the varying success rates (as a function of MSA, peptide target, and framework changes) we aim to help future researchers craft modified algorithms that can achieve increased reliability at protein-peptide binding predictions. Ultimately, tracking down how MSA generation details vary results (especially when the MSA’s are hundreds long) is significantly outside the scope of this paper. Our goal for this paper was to show a general method for identification of linear antibody epitopes using only sequence information, and future work by us or others should focus on optimization of the process. 

      (2) Aside from the fundamental issue of reproducibility, the number of validating tests is insufficient to assess the ability of AlphaFold to predict antibody-peptide interactions. Given the authors' use of AlphaFold to identify antibody binding to a linear epitope within a whole protein (in the mBG17:SARS-Cov-2 nucleocapsid protein interaction), they should expand their test set well beyond Myc- and HA-tags using antibody-antigen interactions from existing large structural databases.

      Performing the calculations at the scale that the reviewer is requesting is not feasible at this time. We showed in this manuscript that we were able to predict 3 of 3 epitopes, including one antigen and antibody pair that have not been deposited into the PDB with no homologs. While we feel that an N=3 is acceptable to introduce this method to the scientific community, we will consider adding more examples of success and failure in the future to optimize and refine the method as computational resources become available. Notably, future efforts that attempt high-throughput predictions of this class using existing databases should take particular care to avoid contamination.

      (3) As discussed in lines 358-361, the authors are unsure if their primary control tests (antibody binding to Myc-tag and HA-tag) are included in the training data. Lines 324-330 suggest that even if the peptides are not included in the AlphaFold training data because they contain fewer than 10 amino acids, the antibody structures may very well be included, with an obvious "void" that would be best filled by a peptide. The authors must confirm that their tests are not included in the AlphaFold training data, or re-run the analysis with these templates removed.

      First, we address the simpler question of templates.

      The reruns of AF2 with the local 2022 rebuild, the most reproducible method used with results most on par with the MMSEQS server in the Fall of 2022, were run without templates. This is because the MSA was generated locally; no templates were matched and generated locally. The only information passed then was the locally generated MSA, and the fasta sequence of the unchanging scFv and the dynamic epitope sequence. Because of how well this performed despite the absence of templates, we can confidently say the inclusion of the template flag is not significant with respect to how universally accurately PAbFold can identify the correct epitope. 

      Second, we can partially address the question of whether the AlphaFold models had access to models suitable, in theory, for “memorization” of pertinent structural details. 

      With respect to tracking the exact role and inclusion of specific PDB entries, the AF2 paper provides the following:

      “Structures from the PDB were used for training and as templates (https://www.wwpdb.org/ftp/pdb-ftp-sites; for the associated sequence data and 40% sequence clustering see also https://ftp.wwpdb.org/pub/pdb/derived_data/ and https://cdn.rcsb.org/resources/sequence/clusters/bc-40.out). Training used a version of the PDB downloaded 28 August 2019, while the CASP14 template search used a version downloaded 14 May 2020. The template search also used the PDB70 database, downloaded 13 May 2020 (https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/).”

      Three of these links are dead. As such, it is difficult to definitively assess the role of any particular PDB entry with respect to AF2 training/testing, nor what impact homologous training structures given the very large number of immunoglobin structures in the training set. That said, we can summarize information for the potentially relevant PDB entries (l 2or9, which is shown in Fig. 1 and 1frg), and believe it is most conservative to assume that each such entry was within the training set.

      PDB entry 2or9 (released 2008): the anti-c-myc antibody 9E10 Fab fragment in complex with an 11-amino acid synthetic epitope: EQKLISEEDLN. This crystal structure is also noteworthy for featuring a binding mode where the peptide is pinned between two Fab. The apo structure (2orb) is also in the database but lacks the peptide and a resolved structure for CDR H3.

      PDB entry 1a93 (released 1998): a c-Myc-Max leucine zipper structure, where the c-Myc epitope (in a 34-amino acid protein) adopts an alpha helical conformation completely different from the epitope captured in entry 2or9.

      PDB entries 5xcs and 5xcu (released 2017): engineered Fv-clasps (scFv alternatives) in complex with the 9-amino acid synthetic HA epitope: YPYDVPDYA.

      PDB entry 1frg (released 1994): anti-HA peptide Fab in complex with HA epitope subset Ace-DVPDYASL-NH2.

      Since the 2or9 entry has our target epitope (10 aa) embedded within an 11aa sequence, we have revised this line in the manuscript:

      The AlphaFold2 training set was reported to exclude chains of less than 10, which would eliminate the myc and HA epitope peptides. => The AlphaFold2 training set was reported to exclude chains of less than 10, which would eliminate the HA epitope peptide from potential training PDB entries such as 5xcs or 5xcu”

      It is important to note that we obtained the best prediction performance for the scFv:peptide pair that had no pertinent PDB entries (mBG17). Specifically, doing a Protein Blast against the PDB using the mBG17 scFv revealed diverse homologs, but a maximum sequence identity of 89.8% for the heavy chain (to an unrelated antibody) and 93.8% for the light chain (to an unrelated antibody). Additionally, while it is possible that the AF2 models might have learned from the complex in pdb entry 2or9, Supplemental Figure 3 shows how often the peptide is “misplaced”, and the performance does not exceed the performance for mBG17.

      (4) The ability of AlphaFold to refine the linear epitope of antibody mBG17 is quite impressive and robust to the reproducibility issues the authors have run into. However, Figure 4 seems to suggest that the target epitope adopts an alpha-helical structure. This may be why the score is so high and the prediction is so robust. It would be very useful to see along with the pLDDT by residue plots a structure prediction by residue plot. This would help to see if the high confidence pLDDT is coming more from confidence in the docking of the peptide or confidence in the structure of the peptide.

      The reviewer is correct that target mBG17 epitope adopts an alpha helical conformation, and we concur that this likely contributes to the more reliable structure prediction performance.  When we predict the structure of the epitope alone without the mBG17 scFv, AF2 confidently predicts an alpha helix with an average pLDDT of 88.2 (ranging from 74.6 to 94.4). 

      Author response image 1.

      The AF2 prediction for the mBG17 epitope by itself.

      However, as one interesting point of comparison, a 10 a.a. poly-alanine peptide is also consistently folded into an alpha-helical coil by AF2. The A<sub>10</sub> peptide is also predicted to bind among the traditional scFv CDR loops, but the pLDDT scores are very poor (Supplemental Figure 5J). We also observed the opposite case; when a peptide has a very unstructured region in the binding domain but is nonetheless still be placed confidently, as seen in Supplemental Figure 3 C&D. Therefore, while we suspect peptides with strong alpha helical propensity are more likely to be accurately predicted, the data suggests that that alpha helix adoption is neither necessary nor sufficient to reach a confident prediction.

      (5) Related to the above comment, pLDDT is insufficient as a metric for assessing antibody antigen interactions. There is a chance (as is nicely shown in Figure S3C) that AlphaFold can be confident and wrong. Here we see two orange-yellow dots (fairly high confidence) that place the peptide COM far from the true binding region. While running the recommended larger validation above, the authors should also include a peptide RMSD or COM distance metric, to show that the peptide identity is confident, and the peptide placement is roughly correct. These predictions are not nearly as valuable if AlphaFold is getting the right answer for the wrong reasons (i.e. high pLDDT but peptide binding to a nonCDR loop region). Eventual users of the software will likely want to make point mutations or perturb the binding regions identified by the structural predictions (as the authors do in Figure 4).

      We agree with the reviewer that pLDDT is not a perfect metric, and we are following with great interest the evolving community discussion as to what metrics are most predictive of binding affinity (e.g. pAE, or pITM as a decent predictor for binding, but not affinity ranking). To our knowledge, there is not yet a consensus for the most predictive metrics for protein:protein binding nor protein:peptide binding. Intriguingly, since the antigen peptides are so small in our case, the pLDDT of the peptide residues should be mostly reporting on the confidence of the distances to neighboring protein residues.

      As to the suggestion for a RMSD or COM distance metric, we agree that these are useful -with the caveat that these require a reference structure. The goal of our method is to quickly narrow down candidate linear epitopes and thereby guide experimentalists to more efficiently determine the actual binding sequence of an antibody-antigen sequence. Presumably this would not be necessary if a reference structure were known. 

      It may also be possible to invent a method to filter unlikely binding modes that is specific to antibodies and peptide epitopes that does not require a known reference structure, but this would be an interesting problem for subsequent study.

      Reviewer 1 (Recommendations for the Authors):

      (1) "Linear epitope" should be more precisely defined in the text. It isn't clear whether the authors hope that they can use AlphaFold to predict where on a given protein antigen an antibody will bind, or which antigenic peptide the antibody will bind to. The authors discuss both problems, and there is an important distinction between the two. If the authors are only concerned with isolated antigenic peptides, rather than linear epitopes in their full length structural contexts, they should be more precise in the introduction and discussion.

      We thank the reviewer for the prompt towards higher precision. We are using the short contiguous antigen definition of “linear epitope” that depends on secondary rather than tertiary structure. The linear epitopes this paper considers are short “peptides” that form secondary structure independent of their structure in the complete folded antigen protein. We have clarified our definition of “linear epitope” in the text (lines 64-66). 

      (2) Line 101: "Not all portions of the antibody are critical". First, this is not consistent with the literature, particularly where computational biology is concerned.

      See https://pubs.acs.org/doi/10.1021/acs.jctc.7b00080 . Second, while I largely agree with what I think the authors are trying to say (that we can largely reduce the problem to the CDR loops), this is inconsistent with what the authors later find, which is that inexplicably the VH/VL scaffold used alters results strongly.

      We have adopted verbiage that should be less provocative: “Fortunately, with respect to epitope specificity, antibody constant domains are less critical than the CDR loops and the remainder of the variable domain framework regions.”

      (3) Related to the above comment, do the authors have any idea why epitope prediction performance improved for the chimeric scFvs? Is this due to some stochasticity in AlphaFold? Or is there something systematic? Expanding the test dataset would again help answer this question.

      We agree that future study with a larger test set could help address this intriguing result, for which we currently lack a conclusive explanation. Part of our motivation for this publication was to bring to light this unexpected result. Notably, these framework differences are not only implicated as a factor in driving AF2 performance, but also changing experimental intracellular performance as reported by our group (DOI: 10.1038/s41467-019-10846-1 ). We can generate a variety of hypotheses for this phenomenon. Just as MSA sub-sampling has been a popular approach to drive AF2 to sample alternative conformations, sequence recombination may be a generically effective way to generate usefully different binding predictions. However, it is difficult to discriminate between recombination inducing subtle structural tweaks that increase protein intracellular fitness and binding, from recombination causing changes to the MSA that affect the likelihood of sampling a good epitope binding conformation. It is also possible that the chimeras are more deftly predicted by AF2 due to differences in sequence representation during the training of the AF2 models (e.g. more exposure to models containing 15F11 or 2E2 structures). We attempted to deconvolute MSA differences by using single-sequence mode (Supplementary Figure 13) but this ablated performance.

      (4) Figure 2: The reported consensus pLDDT scores are actually quite low here, suggesting low confidence in the result. This is in strong contrast to the reported consensus scores for mBG17. Again, a larger test dataset would help set a quantitative cutoff for where to draw the line for "trustworthy" AlphaFold predictions in antibody-peptide binding applications.

      We agree that a larger dataset will be useful to begin to establish metrics and thresholds and will contribute to the aforementioned community discussion about reliable predictors of binding. Our current focus is not structure prediction per se. In the current work we are more focused on relative binding likelihood and increasing the efficiency of experimental epitope verification by flagging the most likely linear epitopes. Thus, while the pLDDT scores are low for Myc in Figure 2, it is remarkable (and worth reporting) that there is still useful signal in the relative variation in pLDDT. The utility of the signal variation is evident in the ability to short-list correct lead peptides via the two methods we demonstrate (consensus and per-residue max).

      (5) Figure 4: if the authors are going to draw conclusions from the actual structure predictions of AlphaFold (not just the pLDDT scores), the side-chain accuracy placement should be assessed in the test dataset (RMSD or COM distance).

      We agree with the reviewer that side-chain placement accuracy is important when evaluating the accuracy of AF2 structure predictions. However, here our focus was relative binding likelihood rather than structure prediction. The one case where we attempted to draw conclusions from the structure prediction was in the context of mBG17, where there is not yet an experimental reference structure. Absolutely, if we were to obtain a crystal structure for that complex, we would assess side-chain placement accuracy. 

      (6) Lines 493-508: I am not sure that this assessment for why AlphaFold has difficulty with antibody-antigen interactions is correct. If the authors' interpretation is correct (larger complicated structures are more challenging to move) then AlphaFold-Multimer (https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2.full) wouldn't perform as well as it does. Instead, the issue is likely due to the incredibly high diversity in antibody CDR loops, which reduces the ability of the AlphaFold MSA step (which the authors show is quite critical to predictions: Figure S13) to inform structure prediction. This, coupled with the importance of side chain placement in antibody and TCR interactions, which is notoriously difficult (https://elifesciences.org/articles/90681), are likely the largest source of uncertainty in antibody-antigen interaction prediction.

      We agree with the reviewer that CDR loop diversity (and associated side chain placement challenges) are a major barrier to successfully predict antibody-antigen complexes. Presumably this is true for both peptide antigens and protein antigens. Indeed, the authors of AlphaFold-multimer admit that the updated model struggles with antibody-antigen complexes, saying “As a limitation, we observe anecdotally that AlphaFold-Multimer is generally not able to predict binding of antibodies and this remains an area for future work.” The point about how loop diversity could reduce MSA quality is well taken. We have included the following thanks to the guidance of the reviewer when discussing MSA sensitivity is discussed later on in lines 570-572.: 

      “These challenges are presumably compounded by the incredible diversity of the CDR loops in antibodies which could decrease the useful signal from the MSA as well as drive inconsistent MSA-dependent performance”.

      With respect to lines 493-508, we have also rephrased a key sentence to try to better explain that we are comparing the often-good recognition performance for short epitopes to the never-good performance when those epitopes are embedded within larger sequences. Instead of saying, “In contrast, a larger and complicated structure may be more challenging to move during the AlphaFold2 structure prediction or recycle steps.” we now say in lines 520-522 , “In contrast, embedding the epitope within a larger and more complicated structure appears to degrade the ability of AlphaFold2 to sample a comparable bound structure within the allotted recycle steps.”

      (7) Related to major comment 1: Are AlphaFold predictions deterministic? That is, if you run the same peptide through the PAbFold pipeline 20 times, will you get the same pLDDT score 20 times? The lack of reproducibility may be in part due to stochasticity in AlphaFold, which the authors could actually leverage to provide more consistent results.

      This is a good question that we addressed while dissecting the variable performance. When the random seed is fixed, AF2 returns the same prediction every time. After running this 10 times with a fixed seed, the mBG17 epitope was predicted with an average pLDDT of 88.94, with a standard deviation of 1.4 x 10<sup>-14</sup>. In contrast, when no seed is specified, AF2 did not return an *identical* result. However, the results were still remarkably consistent. Running the mBG17 epitope prediction 10 times with a different seed gave an average pLDDT of 89.24, with a standard deviation of 0.49. 

      (8) Related to major comment 2: The authors could use, for example, this previous survey of 1833 antibody-antigen interactions (https://www.sciencedirect.com/science/article/pii/S2001037023004725) the authors could likely pull out multiple linear epitopes to test AlphaFold's performance on antibody peptide interactions. A large number of tests are necessary for validation.

      We thank the reviewer for this report of antibody-antigen interactions and will use it as a source of complexes in a future expanded study. Given the quantity and complexity of the data that we are already providing, as well as logistical challenges for compute and personnel the reviewer is asking for, we must defer this expansion to future work.

      (9) Related to major comment 3: Apologies if this is too informal for a review, but this Issue on the AlphaFold GitHub may be useful: https://github.com/googledeepmind/alphafold/issues/416 .

      We thank the reviewer for the suggestion – per our response above we have indeed run predictions with no templates. Since we are using local AlphaFold2 calculations with localcolabfold, the use or non-use of templates is fairly simple: including a “—templates” flag or not.

      (10) Related to major comment 4: I am not sure if AlphaFold outputs by-residue secondary structure prediction by default, but I know that Phyre2 does http://www.sbg.bio.ic.ac.uk/~phyre2/html/page.cgi?id=index .

      To our knowledge, AF2 does not predict secondary structure independent of the predicted tertiary structure. When we need to analyze the secondary structure we typically use the program DSSP from the tertiary structure. 

      (11) The documentation for this software is incomplete. The GitHub ReadMe should include complete guidelines for users with details of expected outputs, along with a thorough step-by-step walkthrough for use.

      We thank the reviewer for pointing this out, but we feel that the level of detail we provide in the GitHub is sufficient for users to utilize the method described.

      Stylistic comments:

      (1) I do not think that the heatmaps (as in 1C, top) add much information for the reader. They are largely uniform across the y-axis (to my eyes), and the information is better conveyed by the bar and line graphs (as in 1C, middle and bottom panels).

      We thank the reviewer for this feedback but elect to leave it in on the premise of more data presented is (usually) better. Including the y-axis reveals common patterns such as the lower confidence of the peptide termini, as well as the lack of some patterns that might have occurred. For example, if a subset of five contiguous residues was necessary and sufficient for local high confidence this could be visually apparent as a “staircase” in the heat map.

      (2) A discussion of some of the shortcomings of other prediction-based software (lines 7177) might be useful. Why are these tools less well-equipped than AlphaFold for this problem? And if they have tried to predict antibody-antigen interactions, why have they failed?

      We agree with the reviewer that a broader review of multiple methods would be interesting and useful. One challenge is that the suite of available methods is evolving rapidly, though only a subset work for multimeric systems. Some detail on deficiencies of other approaches was provided in lines 71-77 originally, although we did not go into exhaustive detail since we wanted to focus on AF2. We view using AF2 in this manner is novel and that providing additional options predict antibody epitopes will be of interest to the scientific community. We also chose AF2 because we have ample experience with it and is a software that many in the scientific community are already using and comfortable with. Additionally, AF2 provided us with a quantification parameter (pLDDT) to assess the peptides’ binding abilities. We think a future study that compares the ability of multiple emerging tools for scFv:peptide prediction will be quite interesting. 

      (3) Similar to the above comment, more discussion focused on why AlphaFold2 fails for antibodies (lines 126-128) might be useful for readers.  

      We thank the reviewer for the suggestion. The following line has been added shortly after lines 135-137:

      “Another reason for selecting AF2 is to attempt to quantify its abilities the compare simple linear epitopes, since the team behind AF-multimer reported that conformational antibody complexes were difficult to predict accurately (14).”

      Per earlier responses, we also added text that flags one particular possible reason for the general difficulty of predicting antibody-antigen complexes (the diversity of the CDR loops and associated MSA challenges).

      (4) The first two paragraphs of the results section (lines 226-254) could likely be moved to the Methods. Additionally, details of how the scores are calculated, not just how the commands are run in python, would be useful.

      Per the reviewer suggestion, we moved this section to the end of the Methods section. Also, to aid in the reader’s digestion of the analysis, the following text has been added to the Results section (lines 256-264):

      “Both the ‘Simple Max’ and ‘Consensus’ methods were calculated first by parsing every pLDDT score received by every residue in the antigen sequence sliding window output structures. From the resulting data structure, the Simple Max method simply finds the maximum pLDDT value ever seen for a single residue (across all sliding windows and AF2 models). For the Consensus method, per-residue pLDDT was first averaged across the 5 AF2 models. These averages are reported in the heatmap view, and further averaged per sliding window for the bar chart below.

      In principle, the strategy behind the Consensus method is to take into account agreement across the 5 AF2 models and provide insight into the confidence of entire epitopes (whole sliding windows of n=10 default) instead of disconnected, per-residue pLDDT maxima.” 

      (5) Figure 1 would be more useful if you could differentiate specifically how the Consensus and Simple Max scoring is different. Providing examples for how and why the top 5 peptide hits can change (quite significantly) using both methods would greatly help readers understand what is going on.

      Per the reviewer suggestion, we have added text to discuss the variable hit selection that results from the two scoring metrics. The new text (lines 264-271) adds onto the added text block immediately above:

      “Having two scoring metrics is useful because the selection of predicted hits can differ. As shown in Figure 2, part of the Myc epitope makes it into the top 5 peptides when selection is based on summing per-residue maximum pLDDT (despite there being no requirement that these values originate in the same physical prediction). In contrast, a Consensus method score more directly reports on a specific sliding window, and the strength of the highest confidence peptides is more directly revealed with superior signal to noise as shown in Figure 3. Variability in the ranking of top hits between the two methods arises from the fundamental difference in strategy (peptide-centric or residue-centric scoring) as well as close competition between the raw AF2 confidence in the known peptide and competing decoy sequences.”

      (6) Hopefully the reproducibility issue is alleviated, but if not the discussion of it (lines 523554) should be moved to the supplement or an appendix.

      The ability of the original AF2 model to predict protein-protein complexes was an emergent behavior, and then an explicit training goal for AF2.multimer. In this vein, the ability to predict scFv:peptide complexes is also an emergent capability of these models. It is our hope that by highlighting this capacity, as well as the high level of sensitivity, that this capability will be enhanced and not degraded in future models/algorithms (both general and specialized). In this regard, with an eye towards progress, we think it is actually important to put this issue in the scientific foreground rather than the background. When it comes to improving machine learning methods negative results are also exceedingly important.

      Reviewer 2 (Recommendations for the Author):

      - Line 113, page 3 - the structures of the novel scFv chimeras can be rapidly and confidently be predicted by AlphaFold2 to the structures of the novel scFv chimeras can be rapidly and confidently predicted by AlphaFold2.

      The superfluous “be” was removed from the text.

      - Line 276 and 278 page 9 - peptide sequences QKLSEEDLL and EQKLSEEDL in the text are different from the sequences reported in Figures 1 and 2 (QKLISEEDLL and EQKLISEEDL). Please check throughout the manuscript and also in the Figure caption (as in Figure 2).

      These changes were made throughout the text. 

      - I would include how you calculate the pLDDT score for both Simple Max approach and Consensus analysis.

      Good suggestion, this should be covered via the additions noted above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors bring together implanted radiofrequency coils, high-field MRI imaging, awake animal imaging, and sensory stimulation methods in a technological demonstration. The results are very detailed descriptions of the sensory systems under investigation.

      Strengths:

      - The maps are qualitatively excellent for rodent whole-brain imaging. - The design of the holder and the coil is pretty clever.

      Weaknesses:

      - Some unexpected regions appear on the whole brain maps, and the discussion of these regions is succinct.

      - The authors do not make the work and e ort to train the animals and average the data from several hundred trials apparent enough. This is important for any reader who would like to consider implementing this technology.

      - The data is not available. This does not let the readers make their own assessment of the results.

      Thank you for the comments on this manuscript. We have provided more detailed discussion of the unexpected regions(page 18 – line 491-494) and training procedures(page7-9 – line 172-236). We also uploaded the datasets to OpenNeuro 

      Whisker (https://doi.org/10.18112/openneuro.ds005496.v1.0.1),  Visual (https://doi.org/10.18112/openneuro.ds005497.v1.0.0) and Zenodo:

      SNR Line Profile Data & Data Processing Scripts:  (https://zenodo.org/doi/10.5281/zenodo.13821455). 

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Hike et al. entitled 'High-resolution awake mouse fMRI at 14 Tesla' describes the implementation of awake mouse BOLD-fMRI at high field. This work is timely as the field of mouse fMRI is working toward collecting high-quality data from awake animals. Imaging awake subjects o ers opportunities to study brain function that are otherwise not possible under the more common anesthetized conditions. Not to mention the confounding e  ects that anesthesia has on neurovascular coupling. What has made progress in this area slow (relative to other imaging approaches like optical imaging) is the environment within the MRI scanner (high acoustic noise) - as well as the intolerance of head and body motion. This work adds to a relatively small, but quickly growing literature on awake mouse fMRI. The findings in the study include testing of an implanted head-coil (for MRI data reception). Two designs are described and the SNR of these units at 9.4T and 14T are reported. Further, responses to visual as well as whisker stimulation recorded in acclimated awake mice are shown. The most interesting finding, and most novel, is the observation that mice seem to learn to anticipate the presentation of the stimulus - as demonstrated by activations evident ~6 seconds prior to the presentation of the stimulus when stimuli are delivered at regular intervals (but not when stimuli are presented at random intervals). These kinds of studies are very challenging to do. The surgical preparation and length of time invested into training animals are grueling. I also see this work as a step in the right direction and evidence of the foundations for lots of interesting future work. However, I also found a few shortcomings listed below.

      Weaknesses:

      (1) The surface coil, although o ering a great SNR boost at the surface, ultimately comes at a cost of lower SNR in deeper more removed brain regions in comparison to commercially available Bruker coils (at room temperature). This should be quantified. A rough comparison in SNR is drawn between the implanted coils and the Bruker Cryoprobe - this should be a quantitative comparison (if possible) - including any di erences in SNR in deeper brain structures. There are drawbacks to the Cryoprobe, which can be discussed, but a more thorough comparison between the implanted coils, and other existing options should be provided (the Cryoprobe has been used previously in awake mouse experiments(Sensory evoked fMRI paradigms in awake mice - Chen, Physiological e ects of a habituation procedure for functional MRI in awake mice using a cryogenic radiofrequency probe – Yoshida, PREVIOUS REFERENCE). Further, the details of how to build the implanted coils should be provided (shared) - this should include a parts list as well as detailed instructions on how to build the units. Also, how expensive are they? And can they be reused?

      Thank you for the comment. We did not use a Bruker Cryoprobe for this work but rather a Bruker 4array surface coil. We are unable to compare to a cryoprobe since we do not have access to one for our system. A comparison to previously published data using different scanners could be possible but would require the sequence contain identical parameters to avoid introducing an uncontrollable variable, we are planning to recruit different laboratories to test the implanted RF coils with their existing cryoprobes in the future study. 

      We have included an updated figure comparing SNR at different depths across the Bruker 4-array coil and the implanted RF coils. As shown in Supplementary Figure 7B, there is significant SNR enhancement up to 4 mm cortical depth for both single loop and Figure 8 implanted RF coils in comparison to the Bruker 4-array coil.

      Author response image 1.

      Comparison between implanted and commercial coils. A shows representative coils in the single loop (left) and figure 8 styles (right). Supplementary Table 1 provides a parts list and cost for making these coils and Supplementary Figure 1 provides a circuit diagram to assemble. B presents the SNR line profile values as a function of distance from Pia Matter for each coil tested at 9.4T: commercial phased array surface coil (4 Array), implanted single loop, and implanted figure 8. SNR values were calculated by dividing the signal by the standard deviation of the noise. C-E shows a representative FLASH image with line profile of SNR measurements from each of the coils used to create the graph seen in B. Clear visual improvement in SNR can be seen in figures C-E. C – Commercial phased array. D – Single loop at 9.4T. E – Figure 8 at 9.4T. (N4 array = 6, Nsingle loop = 5, Nfigure 8 = 5)

      Additionally, we have added a supplementary figure (supp fig 1) of a circuit diagram, in an effort to disseminate the prototype design of the coils to other laboratories. We have included a detailed parts list with the cost for construction of the coils configured for our scanner(supp table 1). These specifics though would need to be adjusted to the precise field strength/bore size/animal the coil was being built for. As for reusability, the copper wire is cemented to the animal skull and this implantable coil should be considered as consumables for the awake mouse experiments, though the PCB parts can be retrieved.  

      (2) In the introduction, the authors state that "Awake mouse fMRI has been well investigated". I disagree with this statement and others in the manuscript that gives the reader the impression that awake experiments are not a challenging and unresolved approach to fMRI experiments in mice (or rodents). Although there are multiple labs (maybe 15 worldwide) that have conducted awake mouse experiments (with varying degrees of success/thoroughness), we are far from a standardized approach. This is a strength of the current work and should be highlighted as such. I encourage the authors to read the recent systematic review that was published on this topic in Cerebral Cortex by Mandino et al. There are several elements in there that should influence the tone of this piece including awake mouse implementations with the Bruker Cryoprobe, prevalence of surgical preparations, and evaluations of stress.

      Thank you for the comment. We agree with the reviewer that the current stage of awake mouse fMRI studies remains to be improved.  And, we have revised the Introduction to highlight the state-of-theart of awake mouse fMRI (Page 4 – line 81-88). 

      (3) The authors also comment on implanted coils reducing animal stress - I don't know where this comment is coming from, as this has not been reported in the literature (to my knowledge) and the authors don't appear to have evaluated stress in their mice. 

      Since question 3 and 4 are highly related to the acclimation procedures, we will answer the two questions together.   

      (4) Following on the above point, measures of motion, stress, and more details on the acclimation procedure that was implemented in this study should be included.

      We thank the reviewer to raise the animal training issues.  

      During the animal training, we have measured both pupil dynamic and eye motion features from training sessions, of which the detailed procedure is described in Methods (page 7-9 – line 172236). 

      The training procedure is carried out over a total of 5 weeks with four phases of training: i. Holding animal in hands, ii. Head-fixation and pupillometry, iii. Head-fixation and pupillometry with mockMRI acoustic exposure, iv. Head-fixation and pupillometry with Echo-Planar-Imaging (EPI) in the MR scanner.

      Author response table 1.

      As shown in Supp Fig 2B, the spectral power of pupil dynamics (<0.02Hz) and eye movements gradually increased as a function of the training time for head-fixed mice exposed to the mock MRI acoustic environment during phase 3.  In phase 4, when head-fixed mice were put into the scanner for the first time, both eye movements and pupil dynamics were initially reduced during scanning but recovered to an acclimated state on Day 2, similar to the level on Day 8 of phase 3.  These behavioral outputs would provide an alternative way to monitor the stress levels of the mice. 

      Author response image 2.

      The eye movements (A) and power spectra of pupil dynamics (<0.02Hz) (B) change during different training phases.

      It should be noted that stress may be related to increased frequency of eye blinking or twitching movements in human subjects(1–3). Whereas, the eyeblink of head-fixed mice has been used for behavioral conditioning to investigate motor learning in normal behaving mice(4–6). Importantly, head-fixed mouse studies have shown that eye movements are significantly reduced compared to the free-moving mice(7). The increased eye movement during acclimation process would indicate an alleviated stress level of the head-fixed mice in our cases. Meanwhile, stress-related pupillary dilation could dominate the pupil dynamics at the early phase of training(8). We have observed a gradually increased pupil dynamic power spectrum at the ultra-slow frequency during phase 3, presenting the alleviated stress-related pupil dilation but recovered pupil dynamics to other factors, including arousal, locomotion, startles, etc. in normal behaving mice.  Despite the extensive training procedure of the present work in comparison to the existing awake mouse fMRI studies (training strategies for awake mice fMRI have been reviewed by Mandino et al. to show the overall training duration of existing studies(9)), the stress remains a confounding factor for the brain functional mapping in head-fixed mice. In particular, a recent study(10) shows that the corticosterone concentration in the blood samples of head-fixed mice is significantly reduced on Day 25 following the training but remains higher than in the control mice. In the discussion section, we have discussed the potential issues of stress-related confounding factors for awake mouse fMRI studies (Page 16 – lines 436-458). 

      (1) A. Marcos-Ramiro, D. Pizarro-Perez, M. Marron-Romera, D. Gatica-Perez, Automatic blinking detection towards stress discovery. ICMI 2014 - Proceedings of the 2014 International Conference on Multimodal Interaction 307–310 (2014). https://doi.org/10.1145/2663204.2663239/SUPPL_FILE/ICMI1520.MP4.

      (2) M. Haak, S. Bos, S. Panic, L. Rothkrantz, DETECTING STRESS USING EYE BLINKS AND BRAIN ACTIVITY FROM EEG SIGNALS. Lance 21, 76 (2009).

      (3) E. Del Carretto Di Ponti E Sessam, Exploring the impact of Stress and Cognitive Workload on Eye Movements: A Preliminary Study. (2023).

      (4) S. A. Heiney, M. P. Wohl, S. N. Chettih, L. I. Ru olo, J. F. Medina, Cerebellar-dependent expression of motor learning during eyeblink conditioning in head-fixed mice. J Neurosci 34, 14845–14853 (2014).

      (5) S. N. Chettih, S. D. Mcdougle, L. I. Ruffolo, J. F. Medina, Adaptive timing of motor output in the mouse: The role of movement oscillations in eyelid conditioning. Front Integr Neurosci 5, 12996 (2011).

      (6) J. J. Siegel, et al., Trace Eyeblink Conditioning in Mice Is Dependent upon the Dorsal Medial Prefrontal Cortex, Cerebellum, and Amygdala: Behavioral Characterization and Functional Circuitry. eNeuro 2, 51–65 (2015).

      (7) A. F. Meyer, J. O’Keefe, J. Poort, Two Distinct Types of Eye-Head Coupling in Freely Moving Mice. Current Biology 30, 2116-2130.e6 (2020).

      (8) H. Zeng, Y. Jiang, S. Beer-Hammer, X. Yu, Awake Mouse fMRI and Pupillary Recordings in the UltraHigh Magnetic Field. Front Neurosci 16, 886709 (2022).

      (9) F. Mandino, S. Vujic, J. Grandjean, E. M. R. Lake, Where do we stand on fMRI in awake mice? Cereb Cortex 34 (2024).

      (10) K. Juczewski, J. A. Koussa, A. J. Kesner, J. O. Lee, D. M. Lovinger, Stress and behavioral correlates in the head-fixed method: stress measurements, habituation dynamics, locomotion, and motor-skill learning in mice. Scientific Reports 2020 10:1 10, 1–19 (2020).

      (5) It wasn't clear to me at what times the loop versus "Figure 8" coil was being used, nor how many mice (or how much data) were included in each experiment/plot. There is also no mention of biological sex.

      Thank you for the comment. We have clarified sex and number. The figure 8 coil was only used as part of development to show the improvement of the coil design for cortical measurements. The detailed information is described in Method (Page 6 – line 127-129 & Page 10 – line 269-270). Additionally animal numbers have been included in the figure captions.

      (6) Building on the points above, the manuscript overall lacks experimental detail (especially since the format has the results prior to the methods).

      Thank you for the comment. We have modified the manuscript to increase the experimental detail and moved the methods section before the results.

      (7) An observation is made in the manuscript that there is an appreciable amount of negative BOLD signal. The authors speculate that this may come from astrocyte-mediated BOLD during brain state changes (and cite anesthetized rat and non-human primate experiments). This is very strange to me. First, the negative BOLD signal is not plotted (please do this), further, there are studies in awake mice that measure astrocyte activation eliciting positive BOLD responses (see Takata et al. in Glia, 2017).

      We thank the reviewer to raise the negative BOLD fMRI observation issue.  We added a subplot of the negative BOLD signal changes in the revised Figure 4. This negative BOLD signals across cortical areas could be coupled with brain state changes upon air-pu -induced startle responses. Our future studies are focusing on elucidating the brain-wide activity changes of awake mice with fMRI.  We also provide a detailed discussion of the potential mechanism underlying the negative BOLD fMRI signals. First, as reported in the paper (suggested  by the reviewer),  astrocytic Ca2+ transients coincide with positive BOLD responses in the activated cortical areas, which is aligning with the neurovascular coupling (NVC) mechanism. However, there is emerging evidence to show that astrocytic Ca2+ transients are coupled with both positive and negative BOLD responses in anesthetized rats(11) and awake mice(12). An intriguing observation is that cortex-wide negative BOLD signals coupled with the spontaneous astrocytic Ca2+ transients could co-exist with the positive BOLD signal detected at the activated cortex.  Studies have shown that astrocytes are involved in regulating brain state changes(13), in particular, during locomotion(14) and startle responses(15). These brain state-dependent global negative BOLD responses are also related to the arousal changes of both non-human primates(16) and human subjects(17).  The established awake mouse fMRI platform with ultra-high spatial resolution will enable the brain-wide activity mapping of the functional nuclei contributing to the brain state changes of head-fixed awake mice in future studies. (Page 17-18 – Line 478-490)

      (11) M. Wang, Y. He, T. J. Sejnowski, X. Yu, Brain-state dependent astrocytic Ca2+ signals are coupled to both positive and negative BOLD-fMRI signals. Proc Natl Acad Sci U S A 115, E1647–E1656 (2018).

      (12) C. Tong, Y. Zou, Y. Xia, W. Li, Z. Liang, Astrocytic calcium signal bidirectionally regulated BOLD-fMRI signals in awake mice in Proc. Intl. Soc. Mag. Reson. Med. 32, (2024).

      (13) K. E. Poskanzer, R. Yuste, Astrocytes regulate cortical state switching in vivo. Proc Natl Acad Sci U S A 113, E2675–E2684 (2016).

      (14) M. Paukert, et al., Norepinephrine controls astroglial responsiveness to local circuit activity. Neuron 82, 1263–1270 (2014).

      (15) R. Srinivasan, et al., Ca2+ signaling in astrocytes from IP3R2−/− mice in brain slices and during startle responses in vivo. Nat Neurosci 18, 708 (2015).

      (16) C. Chang, et al., Tracking brain arousal fluctuations with fMRI. Proc Natl Acad Sci U S A 113, 4518– 4523 (2016).

      (17) B. Setzer, et al., A temporal sequence of thalamic activity unfolds at transitions in behavioral arousal state. Nat Commun 13 (2022).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I really enjoyed this work. The maps shown are among the best-quality maps out there. Here are suggestions to the authors.

      (1) Both the ACA and VRA are rather unexpected. The authors explain these briefly as being part of the associative cortical areas. Both the ACA and VRA are not canonical associative areas (or at least not to us). This warrants a stronger discussion.

      To verify both ACA and VRA as associate areas, we provide the  connectivity map projections from the Allen Brain Atlas (seen below). These projections are derived from a Cre-dependent AAV tracing of axonal projections. We have included an explanation of this in the introduction. 

      Author response image 3.

      Representative images are shown indicating connections between the barrel cortex and retrosplenial area from an injection in the barrel cortex (Left panel) as well as the visual cortex and cingulate connection from an injection in the visual cortex (Right panel). Images are of connectivity map projections from the Allen Brain Atlas derived from a Cre-dependent AAV tracing of axonal projections

      (2) This is a lot of work. But looking at the figures, this is not obvious. We read in the caption that several hundred trials were used. It would be good to also specify how many mice. It would be clearer to represent this info in the figure as well to support the fact that this is not a trivial acquisition.

      Thank the reviewer to raise the e ort issue. We have edited the figure to include this information and included the numbers in the text as well

      (3) The training protocol is seemingly extensive, but this is only visible by following another reference. Including a description in this work would help the reader make sense of the effort that went into this work.

      We thank the reviewer to raise the training protocol issue. We have more thoroughly discussed the training method used for this study (page 7-9 – line 172-236)

      (4) I really would love to see that dataset made freely available - this should be the norm.

      The datasets have been uploaded to OpenNeuro 

      Whisker (https://doi.org/10.18112/openneuro.ds005496.v1.0.1),  Visual (https://doi.org/10.18112/openneuro.ds005497.v1.0.0) and Zenodo:

      SNR Line Profile Data & Data Processing Scripts: 

      (https://zenodo.org/doi/10.5281/zenodo.13821455). 

      (page 21 – line 573-579)

      Reviewer #2 (Recommendations For The Authors):

      (1) I'm a little confused about the stimulation paradigm and the effect of it causing an effective 2second TR (which is on the long side) - please elaborate (a figure might be helpful). The paradigm for visual stimulation also seems elaborate, can you please explain the logic and how it was developed?

      Thank you for raising the detailed stimulation paradigm issues. The stimulation paradigm is independent and does not interfere with the setup of the effective 2-second TR. The 2-second TR is based on the usage of 2-segment EPI, each with a TR of 1-second. The application of 2-segment paradigm enables the echo spacing with 0.52 ms with effective image bandwidth with 3858Hz, assuring less image distortion.  The stimulation paradigm was defined by an “8s on, 32s o ” epoch such to elicit a strong BOLD response and could be used for any reasonable TR duration. 

      We have included a figure outlining the stimulation paradigm (Supp Fig. 3)

      (2) I had difficulties viewing the movies (on my MAC).

      Thank you for this note. We have re-upload the videos in .mov format

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable study that describes the effects of T. pallidum on neural development by applying single-cell RNA sequencing to an iPSC-derived brain organoid model. The evidence supporting the claims of the authors is solid, although further evidence to understand the differences in infection rates would strengthen the conclusions of the study. In particular, the conclusions would be strengthened by validating infection efficiency as this can impact the interpretation of single-cell sequencing results, and how these metrics affect organoid size as well as comparison with additional infectious agents. Furthermore, additional validations of downstream effectors are not adequate and could be improved. 

      Thank you very much for your valuable comments. Since we used the organoid model for the first time to investigate the effects of T. pallidum on brain development, the study design is not perfect. As you have accurately mentioned, the results of the paper do not have more in-depth details, especially to verify the infection rate of T. pallidum. Your valuable comments will be very useful for us for carrying out further research. In addition, the downstream effector validation is inadequate, so we performed an analysis of single-cell sequencing data to strengthen our view in the revised manuscript (See Figure 5F for a description in current manuscript).

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an interesting study by Xu et al showing the effects of infection with the Treponema pallidum virus (which causes syphilis disease) on neuronal development using iPSC-derived human brain organoids as a model and single-cell RNA sequencing. This work provides an important insight into the impact of the virus on human development, bridging the gap between the phenomena observed in studies using animal models as well as non-invasive human studies showing developmental abnormalities in fetuses infected with the virus in utero through maternal vertical transmission.

      Using single-cell RNAseq in combination with qPCR and immunofluorescence techniques, the authors show that T. pallidum infected organoids are smaller in size, in particular during later growth stages, contain a larger number of undifferentiated neuronal lineage cells, and exhibit decreased numbers of specific neuronal subcluster, which the authors have identified as undifferentiated hindbrain neurons.

      The study is an important first step in understanding how T. pallidum affects human neuronal development and provides important insight into the potential mechanisms that underlie the neurodevelopmental abnormalities observed in infected human fetuses. Several important weaknesses have also been noted, which need to be addressed to strengthen the study's conclusions.

      Strengths:

      (1) The study is well written, and the data quality is good for the most part.

      (2) The study provides an important first step in utilizing human brain organoids to study the impact of T. pallidum infection on neuronal development.

      (3) The study's conclusions may provide important insight to other researchers focused on studying how viral infections impact neuronal development. 

      Thank you very much for your positive feedback. Below, you will find our detailed responses to your concerns, addressed point-by-point. I once again sincerely appreciate your time and effort in reviewing our manuscript.

      Weaknesses:

      (1) It is unclear how T. pallidum infection was validated in the organoids. If not all cells are infected, this could have important implications for the study's conclusions, in particular the single-cell RNAseq experiments. Were only cells showing the presence of the virus selected for sequencing? A detailed description of how infection was validated and the process of selection of cells for RNAseq would strongly support the study's conclusions. 

      Thank you for your valuable comment. We completely agree with your point. Exploring the infection rate of T. pallidum to brain organoids is a key factor that must be considered. We selected pluripotent stem cell-derived brain organoids to simulate the process of foetal brain neurodevelopment and cultured them mixed with T. pallidum to mimic T. pallidum invading brain tissue. Since brain organoids are three-dimensional structures formed by nerve cell aggregation, T. pallidum invades organoids from the periphery to the center of the organoids gradually. T. pallidum acts on organoids long enough to increase the infection rates; however, the pathogen is selective in invading human cells. If we only select cells present in T. pallidum for sequencing, the authenticity of simulating "real world" infections is somewhat weakened. To better carry out this study, selecting cells from intact organoids for sequencing, without eliminating cells without T. pallidum, can better simulate the effect of T. pallidum infection on the nervous system. Of course, we should also set up a blank control group.

      (2) The authors show that T. pallidum infection results in impaired development of hindbrain neurons. How does this finding compare to what has already been shown in animal studies? Is a similar deficit in this brain region observed with this specific virus? It would be useful to strengthen the study's conclusions if the authors added a discussion about the observed deficits in hindbrain neuronal development, and prior literature on similar studies conducted in animal models or human patients. Does T. pallidum preferentially target these neurons, or is this a limitation of the current organoid model system? 

      Thank you for your valuable comments. The finding that T. pallidum infection results in impaired development of hindbrain neurons has not been verified in animal experiments. Of course, it is better to further validate the findings in organoid studies through animal experiments. Unfortunately, due to the technical challenges, mature animal models have not been developed for the study of congenital syphilis. Although our team has been working on the development of animal models of congenital neurosyphilis, the current progress is still not satisfactory. After struggling hard in this field for many years, we decided to attempt to utilize human brain organoids instead of animal models to study the impact of T. pallidum infection on neuronal development.

      We also checked prior literature on similar studies that have referred to the content in human patients. Dan Doherty et al. reported that patients with pontocerebellar hypoplasia develop microcephaly at birth or over time after birth (PMID: 23518331). Based on your constructive suggestions, we have added some content related to hindbrain to the “Discussion” section.

      Our study found that T. pallidum could inhibit the differentiation of subNPC1B in brain organoids, thereby reducing the differentiation from subNPC1B to hindbrain neurons, and ultimately affecting the development and maturation of hindbrain neurons during pregnancy. Based on our results, T. pallidum does not preferentially target hindbrain neurons. Of course, there are limitations to the current organoid model system, see the "Limitations" section.

      PMID: 23518331- Dan Doherty et al, Midbrain and hindbrain malformations: advances in clinical diagnosis, imaging, and genetics.

      Revision in the “Discussion” section, line 343-352:

      “The vertebrate hindbrain contains a complex network of dedicated neural circuits that play an essential role in controlling many physiological processes and behaviors, including those related to the cerebellum, pons, and medulla oblongata (Shoja et al., 2018). Patients with pontocerebellar hypoplasia represent the less severe end of the spectrum with early hyperreflexia, developmental delay, and feeding problems, eventually developing spasticity and involuntary movements in childhood, while some patients represent the severe end of the spectrum characterised by polyhydramnios, severe hyperreflexia, contracture, and early death from central respiratory failure. Patients with pontocerebellar hypoplasia develop microcephaly at birth or over time after birth (Doherty et al., 2013).”

      (3) The authors show that T. pallidum-infected organoids are smaller in size by measuring organoid diameter during later stages of organoid growth, with no change during early stages. Does that represent insufficient infection at the early stages? Is this due to increased cell death or lack of cell division in the infected organoids? Experiments using IHC to quantify levels of cleaved caspase and/or protein markers for cell proliferation would be able to address these questions. 

      Thank you for your valuable suggestion. The concentration of T. pallidum in patients with syphilis was generally very low (PMID: 21752804, 35315702, 33099614). In this study, a low concentration of T. pallidum was applied to brain organoids to simulate early foetal transmission of syphilis. Nerve cells mainly establish intercellular connections to form brain organoids in the way of adhesion, which can easily cause organoids to divide and die if treated with a high concentration of T. pallidum. Furthermore, based on your suggestions, we performed additional immunostaining analyses to verify the apoptosis of brain organoids infected by T. pallidum. Cleaved caspase 3 (clCASP3) staining showed that the number of apoptotic cells increased following T. pallidum infection; however, the proportion of apoptotic cells in both groups of brain organoids was very low (Figure supplement 2) (N=12 organoids, each group from three independent bioreactors), which would be not enough to affect the results of the experiment, thereby suggesting that neural differentiation and development of brain organoids were mainly inhibited following T. pallidum infection (rather than promoting organoid apoptosis).

      PMID: 21752804-- Craig Tipple et al, Getting the measure of syphilis: qPCR to better understand early infection.

      PMID: 35315702-- Cuini Wang et al, Quantified Detection of Treponema pallidum DNA by PCR Assays in Urine and Plasma of Syphilis Patients.

      PMID: 33099614—Cuini Wang et al, A New Specimen for Syphilis Diagnosis: Evidence by High Loads of Treponema pallidum DNA in Saliva.

      Revision in the “Results” section, line 105-108:

      “… cleaved caspase 3 (clCASP3) staining showed that the number of apoptotic cells increased significantly following T. pallidum infection, but the proportion of apoptotic cells in both groups of brain organoids was very low (Figure supplement 2) (N=12 organoids, each group from three independent bioreactors) …”

      Revision in the “Materials and methods” section, line 446-447:

      “…anti-cleaved caspase 3 (rabbit, 1:100, Cell Signaling Technology, 9661S),”

      Revision in the “Supplementary File” section, line 78-81:

      Author response image 1.

      The number of clCASP3+ cells in the microscopic field of brain organoids. A nonparametric t-test was used to evaluate the statistical differences between the two groups. (**: P < 0.01).

      (4) In Figure 1D authors show differences in rosette-like structure in the infected organoids. The representative images do not appear to be different in any of the discussed components (e.g., the sox2 signal looks fairly similar between the two conditions). No quantification of these structures was presented. Authors should provide quantification or a more representative image to support their statement. 

      Thank you for your valuable suggestion. I have quantified the neural rosette structure and compared the number of intact rosette-like structures between the two groups (See Figure 1D for a description in current manuscript).

      (5) The IHC images shown in Figures 3E, G, and Figure 4E look very similar between the two conditions despite the discussed decrease in the text. A more suitable representative image should be presented, or the analysis should be amended to reflect the observed results. 

      Thank you for your valuable suggestion. I have replaced more representative images in Figure 3E, G, and Figure 4E in the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study provides an important overview of infectious etiology for neurodevelopment delay.

      Strengths:

      Strong RNA evaluation.

      Weaknesses:

      The study lacks an overview of other infectious agents. The study should address the epigenetic contributors (PMID: 36507115) and the role of supplements in improving outcomes (PMID: 27705610). 

      Addressing the above - with references included - is recommended. 

      Thank you for your valuable comment. Our research is mainly inspired by other infectious agents, such as Zika virus; there are many descriptions of Zika virus in the “Discussion” section of the manuscript to better describe and demonstrate our point of view (See pages 12–13). I was unable to retrieve the article (PMID: 36507115), kindly help in confirming the PMID number. I will be very grateful if you can provide the full text. Secondly, I have carefully read the article (PMID: 27705610), which is a very rich and comprehensive review, and summarised and cited it in appropriate places in our manuscript.

      Revision in the “Discussion- limitation” section, line 375-379:

      “First, although several recent protocols have made use of growth factors to promote further neuronal maturation and survival (Lucke-Wold et al., 2018), the organoid culture scheme needs to be further improved owing to the lower percentage of mature neurons and the challenge of cell necrosis within the organoids at this stage in day 55 organoids.”

      Reviewer #3 (Public Review): 

      This article is the first report to study the effects of T. pallidum on the neural development of an iPSC-derived brain organoid model. The study indicates that T. pallidum inhibits the differentiation of subNPC1B neurons into hindbrain neurons, hence affecting brain organoid neurodevelopment. Additionally, the TCF3 and notch signaling pathways may be involved in the inhibition of the subNPC1B-hindbrain neuron differentiation axis. While the majority of the data in this study support the conclusions, there are still some questions that need to be addressed and data quality needs to be improved. The study provides valuable insights for future investigations into the mechanisms underlying congenital neurodevelopment disability. 

      I sincerely appreciate your comments on our paper. The comments have helped us greatly improve the quality of our paper. Thank you for your time and constructive critique.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Paired t-test analysis is not appropriate if two distinct groups are compared. 

      I sincerely apologize for our presentation. We used a nonparametric t-test to compare the two groups. I have confirmed and corrected the statistical method description of this manuscript (Revision in the “Materials and methods” section (line 553-555) and “Figures-legend” section (line 789-790, 817-818, 829-830) in current manuscript).

      Reviewer #3 (Recommendations For The Authors): 

      (1) Can the authors explain why the mean size of organoids infected with T. pallidum is smaller?

      Thank you for your valuable comment. In our study, T. pallidum infection resulted in brain organisational changes in neural rosette-like structures resembling the proliferative regions of the human ventricular zone and caused fewer and incomplete rosette-like structures. Next, the ventricular zone is also the main area where neural progenitor cells (NPCs) reside (PMID: 33838105); our results showed that the proportion of neural progenitor cells (NPC)1 was reduced after T. pallidum infection. Rosette-like structure size changes owing to NPC depletion. Therefore, the mean size of organoids infected with T. pallidum is smaller.

      Revision in the “Results” section, line 101-104:

      “T. pallidum infection resulted in brain organisational changes in neural rosette-like structures resembling the proliferative regions of the human ventricular zone where NPC reside (Krenn et al., 2021), and caused fewer and incomplete rosette-like structures (P < 0.01) (Figure 1D)”

      (2) Why was the target gene for qRT-PCR validation selected to be HOXA5、HOXC5、HOXA4?

      Thank you for your valuable comment. The qRT-PCR experiment was selected here to verify the analysis results of the scRNA-seq. HOX family genes are key factors controlling early hindbrain development, which are expressed in the hindbrain region during the gastrulation stage of early embryonic development and persist into the nerve cell stage, and are essential for the correct induction of hindbrain development and segmentation (PMID: 2571936, 1983472, 1673098, 15930115). Therefore, we selected the HOX family gene for verification.

      PMID: 2571936-WILKINSON D G, et al. Segmental expression of Hox-2 homoeobox- containing genes in the developing mouse hindbrain.

      PMID: 1983472-- FROHMAN M A, et al. Isolation of the mouse Hox-2.9 gene; analysis of embryonic expression suggests that positional information along the anterior-posterior axis is specified by mesoderm.

      PMID: 1673098--MURPHY P, et al. Expression of the mouse labial-like homeobox-containing genes, Hox 2.9 and Hox 1.6, during segmentation of the hindbrain.

      PMID: 15930115-- MCNULTY C L, et al. Knockdown of the complete Hox paralogous group 1 leads to dramatic hindbrain and neural crest defects.

      (3) Why was qRT-PCR not employed in other experimental validations, but solely to validate early neural-specific transcription factor changes?

      Thank you for your valuable comment. The qRT-PCR experiment was selected to validate early neural-specific transcription factor changes, indicating the reliability of the scRNA-seq. Then, validated scRNA-seq data were used to analyze for other neuro-specific gene differences, such as violin plots and heatmap showing differentially expressed genes (Figure 4D and Figure 5B, C). Of course, we also tested it with other experiments, such as immunohistochemistry and flow cytometric screening.

      (4) The authors found that T. pallidum might reduce the differentiation from subNPC1B to hindbrain neurons by inhibiting subNPC1B differentiation in brain organoids. Why were the subNPC1B-specific markers declining?

      Thank you for your valuable comment. scRNA-seq is aimed at complete brain organoids. Cluster analysis of cell types of organoids is performed according to specific marker genes of different cells. The decrease in the expression of marker genes of certain cell groups indicates that the cell proportion of such cell groups in the whole organoids is reduced. We analysed organoids following T. pallidum infection, uniform manifold approximation and projection (UMAP), and clustering of the NPC1 population demonstrated that T. pallidum downregulated the number of subNPC1B population. Therefore, the results demonstrated a decrease in the subNPC1B -specific markers.

      (5) In comparison to the other figures, Figure 5E letter size is excessively small and ambiguous.

      Thanks for your valuable comments, I have adjusted Figure 5E letter size.

      (6) Figure 5E shows that TCF3, more than one gene, is specifically enriched in subNPC1B of the T. pallidum group. It is best to confirm the impact of the other gene. 

      Thank you for raising this key issue that we had not addressed properly in our previous version of the manuscript; we have added further analytical data. The SCENIC analysis found that the transcriptional activity of 52 genes has significantly changed after T. pallidum infection. Furthermore, GO analyses demonstrated that 27 transcription factors were significantly enriched in four key pathways of neural differentiation and development. TCF3 is the sole transcription factor present in all four terms simultaneously, speculating that TCF3 is the key transcription factor for the inhibition of subNPC1B-hindbrain neuron differentiation caused by T. pallidum.

      Revision in the “Results” section, line 261-273:

      “Next, the single-cell regulatory network inference and clustering (SCENIC) analysis for the subNPC1B subcluster was performed to assess the differences in the transcriptional activity of the transcription factors between the two groups and found that the transcriptional activity of 52 genes significantly changed after T. pallidum infection (Figure 5E). Furthermore, GO analyses demonstrated that 27 transcription factors were significantly enriched in key pathways of neural differentiation and development in response to nervous system development, positive regulation of sequence-specific DNA-binding transcription factor activity, positive regulation of neuronal differentiation, and DNA templated transcription regulation. Remarkably, transcription factor 3 (TCF3) is the sole transcription factor present in all four terms simultaneously (Figure 5F), speculating that TCF3 is the key transcription factor for the inhibition of subNPC1B-hindbrain neuron differentiation caused by T. pallidum.”

      Revision in the “Materials and methods” section, line 540-543:

      “The Sankey diagram was created using SankeyMATIC (https://sankeymatic.com/) (Zhang et al., 2023), which was used to characterize the interactions between differential transcription factors and neural differentiation and development.”

      Revision in the “Figure and Figure Legend” section, line 832, 842-844:

      Author response image 2.

      Sankey diagram showing the correspondence between differential transcription factors and neural differentiation and development.

      (7) Are there other experiments demonstrating that TCF3 is a key transcription factor for the inhibition of subNPC1B-hindbrain neuron differentiation caused by T. pallidum

      Thank you for your valuable comment. In the previous experiment, we attempted to select a subNPC1B subcluster by flow sorting to verify the relevant molecular mechanism. Due to the small proportion of subNPC1B subcluster in the whole organoids, the selected cells were in a poor state and could not reach the number of cells required for the experiment. However, we used scRNA-seq data to further identify TCF3 as a key transcription factor that inhibits subNPC1B - hindbrain neuron differentiation induced by T. pallidum. The relevant results and descriptions of the analysis are detailed in the revised manuscript, please see our response to point (6) above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The reviewers found this manuscript to present convincing evidence for associative and non-associative behaviors elicited in male and female mice during a serial compound stimulus Pavlovian fear conditioning task. The work adds to ongoing efforts to identify multifaceted behaviors that reflect learning in classic paradigms and will be valuable to others in the field. The reviewers do note areas that would benefit from additional discussion and some minor gaps in data reporting that could be filled by additional analyses or experiments.

      We thank the reviewers and the editors for their thoughtful and constructive critiques of our manuscript. We have updated our manuscript with data from additional experiments as suggested by the reviewers, and we have significantly edited the text and figures to reflect these additions. Our detailed, point-by-point responses are below.

      Reviewer #1 (Public Review):

      The main goal of the study was to tease apart the associative and non-associative elements of cued fear conditioning that could influence which defensive behaviors are expressed. To do this, the authors compared groups conditioned with paired, unpaired, or shock only procedures followed by extinction of the cue. The cue used in the study was not typical; serial presentation of a tone followed by a white noise was used in order to assess switches in behavior across the transition from tone to white noise. Many defensive behaviors beyond the typical freezing assessments were measured, and both male and female mice were included throughout. The authors found changes in behavioral transitions from freezing to flight during conditioning as the tone transitioned into white noise, and a switch in freezing during extinction such that it became high during the white noise as flight behavior decreased. Overall, this was an interesting analysis of transitions in defensive behaviors to a serially presented cue consisting of two auditory stimuli during conditioning and then extinction.

      We thank the Reviewer for their supportive insight.

      There are some concerns regarding the possibility that the white noise is more innately aversive than the tone, inducing more escape-like behaviors compared to a tone, especially since the shock only group also showed increased escape-like behaviors during the white noise versus tone. This issue would have been resolved by adding a control group where the order of the auditory stimuli was reversed (white noise->tone).

      We appreciate this concern, and we have added two additional groups to address this possibility. We have conducted the same experimental paradigm with 2 reverse-SCS groups (WN—tone), one with paired (new PA-R group), and one with unpaired (new UN-R group), presentations to shock during conditioning. These experiments revealed that during conditioning day 2 in both reverse order groups, WN causes reductions in freezing and increases in locomotor activity (see revised Figure 2D), an effect that is stronger in the UN-R compared to the PA-R group. This locomotor effect is neither darting nor escape jumping in the PA-R group (revised Figure 3G, I; Figure 4G). In the UN-R group, WN induces more activity than the PA-R group (Figure 2D), including some jumping at WN onset (Figure 3H), but no darting (Figure 4G). It is worth noting that WN does not elicit defensive behavior before conditioning at the sound intensity we use (75dB; see Fadok et al. 2017, Borkar et al. 2020, Borkar et al. 2024). Together, these results suggest that WN is an inherently more salient stimulus than tone, and it can elicit defensive behaviors in shock-sensitized mice through non-associative mechanisms. Indeed, stimulus salience is a key factor in this paradigm for inducing activity (see Hersman et al. 2020).

      While the more complete assessment of defensive behaviors beyond freezing is welcomed, the main conclusions in the discussion are overly focused on the paired group and the associative elements of conditioning, which would likely not be surprising to the field. If the goal, as indicated in the title, was to tease apart the associative and non-associative elements of conditioning and defensive behaviors, there needs to be a more emphasized discussion and explicit identification of the non-associative findings of their study, as this would be more impactful to the field.

      We have rewritten the Discussion to provide a greater emphasis on the findings of the study that are more related to non-associative mechanisms. For example, we argue that cue-salience and changes in stimulus intensity can induce non-associative increases in locomotor behavior and tail rattling in shock-sensitized mice.

      Reviewer #2 (Public Review):

      Summary:

      The authors examined several defensive responses elicited during Pavlovian conditioning using a serial compound stimulus (SCS) as the conditioned stimulus (CS) and a shock unconditioned stimulus (US) in male and female mice. The SCS consisted of tone pips followed by white noise. Their design included 3 treatment groups that were either exposed to the CS and US in a paired fashion, in an unpaired fashion, or only exposed to the shock US. They compared freezing, jumping, darting, and tail rattling across all groups during conditioning and extinction. During conditioning, strong freezing responses to the tone pips followed by strong jumping and darting responses to the white noise were present in the paired group but less robust or not present in the unpaired or shock only groups. During extinction, tone-induced freezing diminished while the jumping was replaced by freezing and darting in the paired group. Together, these findings support the idea that associative pairings are necessary for conditioned defensive responses.

      Strengths:

      The study has strong control groups including a group that receives the same stimuli in an unpaired fashion and another control group that only receives the shock US and no CS to test the associative value of the SCS to the US. The authors examine a wide variety of defensive behaviors that emerge during conditioning and shift throughout extinction: in addition to the standard freezing response, jumping, darting, and tail rattling were also measured.

      We thank the Reviewer for their supportive appraisal of this study’s strengths.

      Weaknesses:

      This study could have greater impact and significance if additional conditions were added (e.g., using other stimuli of differing salience during the SCS), and determining the neural correlates or brain regions that are differentially recruited during different phases of the task across the different groups.

      In the revised manuscript, we have conducted experiments with 2 reverse-SCS groups (WN—tone): one with paired (new PA-R group), and one with unpaired (new UN-R group), presentations to shock during conditioning. These experiments revealed that during conditioning day 2 in both reverse order groups, WN causes reductions in freezing and increases in locomotor activity (see revised Figure 2D), an effect that is stronger in the UN-R compared to the PA-R group. This locomotor effect is neither darting nor escape jumping in the PA-R group (revised Figure 3G, I; Figure 4G). In the UN-R group, WN induces more activity than the PA-R group (Figure 2D), including some jumping at WN onset (Figure 3H), but no darting (Figure 4G). Indeed, stimulus salience is a key factor in this paradigm for inducing activity (see Hersman et al. 2020). Together, these results suggest that WN is an inherently more salient stimulus than tone, and it can elicit defensive behaviors in shock-sensitized mice through non-associative mechanisms. It is worth noting that WN does not elicit defensive behavior before conditioning at the sound intensity we use (75dB; see Fadok et al. 2017, Borkar et al. 2020, Borkar et al. 2024).

      We agree that determining the neuronal correlates and brain regions that are involved in defensive ethograms at various stages within this paradigm is of great importance, but we feel that those experiments are beyond the scope of the current study, which is focused on identifying behavioral differences based on associative and non-associative factors.

      Reviewer #1 (Recommendations For The Authors):

      In LINES 72-73, authors say they used a "truly random procedure" as one of their control groups. Then in LINES 113-116, they describe this group as "unpaired" where the "SCS could not reliably predict footshock". Combined, it is unclear if this group is random or unpaired. The "truly random procedure" is defined, by the cited Rescorla paper, as "the two events are programmed entirely randomly and independently in such a way that some "pairings" of CS and US may occur by chance alone". So, truly random would indicate that the shock may occur during the cue, while unpaired indicates the shock was explicitly unpaired from the cue. If the authors used a random procedure, the groups need to be labeled as random, not unpaired, and the # of cues that happened to coincide with footshock per animal needs to be reported somewhere. If the authors used an unpaired procedure (which appears to be the case based on 40-60s ITI between SCS and footshock being reported), it needs to be clearer and consistent throughout that it was explicitly unpaired, as well as removing the claim in LINE 72-73 that they used a "truly random procedure".

      We did indeed use an explicitly unpaired procedure. We have adjusted the text and figures to better reflect this, and we removed any mentions of randomness with regards to the presentations of SCS and footshock.

      Despite the lack of significant sex differences, it would still be helpful if data panels with individual data points (e.g. Fig 2E-J), were presented as identifiable by sex (e.g. closed vs open circles for males vs females).

      The revised manuscript now compares four or five groups per figure, making data presentation complicated. Providing the individual data points in each panel reduces figure clarity, therefore, we feel it is best to present the data as box-and-whisker plots without them. However, the source data files for each figure are available to the reader and the data are clearly labeled to be identifiable by sex.

      Is it not odd that all groups showed similar levels of contextual freezing during the 3min baseline? If shocks are unsignaled in the UN and SO groups, one would expect higher levels of contextual freezing compared to a paired group.

      We are not certain why one would expect higher levels of contextual freezing in the UN and SO groups compared to the PA group at the beginning of conditioning day 2. Another study also looked at baseline freezing in a contextual fear group (which is the same as shock only in our study) and in an auditory cued fear conditioning group within the conditioning context, and their data show that freezing during the baseline period is equivalent between groups (Sachella et al., 2022).

      During baseline on Extinction Day 1, it does seem that the unpaired and SO groups tend to have higher freezing levels compared to the paired groups. Author response image 1 shows baseline freezing during the first 3 minutes of extinction day 1. After two days of conditioning in the conditioned flight paradigm, contextual freezing either is, or trends to be significantly higher in the UN, UN-R, and SO groups than the PA and PA-R groups.

      Author response image 1.

      Baseline Freezing levels for all groups during the first extinction session. Baseline period is defined as the first 180 seconds of the session, before any auditory stimulus was presented. PA, Paired; UN, Unpaired; SO, Shock Only; PA-R, Paired Reverse; UN-R, Unpaired Reverse. *p<0.05, **p<0.01, ****p<0.0001.

      Do the tone and WN elicit similar levels of defensive behaviors in a naïve mouse? Or have the authors tested WN followed by tone? Is there a potential issue that the WN may be innately aversive which is then amplified with training? i.e. does a tone preferentially induce freezing while WN induces active behaviors, regardless of which sensory stimulus is temporally closer to the shock? If the change in behavior is really due to the pairing and temporal proximity to shock, then there should be increased jumps, etc to the tone if trained with WN->tone.

      WN can indeed be used as an aversive stimulus under certain conditions and at sufficiently high decibel levels. In the conditioned flight paradigm, WN is presented at 75dB, which is below the threshold for eliciting an acoustic startle response in a C57BL/6J mouse (Fadok et al. 2009). Also, during pre-exposure, when animals are naïve to the SCS, tone and WN stimuli do not elicit defensive behaviors (see Fadok et al. 2017, Borkar et al. 2020, 2024).

      As suggested by the Reviewer, during revision we have included reverse-SCS paired (PA-R) and unpaired (UN-R) groups to test for the role of stimulus salience and stimulus order on defensive ethograms. During conditioning day 2, the PA-R group exhibited little freezing to the WN, with a slightly elevated activity index, and they exhibited robust freezing during tone (revised Figure 2A-H). The activity during the WN in the PA-R group was significantly lower than that of the PA group (Figure 2L). The PA-R group also did not respond to WN with escape jumps or darting (Figure 3I, 4G). The UN-R group displayed greater activity during the WN than the UN and PA-R groups, but less activity than the PA group (Figure 2D, H). The UN-R group did not dart but this group displayed some jumping at WN onset (Figure 3H), like what was observed in the UN group.

      These data suggest that WN has inherent, salient properties that can induce some non-associative activity after the mouse has been sensitized by shock (see also Hersman et al. 2020 for more detailed analysis of stimulus salience in the conditioned flight paradigm). However, only in the PA group is robust flight behavior (comprised of high numbers of escape jumps and darting) observed. Therefore, both stimulus salience and temporal order are important for eliciting transitions from freezing to flight.

      Fig 3G/4G are hard for me to understand. The figure legends say they're survival graphs but the y-axis labels "Latency to initial jump/dart (% of cohort)" confuses me. What is the purpose of these graphs? Perhaps they are not needed. Or consider presenting them similar to Fig 7C, D as those were more intuitive and faster for me to grasp.

      We had intended these plots to show that a greater proportion of the paired group jumps and darts during WN compared to the unpaired group, and that the percentage of the cohort that jumps and darts increases across conditioning trials. Because these graphs were not clear, we have removed them, and we have replaced them with graphs comparing total cohort percentages that jumped (Figure 3I) or darted (Figure 4G) over the whole CD2 session.

      For the extinction data, I did not see within group analyses for within or between session fear extinction to the tone. So, for the paired group, were the last 4 trials of Ext 1 significantly lower than the first 4 trials? If not, then they did not show within-session extinction. Also, for the paired group, were the last 4 trials of Ext 1 significantly different than the first 4 trials of Ext 2? This would test for long-term retention and spontaneous recovery.

      In the original submission and in the revised manuscript, we calculated a delta change score for freezing during tone in the early versus late blocks of 4 trials, and then we statistically compared these differences across groups (Figure 5C, D). This allowed us to assess between-group differences in changes to tone-evoked freezing during extinction. Freezing to tone did decrease significantly over the first extinction session for the paired group (Early Ext1 vs Late Ext1, paired t-test, t(31) \= 6.23, p<0.0001), and when comparing late Ext1 and early Ext2, we found that tone-evoked freezing did significantly increase (Late Ext1 vs Early Ext2, paired t-test, t(31) \= 5.26, p<0.0001). This increase in cue-induced freezing between days of extinction is characteristic of C57BL/6J mice (Hefner et al., 2008). Our study did not test for more distal timepoints, so we cannot comment on the efficacy of long-term retention or spontaneous recovery.

      For the conditioning and extinction data across Figs 2, 5 and 6, what I gather from them is that freezing is high to the tone and low to the WN during conditioning, and then low to the tone, and high to the WN across extinction. Then for activity levels I see they are low to the tone and high to the WN during conditioning, and then low to the WN during extinction. The piece that is missing is what are activity levels like to the tone during extinction. Are they low like in conditioning and remain low in extinction? Or do they increase across extinction as freezing decreases? As I was going through these graphs I drew myself out step function summaries of the freezing and activity levels between tone/WN for conditioning vs extinction; maybe the authors could consider a summary figure.

      We thank the Reviewer for their interest. We found that within the paired group, activity to tone remained low throughout both days of extinction (though increased within each session) and did not return to normal activity levels. We present this data in Author response image 2. We thank the Reviewer for the suggestion of a summary figure, but we feel there are too many axes of classification (between-group, within-group, multiple behaviors, tone/WN, conditioning/extinction) to coherently present our findings in a single figure.

      Author response image 2.

      Trial-by-trial plot of activity index during the tone period of SCS across both extinction sessions for the PA group. SCS, Serial compound stimulus; Ext, extinction; PA, Paired.

      In the discussion (LINE 592-3), they discuss that shock sensitization in the SO group may prime a stressed animal to dart more readily to WN upon stimulus transition. Should this not also happen during the transition of silence to tone? What is special about a transition between two auditory stimuli that would result in panic like behavior in an animal that only received shock presentations? This also gets back to an earlier concern above regarding the potentially innately aversiveness of the WN.

      After 2 days of shock sensitization, we observe that mice exhibit freezing to the tone during the first three trials of extinction day 1 (Figure 5A). This non-associative freezing response is like that observed in other studies of non-associative fear processing (please see Kamprath and Wotjak, 2004). As trials progress during extinction day 1, mice do become mildly activated during the tone (Author response image 3). The transition to WN in the shock-only group during extinction induces non-associative darting responses, but it does not induce escape jumping behavior (Figure 7).  We hypothesize that the innate salience of the WN is a vital factor contributing to these escalated responses. The importance of stimulus salience in conditioned flight was also demonstrated by Hersman et al., 2020 for SCS conditioning, and by Furuyama et al., 2023 for single tone conditioning.  Just as with conditional freezing responses (Kamprath and Wotjak, 2004), we believe that conditional flight is controlled by summative components, one being associative and the other non-associative.

      Author response image 3.

      Trial-by-trial plot of activity index during the tone period of SCS across both extinction sessions for the SO group. SCS, Serial compound stimulus; Ext, extinction; SO, Shock Only.

      In the discussion (LINE 583), they say that the development of explosive defensive behaviors are "not achievable with traditional single-cue Pavlovian conditioning paradigms". The authors should include a caveat here that the current study did not compare their results to a group of mice that received just WN-shock pairings.

      We thank the reviewer for this comment. This statement was meant to highlight that traditional paradigms do not offer an element of signaling the temporal imminence of threat, only its inevitability. It was not our intention to state that defensive escape behaviors were unachievable in single-cue conditioning paradigms, and we regret not making this clear. Indeed, the supplement of Fadok et al. 2017 shows that WN-shock conditioning is capable of inducing flight, Furuyama et al. 2023 shows that tone-shock conditioning is capable of inducing flight under specific parameters, and Gruene et al. 2015 demonstrates that single CS-US pairings induce conditional darting behaviors in female rats. We have adjusted the text to better reflect our intent.  

      Minor comment to LINE 613-5: Speaking as someone who has done fear conditioning in both mice and rats, tail rattling may be specific to mice (I have seen this often) and likely not observable in rats (never seen it).

      We thank the Reviewer for this information. We have adjusted our text to mainly discuss mouse-specific tail rattling.

      Reviewer #2 (Recommendations For The Authors):

      The research questions in this study are novel and bring new insight to the field. However, there are some issues that can be addressed to improve the overall quality of the study, namely, the reader is left wanting to know more, especially about how neural circuits contribute to these different defensive behaviors during this task. Below are some recommendations for the authors that would greatly improve the impact and significance of this study.

      (1) What are the neural correlates or circuits recruited during these different defensive behaviors across the course of conditioning and extinction? How might they differ between the PA and UN groups? What differences might emerge when an animal is shifting their defensive behavior from freezing to darting, for example? Answering these questions would require intensive additional experiments, therefore more discussion of possible neural mechanisms that might be recruited during this task would be appreciated, given the scope of the subject area.

      We agree that understanding the neural circuits recruited during these behaviors and across conditioning and extinction is of vital importance. We are actively working on these questions, and we have published on the role of central amygdala circuits (Fadok et al. 2017) as well as on top-down control of flight by the medial prefrontal cortex (Borkar et al. 2024). Because the current manuscript is focused on learning mechanisms influencing defensive behavior, we would prefer to focus our discussion on that, rather than speculating on possible neural mechanisms. However, we have added a statement in the Discussion (LINES 706-707) emphasizing that future studies should investigate the neuronal mechanisms contributing to threat associations and different defensive behaviors.

      (2) Were any vocalizations observed during conditioning or extinction phases? If not, could you speculate how type and occurrence of vocalizations might correlate with the different defensive responses observed?

      Audible vocalizations were only observed during footshock presentations (squeaks). Unfortunately, we do not have the proper specialized recording equipment to monitor the full spectrum of mouse vocalizations, especially those in the ultrasonic range. Thus, we cannot speculate on the nuances of vocalizations in mice with respect to this behavioral paradigm. To the best of our knowledge, mice have not been reported to emit specific ultrasonic calls during conditioned threat like those of rats. That said, it would be of interest to determine if mice emit different vocalizations during different defensive behaviors.

      (3) The transition from freezing to flight during the SCS is thought to be due to the close proximity of threat imminence between the WN CS and shock US. What if you switched the order of the SCS stimuli to WN followed by tone stimuli? If the salience of the WN stimulus is truly driving the jumping behavior, then it would be observed even if the WN stimulus preceded the pure tone stimulus and that would bring additional evidence that it is the associative value of the stimuli rather than its salience that's driving the defensive behaviors. What do you predict you would observe in rodents that were given a WN-tone SCS paired and unpaired in the same design of this study?

      As suggested by the reviewer, we collected data from reverse-SCS paired and unpaired groups and reported our findings within the manuscript. Our detailed findings are also discussed above. Overall, we find that a combination of stimulus salience and temporal proximity, and a summation of non-associative and associative mechanisms, are necessary to elicit explosive flight behavior (escape jumping and darting).

      References

      Borkar CD, Dorofeikova M, Le QE, Vutukuri R, Vo C, Hereford D, Resendez A, Basavanhalli S, Sifnugel N, Fadok JP (2020) Sex differences in behavioral responses during a conditioned flight paradigm. Behavioural Brain Research 389:112623.

      Borkar CD, Stelly CE, Fu X, Dorofeikova M, Le QE, Vutukuri R, Vo C, Walker A, Basavanhalli S, Duong A, Bean E, Resendez A, Parker JG, Tasker JG, Fadok JP (2024) Top-down control of flight by a non-canonical cortico-amygdala pathway. Nature 625: 743-749.

      Fadok JP, Krabbe S, Markovic M, Courtin J, Xu C, Massi L, Botta P, Bylund K, Müller C, Kovacevic A, Tovote P, Lüthi A (2017) A competitive inhibitory circuit for selection of active and passive fear response. Nature 542:96-100.

      Furuyama T, Imayoshi A, Iyobe T, Ono M, Ishikawa T, Ozaki N, Kato N, Yamamoto R (2023) Multiple factors contribute to flight behaviors during fear conditioning. Scientific Reports 13:10402. 

      Gruene TM, Flick K, Stefano A, Shea SD, Shansky RM (2015) Sexually divergent expression of active and passive conditioned fear responses in rats. eLIfe 4:e11352.

      Hefner K, Whittle N, Juhasz J, Norcross M, Karlsson RM, Saksida LM, Bussey TJ, Singewald N, Holmes A (2008) Impaired Fear Extinction Learning and Cortico-Amygdala Circuit Abnormalities in a Common Genetic Mouse Strain. Journal of Neuroscience 6:8074-8085.

      Hersman S, Allen D, Hashimoto M, Brito SI, Anthony T (2020) Stimulus salience determines defensive behaviors elicited by aversively conditioned serial compound auditory stimuli. elife 9:e53803. 

      Kamprath K and Wotjak CT (2004) Nonassociative learning processes determine expression and extinction of conditioned fear in mice. Learning & Memory 11:770-786.

      Sachella TE, Ihidoype MR, Proulx CD, Pafundo DE, Medina JH, Mendez P & Piriz J (2022) A novel role for the lateral habenula in fear learning. Neuropsychopharmacology 47:1210-1219.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript titled "Evolutionary and Functional Analyses Reveal a Role for the RHIM in Tuning RIPK3 Activity Across Vertebrates" by Fay et al. explores the function of RIPK gene family members across a wide range of vertebrate and invertebrate species through a combination of phylogenomics and functional studies. By overexpressing these genes in human cell lines, the authors examine their capacity to activate NF-κB and induce cell death. The methods employed are appropriate, with a thorough analysis of gene loss, positive selection, and functionality. While the study is well-executed and comprehensive, its broader relevance remains limited, appealing mainly to specialists in this specific field of research. It misses the opportunity to extract broader insights that could extend the understanding of these genes beyond evolutionary conservation, particularly by employing evolutionary approaches to explore more generalizable functions.

      Major comments:

      The main issue I encounter is distinguishing between what is novel in this study and what has been previously demonstrated. What new insights have been gained here that are of broader relevance? The discussion, which would be a good place to do so, is very speculative and has little to do with the actual results. Throughout the manuscript, there is little explanation of the study's importance beyond the fact that it was possible to conduct it. Is the evolutionary analysis being used to advance our understanding of gene function, or is the focus merely on how these genes behave across different species? The former would be exciting, while the latter feels less impactful.

      We thank the reviewer for the positive feedback. With regard to the major comment, we have now made changes throughout the revised manuscript to highlight the novel insights that emerge from our work, as well as the importance of using evolutionary and functional analyses to understand gene function. 

      Reviewer #2 (Public review):

      Summary:

      By combining bioinformatical and experimental approaches, the authors address the question of why several vertebrate lineages lack specific genes of the necroptosis pathway or those that regulate the interplay between apoptosis and necroptosis. The lack of such genes was already known from previous publications, but the current manuscript provides a more in-depth analysis and also uses experiments in human cells to address the question of the functionality of the remaining genes and pathways. A particular focus is placed on RIPK3/RIPK1 and their dual roles in inducing NFkB and/or necroptosis.

      Strengths:

      The well-documented bioinformatical analyses provide a comprehensive data basis of the presence/absence of RIP-kinases, other RHIM proteins, apoptosis signaling proteins (FADD, CASP8, CASP10), and some other genes involved in these pathways. Several of these genes are known to be missing in certain animal lineages, which raises the question of why their canonical binding partners are present in these species. By expressing several such proteins (both wildtype and mutants destroying particular interaction regions) in human cells, the authors succeed in establishing a general role of RIPK3 and RIPK1 in NFkB activation. This function appears to be better conserved and more universal than the necroptotic function of the RHIM proteins. The authors also scrutinize the importance of the kinase function and RHIM integrity for these separate functionalities.

      Weaknesses:

      A major weakness of the presented study is the experimental restriction to human HEK293 cells. There are several situations where the functionality of proteins from distant organisms (like lampreys or even mussels) in human cells is not necessarily indicative of their function in the native context. In some cases, these problems are addressed by co-expressing potential interaction partners, but not all of these experiments are really informative.

      A second weakness is that the manuscript addresses some interesting effects only superficially. By using host cells that are deleted for certain signaling components, a more focussed hypothesis could have been tested.

      Thus, while the aim of the study is mostly met, it could have been a bit more ambitious. The limited conclusions drawn by the authors are supported by convincing evidence. I have no doubts that this study will be very useful for future studies addressing the evolution of necroptosis and its regulation by NFkB and apoptosis.

      We thank the reviewer for the positive feedback. We agree that our study is limited by using HEK293 cells. However, we do not have appropriate cell lines for all species analyzed and therefore wished to use a single system to test all effects. As the reviewer points out, we do  co-express when possible, and are careful in the manuscript to not overextend our conclusions. We, like the reviewer, believe that many of the intriguingly findings in this study, which was intended to cover a broad range of species, will be useful for more in-depth studies in a given species.

      Reviewer #3 (Public review):

      This important study provides insights into the functional diversification of RIP family kinase proteins in vertebrate animals. The provided results, which combine bioinformatic and experimental analyses, will be of interest to specialists in both immunology and evolutionary biology. However, the computational part of the methodology is insufficiently covered in the paper and the experimental results would benefit from including data for additional species.

      We thank the reviewer for the positive feedback. As described below, we have now addressed the concerns about the description of the computational methods.

      (1) In the Methods section concerning gene loss analysis, the authors refer to the 'Phylogenetic analysis' section for details of RIPK sequence acquisition and alignment procedure. This section is missing from the manuscript as provided. In its absence, it is hard for the reviewer to provide relevant comments on gene presence/absence analysis.

      We have expanded the gene loss analysis methods to be more comprehensive. 

      (2) In the same section, the authors state that gene sequences were filtered and grouped based on the initial gene tree pattern (lines 448-449). How exactly did the authors filter the non-RIP kinases and other irrelevant homologs from the gene trees? Did they consider the reciprocal best (BLAST) hit approach or similar approaches for orthology inference? Did they also encounter potential pseudogenes of genes marked as missing in Figure 1C? Will the gene trees mentioned be available as supplementary files?

      We have expanded the gene loss analysis methods to be more comprehensive. 

      (3) The authors state the presence of additional RIPK2 paralog in non-therian vertebrates.

      The ramifications of this paralog loss in therians are not discussed in the text, although RIPK2 is also involved in NF-kB activation. In addition, the RIPK2B gene loss pattern is shunned from Figure 1C to Supplementary Figure 4, despite posing comparable interest to the reader.

      We are also intrigued by the RIPK2/RIPK2B data and felt it important to include our findings here, however we do not have functional data for RIPK2B at this point and feel it is better suited for a separate study. We therefore focused both the title and the main figures on RIPK3, for which we have functional data.

      (4) The authors present evidence for (repeated) positive selection in both RIPK1 and RIPK3 in bats; however, neither bat RIPK1/3 orthologs nor bat-specific RHIM tetrad variants (IQFG, IQLG) are considered in the experimental part of the work.

      We included a tetrad variant (VQFG) that is found in bats and multiple other species. We wanted to test a wide range of variant amino acids, so testing both IQFG (found only in bats) and VQFG (found in bats and multiple other diverse species) was not of high importance.

      (5) The authors present gene presence/absence patterns for zebra mussels as an outgroup of vertebrate species analyzed. From the evolutionary perspective, adding results for a closer invertebrate group, such as lancelets, tunicates, or echinoderms, would be beneficial for reconstructing the evolutionary progression of RIPK-mediated immune functions in animals.

      In our initial analyses, we searched for RIPK-like proteins in cnidarians, arthropods, nematodes, amoeba, and spiralia, with only spiralia species containing proteins with substantial homology to vertebrate RIPK1 proteins, as defined by a homologous N-terminal kinase domain and C-terminal RHIM and death domain. We have expanded this analysis to include lancelets, tunicates, and echinoderms and found several lancelet species with RIPK1 like proteins. These data have been added to the manuscript.

      (6) In the broader sense, the list of non-mammalian species included in the study is not explained or substantiated in the text. What was the rationale behind selecting lizards, turtles, and lampreys for experimental assays? Why was turtle RIPK3 but not turtle RIPK1CT protein used for functional tests? Which results do the authors expect to observe if amphibian or teleost RIPK1/3 are included in the analysis, especially those with divergent tetrad variants?

      We have added additional text to define our rationale for selecting which species were tested. 

      (7) For lamprey RIPK3, the observed NF-kB activity levels still remain lower than those of mammalian and reptilian orthologs even after catalytic tetrad modification. In the same way, switching human RIPK3 catalytic tetrad to that of lamprey does not result in NF-kB activation. What are the potential reasons for the observed difference? Does it mean that lamprey's RIPK3 functions in NF-kB activation are, at least partially, delegated to RIPK1?

      The function of lamprey RIPK3 is intriguing, albeit unknown. The reduced activation in human cells may be due to an incompatibility between lamprey RIPK3 and human NF-kB machinery, or it may not function in NF-kB at all. Considering that lamprey do not have other components of the known mammalian necroptosis pathway, it is unclear what function RIPK3 would serve in these species. It is possible lamprey may have a necroptosis pathway that is RIPK3-dependent but distinct from the mammalian pathway. It is an interesting question for future study. 

      (8) In lines 386-388, the authors state that 'only non-mammalian RIPK1CT proteins required the RHIM for maximal NF-kB activation', which is corroborated by results in Figure 4B. The authors further associate this finding with a lack of ZBP1 in the respective species (lines 388-389). However, non-squamate reptiles seem to retain ZBP1, as suggested by

      Supplementary Table 1. Given that, do the authors expect to observe RHIM-independent (maximal) NF-kB activation in turtles and crocodilians or respective RIPK1CT-transfected cells?

      While turtles and crocodiles do retain ZBP1, it is still unclear if they are able to activate ZBP1/RIPK3/MLKL-dependent necroptosis similar to mammals, especially given the divergence in the turtle ZBP1 RHIMs seen in Figure 4C. Future studies will be needed to further test our hypotheses and to continue to characterize innate immune function and evolution across a range of vertebrate species. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      (1) The title is somewhat restrictive, as it only mentions RIPK3, despite the manuscript covering a broader range of RIPKs and associated proteins.

      We agree that a title that encompasses both the breadth of our study and the depth with which we analyzed RIPK3 would be ideal. However, we were unable to come up with a succinct title that conveyed both points appropriately, so opted for one that focused on our RIPK3 insights.

      (2) Several supplementary figures contain valuable information that could be incorporated into the main figures for greater clarity and emphasis.

      We agree that many interesting pieces of data are in the supplement. We felt it was important to include those data in the manuscript, but also wanted to keep the main manuscript figures as focused as possible.  

      Reviewer #2 (Recommendations for the authors):

      (1) I do not fully agree with the claim that caspase-8 is absent from fish. I briefly repeated this part of the analysis and found several fish proteins that cluster with caspase-8 rather than caspase-10 or cFLIP. From the method section, it does not really become clear how the Casp8/Casp10/cFLIP decision was made, and particularly, how cases were addressed where Genew predate the caspase-8/caspase-10 split. To name just a few examples, the authors might check uniprot:A0A444UA91, W5MXS4, or A0A8X8BKJ8 for being fish Caspase-8 candidates.

      We thank the reviewer for their critical analysis. CASP8 and CASP10 are very similar proteins in humans. We are distinguishing between the two based on vertebrate phylogeny with outgroup proteins (CASP2, CASP9, and CFLAR, see tree in Author response image 1 below) to help define the CASP8/CASP10 clade. Once we isolate CASP8/10, we build an additional tree to distinguish CASP8 and CASP10. Using this method, all fish CASP8/10-like proteins cluster with the mammalian CASP10 clade rather than the CASP8 clade, despite many fish proteins being annotated as CASP8 or CASP8-like. We do acknowledge that, because of the similarities between CASP8 and CASP10, there are likely proteins that can fall in either clade depending on which outgroups are included. To this end, we have updated our gene loss figure to only denote whether a species has no CASP8/10, a single CASP8/10 protein, or both CASP8 and CASP10. We have also updated our methods to better define how we completed our analyses. 

      Author response image 1.

      (2) While analyzing which RIPK3 protein causes cell death (lines 188ff), the underlying assumption is that the heterologous RIPK3 proteins can interact with human MLKL and activate it by phosphorylation. No attempts are being made to check if MLKL actually gets phosphorylated, and this issue is also not discussed. In Figure 2C, cell death is either measured by RIPK3 overexpression alone or by the additional overexpression of ZBP1 and MLKL. However, it is not shown if in all cases all the transfected proteins are expressed at a comparable level, or if the observed cell death might be caused by MLKL/ZBP1 overexpression alone.

      Cell death is dependent on expression of ZBP1, MLKL, and RIPK3, as shown in

      Supplementary Figure 6. We have attempted to detect phospho-MLKL via western blot. However, in these overexpression assays, we are able to detect phospho-MLKL in the presence of RIPK3 and MLKL alone, independent of activation of cell death. In fact, we see reduced phospho-MLKL and reduced expression of MLKL overall when ZBP1, MLKL, and RIPK3 are added, presumably due to cell death induced in these conditions (see blot in Author response image 2 below). We therefore felt these data were of limited use here.

      Author response image 2.

      (3) The manuscript describes a well-documented bioinformatical analysis and acknowledges the body of earlier published work on necroptosis evolution and associated gene losses. However, when discussing the RHIM-related aspects, the authors do not mention previous publications on RHIM conservation in invertebrates and even fungal proteins such as Het-S. They also fail to mention/discuss the amyloid-forming properties of RHIMs, which I consider crucial for understanding the function of RHIM-containing proteins.

      We thank the reviewer for their insight. We have added additional points on both RHIM conservation and amyloid formation.

      (4) Related to the above issue: In lines 226ff, the induction of NFkB by RIPK3 overexpression is described. While RIPK3 from other mammals requires endogenous (human) RIPK1 to be present, lizard and turtle RIPK3 do not require human RIPK1 but *do* require functional RHIMs. It is not checked (or at least discussed) if RHIM amyloid formation is required, nor if the RHIM of the heterologous RIPK3 might act through interaction with endogenous (human) RIPK3.

      We and others (PMID: 29073079) did not detect RIPK3 protein in HEK293T cells. This, combined with the requirement for exogenous RIPK3 to activate cell death, indicate that endogenous RIPK3 is not contributing to these assays. 

      (5) In lines 275ff, the authors observe that RIPK1s from other mammalian species do not require the RHIM for NFkB activation, while RIPK1 from non-mammalian species do require the RHIM. I wonder why the (in my opinion) most obvious explanation is not addressed: Maybe the mammalian RIPK1 proteins are similar enough to the human one so that they can signal on their own, while the more distant RIPK1 cannot and thus require human RIPK1 (associated via RHIMs) for NFkB activation? Since the authors used RIPK1-deficient cells in previous experiments, wouldn't it make sense to test them here, too?

      It is intriguing that the more diverged RIPK1 species require the RHIM for NF-kB signaling. In Supplementary Figure 12, we do test the mammalian and non-mammalian proteins in RIPK1 KO cells and all proteins are able to activate NF-kB. So while nonmammalian RIPK1 signaling is dependent on the RHIM, it is independent of endogenous RIPK1.  

      Minor comments:

      (1) In the legend of Figure 1, there is a typo "heat amp".

      This typo has now been corrected.

      (2) In Figure 3A, the term "FUBAR" is not explained at all.

      FUBAR has now been defined in the methods section.

      Reviewer #3 (Recommendations for the authors):

      A few typos and graph inconsistencies have been encountered in the course of the manuscript, e.g.:

      (1) Line 168: 'heat amp' -> 'heat map'.

      (2) Lines 290-291: 'known mediate' -> 'known to mediate' (?)

      We thank the reviewer for catching these mistakes. They have been corrected. 

      (3) Supplementary Figure 12: Are human RIPK1 results presented in both 'mammalian' and 'non-mammalian' parts of the figure? If so, why do human data differ between the graphs?

      Mammalian and non-mammalian data were collected in separate experiments with human RIPK1 used as a control for both. The human data shown in the two graphs represent two separate experiments.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for the constructive comments, which have improved the manuscript. In response to these comments, we have made the following major changes to the main text and reviewer response:

      (1) Added experimental and computational evidence to support the use of Cut&Tag to determine speckle location.

      (2) Performed new Transmission Electron Microscopy (TEM) experiments to visualize interchromatin granule clusters +/- speckle degradation.

      (3) Altered the text of the manuscript to remove qualitative statements and clarify effect sizes.

      (4) Performed new analyses of published whole genome bisulfite data from LIMe-Hi-C following DNMT1 inhibition to demonstrate that CpG methylation is lost at DNMT1i-specific gained CTCF sites.

      (5) Included citations for relevant literature throughout the text.

      These revisions in addition to others are described in the point-by-point response below.

      Reviewer #1 (Public review):

      Summary

      Roseman et al. use a new inhibitor of the maintenance DNA methyltransferase DNMT1 to probe the role of methylation on binding of the CTCF protein, which is known to be involved chromatin loop formation. As previous reported, and as expected based on our knowledge that CTCF binding is methylation-sensitive, the authors find that loss of methylation leads to additional CTCF binding sites and increased loop formation. By comparing novel loops with the binding of the pre-mRNA splicing factor SON, which localizes to the nuclear speckle compartment, they propose that these reactivated loops localize to near speckles. This behavior is dependent on CTCF whereas degradation of two speckle proteins does not affect CTCF binding or loop formation. The authors propose a model in which DNA methylation controls the association of genome regions with speckles via CTCF-mediated insulation.

      Strengths

      The strengths of the study are 1) the use of a new, specific DNMT1 inhibitor and 2) the observation that genes whose expression is sensitive to DNMT1 inhibition and dependent on CTCF (cluster 2) show higher association with SON than genes which are sensitive to DNMT1 inhibition but are CTCF insensitive, is in line with the authors' general model.

      Weaknesses

      There are a number of significant weaknesses that as a whole undermine many of the key conclusions, including the overall mechanistic model of a direct regulatory role of DNA methylation on CTCF-mediated speckle association of chromatin loops.

      We appreciate the reviewer’s constructive comments and address them point-by-point below.

      (1) The authors frequently make quasi-quantitative statements but do not actually provide the quantitative data, which they actually all have in hand. To give a few examples: "reactivated CTCF sites were largely methylated (p. 4/5), "many CTCF binding motifs enriched..." (p.5), "a large subset of reactivated peaks..."(p.5), "increase in strength upon DNMT1 inhibition" (p.5); "a greater total number....." (p.7). These statements are all made based on actual numbers and the authors should mention the numbers in the text to give an impression of the extent of these changes (see below) and to clarify what the qualitative terms like "largely", "many", "large", and "increase" mean. This is an issue throughout the manuscript and not limited to the above examples.

      Related to this issue, many of the comparisons which the authors interpret to show differences in behavior seem quite minor. For example, visual inspection suggests that the difference in loop strength shown in figure 1E is something like from 0 to 0.1 for K562 cells and a little less for KCT116 cells. What is a positive control here to give a sense of whether these minor changes are relevant. Another example is on p. 7, where the authors claim that CTCF partners of reactivated peaks tend to engage in a "greater number" of looping partners, but inspection of Figure 2A shows a very minor difference from maybe 7 to 7.5 partners. While a Mann-Whitney test may call this difference significant and give a significant P value, likely due to high sample number, it is questionable that this is a biologically relevant difference.

      We have amended the text to include actual values, instead of just qualitative statements. We have also moderated our claims in the text to note where effect sizes are more modest.

      The following literature examples can serve as positive controls for the effect sizes that we might expect when perturbing CTCF. Our observed effect sizes are largely in line with these expected magnitudes.

      https://pmc.ncbi.nlm.nih.gov/articles/PMC8386078/ Fig. 2E

      https://www.cell.com/cell-reports/pdf/S2211-1247(23)01674-1.pdf Fig. 3J,K

      https://academic.oup.com/nar/article/52/18/10934/7740592 Fig. S5D (CTCF binding only).

      (2) The data to support the central claim of localization of reactivated loops to speckles is not overly convincing. The overlap with SON Cut&Tag (figure 2F) is partial at best and although it is better with the publicly available TSA-seq data, the latter is less sensitive than Cut&Tag and more difficult to interpret. It would be helpful to validate these data with FISH experiments to directly demonstrate and measure the association of loops with speckles (see below).

      A recent publication we co-authored validated the use of speckle (SON) Cut&Run using FISH (Yu et al, NSMB 2025, doi: 10.1038/s41594-024-01465-6). This paper also supports a role of CTCF in positioning DNA near speckles. Unfortunately, the resolution of these FISH probes is in the realm of hundreds of kilobases. This was not an issue for Yu et. al., as they were looking at large-scale effects of CTCF degradation on positioning near speckles. However, FISH does not provide the resolution we need to look at more localized changes over methylation-specific peak sites.

      Instead, we use Cut&Tag to look at these high-resolution changes. In Figure 3C, we show that SON localizes to DNMT1i-specific peaks only upon DNMT1 inhibition. We further demonstrate that this interaction is dependent on CTCF. In response to reviewer comments, we have now also performed spike-in normalized Cut&Tag upon acute (6 hr) SON degradation to validate that our signal is also directly dependent on SON and not merely due to a bias toward open chromatin.

      Author response image 1.

      TSA-seq has been validated with FISH (Chen et. al., doi: 10.1083/jcb.201807108), Alexander et. Al 10.1016/j.molcel.2021.03.006) Fig 6. We include TSA-seq data where possible in our manuscript to support our claims.

      We also note that Fig 2F shows all CTCF peaks and loops, not just methylation-sensitive peaks and loops, to give a sense of the data. We apologize for any confusion and have clarified this in the figure legend.

      (3) It is not clear that the authors have indeed disrupted speckles from cells by degrading SON and SRRM2. Speckles contain a large number of proteins and considering their phase separated nature stronger evidence for their complete removal is needed. Note that the data published in ref 58 suffers from the same caveat.

      Based upon the reviewers’ feedback, we generated Tranmission electron microscopy (TEM) data to visualize nuclear speckles +/- degradation of SON and SRRM2 (DMSO and dTAG). We were able to detect Interchromatin Granules Clusters (ICGs) that are representative of nuclear speckles in the DMSO condition. However, even at baseline, we observed a large degree of cell-to-cell variability in these structures. In addition, we also observe potential structural changes in the distribution of heterochromatin upon speckle degradation. Consequently, we hesitate to make quantitative conclusions regarding loss of these nuclear bodies. In the interest of transparency, we have included representative raw images from both conditions for the reviewers’ consideration.

      We also note that in Ref 58 (Ilik et. Al., https://doi.org/10.7554/eLife.60579), the authors show diffusion of speckle client proteins RBM25, SRRM1, and PNN upon SON and SRRM2 depletion, further supporting speckle dissociation in these conditions.

      Author response image 2.

      Author response image 3.

      (4) The authors ascribe a direct regulatory role to DNA methylation in controlling the association of some CTCF-mediated loops to speckles (p. 20). However, an active regulatory role of speckle association has not been demonstrated and the observed data are equally explainable by a more parsimonious model in which DNA methylation regulates gene expression via looping and that the association with speckles is merely an indirect bystander effect of the activated genes because we know that active genes are generally associated with speckles. The proposed mechanism of a regulatory role of DNA methylation in controlling speckle association is not convincingly demonstrated by the data. As a consequence, the title of the paper is also misleading.

      While it is difficult to completely rule out indirect effects, we do not believe that the relationship between methylation-sensitive CTCF sites and speckles relies only on gene activity.

      We can partially decouple SON Cut&Tag signal from gene activation if we break down Figure 4D to look only at methylation-sensitive CTCF peaks on genes whose expression is unchanged upon DNMT1 inhibition (using thresholds from manuscript, P-adj > 0.05 and/or |log2(fold-change)| < 0.5). This analysis shows that many methylation-sensitive CTCF peaks on genes with unchanged expression still change speckle association upon DNMT1 inhibition. This result refutes the necessity of transcriptional activation to recruit speckles to CTCF.

      Author response image 4.

      We note the comparator upregulated gene set here is small (~20 genes with our stringent threshold for methylation-sensitive CTCF after 1 day DNMT1i treatment).

      However, we acknowledge that these effects cannot be completely disentangled. We previously included the statement “other features enriched near speckles, such as open chromatin, high GC content, and active gene expression, could instead contribute to increased CTCF binding and looping near speckles” in the discussion. In response to the reviewer’s comment, we have further tempered our statements on page 20/21 and also added a statement noting that DNA demethylation and gene activation cannot be fully disentangled. While we are also open to a title change, we are unsure which part of the title is problematic. 

      (5) As a minor point, the authors imply on p. 15 that ablation of speckles leads to misregulation of genes by altering transcription. This is not shown as the authors only measure RNA abundance, which may be affected by depletion of constitutive splicing factors, but not transcription. The authors would need to show direct effects on transcription.

      We agree, and we have changed this wording to say RNA abundance.

      Reviewer #2 (Public review):

      Summary:

      CTCF is one of the most well-characterized regulators of chromatin architecture in mammals. Given that CTCF is an essential protein, understanding how its binding is regulated is a very active area of research. It has been known for decades that CTCF is sensitive to 5-cystosine DNA methylation (5meC) in certain contexts. Moreover, at genomic imprints and in certain oncogenes, 5meC-mediated CTCF antagonism has very important gene regulatory implications. A number of labs (eg, Schubeler and Stamatoyannopoulos) have assessed the impact of DNA methylation on CTCF binding, but it is important to also interrogate the effect on chromatin organization (ie, looping). Here, Roseman and colleagues used a DNMT1 inhibitor in two established human cancer lines (HCT116 [colon] and K562 [leukemia]), and performed CTCF ChIPseq and HiChIP. They showed that "reactivated" CTCF sites-that is, bound in the absence of 5meC-are enriched in gene bodies, participate in many looping events, and intriguingly, appear associated with nuclear speckles. This last aspect suggests that these reactivated loops might play an important role in increased gene transcription. They showed a number of genes that are upregulated in the DNA hypomethylated state actually require CTCF binding, which is an important result.

      Strengths:

      Overall, I found the paper to be succinctly written and the data presented clearly. The relationship between CTCF binding in gene bodies and association with nuclear speckles is an interesting result. Another strong point of the paper was combining DNMT1 inhibition with CTCF degradation.

      Weaknesses:

      The most problematic aspect of this paper in my view is the insufficient evidence for the association of "reactivated" CTCF binding sites with nuclear speckles needs to be more diligently demonstrated (see Major Comment). One unfortunate aspect was that this paper neglected to discuss findings from our recent paper, wherein we also performed CTCF HiChIP in a DNA methylation mutant (Monteagudo-Sanchez et al., 2024 PMID: 39180406). It is true, this is a relatively recent publication, although the BioRxiv version has been available since fall 2023. I do not wish to accuse the authors of actively disregarding our study, but I do insist that they refer to it in a revised version. Moreover, there are a number of differences between the studies such that I find them more complementary rather than overlapping. To wit, the species (mouse vs human), the cell type (pluripotent vs human cancer), the use of a CTCF degron, and the conclusions of the paper (we did not make a link with nuclear speckles). Furthermore, we used a constitutive DNMT knockout which is not viable in most cell types (HCT116 cells being an exception), and in the discussion mentioned the advantage of using degron technology:

      "With high-resolution techniques, such as HiChIP or Micro-C (119-121), a degron system can be coupled with an assessment of the cis-regulatory interactome (118). Such techniques could be adapted for DNA methylation degrons (eg, DNMT1) in differentiated cell types in order to gauge the impact of 5meC on the 3D genome."

      The authors here used a DNMT1 inhibitor, which for intents and purposes, is akin to a DNMT1 degron, thus I was happy to see a study employ such a technique. A comparison between the findings from the two studies would strengthen the current manuscript, in addition to being more ethically responsible.

      We thank the reviewer for the helpful comments, which we address in the point-by-point response below. We sincerely apologize for this oversight in our references. We have included references to your paper in our revised manuscript. It is exciting to see these complementary results! We now include discussion of this work to contextualize the importance of methylation-sensitive CTCF sites and motivate our study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      To address the above points, the authors should:

      (1) Provide quantitative information in the text on all comparisons and justify that the small differences observed, albeit statistically significant, are biologically relevant. Inclusion of positive controls to give an indication of what types of changes can be expected would be helpful.

      We have added quantitative information to the text, as discussed in the response to public comments above.  We also provide literature evidence of expected effect sizes in that response.

      (2) Provide FISH data to a) validate the analysis of comparing looping patterns with SON Cut&Tag data as an indicator of physical association of loops with speckles and b) demonstrate by FISH increased association of some of the CTCF-dependent loops/genes (cluster 2) with speckles upon DNMT1 inhibition.

      Please see response to Reviewer 1 comment #2 above. Unfortunately, FISH will not provide the resolution we need for point a). We have confidence in our use of TSA-seq and Cut&Tag to study SON association with CTCF sites on a genome-wide scale, which would not be possible with individual FISH probes. Specifically, since the submission of our manuscript several other researchers (Yu et al, Nat. Struct. and Mol. Biol. 2025, Gholamalamdari et al eLife 2025) have leveraged CUT&RUN/CUT&TAG and TSA-seq to map speckle associated chromatin and have validated these methods with orthogonal imaging based approaches.

      (3) Demonstrate loss of speckles upon SON or SRRM2 by probing for other speckle components and ideally analysis by electron microscopy which should show loss of interchromatin granules.  

      We have performed TEM in K562 cells +/- SON/SRRM2 degradation. Please see response to Reviewer 1 comment #3. Specifically, interchromatin granule clusters are visible in the TEM images of the DMSO sample (see highlighted example above), however, given the heterogeneity of these structures and potential global alterations in heterochromatin that may be occurring following speckle loss, we refrained from making quantitative conclusions from this data. We instead include the raw images above.

      (4) The authors should either perform experiments to clearly show whether loop association is transcription dependent or whether association is merely a consequence of gene activation. Alternatively, they should tone down their model ascribing a direct regulatory role of methylation in control of loop association with speckles and also discuss other models. Unless the model is more clearly demonstrated, the title of the paper should be changed to reflect the uncertainty of the central conclusion.

      Please see response to Reviewer 1 comment #4 above.

      (5) The authors should either probe directly for the effect of speckle ablation on transcription or change their wording.

      We have changed our wording to RNA abundance.

      Reviewer #2 (Recommendations for the authors):

      Major:

      ⁃ There was no DNA methylation analysis after inhibitor treatment. Ideally, genome bisulfite sequencing should be performed to show that the DNMT1i-specific CTCF binding sites are indeed unmethylated. But at the very least, a quantitative method should be employed to show the extent to which 5meC levels decrease in the presence of the DNMT1 inhibitor

      Response: We have now included analysis of genome wide bisulfite information from LIMe-Hi-C (bisulfite Hi-C) in K562 following DNMT1i inhibition. Specifically, we leverage the CpG methylation readout and find that DNTM1i-specific CTCF sites are more methylated than non-responsive CTCF peaks at baseline. In addition, these sites show the greatest decrease in CpG methylation upon 3 days of DNMT1 inhibition. We include a figure detailing these analyses in the supplement (Fig S1E). In addition, we have added CpG methylation genome browser tracks to (Fig S1D). In terms of global change, we have found that 3 days of DNMT1 inhibitor treatment leads to a reduction in methylation to about ~1/4 the level at baseline.

      I am not convinced that CUT&Tag is the proper technique to assess SON binding. CUT&Tag only works under stringent conditions (high salt), and can be a problematic assay for non-histone proteins, which bind less well to chromatin. In our experience, even strong binders such as CTCF exhibit a depleted binding profile when compared to ChIP seq data. I would need to be strongly convinced that the analysis presented in figures 2F-J and S2 D-I simply do not represent ATAC signal (ie, default Tn5 activity). For example, SON ChIP Seq, CUT&Tag in the SON degron and/or ATAC seq could be performed. What worries me is that increased chromatin accessibility would also be associated with increased looping, so they have generated artifactual results that are consistent with their model.

      As the reviewer suggested, we have now performed spike-in normalized SON Cut&Tag with DNMT1 inhibition and 6 hours of SON/SRRM2 degradation in our speckle dTAG knockin cell line. These experiments confirm that the SON Cut&Tag signal we see is SON-dependent. If the signal was truly due to artifactual binding, gained peaks would be open irrespective of speckle binding, however we see a clear speckle dependence as this signal is much lower if SON is degraded.

      Author response image 5.

      Moreover, in our original Cut&Tag experiments, we did not enrich detectable DNA without using the SON antibody (see last 4 samples-IgG controls). This further suggests that our signal is SON-dependent.

      Author response image 6.

      Finally, we see good agreement between Cut&Tag and TSA-seq (Spearman R=0.82).  The agreement is particularly strong in the top quadrant, which is most relevant since this is where the non-zero signal is.

      Author response image 7.

      Minor points

      ⁃ Why are HCT116 cells more responsive to treatment than K562 cells? This is something that could be addressed with DNA methylation analysis, for example

      K562 is a broadly hypomethylated cell line (Siegenfeld et.al, 2022 https://doi.org/10.1038/s41467-022-31857-5 Fig S2A-C). Thus, there may be less dynamic range to lose methylation compared to HCT116.

      Our results are also consistent with previous results comparing DKO HCT116 and aza-treated K562 cells (Maurano 2015, http://dx.doi.org/10.1016/j.celrep.2015.07.024). They state “In K562 cells, 5-aza-CdR treatment resulted in weaker reactivation than in DKO cells…”  In addition, cell-type-specific responsiveness to DNA methyltransferase KO depending upon global CpG methylation levels, has also been observed in ES and EpiLC cells (Monteagudo-Sanchez et al., 2024), which we now comment on in the manuscript.

      ⁃ How many significant CTCF loops in DNMTi, compared to DMSO? It was unclear what the difference in raw totals is.

      We now include a supplemental table with the HiChIP loop information. We call similar numbers of raw loops comparing DNMT1i and DMSO, as only a small subset of loops is changing.

      ⁃ For the architectural stripes, it would be nice to see a representative example in the form of a contact plot. Is that possible to do with the hiChIP data?

      As described in our methods, we called architectural stripes using Stripenn (Yoon et al 2022) from LIMe-Hi-C data under DNMT1i conditions (Siegenfeld et al, 2022). Shown below is a representative example of a stripe in the form of a Hi-C contact map.

      Author response image 8.

      ⁃ Here 4-10x more DNMT1i-specific CTCF binding sites were observed than we saw in our study. What are thresholds? Could the thresholds for DNMT1i-specific peaks be defined more clearly? For what it's worth, we defined our DNMT KO-specific peaks as fold-change {greater than or equal to} 2, adjusted P< 0.05. The scatterplots (1B) indicate a lot of "small" peaks being called "reactivated."

      We called DNMT1i-specific peaks using HOMER getDifferentialPeaksReplicates function. We used foldchange >2 and padj <0.05. We further restricted these peaks to those that were not called in the DMSO condition. 

      ⁃ On this note, is "reactivated" the proper term? Reactivated with regards to what? A prior cell state? I think DNMT1i-specific is a safer descriptor.

      We chose this term based on prior literature (Maurano 2015 http://dx.doi.org/10.1016/j.celrep.2015.07.024, Spracklin 2023 https://doi.org/10.1038/s41594-022-00892-7) . However, we agree it is not very clear, so we’ve altered the text to say “DNMT1i-specific”. We thank the reviewer for suggesting this improved terminology.

      ⁃ It appears there is a relatively small enrichment for CTCF peaks (of any class) in intergenic regions. How were intergenic regions defined? For us, it is virtually half of the genome. We did some enrichment of DNMT KO-specific peaks in gene bodies (our Supplemental Figure 1C), but a substantial proportion were still intergenic.

      We defined intergenic peaks using HOMER’s annotatepeaks function, with the -gtf option using Ensembl gene annotations (v104). We used the standard annotatepeaks priority order, which is TSS > TTS> CDS Exons > 5’UTR exons >3’ UTR exons > Introns > Intergenic.

      Maurano et. al. 2015 (http://dx.doi.org/10.1016/j.celrep.2015.07.024) also found reduced representation of intergenic sites among demethylation-reactivated CTCF sites in their Fig S5A. We note this is not a perfect comparison because their data is displayed as a fraction of all intergenic peaks.

      ⁃ We also recently published a review on this subject: The impact of DNA methylation on CTCF-mediated 3D genome organization NSMB 2024 (PMID: 38499830) which could be cited if the authors choose.

      We have cited this relevant review.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Prior research indicates that NaV1.2 and NaV1.6 have different compartmental distributions, expression timelines in development, and roles in neuron function. The lack of subtype-specific tools to control Nav1.2 and Nav1.6 activity however has hampered efforts to define the role of each channel in neuronal behavior. The authors attempt to address the problem of subtype specificity here by using aryl sulfonamides (ASCs) to stabilize channels in the inactivated state in combination with mice carrying a mutation that renders NaV1.2 and/or NaV1.6 genetically resistant to the drug. Using this innovative approach, the authors find that action potential initiation is controlled by NaV1.6 while both NaV1.2 and NaV1.6 are involved in backpropagation of the action potential to the soma, corroborating previous findings. Additionally, NaV1.2 inhibition paradoxically increases the firing rate, as has also been observed in genetic knockout models. Finally, the potential anticonvulsant properties of ASCs were tested. NaV1.6 inhibition but not NaV1.2 inhibition was found to decrease action potential firing in prefrontal cortex layer 5b pyramidal neurons in response to current injections designed to mimic inputs during seizure. This result is consistent with studies of loss-of-function Nav1.6 models and knockdown studies showing that these animals are resistant to certain seizure types. These results lend further support for the therapeutic promise of activity-dependent, NaV1.6-selective, inhibitors for epilepsy.

      Strengths:

      (1) The chemogenetic approaches used to achieve selective inhibition of NaV1.2 and NaV1.6 are innovative and help resolve long-standing questions regarding the role of Nav1.2 and Nav1.6 in neuronal electrogenesis.

      (2) The experimental design is overall rigorous, with appropriate controls included.

      (3) The assays to elucidate the effects of channel inactivation on typical and seizure-like activity were well selected.

      Weaknesses:

      (1) The potential impact of the YW->SR mutation in the voltage sensor does not appear to have been sufficiently assessed. The activation/inactivation curves in Figure 1E show differences in both activation and inactivation at physiologically relevant membrane voltages, which may be significant even though the V1/2 and slope factors are roughly similar.

      We have performed new experiments testing how YW->SR mutations affect spiking on their own. The reviewer’s intuition was correct; the small changes in voltage-dependence in NaV1.6 identified in heterologous expression systems translated into a ~2 mV hyperpolarization in threshold in neurons.

      (2) Additional discussion of the fact that channels are only partially blocked by the ASC and that ASCs act in a use-dependent manner would improve the manuscript and help readers interpret these results.

      We have updated text extensively to address this concern. Details are found in the author suggestions below.

      (3) NaV1.6 was described as being exclusively responsible for the change in action potential threshold, but when NaV1.6 alone was inactivated, the effect was significantly reduced from the condition in which both channels were inactivated (Figure 4E). Similarly, Figure 6C shows that blockade of both channels causes threshold depolarization prior to the seizure-like event, but selective inactivation of NaV1.6 does not. As NaV1.2 does not appear to be involved in action potential initiation and threshold change, what is the mechanism of this dissimilarity between the NaV1.6 inactivation and combined NaV1.6/ NaV1.2 inactivation?

      We believe the dissimilarity is due to interactions between NaV1.2 and other channel classes (e.g., potassium channels) throughout the cell, including the somatodendritic domain. NaV1.6 that initiates APs, localized to the AIS, do not live in isolation, and AP threshold can be affected by the recent membrane potential history. Loss of NaV1.2-mediated depolarization in the dendrites begets less potassium channel-mediated repolarization, as described in Figure 4.

      (4) The idea that use-dependent VGSC-acting drugs may be effective antiseizure medications is well established. Additional discussion or at least acknowledgement of the existing, widely used, use-dependent VGSC drugs should be included (e.g. Carbamazepine, Lamotrigine, Phenytoin). Also, the idea that targeting NaV1.6 may be effective for seizures is established by studies using genetic models, knockdown, and partially selective pharmacology (e.g. NBI-921352). Additional discussion of how the results reported here are consistent with or differ from studies using these alternative approaches would improve the discussion

      We agree; the concept of use-dependent block as a means to treat seizure is not new, and we have updated the discussion to include commentary on other medications currently in use. What is new here is our ability to explore the role of NaV1.2 and NaV1.6 in electrogenesis with a level of drug selectivity that could not be achieved without the addition of the YW->SR mutations. This approach in itself will not be useful in the clinic, but it may help guide drug design in the future. One major interpretation of this work is that NaV1.6 block is more effective than NaV1.2 block in general, and may even be effective for non-SCN8A genetic conditions. This is indeed one of the reasons that we believe that drugs like NBI-921352, itself an aryl-sulfonamide, is being tested in seizure models.

      Reviewer #2 (Public review):

      The authors used a clever and powerful approach to explore how Nav1.2 and Nav1.6 channels, which are both present in neocortical pyramidal neurons, differentially control firing properties of the neurons. Overall, the approach worked very well, and the results show very interesting differences when one or the other channel is partially inhibited. The experimental data is solid and the experimental data is very nicely complemented by a computational model incorporating the different localization of the two types of sodium channels.

      In my opinion the presentation and interpretation of the results could be improved by a more thorough discussion of the fact that only incomplete inhibition of the channels can be achieved by the inhibitor under physiological recording conditions and I thought the paper could be easier to digest if the figures were re-organized. However, the key results are well-documented.

      This is a concern raised by multiple reviewers, and we thank you all for your help in improving the way in which we discuss the results. We have revised the manuscript extensively, moving figures around per your advice and the advice of R1 in their comments to authors.

      Reviewer #3 (Public review):

      Summary:

      The authors used powerful and novel reagents to carefully assess the roles of the voltage gated sodium channel (NaV) isoforms in regulating the neural excitability of principal neurons of the cerebral cortex. Using this approach, they were able to confirm that two different isoforms, NaV1.2 and NaV1.6 have distinct roles in electrogenesis of neocortical pyramidal neurons.

      Strengths:

      Development of very powerful transgenic mice in which NaV1.2 and/or NaV1.6 were modified to be insensitive to ASCs, a particular class of NaV blocker. This allowed them to test for roles of the two isoforms in an acute setting, without concerns of genetic or functional compensation that might result from a NaV channel knockout.

      Careful biophysical analysis of ASC effects on different NaV isoforms.

      Extensive and rigorous analysis of electrogenesis - action potential production - under conditions of blockade of either NaV1.2 or NaV1 or both.

      Weaknesses:

      Some results are overstated in that the representative example records provided do not directly support the conclusions.

      We have swapped out example records to better capture the median effect observed and to better capture our discussion of these results. Please see below, in recommendations for authors, for details.

      Results from a computational model are provided to make predictions of outcomes, but the computational approach is highly underdeveloped.

      Modeling has been elaborated upon extensively, with more detail in methods, a new sensitivity analysis supplemental figure, and a deposition into ModelDB.  Please see below, in recommendations for authors, for details.

      Reviewer #1 (Recommendations for the authors):

      Regarding the concern about the potential impact of the YWàSR mutation: All results in Figures 2-6 report only within-subject changes before and after drug-activating protocols. These results show that the drug has no effect on the mutant channel, but whether the mutant channel itself has any effect on neuronal properties is not clear. This deficiency could be rectified by reporting raw values for AP threshold, spike rate, etc. in the pre-drug condition and statistically analyzing the apparent differences in the activation/inactivation curves.

      Data in our original submission only included data in the presence of GNE-4076. We now present new data showing how the YWàSR mutation affects baseline activity of neurons. These data are in Supplemental Figure 1. Compared to wildtype (no drug control) neurons, we observe no change in peak dV/dt. However, threshold is hyperpolarized by approximately 2 mV in dual knockin neurons (median values: -57.4 mV for dual knockin and -55 mV for wildtype). This is consistent with measures from heterologously expressed channels, where we observed somewhat subtle shifts in voltage-dependence of inactivation and activation in NaV1.6 as a result of YWàSR incorporation. 

      In addition to these data, we also include the baseline dataset from Figure 3, where GNE-4076 is present throughout recording, and report that neither threshold nor peak dV/dt are influenced by the presence of GNE at baseline. This suggests that any drug binding at baseline (i.e., before firing APs via somatic current injection) is negligible, consistent with the concept that GNE-4076 has low affinity for the closed channel state.

      Minor Comments:

      While the single-cell response to "seizure-like" input aptly demonstrates the change in action potential threshold and firing rate induced by NaV1.6 inhibition, this component of the paper could be enhanced by a network-level assay that assesses the impact of this drug on an actual seizure-like event in acute slices or on seizure susceptibility in vivo.

      This is an excellent thought, and the work near the end of this manuscript is an effort to mimic network-like activity in a controlled way in single cells. To expand this to bona fide seizure-like activity in acute slices or in vivo is something that we are considering for future studies. To do this properly requires extensive validation of dosing and seizure induction that will require several years’ effort.

      Fig 1e caption says "circles" but the markers are squares

      This has been corrected, thank you for catching it.

      Color scheme in S2B is not intuitive to me

      We’ve now updated the caption to better describe the color scheme used within.

      Fig S2: graph or show change in threshold

      Empirical threshold data are in main figure 3D. Changes in threshold related to modeling are now included in a new sensitivity analysis that is in a new Supplemental Figure 2.

      Fig 3A example of NaV1.6 inhibition does not show change in AP threshold apparent in the aggregate data

      We have updated the representative example to better illustrate the change in AP threshold for NaV1.6 inhibition.

      "AP initiation is mediated exclusively by NaV1.6" not corroborated by data; APs still occur when NaV1.6 is inhibited

      This was an over-interpretation of our data, indeed. We have updated the language to be more accurate to the following: “AP threshold and AP initiation appears to be initiated in an NaV1.6-rich region in control conditions; when NaV1.6 is inhibited, APs can occur at more depolarized potentials, likely mediated predominately by NaV1.2.”

      Fig S3C missing WT/Scn8aSR/SR significance marking. Chosen example makes it look like there is a small decrease.

      Please note that there is no difference between these two conditions when in delta dV/dt for AIS inflection point (p = 0.4344).

      Reviewer #2 (Recommendations for the authors):

      This manuscript presents a clever and powerful approach to examining differential roles of Nav1.2 and Nav1.6 channels in excitability of pyramidal cell excitability, by engineering mice in which a sulfonamide inhibitor of both channels has reduced affinity for one or the other. Overall, the results in the manuscript are interesting and give important information about differential roles of Nav1.6 and Nav1.2 channels.

      The paper makes an important contribution to better understanding distinct roles of Nav1.2 and Nav1.6 channels. This improved understanding could help guide design of anti-seizure drugs targeted to sodium channels.

      Having made it clear that I think this is an important and impressive piece of work for which the authors should be congratulated, I found reading and interpreting the manuscript a frustrating experience. I will be blunt about the ways in which I found the presentation and discussion to be frustrating and even annoying, in the spirit of frank feedback by one interested and appreciative reader that the authors can consider or reject as they wish.

      From the start, I had the feeling that the authors were presenting and discussing the results in a sanitized "never-mind-about the details" fashion such as might be appropriate for a seminar to a general audience not interested in details, but not appropriate for a research paper.

      Our intent certainly was not to frustrate or annoy readers. We are very grateful that you have provided these comments, which have certainly improved the manuscript, hopefully mitigating some of the frustration for future readers. We appreciate that there are complex drug and voltage effects occurring within these studies, and in an effort to distill these effects into digestible prose, we appear to have been too earnest. We have expanded on the requested topics below and please note that, for the aficionados, every figure displays individual data. Further, we have made a special effort to ensure that features of excitability are presented throughout the drug and manipulation timecourse, including time-points before and after periods subject to statistical comparison, so that the reader may draw their own conclusions.

      General:

      There were two major ways in which I found the presentation and discussion frustrating and even annoying: First, not clearly discussing early in the presentation the fact that it is impossible to achieve complete inhibition with this agent during measurements of physiological firing and second, presenting so much of the effects as deltas of various parameters rather than showing effects on absolute values of the parameters.

      Our response to the first issue will follow the next comment, as it relates to this statement. Regarding use of deltas and absolute values for changes in threshold and dV/dt across figures. Every cell has a unique AP threshold and peak dV/dt, and we found that displaying data zeroed to baseline values best illustrated the effects of GNE-4076. Without this, GNE-based effect could be buried within the cell-to-cell variability. This helped most when trying to make the case that threshold was unaffected in 2a/8a YWàSR knockin animals. We continue to believe that this is the best way to display the data in the primary figures, but to provide a more complete account, we now present absolute values in supplemental tables and supplemental figures.

      The first issue, the incomplete inhibition by the agent, was the most annoying because the authors obviously thought a lot about this and even closed the paper by proposing this as a positive feature of this class of inhibitors, yet discussed it only piecemeal - and with most of the key experimental data in the Supplement. There are two fundamental characteristics of this (and other) sulfonamide inhibitors that complicate interpretation of experiments, especially when applied in a slice experiment: they only bind to the channel when the channel is depolarized, and even when the channel is depolarized for many seconds, bind very slowly to the channel.

      That makes it almost impossible to know exactly what fraction of channels is being inhibited during measurements of firing. Obviously, the authors are well-aware of this issue and they allude to it and even make use of it in some of the protocols, but they never really discuss it in a very clear manner.

      We agree that it is impossible to know the precise fraction of channels inhibited in acute slice preparations. But the reason for this is likely different than what has been interpreted by this reviewer. To state that ASMs “only bind to the channel when the channel is depolarized, and even when the channel is depolarized for many seconds, bind very slowly to the channel.” is not consistent with prior data on ASM–channel interactions. Clarification on these points may help the reviewer and a broader audience better understand the effects occurring here, and we appreciate being able to both address this concept here and by revising the manuscript.

      First, ASMs bind activated channels and stabilize the inactivated state. It is correct that channels are more likely to enter these states when subject to voltage depolarization, but channel state is stochastic and can enter activated states near resting membrane potentials. The on-rate is fast enough that channels are blocked immediately in recordings in heterologous systems (Figure 1C). It is more likely that channel biophysical state stochasticity, along with drug concentration used herein, are likely dictating the rate at which channels accumulate block during repetitive spiking.

      To address this in text, we have revised the 3rd paragraph of the introduction to better incorporate these ideas. This also helps with comments in the reviewer paragraph below.

      The key experimental data on this is relegated to the Supplemental Figures. When the reader is first shown results of the effects of the inhibitor on firing in Fig 2, the presentation has been set up as if everything is perfect, and the inhibitor will be completely inhibiting either both or only one channel according to the mouse. With this presentation, it is then exceptionally striking that the cell in the middle panel of Fig 2A, labeled "Nav1.2/1.6 Inhibited" is firing action potentials very nicely even with both channels "inhibited". For a reader not already aware that there is likely only partial inhibition of each channel, the reaction will be "Huh? Shouldn't blocking both channels simply completely block excitability?". The authors do preface Fig 2 by a very brief allusion to the incomplete inhibition: "In spiking neurons, ASCs would therefore be predicted to exhibit use-dependence, progressively blocking channels in proportion to a neuron's activity rate" but this comes out of nowhere after the over-simplified picture of complete inhibition up to that point, and without any estimation of how much inhibition there is likely to be before activity, or how much induction of inhibition there is likely to be during the activity. Without this, interpreting the data in Fig 2 is basically impossible.

      The key experimental data on this issue is really in Supplemental Figures 1-2 and Fig 4, and I found myself immediately ping-ponging back and forth between the Supplemental figures and the main text trying to understand what is going on with the partial inhibition. This was frustrating.

      Thank you for these suggestions; they help with readability appreciably. We have re-organized the figures presented in the manuscript and emphasized details about ASCs to ensure readers can discern between near-complete blockade of channels (Figures 1-4) and activity-dependent ASC onboarding (Figures 5-7). We now present near-complete block experiments first, detailing the current clamp-> voltage clamp (-12 mV)-> current clamp experiments. We incorporated Supp. Fig. 1 into main Figure 1 and moved Supp. Fig. 2 into main Fig. 2.

      As the reviewer notes, there are clear time-dependent effects on channel function when stepping to -12 mV, independent of GNE-4076 block. As stated previously, “We therefore focused on the 12-20 sec after voltage-clamp offset for subsequent analysis, as it is a period in which most channel-intrinsic recovery has occurred, but also a period in which we would still expect significant block from GNE-4076.” We hope that reordering the manuscript as suggested and placing these results near the beginning will help with discerning between near-complete block and activity depending onboarding. By beginning with these experiments, which underscore that 100% block cannot be studied without “contamination” from native slow inactivation, we hope that the readers can better understand why data was done as presented.

      In my opinion, the paper would be greatly improved by a detailed discussion of the voltage- and time-dependence of the inhibitor at the very beginning of the paper. For me, reading and digesting the paper would have been far easier if Fig 1 included a discussion of the voltage- and time-dependence of inhibition, and next Figs were then Supplemental Figs 1-2, and main Fig 4. The key questions are: how much inhibition is there before a 10-s current injection from the resting potential, and how much additional inhibition is there produced during either the 10-s bout of firing or the "on-boarding" depolarization protocol, and how long does that additional inhibition last? The most direct information on that is in the plots in Fig. 4D and Fig 4F in combination with Supplemental Fig 1, which shows that the on-boarding depolarization reduces current to about 30% of current before on-boarding. This is so central to the interpretation of all the results that I think Supp Fig 1 should be in the main paper as the first piece of data in neurons.

      We originally had the nucleated patch data in supplement due to space constraints in an already large figure 1. Based on your recommendation we have moved it to the main figure. We have also changed the ordering of the paper and related figures to present data as suggested. Hopefully this better guides readers through the questions you are raising above, which are addressed in the (now reordered) figures mentioned above.

      Specific:

      (1) Fig.1 I can find no information on the voltage protocol used to generate the dose-response curves. In the literature characterizing sulfonamide blockers, most protocols use very unphysiological strong, long depolarization to induce inhibition, usually with equally unphysiological short hyperpolarizations to produce recovery from inactivation. One assumes something like that was used here. Obviously, the protocol needs to be explained.

      We updated the methods section to better describe the voltage protocol used to generate the dose response curves. In contrast to the literature characterizing sulfonamide blockers, we used pulses that closely mimic physiological activation from -80 mV (rest) to 0 mV (depolarized) for 20 msec. GNE-4076 was perfused onto cells at increasing concentrations throughout the experiment. At each successive dose, cells were held at 0 mV to allow adequate GNE-4076 onboarding.

      (2) Supp Fig1. This shows the effect of depolarization to enhance inhibition, but not how much inhibition there was before the depolarization. Presumably, there were measurements during the application of drug? How much inhibition is there before the depolarization? Why does the time only go to 20-s, when the times in Figs 4 go to 10 minutes?

      Nucleated patch recordings are notoriously difficult to maintain for long durations, especially when subjecting the patch to large voltage deflections. These recordings extend to 20s recovery periods because that is the duration for which we maintained all recordings, though some exhibited rather impressive longevity and allowed for several minutes of recording thereafter. Regardless, the goal here was to assess block within the 12-20 sec recovery window we utilized in current clamp recordings from intact neurons. This was achieved.

      Please note that GNE-4076 was present throughout all recordings. This was in part due to time constraints, as we could not maintain patches long enough to also perform wash-in. The degree of inhibition can be inferred by comparing peak dV/dt and threshold of cells in the absence and presence of GNE-4076. These data are presented in a new Supplemental figure 1, showing no difference in threshold or peak dV/dt.

      (3) Fig. 4. Similar question here - this is a very nice and informative figure, but we see only the delta in threshold and dv/dt, but how were the initial absolute values different in the drug compared to control?

      These data are presented in a new Supplemental Figure 1, showing no difference in threshold or peak dV/dt.

      (4) Fig 2. As far as I can tell, we have no idea how much inhibition there is at rest, before the current injection -what is the dv/dt in the drug compared to in the control? Were there experiments in which the current injections were delivered before and after applying drug? If not, at least it would be useful to see population data on dv/dt of the first spike in control and with drug.

      These data are presented in a new Supplemental Figure 1, showing no difference in threshold or peak dV/dt.

      (5). Fig. 2. Do the authors have any quantitative information on how much extra inhibition would be produced at 200 nM drug using physiological waveforms of firing?

      These types of analyses are part of later figures using EPSC-like waveforms to evoke spiking.

      I was unconvinced that the changes in threshold and dv/dt during the firing in the drug necessarily represent time-dependent use-dependent effects of drug. Partial inhibition by TTX would probably produce greater progressive changes in spike shape and reduced ability to fire robustly.

      TTX is not use-dependent, so it is a good contrast to GNE-4076. We experimented with a few cells at 2 and 10 nM TTX concentrations and found that concentrations required to mimic the block of spiking that occurs with 200 nM GNE-4076 in WT cells was associated with a marked use-independent elevation in AP threshold, with an inability to maintain ~10 Hz spiking rates with the baseline EPSC-like stimulation pattern. These effects are very different from those produced by GNE-4076, but were expected given the use-independence of TTX. We did not pursue this line of inquiry fully, so we present these data only as individual examples in the reviewer figure below:

      Author response image 1.

      Data from Figure 6B, D, E are replicated here with individual lines of 2 nM and 10 nM TTX shown in dashed lines. Note marked changes in threshold not observed with GNE-4076. TTX sourced from Alomone Labs.

      Minor:

      p. 5 and elsewhere: it seems unnecessary to give values of threshold and dv/dt to three decimal places, especially when the precision is not better than a single decimal place.

      We have reduced unnecessary precision throughout.

      Reviewer #3 (Recommendations for the authors):

      The computational model is highly underdeveloped. Without more rigorous development the results of the computational model appear to provides little additional insight beyond that expected from the known axodendritic localizations of NaV 1.2 and 1.6. If the authors wish to use the computational results to make rigorous predictions, then this section needs to be either be expanded to be more complete and promoted to a regular figure, with full details of the model, and how it was evaluated for accuracy. Alternatively, this point regarding computational insight could be de-emphasized and or removed from the paper.

      Modeling:

      (1) I don't see any methods describing the precise model parameters that were used.

      Apologies, this is a model that we have built and tested extensively over the years (PMID: 38290518, 35417922, 34348157, 31995133, 31230762, 28256214), though there have been some small updates over these works. We have deposited this model at ModelDB and provide data there regarding model construction (access #2019342).

      (2) There appears to be no robustness test to assess whether the particular results/conclusions were unduly dependent on particular model construction decisions.

      We have now generated a new supplemental figure 2 that explores the robustness of these observations to changes in NaV1.2 and NaV1.6 position within the AIS and changes in relative density of NaV1.2 and NaV1.6. As shown there, the model is tolerant to all but extreme, non-physiological manipulations to these parameters.

      (3) Figure S2 does not really provide convincing evidence of a biologically relevant model. Probably the model itself needs to be redesigned to better replicate the biological response and be validated by testing parameter sensitivity.

      a) All of the results in S2C show that there is a huge reduction in the first action potential (black?) followed by relatively little change in subsequent spikes. This is not seen in any of the models. The progressive changes in threshold as predicted by the model for dual and NaV1.6 block are not at all evident in the results of C, except perhaps for the the very first and the very last spikes.

      b) The baseline action potential in B is different than the recorded action potentials. In particular, the somatic depolarization occurs much later and over a more extended time frame than the real neuron, and the phase plot shows an actual dip in depolarization at the transition to the somatic spike, which is not representative of naturally occurring action potentials.

      To address both (a) and (b), please note that in empirical experiments there are two parallel processes occurring: block by GNE-4076 and channel recovery from inactivation. In the model we can isolate the effects of block to test that parameter fully and in isolation. This is something that we could never achieve biologically. The important take home here in both cases is to observe that with NaV1.6 block there is a change in threshold, whereas with NaV1.2 block there is none.

      (4) The one finding that seems to be robust is that the changes in NaV1.2 have little effect on threshold.

      Yes! This is a major take-home message from both the model and the use of these knockin mice in combination with GNE-4076. In mature pyramidal cells, NaV1.6 is the major determinant of AP threshold. And to editorialize on this observation, changes in threshold are a useful metric to test if other pharmacology are truly selective for NaV1.2 over NaV1.6. We note that phrixotoxin-3, which is described as NaV1.2 specific in multiple papers, was never tested for specificity over NaV1.6 in its original description, and we find that it fails this test in our hands.

      Data presentation:

      (1) The phase plots in Figure 3B (left and right) appear to be visually identical, and as such don't strongly support any particular conclusion.

      We changed the representative example record (specifically for Fig. 3A-B) to more directly support the conclusions.

      (2) It is unclear to me what is meant by AP speed (title of Figure 3 legend). Do the authors mean propagation speed along the axon, or perhaps the rate of action potential firing?

      Apologies, we are referencing dV/dt when we mention AP speed. We updated AP speed to AP velocity throughout the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editor for their positive view and constructive valuable comments on the manuscript.  Following we address the suggestions of the reviewers.

      Reviewer #1 (Public Review):

      (1) It will be interesting to monitor the levels of another MIM insertase namely, OXA1. This will help to understand whether some of the observed changes in levels of OXPHOS subunits are related to alterations in the amounts of this insertase.

      OXA1 was not detected in the untargeted mass spectrometry analysis, most likely due to the fact that it is a polytopic membrane protein, spanning the membrane five times (1,2). Consequently, we measured OXA1 levels with immunoblotting, comparing patient fibroblast cells to the HC. No significant change in OXA1 steady state levels was observed.

      These results are now displayed (Fig. S3B and C) and discussed in the revised manuscript.

      Figure 3: How do the authors explain that although TIMM17 and TIMM23 were found to be significantly reduced by Western analysis they were not detected as such by the Mass Spec. method?

      The untargeted mass spectrometry in the current study failed to detect the presence of TIMM17 for both, patient fibroblasts and mice neurons, while TIMM23 was detected only for mice neurons and a decrease was observed for this protein but was not significant. This is most likely due to the fact that TIMM17 and TIMM23 are both polytopic membrane proteins, spanning the membrane four times, which makes it difficult to extract them in quantities suitable for MS detection (2,3).

      (2) How do the authors explain the higher levels of some proteins in the TIMM50 mutated cells?

      The levels of fully functional TIM23 complex are deceased in patients' fibroblasts. Therefore, the mechanism by which the steady state level of some TIM23 substrate proteins is increased, can only be explained relying on events that occur outside the mitochondria. This could include increase in transcription, translation or post translation modifications, all of which may increase their steady state level albite the decrease in the steady state level of the import complex.

      (3) Can the authors elaborate on why mutated cells are impaired in their ability to switch their energetic emphasis to glycolysis when needed?

      Cellular regulation of the metabolic switch to glycolysis occurs via two known pathways: 1) Activation of AMP-activated protein kinase (AMPK) by increased levels of AMP/ADP (4). 2) Inhibition of pyruvate dehydrogenase (PDH) complexes by pyruvate dehydrogenase kinases (PDK) (5). Therefore, changes in the steady state levels of any of these regulators could push the cells towards anaerobic energy production, when needed. In our model systems, we did not observe changes in any of the AMPK, PDH or PDK subunits that were detected in our untargeted mass spectrometry analysis (see volcano plots below, no PDK subunits were detected in patient fibroblasts). Although this doesn’t directly explain why the cells have an impaired ability to switch their energetic emphasis, it does possibly explain why the switch did not occur de facto.

      Author response image 1.

      Reviewer #2 (Public Review):

      (1) The authors claim in the abstract, the introduction, and the discussion that TIMM50 and the TIM23 translocase might not be relevant for mitochondrial protein import in mammals. This is misleading and certainly wrong!!!

      Indeed, it was not in our intention to claim that the TIM23 complex might not be relevant. We have now rewritten the relevant parts to convey the correct message:

      Abstract –

      Line 25 - “Strikingly, TIMM50 deficiency had no impact on the steady state levels of most of its putative substrates, suggesting that even low levels of a functional TIM23 complex are sufficient to maintain the majority of complex-dependent mitochondrial proteome.”

      Introduction –

      Line 87 - Surprisingly, functional and physiological analysis points to the possibility that low levels of TIM23 complex core subunits (TIMM50, TIMM17 and TIMM23) are sufficient for maintaining steady-state levels of most presequence-containing proteins. However, the reduced TIM23CORE component levels do affect some critical mitochondrial properties and neuronal activity.

      Discussion –

      Line 339 – “…surprising, as normal TIM23 complex levels are suggested to be indispensable for the translocation of presequence-containing mitochondrial proteins…”

      Line 344 – “…it is possible that unlike what occurs in yeast, normal levels of mammalian TIMM50 and TIM23 complex are mainly essential for maintaining the steady state levels of intricate complexes/assemblies.”

      Line 396 – “In summary, our results suggest that even low levels of TIMM50 and TIM23CORE components suffice in maintaining the majority of mitochondrial matrix and inner membrane proteome. Nevertheless, reductions in TIMM50 levels led to a decrease of many OXPHOS and MRP complex subunits, which indicates that normal TIMM50 levels might be mainly essential for maintaining the steady state levels and assembly of intricate complex proteins.”

      Reviewer #1 (Recommendations For The Authors):

      (1) Lines 25-26: The authors write "Strikingly, TIMM50 deficiency had no impact on the steady state levels of most of its substrates". Since the current data challenges the definition of some proteins as substrates of TIMM50, I suggest using the term "putative substrates".

      Changed as suggested

      (2) Line 27: It is not clear whether the wording "general import role of TIM23" it refers to the TIM23 protein or the TIM23 complex. This should be clarified.

      Clarified. It now states "TIM23 complex".

      (3) Line 72: should be "and plays".

      Changed as suggested.

      (4) It will be helpful to include in Figure 1 a small scheme of TIMM50 and to indicate in which domain the T252M mutation is located.

      We predicted the AlphaFold human TIMM50 structure and indicated the mutation site and the different TIMM50 domains. The structure is included in Fig. 1A.

      (5) I suggest labelling the "Y" axis in Fig. 1B as "Protein level (% of control)".

      Changed as suggested in Fig. 1C (previously Fig. 1B) and in Fig. 2C.

      (6) Line 179: since the authors tested here only about 10 mitochondrial proteins (out of 1500), I think that the word "many" should be replaced by "several representative" resulting in "steady state levels of several representative mitochondrial proteins".

      Changed as requested.

      (7) Line 208: correct typo.

      Typo was corrected.

      (8) Figure 4 is partially redundant as its data is part of Figure 3. The authors can consider combining these two figures. Accordingly, large parts of the legend of Figure 4 are repeating information in the legend to Figure 3 and can refer to it.

      We revamped Figures 3 and 4. Figure 3 now shows the analysis of fibroblasts proteomics while Figure 4 focuses on neurons proteomics. We also modified the legend of Figure 4.

      Reviewer #2 (Recommendations For The Authors):

      (1) Abstract: 'Strikingly, TIMM50 deficiency had no impact on the steady state levels of most of its substrates, challenging the currently accepted import dogma of the essential general import role of TIM23 and suggesting that fully functioning TIM23 complex is not essential for maintaining the steady state level of the majority of mitochondrial proteins'. This sentence needs to be rephrased. The data do not challenge any dogma! The authors only show that lower levels of functional TIM23 are sufficient.

      We have rewritten all the relevant sentences as suggested (details are also mentioned in response to reviewer 2 public review point 1)

      (2) Introduction: 'Surprisingly, functional and physiological analysis points to the possibility that TIMM50 and a fully functional TIM23 complex are not essential for maintaining steady-state levels of most presequence-containing proteins'. This again needs to be rephrased.

      Rewritten as suggested (details mentioned in response to reviewer 2 public review point 1)

      (3) Discussion: 'In summary, our results challenge the main dogma that TIMM50 is essential for maintaining the mitochondrial matrix and inner membrane proteome, as steady state level of most mitochondrial matrix and inner membrane proteins did not change in either patient fibroblasts or mouse neurons following a significant decrease in TIMM50 levels.' This again needs to be rephrased.

      Rewritten as suggested (details mentioned in response to reviewer 2 public review point 1)

      (4) The analysis of the proteomics experiment should be improved. The authors show in Figures 3 and 4 several times the same volcano plots in which different groups of proteins are indicated. It would be good to add (a) a principal component analysis to show that the replicates from the mutant samples are consistently different from the controls, (b) a correlation plot that compares the log-fold-change of P1 to that of P2 to show which of the proteins are consistently changed in P1 and P2 and (c) a GO term analysis to show in an unbiased way whether mitochondrial proteins are particular affected upon TIMM50 depletion.

      Figures 3 and 4 have been changed to avoid redundancy. Figure 3 now focuses on fibroblasts proteomics (with additional analysis), while Figure 4 focuses on neurons proteomics. PCA analysis was added in Fig S1, showing that the proteomics replicates of both patients (P1 and P2) are consistently different than the healthy control (HC) replicates. Correlation plots were added in Figure 3C and D, showing high correlation of the downregulated and upregulated mitochondrial proteins between P1 and P2. These plots further highlight that MIM proteins are more affected than matrix proteins and that the OXPHOS and MRP systems comprise the majority of significantly downregulated proteins in both patients. GO term analysis was performed for all the detected proteins that got significantly downregulated in both patients. The GO term analysis is displayed in Figure S3A, and shows that mitochondrial proteins, mainly of the OXPHOS and MRP machineries, are particularly affected.

      (5) Figure 1. The figure shows the levels of TIM and TOM subunits in two mutant samples. The quantifications suggest that the levels of TIMM21, TOMM40, and mtHsp60 are not affected. However, from the figure, it seems that there are increased levels of TIMM21 and reduced levels of TOMM40 and mtHsp60. Unfortunately, in the figure most of the signals are overexposed. Since this is a central element of the study, it would be good to load dilutions of the samples to make sure that the signals are indeed in the linear range and do scale with the amounts of samples loaded.

      The representative WB panels display the Actin loading control of the representative TIMM50 repeat (the top panel). However, each protein was tested separately, at least three times, and was normalized to its own Actin loading control.

      (6) Figure 2B. All panels are shown in color except the panel for TIMM17B which is grayscale. This should be changed to make them look equal.

      All the western blot panels were changed to grayscale.

      (7) Discussion: 'Despite being involved in the import of the majority of the mitochondrial proteome, no study thus far characterized the effects of TIMM50 deficiency on the entire mitochondrial proteome.' This sentence is not correct as proteomic data were published previously, for example for Trypanosomes (PMID: 34517757) and human cells (PMID: 38828998).

      We have corrected the statement to “Despite being involved in the import of the majority of the mitochondrial proteome, little is known about the effects of TIMM50 deficiency on the entire mitochondrial proteome.”

      (8) A recent study on a very similar topic was published by Diana Stojanovki's group that needs to be cited: PMID: 38828998. The results of this comprehensive study also need to be discussed!!!

      We have added the following in the discussion:

      Line 362 – “These observations are similar to the recent analysis of patient-derived fibroblasts which demonstrated that TIMM50 mutations lead to severe deficiency in the level of TIMM50 protein (6,7). Notably, this decrease in TIMM50 was accompanied with a decrease in the level of other two core subunits, TIMM23 and TIMM17. However, unexpectedly, proteomics analysis in our study and that conducted by Crameri et al., 2024 indicate that steady state levels of most TIM23-dependent proteins are not affected despite a drastic decrease in the levels of the TIM23CORE complex (7). The most affected proteins constitute of intricate complexes, such as OXPHOS and MRP machineries. Thus, both these studies indicate a surprising possibility that even reduced levels of the TIM23CORE components are sufficient for maintaining the steady state levels of most presequence containing substrates.

      (1) Homberg B, Rehling P, Cruz-Zaragoza LD. The multifaceted mitochondrial OXA insertase. Trends Cell Biol. 2023;33(9):765–72.

      (2) Carroll J, Altman MC, Fearnley IM, Walker JE. Identification of membrane proteins by tandem mass spectrometry of protein ions. Proc Natl Acad Sci U S A. 2007;104(36):14330–5.

      (3) Ting SY, Schilke BA, Hayashi M, Craig EA. Architecture of the TIM23 inner mitochondrial translocon and interactions with the matrix import motor. J Biol Chem [Internet]. 2014;289(41):28689–96. Available from: http://dx.doi.org/10.1074/jbc.M114.588152

      (4) Trefts E, Shaw RJ. AMPK: restoring metabolic homeostasis over space and time. Mol Cell [Internet]. 2021;81(18):3677–90. Available from: https://doi.org/10.1016/j.molcel.2021.08.015

      (5) Zhang S, Hulver MW, McMillan RP, Cline MA, Gilbert ER. The pivotal role of pyruvate dehydrogenase kinases in metabolic flexibility. Nutr Metab. 2014;11(1):1–9.

      (6) Reyes A, Melchionda L, Burlina A, Robinson AJ, Ghezzi D, Zeviani M.  Mutations in TIMM50 compromise cell survival in OxPhos‐dependent metabolic conditions . EMBO Mol Med. 2018;

      (7) Crameri JJ, Palmer CS, Stait T, Jackson TD, Lynch M, Sinclair A, et al. Reduced Protein Import via TIM23 SORT Drives Disease Pathology in TIMM50-Associated Mitochondrial Disease. Mol Cell Biol [Internet]. 2024;0(0):1–19. Available from: https://doi.org/10.1080/10985549.2024.2353652

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):  

      Summary:

      In this study, Setogawa et al. employ an auditory discrimination task in freely moving rats, coupled with small animal imaging, electrophysiological recordings, and pharmacological inhibition/lesioning experiments to better understand the role of two striatal subregions: the anterior Dorsal Lateral Striatum (aDLS) and the posterior Ventrolateral Striatum (pVLS), during auditory discrimination learning. Attempting to better understand the contribution of different striatal subregions to sensory discrimination learning strikes me as a highly relevant and timely question, and the data presented in this study are certainly of major interest to the field. The authors have set up a robust behavioral task and systematically tackled the question about a striatal role in learning with multiple observational and manipulative techniques. Additionally, the structured approach the authors take by using neuroimaging to inform their pharmacological manipulation experiments and electrophysiological recordings is a strength.

      However, the results as they are currently presented are not easy to follow and could use some restructuring, especially the electrophysiology. Also, the main conclusion that the authors draw from the data, that aDLS and pVLS contribute to different phases of discrimination learning and influence the animal's response strategy in different ways, is not strongly supported by the data and deserves some additional caveats and limitations of the study in the discussion. 

      We appreciate the reviewer’s valuable feedback, which has been beneficial for improvement of our manuscript. In response to the reviewer’s comments, we have revised multiple parts of the manuscript, including explanations of electrophysiological data. We have also provided additional data to support our main conclusion and addressed caveats and limitations related to the data in the Discussion section. For more details, please refer to the responses to each comment.

      Comment 1: The authors have rigorously used PET neuroimaging, which is an interesting noninvasive method to track brain activity during behavioral states. However, in the case of a freely moving behavior where the scans are performed ~30 minutes after the behavioral task, it is unclear what conclusions can be drawn about task-specific brain activity. The study hinges on the neuroimaging findings that both areas of the lateral striatum (aDLS and pVLS) show increased activity during acquisition, but the DMS shows a reduction in activity during the late stages of behavior, and some of these findings are later validated with complementary experiments. However, the limitations of this technique can be further elaborated on in the discussion and the conclusions.

      As described in our response to the following two comments (a, b) from the reviewer, in the PET imaging study we first analyzed task-related activity by comparing <sup>18</sup>F-FDG uptake on different days of the auditory discrimination task with that on Day 4 of the single lever press task as a control. Next, we analyzed learning-dependent activity by comparing the uptake on different days of the discrimination task with that on Day 2 of the same task. Based on the results of both analyses, we concluded that the activity in the striatal subregions changes during the progress of discrimination learning. The behavioral significance of striatal subregions was tested by excitotoxic lesion and pharmacological blockade experiments. The explanation of imaging data analysis may have been insufficient to fully communicate dynamic changes in the activity of striatal subregions. Therefore, we have clarified our voxel-based statistical parametric analysis method to better explain the dynamic activity changes in the striatal subregions. Please refer to the following responses to comments 1 (a, b).

      Comment 1 (a): In commenting on the unilateral shifts in brain striatal activity during behavior, the authors use the single lever task as a control, where many variables affecting neuronal activity might be different than in the discriminatory task. The study might be better served using Day 2 measurements as a control against which to compare activity of all other sessions since the task structures are similar.

      We initially analyzed task-related activity by comparing <sup>18</sup>F-FDG uptake on one of Days 2, 6, 10, or 24 of auditory discrimination task with that on Day 4 of the single lever press task. This task was used as a control that does not require a decision process based on the auditory stimulus. We observed significant increases in the activity of the unilateral aDLS on Day 6 and in that of the bilateral pVLS on Day 10 of the discrimination task. We also observed a significant decrease in the unilateral DMS on Day 24 (see Figures 2F and 2G). Next, as suggested, we compared the uptake on one of Days 6, 10, or 24 with that on Day 2 as a control to evaluate learning-dependent activity. The activity showed significant increases in the bilateral aDLS on Day 6 and in the unilateral pVLS on Day 10, and a significant decrease in the bilateral DMS on Day 24 (see Figures 2H). 

      The reviewer has suggested a discrepancy in the activity of the unilateral or bilateral striatal subregions under certain conditions between the image data (shown in Figures 2F–H) and plot data (Figures 2J–L). This discrepancy is also suggested in the following Comment 1 (b). For example, in the image data the brain activity was increased in the unilateral (left) aDLS on Day 6 of the discrimination task as compared to Day 4 of the single lever task (Figure 2F). In the plot data, <sup>18</sup>F-FDG uptake reached a peak on Day 6 in both the left and right sides of the aDLS (Figure 2J), and the uptake in the left aDLS on Day 6 significantly increased relative to the value of the single lever press, whereas the value in the right aDLS on Day 6 tended to increase relative to that of the single lever press with no significant difference. The plot data showing the unilaterality in the aDLS activation relative to the single lever press are consistent with the image data. On the other hand, the <sup>18</sup>F-FDG uptake in the aDLS on Day 6 compared to the value on Day 2 was significantly increased in both sides. Similar observations were made in the activity in the pVLS on Day 10 compared to that on Day 2, as well as in the DMS activity on Day 24 relative to that of the single lever press. 

      Our analysis of both task-related and learning-dependent activities revealed dynamic changes in striatal subregions during discrimination learning. We investigated the brain regions in which <sup>18</sup>F-FDG uptake significantly increased or decreased during the learning processes, applying a statistical significance threshold (p < 0.001, uncorrected) and an extent threshold, by using a voxel-based statistical parametric analysis. In the image data, the voxels showing significant differences between two conditions are visualized on the brain template. The plot data show the amount of <sup>18</sup>FFDG uptake in the voxels, which was detected by the voxel-based analysis. The insufficient explanation of the data analysis of PET imaging in the initial manuscript may have led to a misunderstanding regarding the activity in the unilateral or bilateral striatal subregions. Therefore, we have revised the explanation for voxel-based statistical parametric analysis, adding a more detailed description of the thresholds in the text (page 7, lines 143–145) and Methods (page 27, lines 672–675).

      Comment 1 (b): From the plots in J, K, and L, it seems that shifts in activity in the different substructures are not unilateral but consistently bilateral, in contrast to what is mentioned in the text. Possibly the text reflects comparisons to the single lever task, and here again, I would emphasize comparing within the same task.

      Please see our response to the first comment (a) regarding our explanation of the consistency in the activity of the unilateral or bilateral striatal subregions between the image and plot data. We have also revised the explanation in the corresponding sections of the manuscript, as described above.

      Comment 2: In Figure 2, the authors present compelling data that chronic excitotoxic lesions with ibotenic acid in the aDLS, pVLS, and DMS produce differential effects on discrimination learning. However, the significant reduction in success rate of performance happens as early as Day 6 in both IBO groups in both aDLS and pVLS mice. This would seem to agree with conclusions drawn about the role of aDLS in the middle stages of learning in Figure 2, but not the pVLS, which only shows an increased activity during the late stages of the behavior.  

      Figure 3 shows the behavioral effects of ibotenic acid injections into striatal subregions in rats. For the aDLS injection, we performed two-way repeated ANOVA, which revealed a significant main effect of group or day and a significant interaction of group × day, and added the simple main effects between the treatments to the figure (Figure 3G). We observed significant differences in the success rate mainly at the middle stage of learning. In contrast, for the pVLS injection there was no significant interaction for group × day, although the main effects of group or day was significant by two-way repeated ANOVA (Figure 3H). Consequently, it was unclear as to when exactly the significant reduction occurred. These results indicate that the aDLS and pVLS are necessary for the acquisition of auditory discrimination, and that the aDLS is mainly required for the middle stage. Similar results were observed in the win-shift-win strategy in the aDLS and pVLS (Figures 3J and 3K).

      Next, we performed temporal inhibition of neuronal activity in striatal subregions by muscimol treatment in order to examine whether the activity in the subregions is linked with learning processes at different stages. In this experiment, muscimol was injected into the aDLS or pVLS at the middle or late stage, and the resultant effects on the success rate were investigated. The success rate in the muscimol-injected groups into the aDLS significantly decreased at the middle stage, but not at the early and late stages (Figure 4C). In contrast, the rate in the muscimol groups into the pVLS significantly decreased at the late stage, but not at the early or middle stages (Figure 4D). The results indicate that the aDLS and pVLS are mainly involved in the processes at the middle and late stages, respectively, and support the PET imaging data showing the activation of two striatal subregions at the various stages.

      We have now provided the results of simple main effects analysis for the aDLS lesion (Figures 3G and 3J) and revised the description of the Results section (page 8, lines 174–178, page 8, lines 186–188, and page 9, line 205-206) and Figure legend (page 44, lines 1000‒1003, and page 44, lines 1010–1013). We have also added the results of simple main effects analysis in Figure 3J.

      Comment 3: In Figure 4, the authors show interesting data with transient inactivation of subregions of the striatum with muscimol, validating their findings that the aDLS mediates the middle and the pVLS the late stages of learning, and the function of each area serves different strategies. However, the inference that aDLS inactivation suppresses the WSW strategy "moderately" is not reflected in the formal statistical value p=0.06. While there still may be a subtle effect, the authors would need to revise their conclusions appropriately to reflect the data. In addition, the authors could try a direct comparison between the success rate during muscimol inhibition in the mid-learning session between the aDLS and pVLS-treated groups in Figure 4C (middle) and 4D (middle). If this comparison is not significant, the authors should be careful to claim that inhibition of these two areas differentially affects behavior.

      In Figure 4E, aDLS inhibition showed a tendency to reduce slightly win-shift-win strategy at the middle stage (t[14] = 2.038, p = 0.061, unpaired Student’s t-test). In accordance with the reviewer’s comment, we changed the word “moderate” to “subtle” (page 12, line 272).

      In the temporal inhibition of the striatal subregions, the aDLS and pVLS experiments (panels C and D, respectively) were conducted separately. Since it is difficult to directly compare the data obtained from different experiments, we did not carry out a direct comparison of the success rate between the aDLS and pVLS injections. 

      Comment 4: The authors have used in vivo electrophysiological techniques to systematically investigate the roles of the aDLS and the pVLS in discriminatory learning, and have done a thorough analysis of responses with each phase of behavior over the course of learning. This is a commendable and extremely informative dataset and is a strength of the study. However, the result could be better organized following the sequence of events of the behavioral task to give the reader an easier structure to follow. Ideally, this would involve an individual figure to compare the responses in both areas to Cue, Lever Press, Reward Sound, and First Lick (in this order).

      We first showed changes in the proportion of event-related neurons during the acquisition phase (Figure S5). Next, we conducted a detailed analysis of the characteristics of aDLS and pVLS neuronal activity. Specifically, we found several types of event-related neurons, including: (1) reward sound-related neurons representing behavioral outcomes in the aDLS; (2) first licking-related neurons showing sustained activity after the reward in the aDLS and pVLS; and (3) cue-onset and cue-response neurons associated with the beginning and ending of a behavior in the pVLS.

      Descriptions of the characteristics of event-related neurons according to the sequence of events in a trial, as the reviewer has suggested, is another way to provide an easy structure for understandings on the electrophysiological data. However, we focused on the characteristics of aDLS neurons at the middle stage and pVLS neurons at the late stage of discrimination learning. Therefore, we explained the electrophysiological data based on the order of learning stages rather than the sequence of events in the trial, as described above.

      Comment 5: An important conceptual point presented in the study is that the aDLS neurons, with learning, show a reduction in firing rates and responsiveness to the first lick as well as the behavioral outcome, and don't play a role in other task-related events such as cue onset. However, the neuroimaging data in Figure 2 seems to suggest a transient enhancement of aDLS activity in the mid-stage of discriminatory learning, that is not reflected in the electrophysiology data. Is there an explanation for this difference?

      In the <sup>18</sup>F-FDG PET imaging study, the brain activity in the aDLS reached a peak at the middle stage of the acquisition phase of auditory discrimination (Figure 2J). In the multi-unit electrophysiological recording experiment, the firing activity of the aDLS neuron subpopulations related to the behavioral outcome showed no significant differences among the three stages (Figure 5E), while the proportion of these subpopulations were gradually reduced through the progress of learning stages (Figure 5F). The extent of the firing activity and length of the firing period of other subpopulations showing sustained activation after the reward appeared to show a learning-dependent decrease (Figures 6B and 6C), although the proportion of these subpopulations indicated no correlation with the progress of the learning (Figure 6D). Patterns of the temporal changes in brain activity in striatal subregions across the learning stages did not match completely the time variation in the property or proportion of specific event-related neurons. In our electrophysiological analysis, we identified well-isolated neurons from the striatal subregions during the auditory discrimination task, focusing on putative medium spiny neurons (Figures S4E–S4G). Based on the combinatorial pattern of the tone instruction cue (high tone/H or low tone /L), and lever press (right/R or left/L), we categorized the electrophysiological data into the four trials, including the HR, LL. LR, and HL. We identified HR or LL type neurons showing significant changes in the firing rate related to specific events, such as cue onset, choice response, reward sound, and first licking compared to the baseline firing rate. These neurons were further divided into two groups with increased or decreased activity relative to the baseline firing (Figures S5A and S5B). In the present study, we focused on event-related neurons with increased activity. Because of the analysis limited to neuronal subpopulations related to specific events with the increased activity, it is difficult to fully explain dynamic shifts in the brain activity of striatal subregions dependent on the progress of learning by the time variation of firing activity of individual event-related neurons. The activity of other subpopulations in the striatum may be involved in the shift in brain activity during the learning processes. In addition, recent studies have reported that the activity of glial cells influences the uptake of <sup>18</sup>FFDG (Zimmer et al., Nat Neurosci., 2017) and that these cells regulate spike timingdependent plasticity (Valtcheva and Venance, Nat Commun, 2016). Changes in glial cellular activity, through the control of synaptic plasticity, may partly contribute to the pattern formation of learning-dependent shifts in brain activity.

      To explain the difference in the time course between the brain activity and the firing activity of specific event-related neurons, we have added the aforementioned information to the Limitations section (pages 21 to 22, lines 512–539). 

      Comment 6: A significant finding of the study is that CO-HR and CO-LL responses are strikingly obvious in the pVLS, but not in the aDLS, in line with the literature that the posterior (sensory) striatum processes sound. This study also shows that responses to the highfrequency tone indicating a correct right-lever choice increase with learning in contrast to the low-frequency tone responses. To further address whether this difference arises from the task contingency, and not from the frequency representation of the pVLS, an important control would be to switch the cue-response association in a separate group of mice, such that high-frequency tones require a left lever press and vice versa. This would also help tease apart task-evoked responses in the aDLS, as I am given to understand all the recording sites were in the left striatum.

      We did not conduct an experiment switching cue-response association in the auditory discrimination task. However, the transient activity of cue onset-related neurons in the pVLS, as the reviewer has suggested, did not appear at the early stage of learning, but was observed in a learning-dependent manner (Figures 7A and S8E). In addition, the cue onset-HR activity showed a slight but notable difference between the HR and LL trials at the middle and late stages (Figure 7B), but there was no difference in activity in the HL and LR incorrect trials at the corresponding stages (Wilcoxon signed rank test; early, p = 0.375, middle, p = 0.931, and late, p = 0.668). These results suggest that the activity of cue onset-related neurons in the pVLS is associated with the stimulus and response association (task contingency) rather than the tone frequency.

      Reviewer #1 (Recommendations For The Authors):

      Minor comment 1: The readability and appeal of this study would be improved by explaining the various neuronal response types, and task-related events in slightly more detail in the results section, and minimizing the use of non-standard abbreviations wherever possible.

      As suggested, we have replaced the abbreviations related to electrophysiological events (CO, CR, RS, and FL) with the original terms, and improved the explanation for neuronal response types and event-related neurons. 

      Minor comment 2: It would be helpful to label DLS and VLS recordings more clearly on the figures instead of only in the figure caption.

      Thank you for pointing this out. The terms “aDLS” and “pVLS” have now been added to the panels showing firing pattern of neurons: “aDLS” in Figures 5D, 6A, S6A, S7A, S8A, S8B. S8C, and S8D; and “pVLS” in Figures 6F, 7A, 7D, S6D, S6E, S7F, S8E, and S8F.

      Minor comment 3: The authors suggest that aDLS HR- and LL- neurons are more sensitive to the behavioral outcome than those in pVLS (Fig 5 and S5). However, their conclusions are based on sample sizes as low as n=3 for each response type.

      We identified event-related neurons from single neurons detected in both the aDLS and pVLS using the same criteria. In the pVLS, we found a small number of neurons that increased their activity during the period when the reward sound is presented (Figures S6D and S6E) (6, 4, and 17 HR type neurons at the early, middle, and late stages, respectively; 3, 5, and 15 LL type neurons at the early, middle, and late stages, respectively). The number of LL type neurons at the early stage was particularly lower, as the reviewer has suggested. However, when we plotted the firing rates of these neurons around the event, their activity did not reflect behavioral outcome. In the aDLS, we detected a large number of reward sound-related neurons representing behavioral outcome (Figures 5 and S6A) (43, 37, and 44 HR type neurons at the early, middle, and late stages, respectively; 49, 62, and 59 LL type neurons at the early, middle, and late stages, respectively). These observations suggest that aDLS neurons are more sensitive to behavioral outcomes than pVLS neurons.

      Minor comment 4: Typo in Figure 4C and D, right plots, y-axis label: "subtracted".

      The typographic errors in Figures 4C–4H have now been corrected to “subtracted”.

      Reviewer #2 (Public Reviews):

      The study by Setogawa et al. aims to understand the role that different striatal subregions belonging to parallel brain circuits have in associative learning and discrimination learning (S-O-R and S-R tasks). Strengths of the study are the use of multiple methodologies to measure and manipulate brain activity in rats, from microPET imaging to excitotoxic lesions and multielectrode recordings across anterior dorsolateral (aDLS), posterior ventral lateral (pVLS)and dorsomedial (DMS) striatum. The main conclusions are that the aDLS promotes stimulus-response association and suppresses response-outcome associations. The pVLS is engaged in the formation and maintenance of the stimulus-response association. There is a lot of work done and some interesting findings however, the manuscript can be improved by clarifying the presentation and reasoning. The inclusion of important controls will enhance the rigor of the data interpretation and conclusions.

      We appreciate the reviewer’s valuable feedback, which has been beneficial in our endeavor to improve our manuscript. In response to the comments, we have revised the description of the experimental methods and underlying rationale, as well as the Results section. We have also provided additional data for some of the experiments that support the conclusions. For more details, please refer to the responses to each comment, included below.

      Reviewer #2 (Recommendations For The Authors):

      Comment 1: Generally, the manuscript is hard to read because of the cumbersome sentence structure, overuse of poorly defined acronyms, and lack of clarity on the methods used.

      According to the following comments (a)–(d), we have revised the corresponding text in the manuscript to clarify the sentence structure, definitions of terms, and methodology. 

      Comment 1 (a): For example, the single lever task used as a control for the auditory discrimination task could be introduced better, explaining the reasoning and the strategy for subtracting it from the images obtained during the discrimination phase at the start of the section.

      We analyzed task-related activity by comparing <sup>18</sup>F-FDG uptake on Days 2, 6, 10, or 24 of auditory discrimination task with that on Day 4 of the single lever press task. This task was used as a control that does not require a decision process based on the auditory stimulus. For clarification, we have provided a more detailed explanation of the flow of the single lever press task used in the PET experiment, including the rationale for employing this task as a control (page 6, lines 129–135). We have also revised the explanation of voxel-based statistical parametric analysis, adding a more detailed description of the thresholds (page 7, lines 143–145).

      Comment 1 (b): Another example is that important methodological information is buried deep in the text and complicates the interpretation of the results.

      We have revised the following sentences in the manuscript in order to provide clearer methodological information.

      (1) As described above, explanations for the single lever task (page 6, lines 129–135) and voxel-based statistical parametric analysis were added (page 7, lines 143–145). 

      (2) Definition of the early, middle, and late stages were described in the initial behavioral experiment (page 6, lines 113–119). 

      (3) Abbreviations related to behavioral strategies (WSW and LSL) and electrophysiological events (CO, CR, RS, and FL) were replaced with the original terms. 

      Comment 1 (c): The specie being studied is not stated in the abstract, nor the introduction, and only in the middle of the result section. Please include the specie in the abstract and the first part of the result also for clarity.

      We included the name of the species (rats) in the Abstract (page 3, line 47), at the end of the Introduction (page 5, lines 87–88) and at the beginning of the Results (page 5, line 109).

      Comment 1 (d): The last part of the intro is copied/pasted from the abstract. Please revise.

      The last part of the Introduction was revised accordingly (page 5, lines 97–104).

      Comment 2: The glucose microPET imaging is carried out 30 mins after the rats performed the task and it is expected to capture activation during the task. Is this correct? This assumption has to be validated with an experiment, which is a control showing a validation of the microPET approach used, and this way can report activation of brain areas during the task completed 20-30 minutes before. For example, V1 or A1 would be a control that we would expect to be activated during the task.

      Our PET experiment was conducted in accordance with previously established methods (Cui et al, Neuroimage, 2015), where rats received intravenous administration of <sup>18</sup>FFDG solution just before the start of the behavioral session, which lasted for 30 min. The <sup>18</sup>F-FDG uptake in the brain starts immediately and reaches the maximum level until 30 min after the administration, and the level is kept at least for 1 h (Mizuma et al., J Nucl Med, 2010). The rats were returned to their home cages, and a 30-min PET scan started 25 min after the session. The start time of the scan was chosen to allow for sufficient reduction of 18F radioactivity in arterial blood to increase the S/N ratio of the radioactivity (Mizuma et al., J Nucl Med, 2010). As shown in Table S1, we confirmed that the brain activity in the medial geniculate body (auditory thalamus) was increased on Days 6 and 10 in the acquisition phase, although the activity in the auditory cortex was not changed, which is consistent with the results of a previous study reporting that the auditory cortex does not show the causality for the pure-tone discrimination task (Gimenez et al., J Neurophysiol., 2015).

      Comment 3: Why are Days 2, 6, 10, and 13 chosen and compared for the behavior? Why aren't these the same days chosen in the other part of the study? It is unclear why authors focused on these days and why the focus changed later.

      We conducted daily training of the discrimination task. The success rate reached a plateau on Day 13 and was maintained until Day 24 (Figure 1B). Based on these results, we categorized the learning processes into the acquisition and learned phases, and then divided the acquisition phase into the early (< 60%), middle (60–80%), and late (> 80%) stages. In the PET experiment, we selected Days 2, 6, and 10 as the representatives of each stage during the acquisition phase. In addition, we also selected Day 24 for the learned phase.  However, no scan was performed on Day 13 due to the transition between the two phases.   

      Comment 4: (A) Is the learning and acquisition of the single lever press and discrimination task completed by day 4? Or are rats still learning? The authors claimed no changes in DMS activity between single lever press & discrimination, and therefore DMS isn't involved in learning. But to make this claim we should have measures that the learning has already happened, which I am not sure have been provided. (B) On this same point, the DMS activity is elevated on Day 4 of a single lever press compared to the aDLS and pVLS. So is it possible that the activity in DMS was already elevated on Day 4 of single lever press training? Especially given that DMS is supposedly involved in goal-directed behavior?

      (A) In the single lever press task, the number of lever presses plateaued on Day 2 (Figure 1C). In addition, we analyzed response time and its variability, which plateaued from Day 3 and Day 2, respectively (see Author response image 1). These results indicate that the learning in the task was completed by Day 4. In the auditory discrimination task, Day 4 corresponded to the transition period from the early-tomiddle stages of the acquisition phase, suggesting that learning was still progressing. 

      In the imaging analysis, we examined task-related activity by comparing <sup>18</sup>F-FDG uptake on either day of the discrimination task with that on Day 4 of the single lever press task, and did not find any changes in the brain activity in the DMS. In addition, we investigated learning-related activity, and the DMS activity did not change during acquisition phase. These results suggest that the DMS is not involved in the acquisition phase of learning. Furthermore, comparisons between Days 10 and Day 24 showed a decrease in DMS activity during the learned phase, suggesting that DMS activity was downregulated during the learned phase. In addition, chronic lesion in the DMS indicated that the success rate in the discrimination task was comparable between the control and lesioned groups (Figure 3I), whereas the response time lengthened throughout the learning in the lesioned group compared to the controls (Figure S1C). These results support our notion that the DMS contributes to the execution, but not learning, of discriminative behavior (Figure 3I and S1C).

      Author response image 1.

      Performance of single lever press task conducted before auditory discrimination task. (A) Number of lever presses. (B) Response time (Kruskal-Wallis test, χ<sup>2</sup> = 38.063, p = 2.7 × 10<sup>-8</sup>, post hoc Tukey–Kramer test, p = 0.047 for Day 1 vs. Day 2; p = 2.3 × 10<sup>-7</sup> for Day 1 vs. Day 3; and p = 4.0 × 10<sup>-6</sup> for Day 1 vs. Day 4; p = 0.019 for Day 2 vs. Day 3; p = 0.082 for Day 2 vs. Day 4; p = 0.951 for Day 3 vs. Day 4). (C) Response time variability (Kruskal-Wallis test, χ<sup>2</sup> = 28.929, p = 2.3 × <sup>-6</sup>, post hoc Tukey–Kramer test, p = 0.077 for Day 1 vs. Day 2; p = 5.7 × 10<sup>-6</sup> for Day 1 vs. Day 3; and p = 1.3 × 10<sup>-4</sup> for Day 1 vs. Day 4; p = 0.060 for Day 2 vs. Day 3; p = 0.253 for Day 2 vs. Day 4; p = 0.912 for Day 3 vs. Day 4). Data obtained from the task shown in Figure 2C are plotted as the median and quartiles with the maximal and minimal values. *p < 0.05, **p < 0.01, and ***p < 0.001.

      (B) We compared <sup>18</sup>F-FDG uptakes among striatal subregions on Day 4 of the single lever press task (334.8 ± 2.86, 299.0 ± 1.71, and 336.8 ± 2.18 for the aDLS, pVLS, and DMS, respectively; one-way ANOVA, F[2,41] = 104.767, p = 2.1 × 10<sup>-16</sup>). The uptake was comparable between the aDLS and DMS (post hoc Tukey-Kramer test, p = 0.058), but it was significantly lower in the pVLS compared to either of the other two subregions (post hoc Tukey-Kramer test, aDLS vs. pVLS, p = 5.1 × 10<sup>-9</sup>, post hoc Tukey-Kramer test, pVLS vs. DMS, p = 5.1 × 10<sup>-9</sup>). However, since we did not measure the brain activity in the single lever task outside of Day 4, it is unclear whether there was an increase in DMS activity during the acquisition of the task. Similarly, since we did not confirm the behavioral modes, which include goal-directed and habitual actions, it is difficult to conclude that the lever presses in the task were controlled by the goaldirected mode. However, our chronic lesion experiment suggests that the DMS is involved in the execution of discrimination behavior (Figure S1C). A clearer understanding of the DMS function in discrimination learning is an important challenge in the future.

      Comment 5: It seems like the procedure of microPET imaging affects performance on the task. The anesthesia used maybe? Figures 2C and D show evidence that the behavior was negatively affected on the days on which microPET imaging was performed after the training. Can the author clarify/comment?

      Isoflurane anesthesia may slightly reduce behavioral performance. We carried out anesthesia (median [interquartile range]: 6 [5–8] min) during the insertion of the catheter for FDG injection, and set a recovery period of at least 2 h until the beginning of the behavioral session, to minimize the impact of anesthesia. The performances in Figure 2E were similar to those in the intact rats (compared to Figures 1C–1F), suggesting that the procedure for PET scans does not affect the acquisition of discrimination. 

      We have added detailed information on the isoflurane anesthesia to the Methods section (page 26, lines 649–653).

      Comment 6: More on clarity. Section 3 of the results (muscimol inactivation) refers a lot to "the behavioral strategies" without really clarifying what these are - are they referring to WSW / LSL (which also could use a better introduction) or goal-directed/habitual or stimulus-response/stimulus-outcome?

      The dorsal striatum is involved in both behavioral strategies based on stimulus-response association and the response-outcome association during instrumental learning. To assess the impact of striatal lesions on the behavioral strategies, we analyzed the proportion of response attributed to two strategies in all responses of each session. One is the “win-shift-win” strategy, which is considered to reflect the behavioral strategy based on the stimulus-response association. In this strategy, after a correct response in the previous trial, the rats press the opposite lever in the current trial in response to a shift of the instruction cue, resulting in the correct response.  Another strategy is the “lose-shift-lose” strategy, which is considered to appear as a consequence of the behavioral strategy based on the response-outcome association. In this strategy, after an error response in the previous trial, the rats press the opposite lever in the current trial despite a shift of the instruction cue, leading to another error response.

      We have revised the explanations of the behavioral strategies in the section of the Results section (page 9, lines 192–201). 

      Comment 7: Related to WSW / LSL needing a better introduction, on lines 192/193 authors describe a result where they saw the WSW and LSL strategies increase and decrease, respectively, in saline-injected mice. Is the change in performance expected or an undesired effect of the saline injection? This is not clear now and it should be clarified.

      The explanations of the win-shift-win and lose-shift-lose strategies have been revised in the Results section on excitotoxic lesion experiment (page 9, lines 192–201) as described in our response to Comment 6. Win-shift-win is an indicator of correct responses, while lose-shift-lose indicates errors. Therefore, win-shift-win is predicted to increase, and lose-shift-lose decrease, as discrimination learning progresses. Indeed, in the results of the behavioral experiments, shown in Figure 1, both indicators change in a similar pattern to those in the results of the lesion experiments (Figure 3).

      We have added the explanation of the proportions of both strategies in intact rats (page 9, lines 203–204) with a supplementary figure (Figure S2) and accompanying legend (page 56, lines 1173–1177).

      Comment 8: Muscimol experiments - two questions/comments. How often do rats receive muscimol?

      In this section, muscimol is given on day 2 and on days after the animals hit a 60% or 80% success rate. Can the authors provide a mean and SEM for when are those injections?

      The first injection was conducted on Day 2 to target the early stage. The second and third injections were conducted on the days after the success rate had reached 60% and 80% for the first time through the training, respectively, to target the middle and late stage. respectively. These conditions are described in the Results (page 10, lines 234– 237) and Methods (page 26, lines 633–636). The mean and s.e.m. of the injection day at the middle and late stages were not significantly different between the saline and muscimol-injected groups into the aDLS (see Author response image 2A) and pVLS (see Author response image 2B).

      Author response image 2.

      Injection days during auditory discrimination learning. Injections with saline (SAL) and muscimol (MUS) into the aDLS (A) or pVLS (B) were performed after the success rate had reached 60% (middle stage) and 80% (late stage) for the first time through the training, respectively (A, Wilcoxon signed rank test, middle, Z = 65, p = 0.772, late, Z = 56.5, p = 0.242 for the aDLS; B, Wilcoxon signed rank test, middle, Z = 39, p = 1.000, late, Z = 43, p = 0.587). Data are indicated as the median and quartiles with the maximal and minimal values. 

      Comment 9: Muscimol experiments. Can the authors comment on the effects on performance vs learning? What happens on the days after Muscimol? Does performance bounce back or is it still impaired?

      We conducted a transient inhibition experiment with muscimol to examine whether the neuronal activity in the striatal subregions is linked with the processes at different stages. In this experiment, to lower the possibility that compensation of learning may occur during a session after the muscimol injection (Day N), we limited the session time to 15 min (45 trials) and evaluated the impact of the injection on the success rate at specific stages. The success rate in the muscimol-injected groups into the aDLS significantly decreased at the middle stage compared to the corresponding salineinjected groups, but not at the early and late stages (Figure 4C), and the rate in the muscimol groups into the pVLS significantly decreased at the late stage compared with the respective saline groups, but not at the early and middle stages (Figure 4D). Our results demonstrated that the aDLS and pVLS mainly function at the middle and late stages of the auditory discrimination task, respectively. 

      In addition, we here reply to comment 10 as for the comparison of success rates before (Day N-1) and after (Day N+1) the injections (see Author response image 3). We focused on two injections into the aDLS at the middle stage and into the pVLS at the late stage, in which the rate was reduced soon after the muscimol injection on Day N. The success rate for the two injections showed no significant main effect regarding group (saline/muscimol) or day (Days N-1/N+1) and no significant interactions for group × day. Moreover, the success rate was not significantly increased on Day N+1 as compared to Day N-1, even in the saline-injected control group, probably because of the limited session time soon after the injection. Therefore, we consider that it was difficult to define the effects of drug injection on the learning of auditory discrimination in our behavioral protocol for the transient inhibition experiment, and that the reduced rates observed in the muscimol-injected group on Day N mostly reflect the impacts of muscimol at least partly on the performance of discriminative behavior. 

      Author response image 3.

      Comparison of success rate between days before (Day N1) and after (Day N+1) the injections into striatal subregions. Success rate in the saline (SAL)- and muscimol (MUS)-injected groups into the aDLS (A) or pVLS (B) at the early, middle, and late stages of auditory discrimination learning (two-way repeated ANOVA; early, day, F[1,14] = 5.266, p = 0.038, group, F[1,14] = 0.276, p = 0.608, day × group, F[1,14] = 0.118, p = 0.736; middle, day, F[1,14] = 4.110, p = 0.062, group, F[1,14] = 0.056, p = 0.816, day × group, F[1,14] = 1.150, p = 0.302; late, day, F[1,14] = 6.408, p = 0.024, group, F[1,14] = 0.229, p = 0.640, day × group, F[1,14] = 1.277, p = 0.278 for the aDLS; and early, day, F[1,10] = 0.115, p = 0.746, group, F[1,10] = 2.414, p = 0.151, day × group, F[1,10] = 0.157, p = 0.700; middle, day, F[1,10] = 0.278, p = 0.610, group, F[1,10] = 0.511, p = 0.491, day × group, F[1,10] = 4.144, p = 0.069; late, day, F[1,10] = 0.151, p = 0.705, group, F[1,10] = 0.719, p = 0.416, day × group, F[1,10] = 0.717, p = 0.417 for the pVLS). Data are indicated as the mean ± s.e.m.

      Comment 10: Muscimol data has a pair before and after, can the authors show this comparison at early, middle, and late training? Not just the subtraction.

      The comparison of success rates before and after drug injection is shown in Author response image 3.

      Comment 11: Ephys recordings. These are complex figures and include a large number of acronyms. It would help to define them again and help the reader through these figures so the reader can focus on understanding the finding more than the figure presentation.

      We replaced the abbreviations related to electrophysiological events (CO, CR, RS, and FL) with the original terms, and improved the explanation in the text and figures. 

      Comment 12: Figure 7B/E - on correct trials, they see a difference in the cue response to high tone / low tone but no difference in the choice. This is the one that seemed like a topography issue.

      The transient activity of cue onset-related neurons in the pVLS did not appear at the early stage of learning, but was observed in a learning-dependent manner (Figures 7A and S8E). In addition, the cue onset-HR activity showed a slight but notable difference between the HR and LL trials at the middle and late stages (Figure 7B), whereas there was no difference between activities in the HL and LR incorrect trials at the corresponding stages (Wilcoxon signed rank test; early, p = 0.375, middle, p = 0.931, and late, p = 0.668). These results suggest that the cue onset-related neurons in the pVLS represents the stimulus and response association (task contingency) rather than the topography of tone frequency.

      Comment 13: Animals were normally trained for 60 minutes but on muscimol days only trained for 15 mins. On PET days only trained for 30 minutes. Ephys sessions were 60 mins. Is this correct? Why?

      We determined the session time for each experiment by considering both technical and behavioral aspects. In the initial behavioral experiment, the session time was set to 60 min per day. Under this condition, the rats acquired the discrimination learning within 13 days. In the imaging experiment, the session without a PET scan was conducted for 60 min, while the session with a PET scan was carried out for 30 min as described previously (Cui et al, Neuroimage, 2015). This time schedule produced a learning curve similar to that of the initial behavioral experiment. In the transient inhibition experiment, the sessions without drug injections lasted for 60 min. As described in our response to the comment 2, the time of the session soon after the injection was limited to 15 min to lower the possibility of compensation of learning during the session. In the chronic lesion and electrophysiological experiments, all sessions were conducted for 60 min, corresponding to the initial experiment. 

      References

      Mizuma, H., Shukuri, M., Hayashi, T., Watanabe, Y. & Onoe, H. Establishment of in vivo brain imaging method in conscious mice. Journal of Nuclear Medicine 51, 10681075 (2010).

      Cui, Y., et al. A voxel-based analysis of brain activity in high-order trigeminal pathway in the rat induced by cortical spreading depression. Neuroimage 108, 17-22 (2015).

      Zimmer, E.R., et al. [18 F] FDG PET signal is driven by astroglial glutamate transport. Nat Neurosci 20, 393-395 (2017).

      Valtcheva, S. & Venance, L. Astrocytes gate Hebbian synaptic plasticity in the striatum. Nature communications 7, 13845 (2016).

      Gimenez T.L., Lorenc M., Jaramillo S. Adaptive categorization of sound frequency does not require the auditory cortex in rats. J Neurophysiol 114:1137-1145 (2015).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      Below I summarize points that should be addressed in a revised version of the manuscript.

      • Page 6, first paragraph: I don't understand by the signals average out to a single state. If the distribution is indeed randomly distributed, a broad signal with low intensity should be present.

      We agree that this statement may cause confusion. We changed the text (marked in bold) to clarify the statement: The mobility of the undocked SBDs will be higher than the diffusion of the whole complex, allowing the sampling of varying interdomain distances within a single burst. However, these dynamic variations are subsequently averaged to a singular FRET value during FRET calculations for each burst, and may appear as a single low FRET state in the histograms.

      • Page 6, third paragraph: how can the donor only be detected in the acceptor channel? Is this tailing out?

      Donor only signal is not detected in the acceptor channel. As described in page 5 and in the Materials & Methods section, the dye stoichiometry value is defined for each burst/dwell using three types of photon counts: donor-based donor emission (FDD), donor-based acceptor emission (FDA) and acceptorbased acceptor emission (FAA).

      When no acceptor fluorophore is present FAA=0 and S=1.

      Some donor photons bleed through into the acceptor channel, but we correct for this by calculating the leakage and crosstalk factors as described in the Materials and Methods (page 20).

      We changed the text (marked in bold) in the manuscript to address the question: The FRET data of both OpuA variants is best explained by a four-state model (Figure 2A,B; fourth and fifth panel) (Supplementary File 3). Two of the four states represent donor-only (S≈1) or acceptor-only (S≈0) dwells. The full bursts belonging to donor-only and acceptor-only molecules were excluded prior to mpH2MM. This means that some molecules transit to a donor-only or acceptor-only state within the burst period, which most likely reflects blinking or bleaching of one of the fluorophores. These donoronly and acceptor-only states were also excluded during further analysis. The other two states reflect genuine FRET dwells that were analyzed by mpH2MM. They represent different conformations of the SBDs.

      • Page 7, "SBD dynamics ..": why was the V149Q mutant only analyzed in the K521C background and not also in the N414C background?

      The two FRET states were best distinguished in OpuA-K521C. Therefore, we decided to focus on OpuA-K521C and not OpuA-N414C. OpuA-V149Q was used to show that reduced docking efficiency does not affect the transition rate constants and relative abundances of the two FRET states, and we regarded it sufficient to test the SBD dynamics in OpuA-K521C only.

      • Page 8, second paragraph: why was the N414C mutant analyzed only from 0 - 600 mM and not also up to 1000 mM?

      In line with the previous answer, our main focus was on OpuA-K521C, since the two FRET states were best distinguished in OpuA-K521C. OpuA-N414C was used to prove that similar states are observed when measuring with fluorophores on the opposite site of the SBD. We studied how the FRET states change in response to different conditions that correspond to different stages of the transport cycle and how it changes in response to different ionic strengths. Initially, 600 mM KCl was used to study the dynamics of the SBD at high ionic strength. Later in this study, we tested a very wide range of different salt concentrations for OpuA-K521C to get detailed insights into the dynamics of the SBDs over a wide ionic strength range. Note that 1 M KCl is a very high, non-physiological ionic strength for the typical habitat of L. lactis and was only used to show that the high FRET state occurs even under very extreme conditions.

      • Page 8, third paragraph: why was the dimer (if it is the source of the FRET signal) only partially disrupted?

      We acknowledge that this is a very good point. However, we purposely did not speculate on this point in the manuscript, because we have limited information on the molecular details of the interaction. As we highlight on page 8, the SBDs experience each other in a very high apparent concentration (millimolar range). This means that the interactions are most likely very weak (low affinity) and not very specific. Such interactions are in the literature referred to as the quinary structure of proteins and they occur at the high macromolecular crowding in the cell and in proteins with tethered domains, and thus at high local concentrations. Such interactions can be screened by high ionic strength. In the revised manuscript, we now present the partially disrupted dimer structure in the context of the quinary structure of a protein (page 11):

      In other words, the high FRET state may comprise an ensemble of weakly interacting states rather than a singular stable conformation, resembling the quinary structure of proteins. The quinary structure of proteins is typically revealed in highly crowded cellular environments and describes the weak interactions between protein surfaces that contribute to their stability, function, and spatial organization (Guin & Gruebele, 2019). Despite the current study being conducted under dilute conditions, the local concentration of SBDs (~4 mM) mimics a densely populated environment and reveal quinary structure.

      • Page 9, second paragraph: according to the EM data processing, only 20% of the particles were used for 3D reconstruction. Why? Does it mean that the remaining 80% were physiologically not relevant? If so, why were the 20% used relevant?

      We note that it is a fundamental part of image processing of single particle cryo-EM data to remove false positives or low-resolution particles throughout the processing workflow. In particular when using a very low and therefore generous threshold during automated particle picking, as we did (t=0.01 and t=0.05 for the 50 mM KCl and 100 mM KCl datasets, respectively), the initial set of particles includes a significant amount of false positives – a tradeoff to avoid excluding particles belonging to low populated classes/orientations. It is thus common that more than 50% of ‘particles’ are excluded in the first rounds of 2D classification. In our case, only 30% and 52% of particles were retained after such first clean-up steps. Subsequently, the particle set is further refined, and additional false positives and low-resolution particles are excluded during extensive rounds of 3D classification. We also note that during the final steps, most of the data excluded represents particles of lower quality that do not contribute to a high-resolution, or belong to low population protein conformations. This does not mean that such a population is not physiological relevant. In conclusion, having only 5-20% of the initial automated picked particles contributing to the reconstruction of the final cryo-EM map is common, with the vast majority of excluded particles being false positives.

      • Page 11, third paragraph: the way the proposed model is selected is also my main criticism. All alternative models do not fit the data. Therefore, the proposed model is suggested. However, I do not grasp any direct support for this model. Either I missed it or it is not presented.

      Concerning the specific model in Figure 5, the reviewer is correct. We do not provide direct evidence for a side-ways interaction. However, we have evidence of transient interactions and our data rule out several scenarios of interaction, leaving 5C as the most likely model. This is also the main conclusion of this paper: In conclusion, the SBDs of OpuA transiently interact in a docking competent conformation, explaining the cooperativity between the SBDs during transport. The conformation of this interaction is not fixed but differs substantially between different conditions.

      Because the interaction is very short-lived it was not possible to visualize molecular details of this interaction. We present Figure 5 to hypothesize the most likely type of interaction, since many possibilities can be excluded with the vast amount of presented data. To make our point more clear that we discuss models and rule out several possibilities but not demonstrate a specific interaction between the SBDs, we now write on page 10 (changes marked in bold): We have shown that the SBDs of OpuA come close together in a short-lived state, which is responsive to the addition of glycine betaine (Figure 4A). Although the occurrence of the state varies between different conditions, it was not possible to negate the high-FRET state completely, not even under very high or low KCl concentrations, or in the presence of 50 mM arginine plus 50 mM glutamate (Figure 4A,B). To evaluate possible interdomain interactions scenarios we consider the following: (1) The SBDs of OpuA are connected to the TMDs with very short linkers of approximately 4 nm, which limit their movement and allow the receptor to sample a relatively small volume near its docking site. (2) in low ionic strength condition OpuA-K521C displays a high FRET state with mean FRET values of 0.7-0.8, which correspond to inter-dye distances of approximately 4 nm. (3) The high FRET state is responsive to glycine betaine, which points toward direct communication between the two SBDs. (4) The distance between the density centers of the SBDs in the cryo-EM reconstructions (based on particles with a low and high FRET state) is 6 nm, which aligns with the dimensions of an SBD (length: ~6 nm, maximal width: ~4 nm). These findings collectively indicate that two SBDs interact but not necessarily in a singular conformation but possibly as an ensemble of weakly interacting states. Hence, we discuss three possible SBD-SBD interaction models to explain the highFRET state:

      Reviewer #2 (Recommendations For The Authors):

      In the abstract and elsewhere the authors suggest that the SBDs physically interact with one another, and that this interaction is important for the transport mechanism, specifically for its cooperativity.

      I feel that this main claim is not well established. The authors convincingly demonstrate that the SBDs largely occupy two states relative to one another and that in one of these states, they are closer than in the other. Unless I have missed (or failed to understand) some major details of the results, I did not find any evidence of a physical interaction. Have the authors established that the high FRET state indeed corresponds to the physical engagement of the SBDs? I feel that a direct demonstration of an interaction is much missing.

      Along the same lines, in the low-salt cryo-EM structure, where the SBDs are relatively closer together, the SBDs are still separated and do not interact.

      See also our response to the final comment of reviewer 1. Furthermore, please carefully consider the following: (1) FRET values of 0.7-0.8 correspond to inter-dye distances of approximately 4 nm. (2) The high FRET state is responsive to glycine betaine, which points toward direct communication between the two SBDs. (3) The cryo-EM reconstruction is the average of all the particles in the final dataset, including both the particles with a low and high FRET state. Further, the local resolution of the SBDs in the cryo-EM map is low, indicative of high degree of flexibility. Thus, a potential interaction is possible within the observed range of flexibility. (4) The distance between the density centers is 6 nm, aligning with the dimensions of an SBD (length: 6 nm, maximal width: 4 nm). These factors collectively indicate SBD interactions, and we present these points now more explicitly in Figure 4 and the last part of the results section (page 9).

      Once the authors successfully demonstrate that direct physical interaction indeed occurs, they will need to provide data that places it in the context of the transport cycle. Do the SBDs swap ligand molecules between them? Do they bind the ligand and/or the transporter cooperatively? What is the role of this interaction?

      We acknowledge the intriguing nature of the posed questions, but they extend beyond the scope of this study. It is extremely challenging to obtain high-resolution structures of highly dynamic multidomain proteins, like OpuA, and to probe transient interactions as we do here for the SBDs of OpuA. We therefore combined cryo-TEM with smFRET studies and perform the most advanced and state-of-theart analysis tools as acknowledged by reviewer 1. We link our observations on the structural dynamics and interactions of the SBDs to a previous study, where we showed that the two SBDs of OpuA interact cooperatively. We do not have further evidence that connect the physical interactions to the transport cycle. In our view, the collective datasets indicate that the here reported physical interactions between the SBDs increase the transport efficiency.

      As far as I understand, the smFRET data have been interpreted on the basis of a negative observation, i.e., that it is "likely" that none of the FRET states corresponds to a docked SBD. To convincingly show this, a positive observation is required, i.e., observation of a docked state.

      The aim of this study was to study interdomain dynamics and not specifically docking. We have previously shown that docking can be visualized via cryo-EM (Sikkema et al., 2020), however the SBDs of OpuA appear to only dock in specific turnover conditions. We now show that the high FRET state of OpuA cannot represent a docked state, but that the SBDs transiently interact (see our response to the first comment). Importantly, a docked state was also not found in the cryo-EM reconstructions at low ionic strength, representing the smFRET conditions where we observe the interactions between the SBDs. The high FRET state occupies 30% of the dwells in this condition, and such a high percentage of molecules would have become apparent during cryo-EM 3D classification in case they would form a docked state. Therefore, we conclude that docking does not occur in low ionic strength apo condition. We discuss this point and our reasoning on page 11 of the revised manuscript.

      In this respect, I find it troubling that in none of the tested conditions, the authors observed a FRET state which corresponds to the docked state. Such a state, which must exist for transport to occur (as mentioned in the authors' previous publications), needs to be demonstrated. This brings me to my next question: why have the authors not measured FRET between the SBDs and the transporter? Isn't this a very important piece that is missing from their puzzle?

      We agree that investigating docking behavior under varied turnover conditions requires focused experiments on FRET dynamics between the SBDs and the transporter. As noted on page 5, OpuA exists as a homodimer, implying that a single cysteine mutation introduces two cysteines in a single functional transporter. To specifically implement a cysteine mutation in only one SBD and one transmembrane domain, it is necessary to artificially construct a heterodimer. We recently published initial attempts in this direction, and this will be a subject for future research but still requires years of work.

      Additionally, I feel that important controls are missing. For example, how will the data presented in Fig1 look if the transporter is labeled with acceptor or donor only? How do soluble SBDs behave?

      In the employed labeling method, donor and acceptor dyes are mixed in a 1:1 ratio and randomly attached to the two cysteines in the transporter. This automatically yields significant fractions of donor only and acceptor only transporters which are always present during the smFRET recordings. We can visualize those molecules on the basis of the dye stoichiometry, which we calculate by using three types of photon counts: donor-based donor emission (FDD), donor-based acceptor emission (FDA) and acceptorbased acceptor emission (FAA).

      Unfiltered plots look as follows (a dataset of OpuA-K521C at 600 mM KCl):

      Author response image 1.

      Donor only and acceptor only molecules have a very well discernible stoichiometry of 1 and 0, respectively. The filtering procedure is described in the materials and methods section, and these plots can be found in the supplementary database. We did not add them to the main text or supplementary materials of the original manuscript, as this is a very common procedure in the field of smFRET. We now include such a dataset in the revised manuscript.

      Soluble SBDs of OpuA have been studied previously (e.g. Wolters et al., 2010 & De Boer et al. 2019). For example, we have shown by SEC-MALLLS that soluble SBDs do not form dimers, which is consistent with our notion that the SBDs interact with low affinity. It is not possible to study interdomain dynamics between soluble SBDs by smFRET, because the measurements are carried out at picomolar concentrations (monomeric conditions). We emphasize that smFRET measurements with native complexes, with SBDs near each other at apparent millimolar concentrations, is physiologically more relevant.

      Additional comments:

      (1) "It could well be that cooperativity and transient interactions between SBDs is more common than previously anticipated" and a similar statement in the abstract. What evidence is there to suggest that the transient interactions between SBDs are a common phenomenon?

      On page 11, we write: Dimer formation of SBPs has been described for a variety of proteins from different structural clusters of substrate-binding proteins [33–38,51–53]. We cite 9 papers that report SBD/SBP dimers. This suggest to us that the phenomenon of interacting substrate-binding proteins could be more common. Moreover, the concentration of maltose-binding protein and other SBPs in the periplasm of Gram-negative bacteria can reach (sub)millimolar concentrations, and low-affinity interactions may play a role not only in membrane protein-tethered SBDs (like in OpuA) but also be important in soluble substrate-receptors. Such low-affinity interactions are rarely studied in biochemical experiments.

      (2) I think that the data presented in 1B-C better suits the supplementary information.

      Figure 1B-D is already a summary of the supplementary information that describes the optimization of OpuA purification. We think it is valuable to show this part of the figure in the main text. A very clean and highly pure OpuA sample is essential for smFRET experiments. Quality of protein preparations and data analysis are key for the type of measurements we report in this paper.

      (3) "the first peak in the SEC profile corresponds...." The peaks should be numbered in the figure to facilitate their identification.

      We have changed the figure as suggested.

      (4) "smFRET is a powerful tool for studying protein dynamics, but it has only been used for a handful of membrane proteins". With the growing list of membrane proteins studied by smFRET I find this an overstatement.

      We removed this sentence in the new version of the manuscript.

      (5) "We rationalized that docking of one SBD could induce a distance shift between the two SBDs in the FRET range of 3-10 nm (Figure 1E)" How and why was this assumed?

      We realize that this is one of the sentences that caused confusion about the aim of this study. In this part of the manuscript, we should not have used docking as an example and we apologize for that. We replaced the sentence by: These variants are used to study inter-SBD dynamics in the FRET range of 310 nm (Figure 1E).

      Also Figure 1E was adjusted to prevent confusion:

      Author response image 2.

      In addition, to avoid any confusion we changed the following sentence on page 4 (changes marked in bold): We designed cysteine mutations in the SBD of OpuA to study interdomain dynamics in the full length transporter.

      (6) "However, the FRET distributions are broader than would be expected from a single FRET state, especially for OpuA-K521C" Have the authors established how a single state FRET of OpuA looks? Is there a control that supports this claim?

      Below we compare two datasets from OpuA-K521C in 600 mM KCl with a typical smFRET dataset from the well-studied substrate-binding protein MBP from E. coli, which resides in a single state. Left: OpuA-K521C; Right: MBP

      Author response image 3.

      We agree that this cannot be assumed from the presented data. Therefore we rewrote this sentence: However, the FRET distributions tail towards higher FRET values, especially OpuA-K521C.

      (7) "V149Q was designed as a mild mutation that would reduce docking efficiency and thereby substrate loading, but leave the intrinsic transport and ATP hydrolysis efficiency intact." I find this statement confusing: How can a mutation reduce docking efficiency yet leave the transport activity unchanged?

      We rewrote the sentences (changes marked in bold): V149Q was designed as a mild mutation that would reduce docking efficiency and thereby substrate loading, but leave the ionic strength sensing in the NBD and the binding of glycine betaine and ATP intact. Accordingly, a reduced docking efficiency should result in a lower absolute glycine betaine-dependent ATPase activity. At the same time the responsiveness of the system to varying KCl, glycine betaine, or Mg-ATP concentrations should not change.

      (8) Along the same lines: "whereas the glycine betaine-, Mg-ATP-, or KCl-dependent activity profiles remain unchanged" vs. "OpuA-V149Q-K521C exhibited a 2- to 3-fold reduction in glycine betainedependent ATPase activity".

      See comment at point 7.

      (9) In general, I find the writing wanting at places, not on par with the high standards set by previous publications of this group.

      We recognize the potential ambiguity in our phrasing. We hope that after incorporating the feedback provided by the reviewers our manuscript will convey our findings in a clearer manner.

      Extra changes to the text:

      (1) Title changed: The substrate-binding domains of the osmoregulatory ABC importer OpuA physically transiently interact

      (2) Second part of the abstract changed: We now show, by means of solution-based single-molecule FRET and analysis with multi-parameter photon-by-photon hidden Markov modeling, that the SBDs transiently interact in an ionic strength-dependent manner. The smFRET data are in accordance with the apparent cooperativity in transport and supported by new cryo-EM data of OpuA. We propose that the physical interactions between SBDs and cooperativity in substrate delivery are part of the transport mechanism.

      (3) Page 6, third paragraph and Figure 2B: the wrong rate number was extracted from table 1. Changed this in the text and figure: 112 s-1  173 s-1. It did not affect any of the interpretations or conclusions.

      (4) Page 8, last paragraph, changed: smFRET was also performed in the absence of KCl and with a saturating concentration of glycine betaine (100 µM). The mean FRET efficiency of the highFRET state of OpuA-K521C increased to 0.78, which corresponds to an inter-dye distance of about 4 nm. This indicates that the dyes at the two SBDs move very close towards each other (Figure 4A) (Table 1) (Supplementary File 34).

      (5) Page 9, second paragraph changed: Due to the inherent flexibility of the SBDs, with respect to both the MSP protein of the nanodisc and the TMDs of OpuA, their resolution is limited. Furthermore, the cryo-EM reconstructions average all the particles in the final dataset, including those with a low and high FRET state. Nevertheless, in both conditions, the densities that correspond to the SBDs can be observed in close proximity (Figure 4D). The distance between the density centers is 6 nm and align with the dimensions of an SBD, providing further evidence for physical interactions between the SBDs.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The association of vitamin D supplementation in reducing Asthma risk is well studied, although the mechanistic basis for this remains unanswered. In the presented study, Kilic and co-authors aim to dissect the pathway of Vitamin D mediated amelioration of allergic airway inflammation. They use initial leads from bioinformatic approaches, which they then associate with results from a clinical trial (VDAART) and then validate them using experimental approaches in murine models. The authors identify a role of VDR in inducing the expression of the key regulator Ikzf3, which possibly suppresses the IL-2/STAT5 axis, consequently blunting the Th2 response and mitigating allergic airway inflammation.

      Strengths:

      The major strength of the paper lies in its interdisciplinary approach, right from hypothesis generation, and linkage with clinical data, as well as in the use of extensive ex vivo experiments and in vivo approaches using knock-out mice.

      The study presents some interesting findings including an inducible baseline absence/minimal expression of VDR in lymphocytes, which could have physiological implications and needs to be explored in future studies.

      Weaknesses:

      The core message of the study relies on the role of vitamin D and its receptor in suppressing the Th2 response. However, there is scope for further dissection of relevant pathophysiological parameters in the in vivo experiments, which would enable stronger translation to allergic airway diseases like Asthma.

      To a large extent, the authors have been successful in validating their results, although a few inferences could be reinforced with additional techniques, or emphasised in the discussion section (possibly utilising the ideas and speculative section offered by the journal).

      The study inferences also need to be read in the context of the different sub-phenotypes and endotypes of Asthma, where the Th2 response may not be predominant. Moreover, the authors have referenced vitamin D doses for the murine models from the VDAART trials and performed the experiments in the second generation of animals. While this is appreciated, the risk of hypervitaminosis-D cannot be ignored, in view of its lipid solubility. Possibly comparison and justification of the doses used in murine experiments from previous literature, as well as the incorporation of an emphasised discussion about the side effects and toxicity of Vitamin D, is an important aspect to consider.

      In no way do the above considerations undermine the importance of this elegant study which justifies trials for vitamin D supplementation and its effects on Asthma. The work possesses tremendous potential.

      We thank the reviewer for their careful assessment of our paper and helpful suggestions. Please find the point-by-point responses to the reviewer recommendations below.

      Reviewer #2 (Public Review):

      Summary:

      This study seeks to advance our knowledge of how vitamin D may be protective in allergic airway disease in both adult and neonatal mouse models. The rationale and starting point are important human clinical, genetic/bioinformatic data, with a proposed role for vitamin D regulation of 2 human chromosomal loci (Chr17q12-21.1 and Chr17q21.2) linked to the risk of immune-mediated/inflammatory disease. The authors have made significant contributions to this work specifically in airway disease/asthma. They link these data to propose a role for vitamin D in regulating IL-2 in Th2 cells implicating genes associated with these loci in this process.

      Strengths:

      Here the authors draw together evidence from multiple lines of investigation to propose that amongst murine CD4+ T cell populations, Th2 cells express high levels of VDR, and that vitamin D regulates many of the genes on the chromosomal loci identified to be of interest, in these cells. The bottom line is the proposal that vitamin D, via Ikfz3/Aiolos, suppresses IL-2 signalling and reduces IL-2 signalling in Th2 cells. This is a novel concept and whilst the availability of IL-2 and the control of IL-2 signalling is generally thought to play a role in the capacity of vitamin D to modulate both effector and especially regulatory T cell populations, this study provides new data.

      Weaknesses:

      Overall, this is a highly complicated paper with numerous strands of investigation, methodologies etc. It is not "easy" reading to follow the logic between each series of experiments and also frequently fine detail of many of the experimental systems used (too numerous to list), which will likely frustrate immunologists interested in this. There is already extensive scientific literature on many aspects of the work presented, much of which is not acknowledged and largely ignored. For example, reports on the effects of vitamin D on Th2 cells are highly contradictory, especially in vitro, even though most studies agree that in vivo effects are largely protective. Similarly, other reports on adult and neonatal models of vitamin D and modulation of allergic airway disease are not referenced. In summary, the data presentation is unwieldy, with numerous supplementary additions, which makes the data difficult to evaluate and the central message lost. Whilst there are novel data of interest to the vitamin D and wider community, this manuscript would benefit from editing to make it much more readily accessible to the reader.

      Wider impact: Strategies to target the IL-2 pathway have long been considered and there is a wealth of knowledge here in autoimmune disease, transplantation, GvHD etc - with some great messages pertinent to the current study. This includes the use of IL-2, including low dose IL-2 to boost Treg but not effector T cell populations, to engineered molecules to target IL-2/IL-2R.

      We thank the reviewer for their careful assessment of our paper and helpful suggestions. Please find the point-by-point responses to the reviewer recommendations below. In addition, we have revisited the Introduction and Discussion, added additional subsection headings, and provided additional schematics to make the general flow of the paper more accessible to a wider audience.

      Reviewer #1 (Recommendations For The Authors):

      There are certain aspects of the manuscript which could be revisited in order to provide more clarity to the reader. Some of these are:

      1. In vivo experiments : The major inference and its impact is derived from the effect of VDR on Ikzf3 expression, and consequently on the Th2 response. While the study employs both in vivo and ex vivo approaches to validate this claim, pathophysiological aspects could have been explored in more detail, by using cytokine panels, possibly techniques to measure airway resistance, as well as by reducing the variations in the sample sizes used in different groups. Similarly, certain inferences from ex vivo studies may be important to demonstrate in the in vivo setting as well. A justification for the incorporation of both Balb/c and C57 Bl6 mice for the experiments could also be incorporated in the manuscript.

      2. Certain sections, especially those connecting VDR, Ikzf1/3 and IL2/STAT axis seem associative. This is indicated by Figure 5 H as well, where the effects of calcitriol administration in KO cells indicate additional pathways at play, possibly through indirect effects. The use of additional techniques like ChIP, co-IP and establishing STAT induction/activation would probably strengthen the findings, alternatively, a clear distinction between the speculative and the definitive results could be made in the discussion section, as the journal encourages. Similar considerations could be made for VDR and Ikzf3.

      3. Role of other cells :

      a. While the investigators have explored the phenotype on other cell types like Th1 and Treg, at places there remains a lacuna. For instance, the absence of neutrophil fractions from the DLC-BAL, as well inconsistencies in the groups selected for comparison. For eg. in Figure 3 Supplementary Figure 2, the figure suggests IL13 expression in CD4+ cells, yet the text reads incubated Th2 cells. This could be made more lucid.

      b. In Figure 3 Supplementary Figure 1 there is a trend towards an increase in IL-10 levels, whereas in Supplementary Figure 2 there is a drop in the IL13 level in the VDR KO group, which has not been explained.

      c. While 17q loci form the predominant loci associated with Asthma, other loci important in Asthma on chromosomes 2,6,9, 22 could be discussed in the manuscript as well, even if they can't be explored in depth.

      1. Quantification of histology and confocal images could provide an objective assessment to the readers. Possibly incorporation of co-localisation panels for the IF images showing membrane/cytoplasmic/ nuclear localisation of the VDR under various conditions.

      2. Structure of the manuscript: At places the manuscript has a disrupted flow, as well as mislabelled figures (Figure 2SF1B is 1C, Fig 2c is 2b in the results, ). Flow gates can be arranged sequentially and consistent labelling of the gates and axis would ease interpretation. In some places sample sizes mentioned do not match the dot graphs in the figures (figure 3K-L). In the same figure and others (Figure 5 Supplementary Figure 2), a comparison of all groups would be beneficial. A restructuring of the results and corrections, could assist the reader. Also, a visualization of the VDAART analysis in the main figures, corroborating with the results sections would do justice to the interesting approach and findings. The clearances and approvals for the study also need to be incorporated into the manuscript. If possible, the incorporation of a schematic showing the proposed pathway for VDR-induced Ikzf3 and subsequent suppression of the genes present on Chr 17 loci to mitigate allergic airway inflammation would help.

      Reviewer #2 (Recommendations For The Authors):

      A few specific points: A number of immune concepts are studied without reference to the broader literature and the data presented data on occasion counter these earlier findings. Examples of this include:

      • Vitamin D can both enhance and inhibit IL-13 synthesis, demonstrated both in vitro and ex vivo, and these effects are clearly context-specific. I am not questioning the validity of the present experimental findings in this specific experimental model), but the experimental context - the problem is that this is not discussed.

      • Short-term bulk Th2 cultures are used with no indication of their enrichment for lineage-specific markers or cytokine - their conclusions might be enhanced by this. Data on genes/markers of interest could be further enhanced by showing FACS plots of co-expression e.g. Th2 genes e.g. IL-13/GATA3 with these other markers.

      • Are human Th2 enriched for VDR, since the backdrop to this study is human clinical and genetic data? For a study that has based its rationale on human clinical/genetic studies it would be great to confirm these findings in human Th2 cells.

      • The Discussion might comment on some of these wider issues.

      • Minor typos throughout, including in figure legends

      Reviewer #1

      1. The study inferences also need to be read in the context of the different sub-phenotypes and endotypes of Asthma, where the Th2 response may not be predominant.

      We agree that asthma has many sub-phenotypes and endotypes and that the Th2 response may not be predominant in all of them, but we focus here on the origins of the disease in the first few years of life and the genetic and molecular mechanisms associate with disease onset where the Th2 response is important.

      1. Moreover, the authors have referenced vitamin D doses for the murine models from the VDAART trials and performed the experiments in the second generation of animals. While this is appreciated, the risk of hypervitaminosis-D cannot be ignored, in view of its lipid solubility. Possibly comparison and justification of the doses used in murine experiments from previous literature, as well as the incorporation of an emphasized discussion about the side effects and toxicity of Vitamin D, is an important aspect to consider.

      We appreciate this comment from the reviewers allowing us to review vitamin D toxicity in more detail. Given the length of this review we did not include this in the manuscript discussion but provide it here.

      Vitamin D supplementation in humans is debated due to possibility of intoxication from overdose. Vitamin D intoxication is a rare medical condition associated with hypercalcemia, hyperphosphatemia, and suppressed parathyroid hormone level and is typically seen in patients who are receiving very high doses of vitamin D, ranging from 50,000 to 1 million IU/d for several months to years 1,2. Intoxication observed at lower doses might be attributable to rare genetic disorders 1. By far the bigger problem in humans is vitamin D deficiency; this is especially true in pregnant women where dosage requirements are high due to the needs of the fetus. It is estimated that virtually all pregnant women are vitamin D insufficient or deficient 3. VDAART has shown that vitamin D in a dose of 4400 IC given to pregnant women can prevent asthma in their offspring. There were no adverse side effects in the mother or the infant from this dose 4.

      In rodents, a few studies have reported vitamin D intoxication with very high vitamin D doses 5(PMID: 23405058: 50.000 IU/kg 120d -> toxicity in females). In contrast there are several studies using 2-2.5 times higher doses of vitamin D than we use here, that do not report adverse events in mouse models of disease 6,7. Our doses of vitamin D are identical to those used in VDAART and are lower than those used in any of these other rodent studies. In addition, while we did not specifically assess specific signs of vitamin D intoxication, we can exclude any impact on animal well-being, health, reproduction, and behavior throughout the study.

      1. The major inference and its impact are derived from the effect of VDR on Ikzf3 expression, and consequently on the Th2 response. While the study employs both in vivo and ex vivo approaches to validate this claim, pathophysiological aspects could have been explored in more detail, by using cytokine panels, possibly techniques to measure airway resistance, as well as by reducing the variations in the sample sizes used in different groups.

      We have added the following sentence to the discussion: “Additional cytokine measurements in the mice as well as measurement of airway resistance would have added to the pathophysiological data linking IKFZ3 expression to TH2 response.”

      1. Similarly, certain inferences from ex vivo studies may be important to demonstrate in the in vivo setting as well. A justification for the incorporation of both Balb/c and C57 Bl6 mice for the experiments could also be incorporated in the manuscript.

      We agree with the reviewers that ex vivo results may require in vivo confirmation. We have added a sentence explaining the rationale for use of both Balb/c and C57BL/6 mice in the results section “Vitamin D suppresses the activation of the IL-2/Stat5 pathway and cytokine production in Th2 cells”: “To ensure that the above findings were not restricted to the C57BL/6 mouse strain, the inverse experiment was performed in Balb/c mice. This mouse strain is commonly used for type 2 driven inflammation.”

      1. Certain sections, especially those connecting VDR, Ikzf1/3 and IL2/STAT axis seem associative. This is indicated by Figure 5 H as well, where the effects of calcitriol administration in KO cells indicate additional pathways at play, possibly through indirect effects.

      We appreciate this comment. The RNA-Seq results showed an over representation of the IL-2/STAT5 pathway in Vit-D deficient Th2 cells compared to those under Vitamin D supplementation. We further show the induction of IKZF3 expression with calcitriol stimulation. High IKZF3 expression is known to suppress IL-2 expression. Lack of IKZF3 diminishes the suppressive activity of calcitriol on IL-2 expression. However, as pointed out by the reviewer, Figure 5 H implicates additional pathways regulated by calcitriol for the suppression of IL-2 and we note that in the text.

      1. The use of additional techniques like ChIP, co-IP and establishing STAT induction/activation would probably strengthen the findings, alternatively, a clear distinction between the speculative and the definitive results could be made in the discussion section, as the journal encourages. Similar considerations could be made for VDR and Ikzf3.

      We have added the following sentence to the discussion. We have focused here on establishing the relationship between VDR binding and IKFZ3 activation or repression and subsequent ORMDL3 and Il2 activation. Additional use of ChIP or co-IP to establish STAT induction and activation would have been of potential value.

      1. Role of other cells: a. While the investigators have explored the phenotype on other cell types like Th1 and Treg, at places there remains a lacuna. For instance, the absence of neutrophil fractions from the DLC BAL, as well inconsistencies in the groups selected for comparison. For e.g., in Figure 3 Supplementary Figure 2, the figure suggests IL13 expression in CD4+ cells, yet the text reads incubated Th2 cells. This could be made more lucid.

      We appreciate this comment and would like to clarify. Neutrophil numbers were assessed in the presented in vivo models and showed no differences in neutrophil number due to genotype or vitamin D diet. We added the graphs to the supplement in Figure 3 - figure supplement 1A and Figure 5 - figure supplement 1B and refer to the figures in the main text. All in vivo data were analyzed by Mixed-effect ANOVA analysis or Two-way ANOVA test with Holm-Šidák’s post-hoc analysis (factors: genotype & exposure). To keep the plots clear, we incorporated only the statistic for the groups of interest.

      1. b) In Figure 3 Supplementary Figure 1 there is a trend towards an increase in IL-10 levels, whereas in Supplementary Figure 2 there is a drop in the IL13 level in the VDR KO group, which has not been explained.

      We apologize for any confusion. Figure 3 supplementary Figure 1 shows cytokine positive CD4+ T cells isolated from saline and HDM exposed mouse lungs. These data were analyzed with a Mixed-effect ANOVA analysis or Two-way ANOVA test with Holm-Šidák’s post-hoc analysis (factors: genotype & exposure) and were not found significant. Figure 3 supplementary Figure 2 shows IL-13 levels in the system of in vitro polarization of naïve CD4+ T cells into Th2 cells. The difference between this result and the findings in Figure 3H is the in vivo setting in which additional factors such as IL-4 can aggravate the immune response.

      1. c) While 17q loci form the predominant loci associated with Asthma, other loci important in Asthma on chromosomes 2,6,9, 22 could be discussed in the manuscript as well, even if they can't be explored in depth.

      This is an excellent comment. Our preliminary results confirm that three asthma susceptibility loci: 2q12.1 (IL1RL1), 6p21.32 (HLA-DQA1/B1/A2/B2) and 22q12.3 (IL2RB) each have VDR and IKZF3 binding sites either in enhancers predicted by GeneHancer to target these genes or within these genes themselves. In particular, we found (i) VDR binding sites within IL18RAP and in the enhancer region GH02J102301 targeting IL1RL1, and IKZF3 binding sites within IL1RL1; (ii) VDR binding sites in the enhancer regions GH06J032940 and GH06J031813 targeting HLA-DQA2, and IKZF3 binding sites within HLA-DQA1; (iii) VDR and IKZF3 binding sites within IL2RB. In contrast, the region 9p24.1 (IL33) has no documented VDR or IKZF3 binding sites within IL33 or in the promoter regions targeting IL33. Investigating these additional genetic loci further, using the integrative approach taken here with 17q12-21, is beyond the scope of this current manuscript but based on these preliminary results, would be a worthwhile scientific endeavor.

      1. Quantification of histology and confocal images could provide an objective assessment to the readers. Possibly incorporation of co-localisation panels for the IF images showing membrane/cytoplasmic/nuclear localisation of the VDR under various conditions.

      We agree that quantification of histology and confocal images could provide an overview of VDR expression in the lungs. Given the knowledge on VDR expression in a variety of cell types, including structural cells in the lungs and the focus of this manuscript on CD4+ T cells, we focused on determining VDR expression in CD4+ T cells isolated from saline and HDM exposed lungs in the mouse models studied (Figure 2 C; Fig. 2- figure supplement 1 B & C, Figure 3 C; Figure 5 - figure supplement 1) as well as in vitro (Figure 2 - figure supplement 2; Figure 5 - figure supplement 2).

      1. Structure of the manuscript: At places the manuscript has a disrupted flow, as well as mislabeled figures (Figure 2SF1B is 1C, Fig 2c is 2b in the results, ). Flow gates can be arranged sequentially and consistent labelling of the gates and axis would ease interpretation.

      We appreciate this comment and have corrected the mislabeled figures and tried to improve the flow.

      1. In some places sample sizes mentioned do not match the dot graphs in the figures (figure 3K-L). In the same figure and others (Figure 5 Supplementary Figure 2), a comparison of all groups would be beneficial.

      We appreciate this comment and have checked the sample sizes. Each of these experiments compared two groups and these two groups were compared statistically. We corrected the sample size for Figure 5 Supplementary Figure 2 C in the manuscript.

      1. A restructuring of the results and corrections, could assist the reader.

      We have restructured both the results and the discussion, incorporating the changes noted here in the response to the reviewers, to make the flow of the manuscript easier to read.

      1. Also, a visualization of the VDAART analysis in the main figures, corroborating with the results sections would do justice to the interesting approach and findings.

      We have now added the below schematic to Figure 1-figure supplement 1C to summarize the analyses conducted on the VDAART data.

      Author response image 1.

      1. The clearances and approvals for the study also need to be incorporated into the manuscript.

      These were in the checklist and have been moved to the main text of the manuscript.

      1. If possible, the incorporation of a schematic showing the proposed pathway for VDR induced Ikzf3 and subsequent suppression of the genes present on Chr 17 loci to mitigate allergic airway inflammation would help.

      We have a figure for this (below) that we have incorporated into the manuscript as Figure 5 - figure supplement 3:

      Author response image 2.

      Cartoon Summarizing Vitamin D molecular genetics at 17q12-21

      Reviewer #2

      1. A few specific points: A number of immune concepts are studied without reference to the broader literature and the data presented data on occasion counter these earlier findings. Examples of this include:

      a. Vitamin D can both enhance and inhibit IL-13 synthesis, demonstrated both in vitro and ex vivo, and these effects are clearly context-specific. I am not questioning the validity of the present experimental findings in this specific experimental model), but the experimental context - the problem is that this is not discussed.

      We thank the reviewer for this comment. We have now included a sentence in the discussion section mentioning the contradictory results. It reads as follows:

      “We acknowledge that the impact of vitamin D on Th2 biology is conflicting in the literature. While several groups report Th2 promoting activity, we, and others, show inhibition of type 2 cytokine production 8–11. These discrepancies could be due to the model system studied, e.g., PBMC and purified CD4+ T cells, or the dose of vitamin D or the mouse strain.”

      b. Short-term bulk Th2 cultures are used with no indication of their enrichment for lineage specific markers or cytokine – their conclusions might be enhanced by this. Data on genes/markers of interest could be further enhanced by showing FACS plots of co-expression e.g., Th2 genes e.g., IL-13/GATA3 with these other markers.

      We appreciate this comment. The in vitro culture system used for Th2 cell differentiation has been well described in the literature. As shown in Figure 3 - figure supplement 2; Figure 4 E and Figure 5 - figure supplement 2 D & E the lineage specific IL-13 cytokine levels are detectable at high levels.

      c. Are human Th2 cells enriched for VDR, since the backdrop to this study is human clinical and genetic data? For a study that has based its rationale on human clinical/genetic studies it would be great to confirm these findings in human Th2 cells.

      We appreciate this comment and are curious to explore this in future research. The VDAART trial is a double-blinded multicenter trial in which an immediate processing of the blood samples and an enrichment of different immune cell populations was not feasible. Other publicly available data sets report gene expression derived from mixed and peripheral (blood) cells and not local (lung) tissues. Published in vitro studies on human Th2 cells do not report VDR expression in comparison to other Th subsets, which would allow the assessment of enrichment.

      1. The Discussion might comment on some of these wider issues.

      We have rewritten the discussion to incorporate many of the issues raised in this review.

      1. Minor typos throughout, including in figure legends.

      We have edited all of the figure legends.

      References

      1. Holick, M. F. Vitamin D Is Not as Toxic as Was Once Thought: A Historical and an Up-to-Date Perspective. Mayo Clinic proceedings 90, 561–564; 10.1016/j.mayocp.2015.03.015 (2015).

      2. Hossein-nezhad, A. & Holick, M. F. Vitamin D for health: a global perspective. Mayo Clinic proceedings 88, 720–755; 10.1016/j.mayocp.2013.05.011 (2013).

      3. Hollis, B. W. & Wagner, C. L. New insights into the vitamin D requirements during pregnancy. Bone research 5, 17030; 10.1038/boneres.2017.30 (2017).

      4. Litonjua, A. A. et al. Effect of Prenatal Supplementation With Vitamin D on Asthma or Recurrent Wheezing in Offspring by Age 3 Years: The VDAART Randomized Clinical Trial. JAMA 315, 362–370; 10.1001/jama.2015.18589 (2016).

      5. Gianforcaro, A., Solomon, J. A. & Hamadeh, M. J. Vitamin D(3) at 50x AI attenuates the decline in paw grip endurance, but not disease outcomes, in the G93A mouse model of ALS, and is toxic in females. PloS one 8, e30243; 10.1371/journal.pone.0030243 (2013).

      6. Landel, V., Millet, P., Baranger, K., Loriod, B. & Féron, F. Vitamin D interacts with Esr1 and Igf1 to regulate molecular pathways relevant to Alzheimer's disease. Molecular neurodegeneration 11, 22; 10.1186/s13024-016-0087-2 (2016).

      7. Agrawal, T., Gupta, G. K. & Agrawal, D. K. Vitamin D supplementation reduces airway hyperresponsiveness and allergic airway inflammation in a murine model. Clinical and experimental allergy : journal of the British Society for Allergy and Clinical Immunology 43, 672–683; 10.1111/cea.12102 (2013).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to address a critical challenge in the field of bioinformatics: the accurate and efficient identification of protein binding sites from sequences. Their work seeks to overcome the limitations of current methods, which largely depend on multiple sequence alignments or experimental protein structures, by introducing GPSite, a multi-task network designed to predict binding residues of various molecules on proteins using ESMFold.

      Strengths:

      • Benchmarking. The authors provide a comprehensive benchmark against multiple methods, showcasing the performances of a large number of methods in various scenarios.

      • Accessibility and Ease of Use. GPSite is highlighted as a freely accessible tool with user-friendly features on its website, enhancing its potential for widespread adoption in the research community.

      RE: We thank the reviewer for acknowledging the contributions and strengths of our work!

      Weaknesses:

      • Lack of Novelty. The method primarily combines existing approaches and lacks significant technical innovation. This raises concerns about the original contribution of the work in terms of methodological development. Moreover, the paper reproduces results and analyses already presented in previous literature, without providing novel analysis or interpretation. This further diminishes the contribution of this paper to advancing knowledge in the field.

      RE: The novelty of this work is primarily manifested in four key aspects. Firstly, although we have employed several existing tools such as ProtTrans and ESMFold to extract sequence features and predict protein conformations, these techniques were hardly explored in the field of binding site prediction. We have successfully demonstrated the feasibility of substituting multiple sequence alignments with language model embeddings and training with predicted structures, providing a new solution to overcome the limitations of current methods for genome-wide applications. Secondly, though a few methods tend to capture geometric information based on protein surfaces or atom graphs, surface calculation and property mapping are usually time-consuming, while massage passing on full atom graphs is memory-consuming and thus challenging to process long sequences. Besides, these methods are sensitive towards details and errors in the predicted structures. To facilitate large-scale annotations, we have innovatively applied geometric deep learning to protein residue graphs for comprehensively capturing backbone and sidechain geometric contexts in an efficient and effective manner (Figure 1). Thirdly, we have not only exploited multi-task learning to integrate diverse ligands and enhance performance, but also shown its capability to easily extend to the binding site prediction of other unseen ligands (Figure 4 D-E). Last but not least, as a “Tools and Resources” article, we have provided a fast, accurate and user-friendly webserver, as well as constructed a large annotation database for the sequences in Swiss-Prot. Leveraging this database, we have conducted extensive analyses on the associations between binding sites and molecular functions, biological processes, and disease-causing mutations (Figure 5), indicating the potential of our tool to unveil unexplored biology underlying genomic data.

      We have now revised the descriptions in the “The geometry-aware protein binding site predictor (GPSite)” section to highlight the novelty of our work in a clearer manner:

      “In conclusion, GPSite is distinguished from the previous approaches in four key aspects. First, profiting from the effectiveness and low computational cost of ProtTrans and ESMFold, GPSite is liberated from the reliance on MSA and native structures, thus enabling genome-wide binding site prediction. Second, unlike methods that only explore the Cα models of proteins 25,40, GPSite exploits a comprehensive geometric featurizer to fully refine knowledge in the backbone and sidechain atoms. Third, the employed message propagation on residue graphs is global structure-aware and time-efficient compared to the methods based on surface point clouds 21,22, and memory-efficient unlike methods based on full atom graphs 23,24. Residue-based message passing is also less sensitive towards errors in the predicted structures. Last but not least, instead of predicting binding sites for a single molecule type or learning binding patterns separately for different molecules, GPSite applies multi-task learning to better model the latent relationships among different binding partners.”

      • Benchmark Discrepancies. The variation in benchmark results, especially between initial comparisons and those with PeSTo. GPSite achieves a PR AUC of 0.484 on the global benchmark but a PR AUC of 0.61 on the benchmark against PeSTo. For consistency, PeSTo should be included in the benchmark against all other methods. It suggests potential issues with the benchmark set or the stability of the method. This inconsistency needs to be addressed to validate the reliability of the results.

      RE: We thank the reviewer for the constructive comments. Since our performance comparison experiments involved numerous competitive methods whose training sets are disparate, it was difficult to compare or rank all these methods fairly using a single test set. Given the substantial overlap between our protein-binding site test set and the training set of PeSTo, we meticulously re-split our entire protein-protein binding site dataset to generate a new test set that avoids any overlap with the training sets of both GPSite and PeSTo and performed a separate evaluation, where GPSite achieves a higher AUPR than PeSTo (0.610 against 0.433). This is quite common in this field. For instance, in the study of PeSTo (Nat Commun 2023), the comparisons of PeSTo with MaSIF-site, SPPIDER, and PSIVER were conducted using one test set, while the comparison with ScanNet was performed on a separate test set.

      Based on the reviewer’s suggestion, we have now replaced this experiment with a direct comparison with PeSTo using the datasets from PeSTo, in order to enhance the completeness and convincingness of our results. The corresponding descriptions are now added in Appendix 1-note 2, and the results are added in Appendix 2-table 4. For convenience, we also attach the note and table here:

      “Since 340 out of 375 proteins in our protein-protein binding site test set share > 30% identity with the training sequences of PeSTo, we performed a separate comparison between GPSite and PeSTo using the training and test datasets from PeSTo. By re-training with simply the same hyperparameters, GPSite achieves better performance than PeSTo (AUPR of 0.824 against 0.797) as shown in Appendix 2-table 4. Furthermore, when using ESMFold-predicted structures as input, the performance of PeSTo decreases substantially (AUPR of 0.691), and the superiority of our method will be further reflected. As in 24, the performance of ScanNet is also included (AUPR of 0.720), which is also largely outperformed by GPSite.”

      Author response table 1.

      Performance comparison of GPSite with ScanNet and PeSTo on the protein-protein binding site test set from PeSTo 24

      Note: The performance of ScanNet and PeSTo are directly obtained from 24. PeSTo* denotes evaluation using the ESMFold-predicted structures as input. The metrics provided are the median AUPR, median AUC and median MCC. The best/second-best results are indicated by bold/underlined fonts.

      • Interface Definition Ambiguity. There is a lack of clarity in defining the interface for the binding site predictions. Different methods are trained using varying criteria (surfaces in MaSIF-site, distance thresholds in ScanNet). The authors do not adequately address how GPSite's definition aligns with or differs from these standards and how this issue was addressed. It could indicate that the comparison of those methods is unreliable and unfair.

      RE: We thank the reviewer for the comments. The precise definition of ligand-binding sites is elucidated in the “Benchmark datasets” section. Specifically, the datasets of DNA, RNA, peptide, ATP, HEM and metal ions used to train GPSite were collected from the widely acknowledged BioLiP database [PMID: 23087378]. In BioLiP, a binding residue is defined if the smallest atomic distance between the target residue and the ligand is <0.5 Å plus the sum of the Van der Waal’s radius of the two nearest atoms. Meanwhile, most comparative methods regarding these ligands were also trained on data from BioLiP, thereby ensuring fair comparisons.

      However, since BioLiP does not include data on protein-protein binding sites, studies for protein-protein binding site prediction may adopt slightly distinct label definitions, as the reviewer suggested. Here, we employed the protein-protein binding site data from our previous study [PMID: 34498061], where a protein-binding residue was defined as a surface residue (relative solvent accessibility > 5%) that lost more than 1 Å2 absolute solvent accessibility after protein-protein complex formation. This definition was initially introduced in PSIVER [PMID: 20529890] and widely applied in various studies (e.g., PMID: 31593229, PMID: 32840562). SPPIDER [PMID: 17152079] and MaSIF-site [PMID: 31819266] have also adopted similar surface-based definitions as PSIVER. On the other hand, ScanNet [PMID: 35637310] employed an atom distance threshold of 4 Å to define contacts while PeSTo [PMID: 37072397] used a threshold of 5 Å. However, it is noteworthy that current methods in this field including ScanNet (Nat Methods 2022) and PeSTo (Nat Commun 2023) directly compared methods using different label definitions without any alignment in their benchmark studies, likely due to the subtle distinctions among these definitions. For instance, the study of PeSTo directly performed comparisons with ScanNet, MaSIF-site, SPPIDER, and PSIVER. Therefore, we followed these previous works, directly comparing GPSite with other protein-protein binding site predictors.

      In the revised “Benchmark datasets” section, we have now provided more details for the binding site definitions in different datasets to avoid any potential ambiguity:

      “The benchmark datasets for evaluating binding site predictions of DNA, RNA, peptide, ATP, and HEM are constructed from BioLiP”; “A binding residue is defined if the smallest atomic distance between the target residue and the ligand is < 0.5 Å plus the sum of the Van der Waal’s radius of the two nearest atoms”; “Besides, the benchmark dataset of protein-protein binding sites is directly from 26, which contains non-redundant transient heterodimeric protein complexes dated up to May 2021. Surface regions that become solvent inaccessible on complex formation are defined as the ground truth protein-binding sites. The benchmark datasets of metal ion (Zn2+, Ca2+, Mg2+ and Mn2+) binding sites are directly from 18, which contain non-redundant proteins dated up to December 2021 from BioLiP.”

      While GPSite demonstrates the potential to surpass state-of-the-art methods in protein binding site prediction, the evidence supporting these claims seems incomplete. The lack of methodological novelty and the unresolved questions in benchmark consistency and interface definition somewhat undermine the confidence in the results. Therefore, it's not entirely clear if the authors have fully achieved their aims as outlined.

      The work is useful for the field, especially in disease mechanism elucidation and novel drug design. The availability of genome-scale binding residue annotations GPSite offers is a significant advancement. However, the utility of this tool could be hampered by the aforementioned weaknesses unless they are adequately addressed.

      RE: We thank the reviewer for acknowledging the advancement and value of our work, as well as pointing out areas where improvements can be made. As discussed above, we have now carried out the corresponding revisions in the revised manuscript to enhance the completeness and clearness of our work.

      Reviewer #2 (Public Review):

      Summary:

      This work provides a new framework, "GPsite" to predict DNA, RNA, peptide, protein, ATP, HEM, and metal ions binding sites on proteins. This framework comes with a webserver and a database of annotations. The core of the model is a Geometric featurizer neural network that predicts the binding sites of a protein. One major contribution of the authors is the fact that they feed this neural network with predicted structure from ESMFold for training and prediction (instead of native structure in similar works) and a high-quality protein Language Model representation. The other major contribution is that it provides the public with a new light framework to predict protein-ligand interactions for a broad range of ligands.

      The authors have demonstrated the interest of their framework with mostly two techniques: ablation and benchmark.

      Strengths:

      • The performance of this framework as well as the provided dataset and web server make it useful to conduct studies.

      • The ablations of some core elements of the method, such as the protein Language Model part, or the input structure are very insightful and can help convince the reader that every part of the framework is necessary. This could also guide further developments in the field. As such, the presentation of this part of the work can hold a more critical place in this work.

      RE: We thank the reviewer for recognizing the contributions of our work and for noting that our experiments are thorough.

      Weaknesses:

      • Overall, we can acknowledge the important effort of the authors to compare their work to other similar frameworks. Yet, the lack of homogeneity of training methods and data from one work to the other makes the comparison slightly unconvincing, as the authors pointed out. Overall, the paper puts significant effort into convincing the reader that the method is beating the state of the art. Maybe, there are other aspects that could be more interesting to insist on (usability, interest in protein engineering, and theoretical works).

      RE: We sincerely appreciate the reviewer for the constructive and insightful comments. As to the concern of training data heterogeneity raised by the reviewer, it is noteworthy that current studies in this field, such as ScanNet (Nat Methods 2022) and PeSTo (Nat Commun 2023), directly compare methods trained on different datasets in their benchmark experiments. Therefore, we have adhered to the paradigm in these previous works. According to the detailed recommendations by the reviewer, we have now improved our manuscript by incorporating additional ablation studies regarding the effects of training procedure and language model representations, as well as case studies regarding the predicted structure’s quality and GPSite-based function annotations. We have also refined the Discussion section to focus more on the achievements of this work. A comprehensive point-by-point response to the reviewer’s recommendations is provided below.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Overall I think the work is slightly deserved by its presentation. Some improvements could be made to the paper to better highlight the significance of your contribution.

      RE: We thank the reviewer for recognizing the significance of our work!

      • Line 188: "As expected, the performance of these methods mostly decreases substantially utilizing predicted structures for testing because they were trained with high-quality native structures.

      This is a major ablation that was not performed in this case. You used the predicted structure to train, while the other did not. One better way to assess the interest of this approach would be to compare the performance of a network trained with only native structure to compare the leap in performance with and without this predicted structure as you did after to assess the interest of some other aspect of your method such as single to multitask.

      RE: We thank the reviewer for the valuable recommendation. We have now assessed the benefit of training with predicted instead of native structures, which brings an average AUPR increase of 4.2% as detailed in Appendix 1-note 5 and Appendix 2-table 9. For convenience, we also attach the note and table here:

      “We examined the performance under different training and evaluation settings as shown in Appendix 2-table 9. As expected, the model yields exceptional performance (average AUPR of 0.656) when trained and evaluated using native structures. However, if this model is fed with predicted structures of the test proteins, the performance substantially declines to an average AUPR of 0.573. This trend aligns with the observations for other structure-based methods as illustrated in Figure 2. More importantly, in the practical scenario where only predicted structures are available for the target proteins, training the model with predicted structures (i.e., GPSite) results in superior performance than training the model with native structures (average AUPR of 0.594 against 0.573), probably owing to the consistency between the training and testing data. For completeness, the results in Appendix 3-figure 2 are also included where GPSite is tested with native structures (average AUPR of 0.637).”

      Author response table 2.

      Performance comparison on the ten binding site test sets under different training and evaluation settings

      Note: The numbers in this table are AUPR values. “Pep” and “Pro” denote peptide and protein, respectively. “Avg” means the average AUPR values among the ten test sets. “native” and “predicted” denote applying native and predicted structures as input, respectively.

      • Line 263: "ProtTrans consistently obtains competitive or superior performance compared to the MSA profiles, particularly for the target proteins with few homologous sequences (Neff < 2)."

      This seems a bit far-fetched. If we see clearly in the figure that the performances are far superior for Neff < 2. The performances seem rather similar for higher Neff. Could the author evaluate numerically the significance of the improvement? MSA profiles outperform GPSite on 4 intervals and I don't know the distribution of the data.

      RE: We thank the reviewer for the valuable suggestion. We have now revised this sentence to avoid any potential ambiguity:

      “As evidenced in Figure 4B and Appendix 2-table 8, ProtTrans consistently obtains competitive or superior performance compared to the MSA profile. Notably, for the target proteins with few homologous sequences (Neff < 2), ProtTrans surpasses MSA profile significantly with an improvement of 3.9% on AUC (P-value = 4.3×10-8).”

      The detailed significance tests and data distribution are now added in Appendix 2-table 8 and attached below as Author response-table 3 for convenience:

      Author response table 3.

      Performance comparison between GPSite and the baseline model using MSA profile for proteins with different Neff values in the combined test set of the ten ligands

      Note: Significance tests are performed following the procedure in 12,25. If P-value < 0.05, the difference between the performance is considered statistically significant.

      • Line 285: "We first visualized the distributions of residues in this dataset using t-SNE, where the residues are encoded by raw feature vectors encompassing ProtTrans embeddings and DSSP structural properties, or latent embedding vectors from the shared network of GPSite. "

      Wouldn't embedding from single-task be more relevant to show the interest of multi-task training here? Is the difference that big when comparing embeddings from single-task training to embeddings from multi-task training? Otherwise, I think the evidence from Figure 4e is sufficient, the interest of multitasking could be well-shown by single-task vs. multi-task AUPR and a few examples or predictions that are improved.

      RE: We thank the reviewer for the comment. In the second paragraph of the “The effects of protein features and model designs” section, we have compared the performance of multi-task and single-task learning. However, the visualization results in Figure 4D are related to the third paragraph, where we conducted a downstream exploration of the possibility to extend GPSite to other unseen ligands. This is based on the hypothesis that the shared network in GPSite may have captured certain common ligand-binding mechanisms during the preceding multi-task training process. We visualized the distributions of residues in an unseen carbohydrate-binding site dataset using t-SNE, where the residues are encoded by raw feature vectors (ProtTrans and DSSP), or latent embedding vectors from the shared network trained before. Although the shared network has not been specifically trained on the carbohydrate dataset, the latent representations from GPSite effectively improve the discriminability between the binding and non-binding residues as shown in Figure 4D. This finding indicates that the shared network trained on the initial set of ten molecule types has captured common binding mechanisms and may be applied to other unseen ligands.

      We have now added more descriptions in this paragraph to avoid potential ambiguity:

      “Residues that are conserved during evolution, exposed to solvent, or inside a pocket-shaped domain are inclined to participate in ligand binding. During the preceding multi-task training process, the shared network in GPSite should have learned to capture such common binding mechanisms. Here we show how GPSite can be easily extended to the binding site prediction for other unseen ligands by adopting the pre-trained shared network as a feature extractor. We considered a carbohydrate-binding site dataset from 54 which contains 100 proteins for training and 49 for testing. We first visualized the distributions of residues in this dataset using t-SNE 55, where the residues are encoded by raw feature vectors encompassing ProtTrans embeddings and DSSP structural properties, or latent embedding vectors from the shared network of GPSite trained on the ten molecule types previously.”

      • Line291: "Employing these informative hidden embeddings as input features to train a simple MLP exhibits remarkable performance with an AUC of 0.881 (Figure 4E), higher than that of training a single-task version of GPSite from scratch (AUC of 0.853) or other state-of-the-art methods such as MTDsite and SPRINT-CBH."

      Is it necessary to introduce other methods here? The single-task vs multi-task seems enough for what you want to show?

      RE: We thank the reviewer for the comment. As discussed above, here we aim to show the potential of GPSite for the binding site prediction of unseen ligand (i.e., carbohydrate) by adopting the pre-trained shared network as a feature extractor. Thus, we think it’s reasonable to also include the performance of other state-of-the-art methods in this carbohydrate benchmark dataset as baselines.

      • Line 321: "Specifically, a protein-level binding score can be generated for each ligand by averaging the top k predicted scores among all residues. Empirically, we set k to 5 for metal ions and 10 for other ligands, considering that the binding interfaces of metal ions are usually smaller."

      Since binding sites are usually not localized on one single amino-acid, we can expect that most of the top k residues are localized around the same area of the protein both spatially and along the sequence. Is it something you observe and could consider in your method?

      RE: We thank the reviewer for the comment. We employed a straightforward method (top-k average) to convert GPSite’s residue-level annotations into protein-level annotations, where k was set empirically based on the distributions of the numbers of binding residues per sequence observed in the training set. We have not put much effort in optimizing this strategy since it mainly serves as a proof-of-concept experiment (Figure 5 A-C) to show the potential of GPSite in discriminating ligand-binding proteins. We have now revised this sentence to better explain how we selected k:

      “Specifically, a protein-level binding score indicating the overall binding propensity to a specific ligand can be generated by averaging the top k predicted scores among all residues. Empirically, we set k to 5 for metal ions and 10 for other ligands, considering the distributions of the numbers of binding residues per sequence observed in the training set.”

      As for the question raised by the reviewer, we can indeed expect that most of the top k predicted binding residues tend to cluster into several but not necessarily one area. For instance, certain macromolecules like DNA may interact with several protein surface patches due to their elongated structures (e.g., Author esponse-figure 1A). Another case may be a protein binding to multiple molecules of the same ligand type (e.g., Author response-figure 1B).

      Author response image 1.

      The structures of 4XQK (A) and 4KYW (B) in PDB.

      • Line 327: The accuracy of the GPSite protein-level binding scores is further validated by the ROC curves in Figure 5B, where GPSite achieves satisfactory AUC values for all ligands except protein (AUC of 0.608).

      Here may be a good place to compare yourself with others, do other frameworks experience the same problem? If so, AUC and AUPR are not relevant here, can you expose some recall scores for example?

      RE: We thank the reviewer for the valuable recommendation. We have conducted comprehensive method comparisons in the preceding “GPSite outperforms state-of-the-art methods” section, where GPSite surpasses all existing frameworks across various ligands. Here, the genome-wide analyses of Swiss-Prot in Figure 5 serve as a downstream demonstration of GPSite’s capacity for large-scale annotations. We didn’t compare with other methods since most of them are time-consuming or memory-consuming, thus unavailable to process sequences of substantial quantity or length. For example, it takes about 8 min for the MSA-based method GraphBind to annotate a protein with 500 residues, while it just takes about 20 s for GPSite (see Appendix 3-figure 1 for detailed runtime comparison). It is also challenging for the atom-graph-based method PeSTo to process structures more than 100 kDa (~1000 residues) on a 32 GB GPU as the authors suggested, while GPSite can easily process structures containing up to 2500 residues on a 16 GB GPU.

      Regarding the recall score mentioned by the reviewer, GPSite achieves a recall of 0.95 (threshold = 0.5) for identifying protein-binding proteins. This indicates that GPSite can accurately identify positive samples, but it also tends to misclassify negative samples as positive. In our original manuscript, we claimed that “This may be ascribed to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete”. To better support this claim, we have now added two examples in Appendix 1-note 7, where GPSite confidently predicted the presences of the “protein binding” function (GO:0005515). Notably, this function was absent in these two proteins in the Swiss-Prot database at the time of manuscript preparation (release: 2023-05-03), but has been included in the latest release of Swiss-Prot (release: 2023-11-08). For convenience, we also attach the note here:

      “As depicted in Figure 5A, GPSite assigns relatively high prediction scores to the proteins without “protein binding” function in the Swiss-Prot annotations, leading to a modest AUC value of 0.608 (Figure 5B). This may be ascribed to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete. To support this hypothesis, we present two proteins as case studies, both sharing < 20% sequence identity with the protein-binding training set of GPSite. The first case is Aminodeoxychorismate synthase component 2 from Escherichia coli (UniProt ID: P00903). GPSite confidently predicted this protein as a protein-binding protein with a high prediction score of 0.936. Notably, this protein was not annotated with the “protein binding” function (GO:0005515) or any of its GO child terms in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P00903?format=txt&versions=171, release: 2023-05-03). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P00903?format=txt&versions=174, release: 2023-11-08) during manuscript revision, this protein is annotated with the “protein heterodimerization activity” function (GO:0046982), which is a child term of “protein binding”. In fact, the heterodimerization activity of this protein has been validated through experiments in the year of 1996 (PMID: 8679677), indicating the potential incompleteness of the Swiss-Prot annotations. The other case is Hydrogenase-2 operon protein HybE from Escherichia coli (UniProt ID: P0AAN1), which was also predicted as a protein-binding protein by GPSite (score = 0.909). Similarly, this protein was not annotated with the “protein binding” function in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=108). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=111), this protein is annotated with the “preprotein binding” function (GO:0070678), which is a child term of “protein binding”. In fact, the preprotein binding function of this protein has been validated through experiments in the year of 2003 (PMID: 12914940). These cases demonstrate the effectiveness of GPSite for completing the missing function annotations in Swiss-Prot.”

      • Line 381: 'Despite the noteworthy advancements achieved by GPSite, there remains scope for further improvements. Given that the ESM Metagenomic Atlas 34 provides 772 million predicted protein structures along with pre-computed language model embeddings, self-supervised learning can be employed to train a GPSite model for predicting masked sequence and structure attributes, or maximizing the similarity between the learned representations of substructures from identical proteins while minimizing the similarity between those from different proteins using a contrastive loss function training from scratch. Additional opportunities for upgrade exist within the network architecture. For example, a variational Expectation-Maximization (EM) framework 58 can be adopted to handle the hierarchical graph structure inherent in proteins, which contains the top view of the residue graph and the bottom view of the atom graph inside a residue. Such an EM procedure enables training two separate graph neural networks for the two views while simultaneously allowing interaction and mutual enhancement between the two modules. Meta-learning could also be explored in this multi-task scenario, which allows fast adaptation to unseen tasks with limited labels.'

      I think this does not belong here. It feels like half of your discussion is not talking about the achievements of this paper but future very specific directions. Focus on the take-home arguments (performances of the model, ability to predict a large range of tasks, interest in key components of your model, easy use) of the paper and possible future direction but without being so specific.

      RE: We thank the reviewer for the valuable suggestion. We have now simplified the discussions on the future directions notably:

      “Despite the noteworthy advancements achieved by GPSite, there remains scope for further improvements. GPSite may be improved by pre-training on the abundant predicted structures in ESM Metagenomic Atlas, and then fine-tuning on binding site datasets. Besides, the hidden embeddings from ESMFold may also serve as informative protein representations. Additional opportunities for upgrade exist within the network architecture. For example, a variational Expectation-Maximization framework can be adopted to handle the hierarchical atom-to-residue graph structure inherent in proteins. Meta-learning could also be explored in this multi-task scenario, which allows fast adaptation to unseen tasks with limited labels.”

      • Overall there is also a lack of displayed structure. You should try to select a few examples of binding sites that were identified correctly by your method and not by others, if possible get some insights on why. Also, some negative examples could be interesting so as to have a better idea of the interest.

      RE: We thank the reviewer for the valuable recommendation. We have performed a case study for the structure of the glucocorticoid receptor in Figure 3 D-H to illustrate a potential reason for the robustness of GPSite. Moreover, we have now added a case study in Appendix 1-note 3 and Appendix 3-figure 5 to explain why GPSite sometimes is not as accurate as the state-of-the-art structure-based method. For convenience, we also attach the note and figure here:

      “Here we present an example of an RNA-binding protein, i.e., the ribosome biogenesis protein ERB1 (PDB: 7R6Q, chain m), to illustrate the impact of predicted structure’s quality. As shown in Appendix 3-figure 5, ERB1 is an integral component of a large multimer structure comprising protein and RNA chains (i.e., the state E2 nucleolar 60S ribosome biogenesis intermediate). Likely due to the neglect of interactions from other protein chains, ESMFold fails to predict the correct conformation of the ERB1 chain (TM-score = 0.24). Using this incorrect predicted structure, GPSite achieves an AUPR of 0.580, lower than GraphBind input with the native structure (AUPR = 0.636). However, the performance of GraphBind substantially declines to an AUPR of 0.468 when employing the predicted structure as input. Moreover, if GPSite adopts the native structure for prediction, a notable performance boost can be obtained (AUPR = 0.681).”

      Author response image 2.

      The prediction results of GPSite and GraphBind for the ribosome biogenesis protein ERB1. (A) The state E2 nucleolar 60S ribosome biogenesis intermediate (PDB: 7R6Q). The ribosome biogenesis protein ERB1 (chain m) is highlighted in blue, while other protein chains are colored in gray. The RNA chains are shown in orange. (B) The RNA-binding sites on ERB1 (colored in red). (C) The ESMFold-predicted structure of ERB1 (TM-score = 0.24). The RNA-binding sites are also mapped onto this predicted structure (colored in red). (D-G) The prediction results of GPSite and GraphBind for the predicted and native ERB1 structures. The confidence of the predictions is represented with a gradient of color from blue for non-binding to red for binding.

      Minor comments:

      • Line 169: "Note that since our test sets may partly overlap with the training sets of these methods, the results reported here should be the upper limits for the existing methods."

      Yes, but they were potentially not trained on the most recent structures in that case. These methods could also see improved performance with an updated training set.

      RE: We thank the reviewer for the comment. We have now deleted this sentence.

      • Line176: "Since 358 of the 375 proteins in our protein-binding site test set share > 30% identity with the training sequences of PeSTo, we re-split our protein-binding dataset to generate a test set of 65 proteins sharing < 30% identity with the training set of PeSTo for a fair evaluation."

      Too specific to be here in my opinion.

      RE: We thank the reviewer for the comment. We have now moved these details to Appendix 1-note 2. The description in the main text here is now more concise:

      “Given the substantial overlap between our protein-binding site test set and the training set of PeSTo, we conducted separate training and comparison using the datasets of PeSTo, where GPSite still demonstrates a remarkable improvement over PeSTo (Appendix 1-note 2).”

      • Figure 2. The authors should try to either increase Fig A's size or increase the font size. This could probably be done by compressing the size of Figure C into a single figure.

      RE: We thank the reviewer for the suggestion. We have now increased the font size in Figure A. Besides, the figures in the final version of the manuscript should be clearer where we could upload SVG files.

      • Have you tried using embeddings from more structure-aware pLM such as ESM Fold embeddings (fine-tuned) or ProstTrans (that may be more recent than this study)?

      RE: We thank the reviewer for the insightful comment. We have not yet explored the embeddings from structure-aware pLM, but we acknowledge its potential as a promising avenue for future investigation. We have now added this point in our Discussion section:

      “Besides, the hidden embeddings from ESMFold may also serve as informative protein representations.”

      Reviewer #3 (Public Review):

      Summary

      The authors of this work aim to address the challenge of accurately and efficiently identifying protein binding sites from sequences. They recognize that the limitations of current methods, including reliance on multiple sequence alignments or experimental protein structure, and the under-explored geometry of the structure, which limit the performance and genome-scale applications. The authors have developed a multi-task network called GPSite that predicts binding residues for a range of biologically relevant molecules, including DNA, RNA, peptides, proteins, ATP, HEM, and metal ions, using a combination of sequence embeddings from protein language models and ESMFold-predicted structures. Their approach attempts to extract residual and relational geometric contexts in an end-to-end manner, surpassing current sequence-based and structure-based methods.

      Strengths

      • The GPSite model's ability to predict binding sites for a wide variety of molecules, including DNA, RNA, peptides, and various metal ions.

      • Based on the presented results, GPSite outperforms state-of-the-art methods in several benchmark datasets.

      • GPSite adopts predicted structures instead of native structures as input, enabling the model to be applied to a wider range of scenarios where native structures are rare.

      • The authors emphasize the low computational cost of GPSite, which enables rapid genome-scale binding residue annotations, indicating the model's potential for large-scale applications.

      RE: We thank the reviewer for recognizing the significance and value of our work!

      Weaknesses

      • One major advantage of GPSite, as claimed by the authors, is its efficiency. Although the manuscript mentioned that the inference takes about 5 hours for all datasets, it remains unclear how much improvement GPSite can offer compared with existing methods. A more detailed benchmark comparison of running time against other methods is recommended (including the running time of different components, since some methods like GPSite use predicted structures while some use native structures).

      RE: We thank the reviewer for the valuable suggestion. Empirically, it takes about 5-20 min for existing MSA-based methods to make predictions for a protein with 500 residues, while it only takes about 1 min for GPSite (including structure prediction). However, it is worth noting that some predictors in our benchmark study are solely available as webservers, and it is challenging to compare the runtime between a standalone program and a webserver due to the disparity in hardware configurations. Therefore, we have now included comprehensive runtime comparisons between the GPSite webserver and other top-performing servers in Appendix 3-figure 1 to illustrate the practicality and efficiency of our method. For convenience, we also attach the figure here as Author response-figure 3. The corresponding description is now added in the “GPSite outperforms state-of-the-art methods” section:

      “Moreover, GPSite is computationally efficient, achieving comparable or faster prediction speed compared to other top-performing methods (Appendix 3-figure 1).”

      Author response image 3.

      Runtime comparison of the GPSite webserver with other top-performing servers. Five protein chains (i.e., 8HN4_B, 8USJ_A, 8C1U_A, 8K3V_A and 8EXO_A) comprising 100, 300, 500, 700, and 900 residues, respectively, were selected for testing, and the average runtime is reported for each method. Note that a significant portion of GPSite’s runtime (75 s, indicated in orange) is allocated to structure prediction using ESMFold.

      • Since the model uses predicted protein structure, the authors have conducted some studies on the effect of the predicted structure's quality. However, only the 0.7 threshold was used. A more comprehensive analysis with several different thresholds is recommended.

      RE: We thank the reviewer for the comment. We assessed the effect of the predicted structure's quality by evaluating GPSite’s performance on high-quality (TM-score > 0.7) and low-quality (TM-score ≤ 0.7) predicted structures. We did not employ multiple thresholds (e.g., 0.3, 0.5, and 0.7), as the majority of proteins in the test sets were accurately predicted by ESMFold. Specifically, as shown in Figure 3B, Appendix 3-figure 3 and Appendix 2-table 5, the numbers of proteins with TM-score ≤ 0.7 are small in most datasets (e.g., 42 for DNA and 17 for ATP). Consequently, there is insufficient data available for analysis with lower thresholds, except for the RNA test set. Notably, Figure 3C presents a detailed inspection of the 104 proteins with TM-score < 0.5 in the RNA test set. Within this subset, GPSite consistently outperforms the state-of-the-art structure-based method GraphBind with predicted structures as input, regardless of the prediction quality of ESMFold. Only in cases where structures are predicted with extremely low quality (TM-score < 0.3) does GPSite fall behind GraphBind input with native structures. This result further demonstrates the robustness of GPSite. We have now added clearer explanations in the “GPSite is robust for low-quality predicted structures” section:

      “Figure 3B and Appendix 3-figure 3 show the distributions of TM-scores between native and predicted structures calculated by US-align in the ten benchmark datasets, where most proteins are accurately predicted with TM-score > 0.7 (see also Appendix 2-table 5)”; “Given the infrequency of low-quality predicted structures except for the RNA test set, we took a closer inspection of the 104 proteins with predicted structures of TM-score < 0.5 in the RNA test set.”

      • To demonstrate the robustness of GPSite, the authors performed a case study on human GR containing two zinc fingers, where the predicted structure is not perfect. The analysis could benefit from more a detailed explanation of why the model can still infer the binding site correctly even though the input structural information is slightly off.

      RE: We thank the reviewer for the comment. We have actually explained the potential reason for the robustness of GPSite in the second paragraph of the “GPSite is robust for low-quality predicted structures” section. In summary, although the whole structure of this protein is not perfectly predicted, the local structures of the binding domains of peptide, DNA and Zn2+ are actually predicted accurately as evidenced by the superpositions of the native and predicted structures in Figure 3D and 3E. Therefore, GPSite can still make reliable predictions. We have now revised this paragraph to explain these more clearly:

      “Figure 3D shows the structure of the human glucocorticoid receptor (GR), a transcription factor that binds DNA and assembles a coactivator peptide to regulate gene transcription (PDB: 7PRW, chain A). The DNA-binding domain of GR also consists of two C4-type zinc fingers to bind Zn2+ ions. Although the structure of this protein is not perfectly predicted (TM-score = 0.72), the local structures of the binding domains of peptide and DNA are actually predicted accurately as viewed by the superpositions of the native and predicted structures in Figure 3D and 3E. Therefore, GPSite can correctly predict all Zn2+ binding sites and precisely identify the binding sites of DNA and peptide with AUPR values of 0.949 and 0.924, respectively (Figure 3F, G and H).”

      • To analyze the relatively low AUC value for protein-protein interactions, the authors claimed that it is "due to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete", which is unjustified. It is highly recommended to support this claim by showing at least one example where GPSite's prediction is a valid binding site that is not present in the current Swiss-Prot database or via other approaches.

      RE: We thank the reviewer for the valuable recommendation. To support this claim, we have now added two examples in Appendix 1-note 7, where GPSite confidently predicted the presences of the “protein binding” function (GO:0005515). Notably, this function was absent in these two proteins in the Swiss-Prot database at the time of manuscript preparation (release: 2023-05-03), but has been included in the latest release of Swiss-Prot (release: 2023-11-08). For convenience, we also attach the note below:

      “As depicted in Figure 5A, GPSite assigns relatively high prediction scores to the proteins without “protein binding” function in the Swiss-Prot annotations, leading to a modest AUC value of 0.608 (Figure 5B). This may be ascribed to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete. To support this hypothesis, we present two proteins as case studies, both sharing < 20% sequence identity with the protein-binding training set of GPSite. The first case is Aminodeoxychorismate synthase component 2 from Escherichia coli (UniProt ID: P00903). GPSite confidently predicted this protein as a protein-binding protein with a high prediction score of 0.936. Notably, this protein was not annotated with the “protein binding” function (GO:0005515) or any of its GO child terms in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P00903?format=txt&versions=171, release: 2023-05-03). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P00903?format=txt&versions=174, release: 2023-11-08) during manuscript revision, this protein is annotated with the “protein heterodimerization activity” function (GO:0046982), which is a child term of “protein binding”. In fact, the heterodimerization activity of this protein has been validated through experiments in the year of 1996 (PMID: 8679677), indicating the potential incompleteness of the Swiss-Prot annotations. The other case is Hydrogenase-2 operon protein HybE from Escherichia coli (UniProt ID: P0AAN1), which was also predicted as a protein-binding protein by GPSite (score = 0.909). Similarly, this protein was not annotated with the “protein binding” function in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=108). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=111), this protein is annotated with the “preprotein binding” function (GO:0070678), which is a child term of “protein binding”. In fact, the preprotein binding function of this protein has been validated through experiments in the year of 2003 (PMID: 12914940). These cases demonstrate the effectiveness of GPSite for completing the missing function annotations in Swiss-Prot.”

      • The authors reported that many GPSite-predicted binding sites are associated with known biological functions. Notably, for RNA-binding sites, there is a significantly higher proportion of translation-related binding sites. The analysis could benefit from a further investigation into this observation, such as the analyzing the percentage of such interactions in the training site. In addition, if there is sufficient data, it would also be interesting to see the cross-interaction-type performance of the proposed model, e.g., train the model on a dataset excluding specific binding sites and test its performance on that class of interactions.

      RE: We thank the reviewer for the suggestion. We would like to clarify that the analysis in Figure 5C was conducted at “protein-level” instead of “residue-level”. As described in the second paragraph of the “Large-scale binding site annotation for Swiss-Prot” section, a protein-level ligand-binding score was assigned to a protein by averaging the top k residue-level predicted binding scores. This protein-level score indicates the overall binding propensity of the protein to a specific ligand. We gathered the top 20,000 proteins with the highest protein-level binding scores for each ligand and found that their biological process annotations from Swiss-Prot were consistent with existing knowledge. We have now revised the corresponding sentence to explain these more clearly:

      “Exploiting the residue-level binding site annotations, we could readily extend GPSite to discriminate between binding and non-binding proteins of various ligands. Specifically, a protein-level binding score indicating the overall binding propensity to a specific ligand can be generated by averaging the top k predicted scores among all residues.”

      As for the cross-interaction-type performance raised by the reviewer, we have now conducted cross-type evaluations to investigate the specificity of the ligand-specific MLPs and the inherent similarities among different ligands in Appendix 1-note 6 and Appendix 2-table 10. For convenience, we also attach the note and table here:

      “We conducted cross-type evaluations by applying different ligand-specific MLPs in GPSite for the test sets of different ligands. As shown in Appendix 2-table 10, for each ligand-binding site test set, the corresponding ligand-specific network consistently achieves the best performance. This indicates that the ligand-specific MLPs have specifically learned the binding patterns of particular molecules. We also noticed that the cross-type performance is reasonable for the ligands sharing similar properties. For instance, the DNA-specific MLP exhibits a reasonable AUPR when predicting RNA-binding sites, and vice versa. Similar trends are also observed between peptide and protein, as well as among metal ions as expected. Interestingly, the cross-type performance between ATP and HEM is also acceptable, potentially attributed to their comparable molecular weights (507.2 and 616.5, respectively).”

      Author response table 4.

      Cross-type performance by applying different ligand-specific MLPs in GPSite for the test sets of different ligands

      Note: “Pep” and “Pro” denote peptide and protein, respectively. The numbers in this table are AUPR values. The best/second-best result in each test set is indicated by bold/underlined font.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are pleased to send you a revised version of our manuscript entitled “voyAGEr: free web interface for the analysis of age-related gene expression alterations in human tissues” and the associated shiny web app, in which we incorporate the referees’ feedback. We would like to express our gratitude for their time and valuable insights, which have contributed to the improvement of our work. We appreciate the rigorous evaluation process that eLife maintains.

      In this letter, we address each of the reviewers' comments and concerns, point-by-point, offering detailed responses and clarifications. We have made several revisions to our manuscript following their recommendations.

      We must note that the revised version of the manuscript has two novel joint first authors, Rita Martins-Silva and Alexandre Kaizeler, who performed all the requested reanalyses, given that the initial first author, Arthur Schneider, already left our lab. We must also point to the following minor unsolicited improvements we took the opportunity to make:

      • Added a comprehensive tutorial to the GitHub repository on how to navigate through voyAGEr’s features.

      • Implemented sample randomisation in the scatter plots depicting gene expression across the age axis to ensure data privacy.

      • Implemented minor adjustments within the web app to enhance user comprehension and clarity when visualizing the data.

      • Improved clarity of the methodological sections.

      Reviewer 1

      (1.1) While this may be obvious to others for some reason that escaped me, I was unsure what was the basis for the authors' choice of 16 years as the very specific sliding window size. If I'm not alone in this, it might add clarity for other readers and users if this parameter choice were explained and justified more explicitly.

      We apologise for our omission in providing the rationale behind our choice in the previous version. We chose 16 years as our sliding window size because this was the minimum needed to guarantee the presence of more than one sample per window, across all the tissues considered in the study (Figure R1 below).

      We added the following sentence to the manuscript (v. Methods, ShARP-LM):

      “This was the minimum age span needed to guarantee the presence of more than one sample per window, across all considered tissues.”

      (1.2) "In particular, tissue-specific periods of major transcriptional changes in the fifth and eighth decades of human lifespan have been revealed, reflecting the so-called digital aging and consistently with what is observed in mice" here I think that "consistently" should be "consistent".

      We thank the reviewer for the comment and following the suggestion, we have revised 'Consistently' to 'consistent' as it is the correct usage in our sentence.

      (1.3) "On a different note, sex biases have been reported in for the expression of SALL1 and KAL1 in adipose tissue and lung, respectively." Here I think that "in for" should be "in".

      As recommended by the reviewer, we have replaced ‘in for’ for ‘in’. As we substituted KAL1, the current sentence now stands as “On a different note, sex biases have been reported in the expression of SALL1 and DDX43 in adipose tissue and lung, respectively”.

      (1.4) "We downloaded the matrix with the RNA-seq read counts for each gene in each GTEx v7 sample from the project's data portal (https://www.gtexportal.org/)." In my pdf manuscript this hyperlink appears to be broken.

      We appreciate the reviewer's attention to the broken link, and we have rectified the issue. The link should now be fully operational, effectively directing users to the GTEx Portal.

      (1.5) Under methods, I might suggest "Development platform" or "Development platforms" over "Development's platform" as a heading.

      We have modified the heading of this section in the methods to 'Development Platforms', as we believe it better reflects the information conveyed.

      Reviewer 2

      (2.1) In this tool/resource paper, it is crucial that the data used is up-to-date to provide the most comprehensive and relevant information to users. However, the authors utilized GTEx v7, which is an outdated (2016) version of the dataset. It is worth noting that GTEx v8 includes over 940 individuals, representing a 35% increase in individuals, and a 50% increase in the total number of samples. The authors should check the newer versions of GTEx and update the data.

      When the development of the voyAGEr web application began, GTEx version 7 was the most up to date. Nevertheless, we agree that the version 8 offers a notably more extensive dataset, encompassing a larger number of individuals, samples, and introducing new tissues. Consequently, we have updated our application to incorporate the data from GTEx version 8.

      (2.2) The authors did not address any correction for batch effects or RNA integrity numbers, which are known to affect transcriptome profiles. For instance, our analysis of GTEx v8 Cortex tissue revealed that after filtering out lowly expressed genes, in the same way authors did, PC1 (which accounts for 24% of the variation) had a Spearman's correlation value of 0.48 (p<6.1e-16) with RNA integrity number.

      We acknowledge the validity of the reviewer’s comment and appreciate the importance of such corrections to enhancing data interpretation. In response, we conducted a thorough unbiased investigation into potential batch effects, with the COHORT variable emerging as the primary driver of those observed across most tissues. Furthermore, SMRIN (as the reviewer pointed), DTHHRDY, MHSMKYRS and the number of detected genes in each sample were consistently associated with the primary sources of variation. As a result, we implemented batch effect correction for those five conditions, in a tissue-specific manner.

      We provide a detailed explanation of the batch effect correction methodology and its importance in the biological interpretation of results in the Methods section, specifically under "Read count data pre-processing". Additionally, we have included two new supplementary figures, Sup. Figures 7 and 8, to illustrate a batch effect example in lung tissue and emphasise the critical role of this correction in data interpretation.

      (2.3) The data analyzed in the GTEx dataset is not filtered or corrected for the cause of death, which can range from violent and sudden deaths to slow deaths or cases requiring a ventilator. As a result, the data may not accurately represent healthy aging profiles but rather reflect changes in the transcriptome specific to certain diseases due to the age-related increase in disease risk. While the authors do acknowledge this limitation in the discussion, stating that it is not a healthy cohort and disease-specific analysis is not feasible due to the limited number of samples, it would be useful for users to have the option to analyze only cases of fast death, excluding ventilator cases and deaths due to disease. This is typically how GTEx data is utilized in aging studies. Alternatively, the authors should consider including the "cause of death" variable in the model.

      This comment is closely related to the prior discussion (point 2.2). Notably, two of the covariates selected for batch effect correction, namely, DTHHRDY (Death classification based on the 4-point Hardy Scale1) and COHORT (indicating whether the participant was a postmortem, organ, or surgical donor1), have a direct relevance to this issue, i.e., both relate to the cause of death of the individual.

      1 According to the nomenclature of variables described in https://www.ncbi.nlm.nih.gov/projects/gap/cgibin/ GetListOfAllObjects.cgi?study_id=phs000424.v9.p2&object_type=variable

      We therefore effectively account for their influence on gene expression, mitigating these factors' impact.

      This approach represents a compromise, as it is practically infeasible to ascertain the absence of underlying health conditions in the remaining samples, even if only considering cases of “fast death”. Hence, we opted to keep all samples, independently of the cause of death of its donor, to dilute potential effects associated with individual causes of death.

      (2.4) The age distribution varies across tissues which may impact the results of the study. The authors' claim that age distribution does not affect the outcomes is inconclusive. Since the study aims to provide cross-tissue analysis, it is important to note that differing age distributions across tissues can influence the overall results. To address this, the authors should conduct downsampling to different age distributions across tissues and evaluate the level of tissue-specific or common changes that remain after the distributions are made similar.

      We acknowledge that variations in age distributions are evident across different tissues, with brain tissues displaying a notably pronounced disparity (green density lines in Figure R2 below).

      To address this issue comprehensively, we conducted tissue-specific downsampling, by reducing the number of samples in a given age window to the minimum available sample size within all age windows for a given tissue. The histograms (density plots) of the number of samples per age window of 16 years considered in the ShARP-LM model, as well as the minimum number of samples in each age window, per tissue are illustrated in Figure R1. After performing downsampling, we computed the logFC and p-value of differential expression for each gene, per age window, and compared them (for all genes in a given age window) with those involving all samples.

      Despite changes in logFC with downsampling, a considerable positive correlation is maintained (Figure R3, top panel). This suggests that the overall trends in gene expression changes persist. However, the downsampling process expectedly results in a decrease of statistical power within each age window concomitant with the decreased sample size, evident from the shift of genes from the third to the first quadrant in Figure R3, bottom panel. Consequently, we have opted for maintaining results encompassing all samples and removing the paragraph in the Discussion that asserted the absence of age distribution impact on the overall outcomes (“Indeed, we found no confounding between the distribution of samples’ ages and the trend of gene expression progression over age in any tissue.”), as we deem it inaccurate, potentially leading to misinterpretation. We have added a supplementary figure (Supplementary Figure 8, identical to Figure R3) illustrating the effect of downsampling, and the following paragraph to the manuscript’s Discussion section:

      “When downsampling to ensure a balanced age distribution, a loss of statistical power is apparent but a considerable positive correlation with the original results is maintained and a substantial number of significant alterations remain so (Supplementary Figure 8).”

      We acknowledge that this limitation can be addressed with the growing accumulation of human tissue transcriptomes in publicly available databases, a trend we anticipate in the near future. We are committed to promptly updating voyAGEr with any new data releases that may offer a solution to this concern.

      Nonetheless, we want to underscore, as the reviewer has astutely pointed out, that while voyAGEr can facilitate cross-tissue comparisons, it must be done with caution. In this regard, we inserted the following paragraph into the Discussion:

      “Due to the tissue-specific nature of the pre-processing steps (v. Read count data preprocessing in the Methods section), and given that most of the plotted gene expression distributions are centred and scaled by tissue, it is important to note that voyAGEr may not be always suited for direct comparisons between different tissues. For instance, it does not allow to directly ascertain if a gene exhibits different expression levels in different tissues or if the expression of a particular gene in one tissue changes more drastically with age than in another tissue.”

      (2.5) The GTEx resource is extremely valuable, however, it comes with challenges. GTEx contains tissue samples from the same individuals across different tissues, resulting in varying degrees of overlap in sample origin across tissues as not all tissues are collected for all individuals. This could affect the similar/different patterns observed across tissues. As this tool is meant for broader use by the community, it is crucial for the authors to either rule out this possibility by conducting a cross-tissue comparison using a non-parametric model that accounts for the dependency between samples from the same individual, or to provide information on the degree of similarity between samples so that the users can keep this possibility in mind when using the tool for hypothesis generation.

      We agree that the variable degrees of overlap between tissues (Figure R4) could lead to a confounding between trends in a population of common individuals and those associated with age. We therefore examined the contributions of variables 'donor,' 'tissue,' and 'age' to the overall variance in the data (Figure R5, panel A), having normalised the data collectively across all tissues. Tissue and donor contribute approximately 90% and 10% of the variance, respectively. Age exhibits minimal impact (around 1%), which may be attributed to the relative subtlety of its effects on gene expression and to the tissue specificity of ageing-associated changes. Notably, removing the 'donor' variable does not transfer this variance to 'age', suggesting a limited confounding between these variables (see Figure R5, panel B).

      We also specifically examined the pairs of tissues exhibiting the lowest (Brain Amygdala / Small Intestine), median (Pancreas / Heart Left Ventricle), and highest (Kidney Cortex / Muscle Skeletal) percentages of shared donors. We identified and selectively removed samples from shared donors while maintaining the original sample size imbalance between tissues. Subsequently, we calculated each gene’s mean expression within each age window from the ShARP-LM pipeline, followed by each gene’s Pearson’s correlation of expression between tissue pairs. The resulting coefficients, both with and without the removal of common donors, were compared in scatter plots (Figure R6, left plots). As this process inherently involves downsampling, which may impact results (v. comment 2.4), we performed additional downsampling by randomly removing samples from both tissues according to the proportions defined for the removal of common donors (Figure R6, right plots).

      In the chosen scenarios, we note a similar impact between the targeted removal of common donors and random downsampling. Nevertheless, the effects of removing samples may vary according to the absolute number of remaining samples. Consequently, singling out individual cases may not provide conclusive insights. To systematically address this, we represented all tissue pairs in a heatmap, colour-coded based on whether the removal of common donors is more impactful (red) or less impactful (blue) than random downsampling (Figure R7). The values depicted in the heatmap, denoted as the Impact of Common Donors (ICD), are computed for each tissue pair. This calculation involves several steps: first, we determined the absolute difference in Pearson’s correlation for each gene’s mean expression within each age window from the ShARP-LM pipeline, between the original data and the subset of data without common donors (DiffWoCD) or with random downsampling (DiffRD). Subsequently, the medians of DiffWoCD and DiffRD are computed, and the difference between these median values provides the ICD for each tissue pair. Due to the unidirectional nature of correlation (i.e., the results for tissue 1 vs tissue 2 mirror those for tissue 2 vs tissue 1), the resulting matrix is triangular in form.

      We have added a supplementary figure (Supplementary Figure 4, a composition of Figures R4-R7, together with a scatterplot relating the values of heatmaps R4 and R7) that aims to provide guidance to users when interpreting specific tissue pairs, acknowledging inherent limitations (refer to comment 2.4). We have also inserted the following paragraph into the manuscript’s Discussion section:

      “Furthermore, we must emphasise that the majority of GTEx donors contributed samples to multiple tissues (Supplementary Figure 4A), potentially introducing biases and confounders when comparing gene expression patterns between tissues. Our analyses of variance (Supplementary Figure 4B) and downsampling to control for common donors (Supplementary Figures 4C-E) suggest very limited global confounding between the impacts of donor and age on gene expression and that any potential cross-tissue bias not to depend much on the proportion of common donors (Supplementary Figure 4E). However, this effect must be taken into account when comparing specific pairs of tissues (e.g., Colon – Transverse and Whole Blood, Supplementary Figure 4D).”

      (2.6) The authors aimed to create an open-source and ever-evolving resource that could be adapted and improved with new functionality. However, this goal was only partially achieved. Although the code for the web app is open source, crucial components such as the statistical tests or the linear model are not included in the repository, limiting the tool's customizability and adaptability.

      We greatly appreciate the reviewer’s concern and share their commitment to maintaining the principles of openness, reproducibility, and adaptability for voyAGEr. voyAGEr was primarily designed as a visualisation tool, displaying pre-processed results, and indeed only the code for the Shiny app itself was accessible through the project's GitHub repository.

      To address this shortcoming, we have made the entire data preprocessing script publicly available in the GitHub repository of voyAGEr. This script encompasses, among others, filtration, normalisation, batch effect correction, the ShARP-LM pipeline and statistical tests employed, and module definition. Moreover, the web app itself offers functionality to export relevant plots and tables.

      (2.7) Furthermore, the authors' choice of visualization platform (R shiny) may not be the best fit for extensibility and open-source collaboration, as it lacks modularity. A more suitable alternative could be production-oriented platforms such as Flask or FastAPI.

      We appreciate this thoughtful concern. The decision to use Shiny was primarily driven by our data having already been prepared in the R environment during pre-processing steps. Consequently, and as the web app serves the purpose of visualisation only (and not data processing), Shiny is as a natural and convenient extension of our scripts, enabling data visualisation seamlessly.

      We acknowledge that Shiny may lack the modularity required for optimal open-source collaboration. While we recognise the merits of alternative platforms like Flask or FastAPI, we decided to keep Shiny because the current iteration of voyAGEr offers significant value to the community. Transitioning to a different platform would be a time-consuming endeavour, that would postpone the release of such resource.

      However, the reviewer’s feedback regarding modularity and open-source collaboration is duly noted and highly valuable. We will certainly take it into account when developing new web applications within our laboratory.

      (2.8) To facilitate collaboration and improve the tool's adaptability, data resulting from the preprocessing pipeline should be made publicly available. This would make it easier for others to contribute and extend the tool's functionality, ultimately enhancing its value for the scientific community.

      As outlined in point 2.6 of this rebuttal letter, certain metadata used in our analysis are subject to restricted access. To address this, we have taken several measures to foster transparency and reproducibility of our analyses. First, we have made the scripts for data pre-processing publicly available, along with a comprehensive explanation of our methodology within the main manuscript. This empowers users to replicate our analyses and provides a foundation for those interested in contributing to the tool's development. Furthermore, we have created new issues on voyAGEr’s GitHub repository, outlining novel features and improvements we envision for the application in the future. We actively encourage users to engage with this section.

      (2.9) It is unfortunate that the manuscript has no line numbers, which makes pointing out language issues or typos cumbersome. Below are some minor typos present in the current version mostly due to inconsistent usage of British vs US English, and the authors would be advised to do a thorough proofreading for the final submission.

      • Page 12: Inconsistent spelling of "analyzed" and "analysed". Should be "analyzed", since US English is used throughout the rest of the paper.

      • Page 14: "randomised"

      • Page 15: "emphasise"

      We apologise for it and include line numbers in the revised version. We have opted for British English and corrected the manuscript accordingly.

      (2.10) Some figures in the supplemental material have a low resolution (e.g. S. Fig 5). Especially figures that are not based on screenshots would ideally be of a higher resolution.

      As voyAGEr is designed as a web application for visualisation, it is inherent that some screenshots of the final resource may have lower resolutions. In response to this concern, we re-generated the figures in this manuscript with a resolution that maintains clarity and readability. We also recreated figures not derived from screenshots, further improving their resolution.

      We saved all figures in PDF format and are sending them together with this letter and the revised manuscript, to address any potential issues related to low-resolution figures that may occur during the export of the Word document.

      <(2.11) In Fig. 1 in the bottom row the sex labels are hard to see.

      We have adapted the figure to address this concern.

      (2.12) Math symbols and equations are not well formatted. For example, the GE equation on p. 13, or Oiij equation should be properly typeset. Also, the Oiij notation might be confusing, I believe the authors meant to use a capital "I", i.e. OI_ij.

      We have incorporated these recommendations into the revised manuscript.

      (2.13) The Readme file in the git repo is very short. It would be helpful to have build and run instructions.

      We have updated the README file in the GitHub repository, which now contains, among other features, instructions for launching the Shiny app and building the associated Docker image. Additionally, a simple tutorial has also been included to assist users in navigating through voyAGEr's functionalities.

      (2.14> "Module" tab's UI inconsistent to other tabs (i.e. "Gene" and "Tissue"), since it contains an "About" page. Adding the "About" page in the actual "Module" page might make the UI clearer.

      We believed that the Modules section, due to its distinct methodology, would benefit from an additional tab explaining its underlying rationale. We relate to the reviewer’s concern regarding the use of tabs throughout the application and made changes to the app in order to ensure consistency.

      (2.15) I would suggest changing the type of the article to "Tools and Resources".

      We agree and followed the reviewer’s suggestion.

      Reviewer 3

      (3.1) In the gene-centric analyses section of the result, to improve this manuscript and database, linear regression tests accounting for the entire range of age should be added. The authors' algorithm, ShARP-LM, tests locally within a 16-year window which makes it has lower power than the linear regression test with the whole ages. I suspect that the power reduction is strongly affected in the younger age range since a larger number of GTEx donors are enriched in old age. By adding the results from the lm tests, readers would gain more insight and evidence into how significantly their interest genes change with age.

      We are grateful for the reviewer's thoughtful and pertinent recommendation and have thus conducted linear regression tests covering the entire age range. The outcomes of these tests have been integrated into the web application, denoted by a dotted orange line on the 'Gene Expression Alterations Over Age' plots. Additionally, a summary of statistics of overall changes, encompassing pvalues, t-statistics, and logFC per year, has been included below the plot title. We have also updated the manuscript to include such changes (v. Methods, Gene-centric visualisation of tissue-specific expression changes across age):

      “We also applied a linear model across the entire age range, thereby providing users with more insight and supporting evidence into how a specific gene changes with age. For visualisation purposes, we incorporated a dashed orange line, with the logFC per year for the Age effect as slope, in the respective scatter plots (Figure 3B c). We depict the Sex effect therein by prominent dots on the average samples, with pink and blue denoting females and males, respectively.”

      Concerning the observation about the potential reduction in statistical power due to the limited number of samples in younger ages, we acknowledge its validity. Indeed, we have addressed this issue in the manuscript's Discussion (v. Supplementary Figure 6).

      (3.1) In line with the ShARP-LM test results, it is not clear which criterion was used to define the significant genes and the following enrichment analyses. I assume that the criterion is P < 0.05, but it should be clearly noted. Additionally, the authors should apply adjusted p-values for multiple-test correction. The ideal criterion is an adjusted P < 0.05. However, if none or only a handful of genes were found to be significant, the authors could relax the criteria, such as using a regular P < 0.01 or 0.05.

      We apologise for any confusion regarding the terminology "significant genes." Our choice to use nonadjusted p-values for determining the significance of gene expression changes with Age, Sex, and their interaction was deliberate, and we would like to clarify our reasoning:

      (1) In the "Gene" tab of the application, individual genes are examined. When users inquire about a specific gene, multiple-testing correction of the p-value does not apply.

      (2) In the "Tissue" tab, using adjusted p-values and a threshold of 0.05 yielded very few differentially expressed genes, limiting the utility of Peaks. Our objective therein is not to assess the significance of alterations in individual genes but to provide a metric for global alterations within a tissue. We then determine significance based on the False Discovery Rate (FDR), using the p-values as a nominal metric of gene expression alterations.

      To avoid using the concept of “differential expression”, commonly linked to significance, we now refer to 'altered genes' in both the manuscript and the app. For clarity and to align with voyAGEr's role as a hypothesis-generation tool, we define 'altered genes' as those with non-adjusted p-values < 0.01 or < 0.05, as discriminated in the Methods section.

      (3.3) In the gene-centric analyses section, authors should provide a full list of donor conditions and a summary table of conditions as supplementary.

      We appreciate the suggestion and we have now included a reference that directs readers to those data, alternatively to including this information as an additional supplementary table. We would like to emphasise that the web app includes information on donor conditions we hypothesise to affect gene expression.

      3.4) The tissue-specific assessment section has poor sub-titles. Every title has to contain information.

      We agree and revised the sub-titles to more accurately reflect the information conveyed in each corresponding section.

      (3.5) I have an issue understanding the meaning of NES from GSEA in the tissue-specific assessment section. The authors performed GSEA for the DEGs against the background genes ordered by tstatistics (from positive to negative) calculated from the linear model. I understand the p-value was two-tailed, which means that both positive and negative NES are meaningful as they represent up-regulated expression direction (positive coefficient) and down-regulated expression direction (negative coefficient) with age, respectively, within a window. However, in the GSEA section of Methods, authors were not fully elaborate on this directionality but stated, "The NES for each pathway was used in subsequent analyses as a metric of its over- or downrepresentation in the Peak". The authors should clearly elaborate on how to interpret the NES from their results.

      We added the following paragraph to the manuscript’s Methods section, in order to clarify the NES’ directionality:

      “We extracted the GSEA normalised enrichment score (NES), which represents the degree to which a certain gene set is overrepresented at the extreme ends of the ranked list of genes. A positive NES corresponds to the gene set’s overrepresentation amongst up-regulated genes within the age window, whereas a negative NES signifies its overrepresentation amongst down-regulated genes. The NES for each pathway was used in subsequent analyses as a metric of its up- or down-regulation in the Peak.”

      (3.6) In the Modules of co-expressed genes section, the authors did not explain how or why they selected the four tissues: brain, skeletal muscle, heart (left ventricle), and whole blood. This should be elaborated on.

      We apologise for not providing a detailed explanation for this selection. As the ‘Modules of coexpressed genes’ section was primarily intended as a proof of concept, we opted to include tissues for which we had a substantial number of samples available and availability of comprehensive cell type signatures, those being the tissues that met such criteria. Nonetheless, as the diversity of cell type signatures increases (e.g., through the increasing availability of scRNA-seq datasets), we plan to encompass a wider range of tissues in the near future. However, as this task is time-demanding and in order to avoid a substantial delay in the release of voyAGEr, we opted to approach this issue in the next version of the App and included a dedicated issue in the projects’ GitHub repository so that users can share their preferences of the next tissues to include.

      We also added a brief sentence in this regard to the Methods section of the manuscript:

      “The four tissues (Brain - Cortex, Muscle - Skeletal, Heart - Left Ventricle, and Whole Blood) covered by the Module section of voyAGEr were selected due to their relatively high sample sizes and availability of comprehensive cell type signatures. The increasing availability of human tissue scRNA-seq datasets (e.g., through the Human Cell Atlas) will allow future updates of voyAGEr to encompass a wider range of tissues.”

      (3.7) In the modules of the co-expressed genes section, the authors did not provide an explanation of the "diseases-manual" sub-tab of the "Pathway" tab of the voyAGEr tool. It would be helpful for readers to understand how the candidate disease list was prepared and what the results represent.

      We greatly appreciate the reviewer's feedback, and in response, we have restructured the 'Modules of co-expressed genes' method section to provide a more comprehensive explanation of the 'diseases' sub-section. To clarify, we obtained a curated set of diseases and their associated genes from DisGeNET v.7.0. We assessed the enrichment of modules in relation to these diseases through two methods: a manual approach utilising Fisher’s tests (i.e. comparing the genes of a given module with the genes associated with a given disease) and another through use of the disgenet2r package, employing the function disease_enrichment. Significance of these enrichments were determined by adjusting p-values using the Benjamini-Hochberg correction.

      (3.8) Most figures have low resolutions, and their fonts are too small to read.

      As already mentioned in issue 2.10, we have recreated all of the images with better resolution to enhance legibility. We also exported such figures in PDF, which we attach to this revision.

      (3.9) Authors used GTEx V7, which is not latest version. Although researchers have developed a huge amount of pipelines and tools for their research, most of them were neglected without a single update. I am sure many users, including myself, would appreciate it if the authors kept updating the database with GTEx V8 for the future version of the database.

      We express our gratitude to the reviewer for their valuable suggestion, and, as already explained in issue 2.1, we have incorporated GTEx V8 into voyAGEr.

      (3.10) I would like to have an option for downloading the results as a whole for gene, tissue, and coexpressed genes. This would be a great option for secondary analysis by users.

      The implementation of such feature would be a time-demanding endeavour that would delay the release of voyAGEr, and we therefore chose not to perform it for this version. However, we agree that it would be a good resource for secondary analyses and acknowledge the possibility of adding this feature in the future. For now, voyAGEr allows the user to download all plots and corresponding data.

      (3.11) How the orders of tissues in the heatmaps (both gene and tissue section) were determined? Did the authors apply hierarchical clustering? If not, I would recommend the authors perform the hierarchical clustering and add it to display the heatmap display.

      We apologise for the oversight in explaining the process behind determining the order of tissues. To clarify, we employed hierarchical clustering to establish the tissue order for visualisation within the app. Although the reviewer suggested adding a dendrogram to illustrate this clustering, we decided against it. The reason for such is that including a dendrogram, while informative, is not essential for the app's primary purpose.

      (3.12) I understand that this is a vast amount of work, but I hope that the authors can expand the coexpressed module analysis to include other tissues in the future version of the database.

      Knowing what co-expressed genes in line with aging are and their pathway and disease enrichments across tissues would be highly informative, and I'm sure many users, including myself, would greatly appreciate it. <br /> We express our gratitude to the reviewer for the valuable suggestion and for acknowledging the extensive effort required to incorporate new tissues into the module section. We completely agree that understanding co-expressed genes across the aging process is of significant value, and we are committed to the ongoing inclusion of additional tissues. As already stated in issue 3.6, comprehensive list of tissues slated for integration in future voyAGEr versions is readily available on voyAGEr’s GitHub repository.

      Author response image 1.

      Density plots (“smoothed” histograms) of the distribution of numbers of samples per moving age window for the ShARP-LM pipeline, categorised by tissue. The numerical value within each rectangle represents the minimum number of samples observed across all age windows for that particular tissue.

      Author response image 2.

      Density lines (“smoothed” histograms) of the distribution of the age of donors per tissue. As depicted in the chart, there are more samples for older ages, particularly of brain tissues.

      Author response image 3.

      Effect of downsampling in ShARP-LM results. A – Per tissue violin plots of gene-wide distributions of Pearson’s correlation coefficients between original and downsampled logFC values for the Age variable across age windows, with tissues coloured by and ordered by increasing percentage of downsampling-associated reduction in the number of samples. B – Density scatter plots of comparison of associated original and downsampled p-values for each tissue, coloured by the downsampling percentage in each age window, highlighting the low range of p-values (from 0 to 0.1). Despite changes in logFC with downsampling, a considerable correlation in significance is maintained, although downsampling naturally results in a loss of statistical power, evident by the shift of points towards the first quadrant (dashed lines: p-value = 0.05).

      Author response image 4.

      Heatmap depicting the percentage of common donors between pairs of tissues. A given square illustrates the percentage of all samples of tissue in the x axis (Tissue 1) that is in common with the tissue in the y axis (Tissue 2)

      Author response image 5.

      Assessment of the relative contributions of different sources to the dataset’s variance. A - tissue accounts for approximately 90% of the total variance, while donor contributes around 10%; age has a minimal impact (1%), likely due to the relative subtlety of its effects on gene expression and to the tissue specificity of ageing dynamics. B - Removal of the donor variable does not transfer variance to age, suggesting limited confounding between the two variables.

      Author response image 6.

      Impact of the relative proportion of common donors on gene expression correlation between tissue pairs. Panels A, B, and C showcase the tissue pairs with the highest (Muscle Skeletal / Kidney Cortex), median (Pancreas / Heart Left Ventricle), and lowest (Small Intestine / Brain Amygdala) percentages of common donors, respectively. The left panels illustrate gene-bygene Pearson’s correlations of gene expression between the two tissues, comparing the scenarios with (x-axis) and without (yaxis) the removal of common donors. The ri ght panels depict the same comparisons, but with random downsampling (y-axis) in both tissues based on the proportions defined for common donor removal. The depicted examples show that the outcomes are comparable when removing common donors or employing random downsampling.

      Author response image 7.

      Comparison of the impacts of removing common donor samples and random downsampling across tissue pairs. The heatmap is coloured based on whether the removal of common donors has a greater (red) or lesser impact (blue) than random downsampling. The values depicted in the heatmap, denoted as the Impact of Common Donors (ICD), are computed for each tissue pair. This calculation involves several steps: first, by determining the absolute difference in Pearson’s correlation for each gene’s mean expression within each age window from the ShARP-LM pipeline, between the original data and the subset of data without common donors (DiffWoCD) or with random downsampling (DiffRD). Subsequently, the medians of DiffWoCD and DiffRD are computed, and the difference between these median values provides the ICD for each tissue pair. Due to the unidirectional nature of correlation (i.e., the results for tissue 1 vs tissue 2 mirror those for tissue 2 vs tissue 1), the resulting matrix is triangular in form. Grey tiles denote NA values, i.e., where the tissue-tissue comparison does not have a meaning, namely self-self and between sex-specific tissues. Top right insert: density line (“smoothed” histogram) of all ICD values.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Fernandez et al. investigate the influence of maternal behavior on bat pup vocal development in Saccopteryx bilineata, a species known to exhibit vocal production learning. The authors performed detailed longitudinal observations of wild mother-pup interactions to ask whether non-vocal maternal displays during juvenile vocal practice or 'babbling', affect vocal production. Specifically, the study examines the durations of pup babbling events and the developmental babbling phase, in relation to the amount of female display behavior, as well as pup age and the number of nearby singing adult males. Furthermore, the authors examine pup vocal repertoire size and maturation in relation to the number of maternal displays encountered during babbling. Statistical models identify female display behavior as a predictor of i) babbling bout duration, ii) the length of the babbling phase, iii) song composition, and iv) syllable maturation. Notably, these outcomes were not influenced by the number of nearby adult males (the pups' source of song models) and were largely independent of general maturation (pup age). These findings highlight the impact of non-vocal aspects of social interactions in guiding mammalian vocal development.

      We thank Reviewer 1 for the time and effort dedicated to the revision of our study. The suggestions for the revision of our manuscript were very helpful and have improved our manuscript considerably. 

      Strengths:

      Historically, work on developmental vocal learning has focused on how juvenile vocalizations are influenced by the sounds produced by nearby adults (often males). In contrast, this study takes the novel approach of examining juvenile vocal ontogeny in relation to non-vocal maternal behavior, in one of the few mammals known to exhibit vocal production learning. The authors collected an impressive dataset from multiple wild bat colonies in two Central American countries. This includes longitudinal acoustic recordings and behavioral monitoring of individual mother-pup pairs, across development.

      The identified relationships between maternal behavior and bat pup vocalizations have intriguing implications for understanding the mechanisms that enable vocal production learning in mammals, including human speech acquisition. As such, these findings are likely to be relevant to a broad audience interested in the evolution and development of social behavior as well as sensory-motor learning.

      We thank reviewer 1 for this assessment. 

      Weaknesses:

      The authors qualitatively describe specific patterns of female displays during pup babbling, however, subsequent quantitative analyses are based on two aggregate measures of female behavior that pool across display types. Consequently, it remains unclear how certain maternal behaviors might differentially influence pup vocalizations (e.g. through specific feedback contingencies or more general modulation of pup behavioral states).

      In analyzing the effects of maternal behavior on song maturation, the authors focus on the most common syllable type produced across pups. This approach is justified based on the syllable variability within and across individuals, however, additional quantification and visual presentation of categorized syllable data would improve clarity and potentially strengthen resulting claims.

      We agree that our analysis of maternal behaviour does not investigate potential contingencies between particular maternal behavioural displays and pup vocalizations (e.g. particular syllable types). Our data collected for this study on maternal behaviour includes direct observations, field notes and/or video recordings. In the future, it will be necessary to work with high-speed cameras for the analysis of potential contingencies between particular maternal behavioural displays and specific pup vocalizations, which allow this kind of fine-detailed analysis. We have planned future studies investigating whether pup vocalizations elicit contingent maternal responses or vice versa. In the revision of our manuscript, we have included a comment pointing out that this special behaviour will be investigated in greater detail in the future. 

      As suggested by reviewer 1, in our revised manuscript we have included more information on methods to improve understandability. In particular, we have:

      -presented more information on different steps of our acoustic analyses

      -provided additional and clearer spectrogram figures representing the different syllable types and categorizations 

      -changed the figures accompanying our GLMM analyses following the suggestion of Reviewer 1

      Reviewer #2 (Public review):

      Summary:

      This study explores how maternal behaviors influence vocal learning in the greater sac-winged bat (Saccopteryx bilineata). Over two field seasons, researchers tracked 19 bat pups from six wild colonies, examining vocal development aspects such as vocal practice duration, syllable repertoire size, and song syllable acquisition. The findings show that maternal behaviors significantly impact the length of daily babbling sessions and the overall babbling phase, while the presence of adult male tutors does not.

      The researchers conducted detailed acoustic analyses, categorizing syllables and evaluating the variety and presence of learned song syllables. They discovered that maternal interactions enhance both the number and diversity of learned syllables and the production of mature syllables in the pups' vocalizations. A notable correlation was found between the extent of acoustic changes in the most common learned syllable type and maternal activity, highlighting the key role of maternal feedback in shaping pups' vocal development.

      In summary, this study emphasizes the crucial role of maternal social feedback in the vocal development of S. bilineata. Maternal behaviors not only increase vocal practice but also aid in acquiring and refining a complex vocal repertoire. These insights enhance our understanding of social interactions in mammalian vocal learning and draw interesting parallels between bat and human vocal development.

      We thank reviewer 2 for his/her time and effort dedicated to the revision of our study. The suggestions were very helpful in improving our manuscript. 

      Strengths:

      This paper makes significant contributions to the field of vocal learning by looking at the role of maternal behaviors in shaping the vocal learning phenotype of Saccopteryx bilineata. The paper uses a longitudinal approach, tracking the vocal ontogeny of bat pups from birth to weaning across six colonies and two field seasons, allowing the authors to assess how maternal interactions influence various aspects of vocal practice and learning, providing strong empirical evidence for the critical role of social feedback in non-human mammalian vocal learners. This kind of evidence highlights the complexity of the vocal learning phenotype and shows that it goes beyond the right auditory experience and having the right circuitry.

      The paper offers a nuanced understanding of how specific maternal behaviors impact the acquisition and refinement of the vocal repertoire, while showing the number of male tutors - the source of adult song - did not have much of an effect. The correlation between maternal activity and acoustic changes in learned syllable types is a novel finding that underscores the importance of non-vocal social interactions in vocal learning. In vocal learning research, with some notable exceptions, experience is often understood as auditory experience. This paper highlights how, even though that is one important piece of the puzzle, other kinds of experience directly affect the development of vocal behavior. This is of particular importance in the case of a mammalian species such as Saccopteryx bilineata, as this kind of result is perhaps more often associated with avian species.

      Moreover, the study's findings have broader implications for our understanding of vocal learning across species. By drawing parallels between bat and human vocal development (and in some ways to bird vocal development), the paper highlights common mechanisms that may underlie vocal practice and learning in both humans and other mammals. This interdisciplinary perspective enriches the field and encourages further comparative studies, ultimately advancing our knowledge of the evolutionary and developmental processes that shape vocal productive learning in all its dimensions.

      We thank reviewer 2 for this assessment. 

      Weaknesses:

      Some weaknesses can be pointed out, but in fairness, the authors acknowledge them in one way or another. As such, these are not flaws per se, but gaps that can be filled with further research.

      Experimental manipulations, such as controlled playback experiments or controlled environments, could strengthen the causal claims by directly testing the effects of specific maternal behaviors on vocal development. Certainly, the strengths of the paper will be consolidated after such work is performed.

      The reliance on the number of singing males as a proxy for social acoustic input. This measure does not account for the variability in the quality, frequency, or duration of the male songs to which the pups are exposed. A more detailed analysis of the acoustic environment, including direct measurements of song exposure and its impact on vocal learning, would provide a clearer understanding of the role of male tutors.

      Finally, and although it would be unlikely that these results are unique to Saccopteryx bilineata, the study's focus on a single species limits at present the generalizability of some of its findings to other vocal learning mammals. While the parallels drawn between bat and human vocal development are intriguing, the conclusions will be more robust when supported by comparative studies involving multiple species of vocal learners. This will help to identify whether the observed maternal influences on vocal development reported here are unique to Saccopteryx bilineata or represent a broader phenomenon in chiropteran, mammalian, or general vocal learning. Expanding the scope of research to include a wider range of species and incorporating cross-species comparisons will significantly enhance the contribution of this study to the field of vocal learning.

      Thank you for your suggestions and comments. 

      Regarding your main comment 1: In the future, we plan to implement temporary captivity experiments to investigate how maternal behaviours affect pup vocal development. This study provides the necessary basis for conducting future playback studies investigating specific behaviours in a controlled environment.

      Regarding your main comment 2: We completely agree that the number of singing males only represents a proxy for acoustic input that pups receive during ontogeny. In the future, we plan to investigate in detail how the acoustic landscape influences pup vocal development and learning. This will include quantifying how long pups are exposed to song during ontogeny and assessing the influence of different tutors, including a detailed analysis of song syllables of the adult tutors to compare it to vocal trajectories of song syllables in pups. 

      Regarding your main comment 3: We also fully agree that it is unlikely that these results are unique to Saccopteryx bilineata. We are certain that other mammalian vocal learners show parallels to the vocal development and learning processes of S. bilineata. Especially bats are a promising taxon for comparative studies because their vocal production and perception systems are highly sophisticated (due to their ability to echolocate). The high sociability of this taxon also includes a variety of social systems and vocal capacities (e.g. regarding vocal repertoire size, vocal learning capacities, information content, etc.) which support social learning and social feedback – as shown in our study. 

      As suggested, in our revised manuscript we have includes information on the validation of the ethogram. Furthermore, we have corrected all the spelling mistakes – thank you very much for pointing them out!

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      The following comments and suggestions are offered to improve clarity and strengthen support for the paper's main claims.

      (1) Female displays as feedback:

      a) The authors rather broadly describe maternal behavior as feedback based on its occurrence during pup babbling. Feedback typically entails some degree of response contingency, which is not explicitly established here. Although the authors qualitatively describe a variety of female displays that only occur within the babbling context, they also state that "all these behaviors could occur singly or in an interactive way" (Line 102). The authors go on to use aggregate counts of these diverse female displays in their analyses. It would of course be interesting to know whether distinct female displays are evoked differentially by pup behavior and whether specific female behaviors, in turn, predict subsequent pup vocalizations. A display-specific approach might also reveal more about the mechanisms by which the female behavior shapes babbling (e.g. specific reinforcement signals vs. more graded social facilitation or 'audience effect'). However, even without identifying such finegrained contingencies, the main text should at least mention the results shown in Figure 1A. Namely, that pups initiate ~80% of interactive behavioral sequences, suggesting that subsequent maternal displays are likely to be pup-contingent responses (i.e. feedback) and not simply co-occurring behavior.

      We fully agree with Reviewer 1 that it would be very informative to investigate whether distinct female displays are evoked differentially by pup behavior, such as specific syllables within babbling. Or conversely, whether specific female behaviors precede particular pup vocalizations. For this study, we documented maternal behavior through direct observations, field notes, and/or video recordings. However, to capture potential contingencies between specific maternal behavioral displays and vocalization occurring in the millisecond range, other data collection methods (e.g. high-speed camera) will be required in the future. 

      Related to this, we have included the following statements (see below). Statement 1 also cites a very recent study in zebra finches, demonstrating that female calls can promote song learning success (Bistere et al. 2024, line 57, lines 304-305). 

      Lines 297-305: This finding serves as an initial indication that non-vocal interactions with the mother may influence a pup´s individual learning trajectories. Future studies will focus on the relationship between acoustic change, maternal feedback, and learning success, specifically investigating contingencies between particular pup vocalizations and maternal displays in natural settings. Playback experiments are an additional approach to test the impact of contingency on vocal learning. For example, one study in zebra finches demonstrated that contingent non-vocal maternal feedback affects imitation success (Carouso-Peck & Goldstein, 2019), while another recent study found that female calls can promote song learning but the role of contingency remains to be determined (Bistere et al., 2024).  

      Lines: 332-334: This might also apply to S. bilineata where pups initiated ~ 80% of social interactions, suggesting that maternal feedback is likely influenced by the pup´s vocal practice.  

      b) The authors claim that the number of maternal displays during babbling predicts the duration of babbling bouts (Figure 1D). I find this analysis - and others based on the raw number of behaviors during babbling - difficult to interpret given that the raw number of displays may depend upon the duration of the babbling bout over which they are counted. In other words, might the number of displays reflect the fact that more displays can occur within the interval of longer babbling bouts? It would be relatively straightforward to minimize this potential confound by testing whether female display *rates* predict longer bouts.

      We calculated the display rates (maternal displays per bout duration) and conducted a GLMM (the same analysis after log-transformation and scaling) like in our original manuscript (model 1).  

      GLMM

      summary(vocpracf)

      Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']  Family: Gamma  ( log )

      Formula: bout_dur ~ age.z + behavioural_quotient.log.z + nomales.z + (1 | ID) Data: set1

      Author response table 1.

      Author response table 2.

      Author response table 3.

      Author response table 4.

      Author response table 5.

      Interpretation: Our analysis in the original manuscript shows that the bout duration increases with number of maternal displays. As reviewer 1 points out: more time offers more opportunities for the mother to show displays. The number of displays in longer bouts could just reflect that more displays are possible in a longer period. This could be a potential confounding factor. However, our analysis of display rates as an explaining factor shows that the relationship between bout duration and display rate is negative. This means that in longer bouts the displays increase (as seen in the first scenario), but they happen less frequently per time unit. This could indicate that in longer bouts, the mother takes breaks or longer periods of time between each display, which decreases the frequency of displays. This minimizes the risk of a potential confound, as it shows that the rate of displays tends to decrease rather than increase in longer bouts. In summary: The display rate does not appear to ‘favour’ longer bouts, as longer bouts are associated with a lower display rate. This speaks against the hypothesis that the number of displays only increases due to the longer bout duration. This also means that our analyses, which show that maternal displays influence song syllable production, are not biased or confounded by the bout duration. This suggests that maternal behaviour is targeted and selective, and represents a potentially contingent reaction to the pup´s vocal production, and is not simply determined by the duration of a bout.

      We added this analysis in our supplementary material (Table S2) and pointed this out in the revision of our main manuscript (lines 136-138). 

      c) The introduction states that "Pup babbling is not tied to a specific function." (Lines 75-78). This may be an important point worth exploring with this unique data set. For example, the termination of a babbling bout is defined in some cases by the onset of nursing. Have the authors (or others) tested whether babbling elicits nursing behavior? If so, this may represent a reinforcement mechanism that affects babbling rates and subsequent song outcomes. Similar functional shifts in developing vocal behavior have been reported in male chipping sparrows, in which juvenile begging calls - which initially elicit parental feeding behavior - can later be incorporated into 'sub-song' (i.e. babbling) during the development of courtship song (Lui, Wada, Nottebohm, PLOS ONE, 2009).

      Thank you for pointing out this interesting study on chipping sparrows! 

      To address your question: Strauss et al. (2010) conducted a study on pup and maternal behaviors, demonstrating that babbling did not consistently result in nursing.  When denied care, pups often returned to resting or grooming, a pattern we also observed in our study. While nursing might provide an additional reinforcement mechanisms, it is not the cause that evokes babbling – this is what we mean by stating “pup babbling is not tied to a specific function”. Babbling is not a begging behavior as described by Lui et al. 2009. As mentioned in the review of ter Haar et al. 2021, babbling differs structurally from begging in that it is composed of both adult-like and juvenile syllables and lacks context specificity. To solicit care (i.e. begging) pups produce several isolation calls in a fast repetitive manner. We added a more detailed explanation to make this distinction clear (lines 79-83).

      Another interesting fact and probably more comparable to the study of the chipping sparrows – in which begging calls are incorporated into subsong practice – might be the isolation call syllables of S. bilineata. Directly after birth, S. bilineata pups produce multisyllabic isolation calls (see Knörnschild & von Helversen 2008, Knörnschild et al. 2012, Fernandez & Knörnschild 2017) that serve to solicit maternal care. For the first 2.5 weeks, pups only produce innate vocalizations, including echolocation and isolation calls (Fernandez et al. 2021). During the babbling phase, the syllables encoding the individual (and group) signature of the isolation call are also incorporated into babbling bouts. The production of isolation calls might also mark an initial step in the vocal learning process. However, in contrast to the subsong of chipping sparrows, babbling bouts in S. bilineata also include syllables acquired through vocal imitation. Thus, although we find similarities in vocal practice and development between chipping sparrows and S. bilineata, there are also distinct differences. 

      (2) Are pups exposed to more male songs when the mother is present?

      The number of singing males in each colony was used as a reasonable proxy for the amount of social acoustic input. However, I wonder if pups are exposed to more adult male songs when the mother is present and, relatedly, if females tend to remain present for longer if a pup is babbling (potentially increasing its exposure to male songs during the babbling phase).

      The mother is always present when males are singing. In S. bilineata, males predominantly engage in territorial song twice daily: at dusk and dawn. After foraging at night, territorial singing males are the first to return to the roost, and females will only return when they hear male song. Pups are either attached to the mother´s belly or – when growing older – will fly into the roost followed by the mother. In the evening, males sing approximately half an hour before leaving for foraging. Females will usually leave first, followed by their pups, and males leave last. Hence, females/mothers are always present when pups are exposed to male acoustic input.  

      (3) Pup sex differences:

      The authors test for sex differences within a subset of pups and briefly mention that vocal development is considered in both males and females. This presumably means that female pups also exhibit vocal imitation of adult male territorial songs, even though they only produce these vocalizations during the babbling phase, after which they stop singing entirely. If so, this would, to my knowledge, be a unique phenomenon among vocal learners and would be interesting to discuss in greater detail.

      We followed your recommendation and discussed this topic in greater detail. We included the following part in our discussion (lines 257-269): An intriguing aspect of this species is that, unlike most song-learning songbird species, female pups show no differences from males in babbling behavior and vocal development (Fernandez et al. 2021). This study corroborated this finding: female pups received the same maternal feedback, and their song syllable imitation did not differ in any way from male pups (as observed as well in Knörnschild et al. 2010). This phenomenon is rare among vocal learners and raises the question of why female pups match male vocal development despite not using the learned vocalizations later in life. One potential explanation might lie in the function of the territorial song for adult females: it serves as an acoustic signal to help females locate new suitable colonies after dispersal. The territorial song exhibits different dialects, with females showing a preference for local over foreign dialects (Knörnschild et al., 2017). The own early practice and production of song might enhance the ability to evaluate male song and support mating decisions.

      (4) Characterization of song syllables:

      The authors explain their acoustic analyses in detail within the methods, however, descriptions of the syllable classification procedures and acoustic movement analyses need to be presented more clearly in the main text, so readers unfamiliar with bioacoustics or previous work can follow the logic. Also, given the qualitative descriptions of the data and the two spectrogram examples provided (Figures 2 and S1), it is difficult for the reader to fully evaluate the suitability and output of these critical procedures.

      Suggestions:  

      - Qualitative descriptions of syllable characteristics (i.e. buzz, pulse, trill, ripple, gap, smeared noisy, precursor syllable, mature syllable, adult-like syllable, early vs. late babbling phase, syllable name, etc) should all be clearly-labeled in example spectrograms and used consistently, without using different terms interchangeably (e.g. mature vs. adult-like).

      We understand that we should provide a clearer description of the various terms essential to understanding this study. We added a “Terminology” box (line 158) to the main manuscript, defining the acoustic terms we are using throughout our study. Additionally, we enhanced Figure S1 by providing more detailed information on the spectrogram that displays the five distinct song syllable types. Moreover, we included an additional spectrogram in the supplementary material (Fig. S2) displaying examples of precursor and mature syllables for syllable B2. In the method section, “The acoustic movement during ontogeny”, we added a sentence clarifying the terms “early” and “late babbling phase” (Lines 605-606). 

      - Show as you tell. Plot the data, at least from a representative pup, for each major step in the analyses (labeled spectrogram, PCA plots with distinct syllable clusters, high vs. low versatility, precursor vs. mature variants, early vs. late syllables with Euclidean distances between centroids and relation to "generic" adult male syllables, etc.)

      To illustrate the acoustic analysis more comprehensively, we have made the following additions:

      -we included a Figure (Fig. S3) in the supplementary materials showing an excerpt of a babbling bout with labelled syllables to illustrate how we analyzed a) total song syllable count per bout, b) versatility per bout, and c) the number of precursor versus mature B2 syllables (the most common syllable type).

      -Additionally, we included a spectrogram with three exemplary B2 syllables to illustrate the acoustic parameter extraction with Avisoft SASLab Pro software for subsequent analysis of vocal change during development (Fig. S4 A).

      Lastly, we included a DFA for one of the colonies with three exemplary pups to illustrate how we calculated each pup's acoustic change during ontogeny (Fig. S4 B). 

      (5) Minor Comments and Corrections:

      - Modeled data are log-transformed, however, the raw data are plotted on linear scales, and in most cases, data points are densely clustered and overlapping at lower values. Plotting the data on log scales would likely aid visibility.

      We appreciate this suggestion and changed the plots accordingly. 

      - Figure 1E displays 18 data points, (legend says n=19).

      The legend is correct; the figure includes 19 data points. Two mothers have the same activity score, so their points are at the same location and it looks like there are only 18 data points. 

      - Line 482: Is "VCL" media player meant to refer to "VLC" player?

      Yes, thank you for spotting that. We corrected it.  

      Reviewer #2 (Recommendations for the authors):

      I have only a couple of comments:

      - Perhaps it would be useful to briefly go over the validation used for the ethogram in Table S1.

      The behaviors listed in the ethogram were defined based on Strauss et al. (2010) and expanded based on our own observations. For consistency, we developed these definitions and trained the students analyzing behavioral data for this study. During the training phase, we validated their analyses until the inter-observer-reliability reached 100% (lines 507-508).  

      - The paper seems to be generally written in American English, yet there are some instances of British English spelling, e.g. "standardised"/"standardisation": table 1, table 2, lines 143, 228, 524, 525, 531, 546, 547, 554, 560, 561.

      Thank you for spotting these errors, we corrected them.  

      - Line 343: "at libitum" should be "ad libitum".

      Thank you for spotting this error. We corrected it.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors present a modelling study to test the hypothesis that horizontal gene transfer (HGT) can modulate the outcome of interspecies competition in microbiomes, and in particular promote bistability in systems across scales. The premise is a model developed by the same authors in a previous paper where bistability happens because of a balance between growth rates and competition for a mutual resource pool (common carrying capacity). They show that introducing a transferrable element that gives a "growth rate bonus" expands the region of parameter space where bistability happens. The authors then investigate how often (in terms of parameter space) this bistability occurs across different scales of complexity, and finally under selection for the mobile element (framed as ABR selection).

      Strengths:

      The authors tackle an important, yet complex, question: how do different evolutionary processes impact the ecology of microbial ecosystems? They do a nice job at increasing the scales of heterogeneity and asking how these impact their main observable: bistability.

      We appreciate the reviewer for agreeing with the potential value of our analysis. We are also grateful for the constructive comments and suggestions on further analyzing the influence of the model structure and the associated assumptions. We have fully addressed the raised issues in the updated manuscript and below.

      Weaknesses:

      The author's starting point is their interaction LV model and the manuscript then explores how this model behaves under different scenarios. Because the structure of the model and the underlying assumptions essentially dictate these outcomes, I would expect to see much more focus on how these two aspects relate to the specific scenarios that are discussed. For example:

      A key assumption is that the mobile element conveys a multiplicative growth rate benefit (1+lambda). However, the competition between the species is modelled as a factor gamma that modulates the competition for overall resource and thus appears in the saturation term (1+ S1/Nm + gamma2*S2/Nm). This means that gamma changes the perceived abundance of the other species (if gamma > 1, then from the point of view of S1 it looks like there are more S2 than there really are). Most importantly, the relationship between these parameters dictates whether or not there will be bistability (as the authors state).

      This decoupling between the transferred benefit and the competition can have different consequences. One of them is that - from the point of view of the mobile element - the mobile element competes at different strengths within the same population compared to between. To what degree introducing such a mobile element modifies the baseline bistability expectation thus strongly depends on how it modifies gamma and lambda.

      Thus, this structural aspect needs to be much more carefully presented to help the reader follow how much of the results are just trivial given the model assumptions and which have more of an emergent flavour. From my point of view, this has an important impact on helping the reader understand how the model that the authors present can contribute to the understanding of the question "how microbes competing for a limited number of resources stably coexist". I do appreciate that this changes the focus of the manuscript from a presentation of simulation results to more of a discussion of mathematical modelling.

      We thank the reviewer for the insightful suggestions. We agree with the reviewer that the model structure and the underlying assumptions need to be carefully discussed, in order to understand the generality of the theoretical predictions. In particular, the reviewer emphasized that how HGT affects bistability might depend on how mobile genetic elements modified growth rates and competition. In the main text, we have shown that when mobile genes only influence species growth rates, HGT is expected to promote multistability (Fig. 1 and 2). However, when mobile genes modify species interactions, the effect of HGT on multistability is dependent on how mobile genes change competition strength (Fig. 3a to f). When mobile genes increase competition, HGT promotes multistability (Fig. 3c and e). In contrast, when mobile genes relax competition, HGT is expected to reduce multistability (Fig. 3d and f).

      In light of the reviewer’s comments, we have further generalized the model structure, by accounting for the scenario where mobile genes simultaneously modify growth rates and competition. The effect of mobile genes on growth rates is represented by the magnitude of 𝜆’s, and the influence on competition is described by another parameter 𝛿. By varying these two parameters, we can evaluate how the model structure and the underlying assumptions affect the baseline expectation. We performed additional simulations with broad ranges of 𝜆 and 𝛿 values. In particular, we analyzed whether HGT would promote the likelihood of bistability in two-species communities compared with the scenario without gene transfer (Fig. 3g-i). Our results suggested that: (1) With or without HGT, reducing 𝜆 (increasing neutrality) promotes bistability; (2) With HGT, increasing 𝛿 promotes bistability; (2) Compared with the population without HGT, gene transfer promotes bistability when 𝛿 is zero or positive, while reduces bistability when 𝛿 is largely negative. These results agree with the reviewer’s comment that the baseline bistability expectation depends on how HGT modifies gamma and lambda. In the updated manuscript, we have thoroughly discussed how the model structure and the underlying assumptions can influence the predictions (line 238-253). 

      We further expanded our analysis, by calculating how other parameters, including competition strength, growth rate ranges, and death/dilution rate, would affect the multistability of communities undergoing horizontal gene transfer (Fig. S2, S3, S9, S10, S11, S12, S13, S15). Together with the results presented in the first draft, these analysis enables a more comprehensive understanding of how different mechanisms, including but not limited to HGT, collectively shaped community multistability. In the updated manuscript, the reviewer can see the change of focus from exploring the effects of HGT to a more thorough discussion of the mathematical model. The revised texts highlighted in blue and the supplemented figures reflect such a change.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors use a theoretical model to study the potential impact of Horizontal Gene Transfer on the number of alternative stable states of microbial communities. For this, they use a modified version of the competitive Lotka Volterra model-which accounts for the effects of pairwise, competitive interactions on species growth-that incorporates terms for the effects of both an added death (dilution) rate acting on all species and the rates of horizontal transfer of mobile genetic elements-which can in turn affect species growth rates. The authors analyze the impact of horizontal gene transfer in different scenarios: bistability between pairs of species, multistability in communities, and a modular structure in the interaction matrix to simulate multiple niches. They also incorporate additional elements to the model, such as spatial structure to simulate metacommunities and modification of pairwise interactions by mobile genetic elements. In almost all these cases, the authors report an increase in either the number of alternative stable states or the parameter region (e.g. growth rate values) in which they occur.

      In my opinion, understanding the role of horizontal gene transfer in community multistability is a

      very important subject. This manuscript is a useful approach to the subject, but I'm afraid that a thorough analysis of the role of different parameters under different scenarios is missing in order to support the general claims of the authors. The authors have extended their analysis to increase their biological relevance, but I believe that the analysis still lacks comprehensiveness.

      Understanding the origin of alternative stable states in microbial communities and how often they may occur is an important challenge in microbial ecology and evolution. Shifts between these alternative stable states can drive transitions between e.g. a healthy microbiome and dysbiosis. A better understanding of how horizontal gene transfer can drive multistability could help predict alternative stable states in microbial communities, as well as inspire novel treatments to steer communities towards the most desired (e.g. healthy) stable states.

      Strengths:

      (1) Generality of the model: the work is based on a phenomenological model that has been extensively used to predict the dynamics of ecological communities in many different scenarios.

      (2) The question of how horizontal gene transfer can drive alternative stable states in microbial communities is important and there are very few studies addressing it.

      We thank the reviewer for the positive comments on the potential novelty and conceptual importance of our work. We are also grateful for the constructive suggestions on the generality and comprehensiveness of our analysis. In particular, we agree with the reviewer that a thorough analysis of the role of different parameter could further improve the rigor of this work. We have fully addressed the raised issues in the updated manuscript and below.

      Weaknesses:

      (1) There is a need for a more comprehensive analysis of the relative importance of the different model parameters in driving multistability. For example, there is no analysis of the effects of the added death rate in multistability. This parameter has been shown to determine whether a given pair of interacting species exhibits bistability or not (see e.g. Abreu et al 2019 Nature Communications 10:2120). Similarly, each scenario is analyzed for a unique value of species interspecies interaction strength-with the exception of the case for mobile genetic elements affecting interaction strength, which considers three specific values. Considering heterogeneous interaction strengths (e.g. sampling from a random distribution) could also lead to more realistic scenarios - the authors generally considered that all species pairs interact with the same strength. Analyzing a larger range of growth rates effects of mobile genetic elements would also help generalize the results. In order to achieve a more generic assessment of the impact of horizontal gene transfer in driving multistability, its role should be systematically compared to the effects of the rest of the parameters of the model.

      We appreciate the suggestions. For each of the parameters that the reviewer mentioned, we have performed additional simulations to evaluate its importance in driving multistability. 

      For the added death rate, we have calculated the bistability feasibility of two-species populations under different values of 𝐷. Our results suggested that (1) varying death rate indeed changed the bistability probability of the system; (2) when the death rate was zero, mobile genetic elements that only modify growth rates would have no effects on system’s bistability. These results highlighted the importance of added death rate in driving multistability (Fig. S2, line 136-142). 

      For the interspecies interaction strength, we first extended our analysis on two-species populations. By calculating the bistability probability under different values of 𝛾, we showed that when interspecies interaction strength was smaller than 1, the influence of HGT on population bistability became weak (Fig. S3, line 143-147). We also considered heterogenous interaction strengths in multispecies communities, by randomly sampling 𝛾<sub>ij</sub> values from uniform distributions. While our results suggested the heterogeneous distribution of 𝛾<sub>ij</sub> didn’t fundamentally change the main conclusion, the mean value and variance of 𝛾<sub>ij</sub> affected the influence of HGT on multistability. The effects of HGT on community multistability becomes stronger when the mean value of 𝛾<sub>ij</sub> gets larger than 1 and the variance of 𝛾<sub>ij</sub> is small (Fig. S12, line 190-196).

      We also analyzed different ranges of growth rates effects of mobile genetic elements. In particular, we sampled 𝜆<sub>ij</sub> values from uniform distributions with given widths. Greater width led to larger range of growth rate effects. We used five-species populations as an example and tested different ranges. Our results suggested that multistability was more feasible when the growth rate effects of MGEs were small. The qualitative relationship between HGT and community was not dependent on the range of growth rate effects (Fig. S13, line 197-205).

      (2) The authors previously developed this theoretical model to study the impact of horizontal gene transfer on species coexistence. In this sense, it seems that the authors are exploring a different (stronger interspecies competition) range of parameter values of the same model, which could potentially limit novelty and generality.

      We appreciate the comment. In a previous work (PMID: 38280843), we developed a theoretical model that incorporated horizontal gene transfer process into the classic LV framework. This model provides opportunities to investigate the role of HGT in different open questions of microbial ecology. In the previous work, we considered one fundamental question: how competing microbes coexist stably. In this work, however, we focused on a different problem: how alternative stable states emerge in complex communities. While the basic theoretical tool that we applied in the two works were similar, the scientific questions, application contexts and the implications of our analysis were largely different. The novelty of this work arose from the fact that it revealed the conceptual linkage between alternative stable states and a ubiquitous biological process, horizontal gene transfer. This linkage is largely unknown in previous studies. Exploring such a linkage naturally required us to consider stronger interspecies competitions, which in general would diminish coexistence but give rise to multistability. We believe that the analysis performed in this work provide novel and valuable insights for the field of microbial ecology. 

      With all the supplemented simulations that we carried out in light of the all the reviewer’s comments, we believe the updated manuscript also provide a unified framework to understand how different biological processes collectively shaped the multistability landscape of complex microbiota undergoing horizontal gene transfer. The comprehensive analyses performed and the diverse scenarios considered in this study also contribute to the novelty and generality of this work.  

      (3) The authors analyze several scenarios that, in my opinion, naturally follow from the results and parameter value choices in the first sections, making their analysis not very informative. For example, after showing that horizontal gene transfer can increase multistability both between pairs of species and in a community context, the way they model different niches does not bring significantly new results. Given that the authors showed previously in the manuscript that horizontal gene transfer can impact multistability in a community in which all species interact with each other, one might expect that it will also impact multistability in a larger community made of (sub)communities that are independent of (not interacting with) each-which is the proposed way for modelling niches. A similar argument can be made regarding the analysis of (spatially structured) metacommunities. It is known that, for smaller enough dispersal rates, space can promote regional diversity by enabling each local community to remain in a different stable state. Therefore, in conditions in which the impact of horizontal gene transfer drives multistability, it will also drive regional diversity in a metacommunity.

      Thanks. Based on the reviewer’s comments, we have move Fig. 3 and 4 to Supplementary Information. In the updated manuscript, we have focused more on analyzing the roles of different parameters in shaping community multistability.

      (4) In some cases, the authors consider that mobile genetic elements can lead to ~50% growth rate differences. In the presence of an added death rate, this can be a relatively strong advantage that makes the fastest grower easily take over their competitors. It would be important to discuss biologically relevant examples in which such growth advantages driven by mobile genetic elements could be expected, and how common such scenarios might be.

      We appreciate the suggestion. Mobile genetic elements can drive large growth rate differences when they encode adaptative traits like antibiotic resistance (line 197-198). 

      We also analyzed different ranges of growth rates effects of mobile genetic elements, by sampling 𝜆<sub>ij</sub> values from uniform distributions with given widths. Our results suggested that multistability was more feasible when the fitness effects of MGEs were small (Fig. S13b). The qualitative relationship between HGT and community was not dependent on the range of growth rate effects (Fig. S13a and b). We discussed these results in line 197-205 of the updated main text.

      Reviewer #3 (Public review):

      Hong et al. used a model they previously developed to study the impact of horizontal gene transfer (HGT) on microbial multispecies communities. They investigated the effect of HGT on the existence of alternative stable states in a community. The model most closely resembles HGT through the conjugation of incompatible plasmids, where the transferred genes confer independent growth-related fitness effects. For this type of HGT, the authors find that increasing the rate of HGT leads to an increasing number of stable states. This effect of HGT persists when the model is extended to include multiple competitive niches (under a shared carrying capacity) or spatially distinct patches (that interact in a grid-like fashion). Instead, if the mobile gene is assumed to reduce between-species competition, increasing HGT leads to a smaller region of multistability and fewer stable states. Similarly, if the mobile gene is deleterious an increase in HGT reduces the parameter region that supports multistability.

      This is an interesting and important topic, and I welcome the authors' efforts to explore these topics with mathematical modeling. The manuscript is well written and the analyses seem appropriate and well-carried out. However, I believe the model is not as general as the authors imply and more discussion of the assumptions would be helpful (both to readers + to promote future theoretical work on this topic). Also, given the model, it is not clear that the conclusions hold quite so generally as the authors claim and for biologically relevant parameters. To address this, I would recommend adding sensitivity analyses to the manuscript.

      We thank the reviewer for the agreeing that our work addressed an important topic and was wellconducted. We are also grateful for the suggestion on sensitivity analysis, which is very helpful to improve the rigor and generality of our conclusion. All the raised issues have been fully addressed in the updated manuscript and below.

      Specific points

      (1) The model makes strong assumptions about the biology of HGT, that are not adequately spelled out in the main text or methods, and will not generally prove true in all biological systems. These include:

      a) The process of HGT can be described by mass action kinetics. This is a common assumption for plasmid conjugation, but for phage transduction and natural transformation, people use other models (e.g. with free phage that adsorp to all populations and transfer in bursts).

      b) A subpopulation will not acquire more than one mobile gene, subpopulations can not transfer multiple genes at a time, and populations do not lose their own mobilizable genes. [this may introduce bias, see below].

      c) The species internal inhibition is independent of the acquired MGE (i.e. for p1 the self-inhibition is by s1).

      These points are in addition to the assumptions explored in the supplementary materials, regarding epistasis, the independence of interspecies competition from the mobile genes, etc. I would appreciate it if the authors could be more explicit in the main text about the range of applicability of their model, and in the methods about the assumptions that are made.

      We are grateful for the reviewer’s suggestions. In main text and methods of the updated manuscript, we have made clear the assumptions underlying our analysis. For point (a), we have clarified that our model primarily focused on plasmid transfer dynamics (line 74, 101, 517). Therefore, the process of HGT can be described by mass action kinetics, which is commonly assumed for plasmid transfer (line 537-538). For point (b), our model allows a cell to acquire more than one mobile genes. Please see our response to point (3) for details. We have also made it clear that we assumed the populations would not lose their own mobile gene completely (line 526-527). For (c), we have also clarified it in the updated manuscript (line 111-112, 527-528). 

      We have also performed a series of additional simulations to show the range of applicability of our model. In particular, we discuss the role of other mechanisms, including interspecies interaction strength, the growth rate effects of MGEs, MGE epistasis and microbial death rates in shaping the multistability of microbial communities undergoing HGT. These results were provided in Fig. S2, S3, S9, S10, S11, S12, S13 and S15.

      (2) I am not surprised that a mechanism that creates diversity will lead to more alternative stable states. Specifically, the null model for the absence of HGT is to set gamma to zero, resulting in pij=0 for all subpopulations (line 454). This means that a model with N^2 classes is effectively reduced to N classes. It seems intuitive that an LV-model with many more species would also allow for more alternative stable states. For a fair comparison, one would really want to initialize these subpopulations in the model (with the same growth rates - e.g. mu1(1+lambda2)) but without gene mobility.

      We appreciate the insightful comments. The reviewer was right that in our model HGT created additional subpopulations in the community. However, with or without HGT, we calculated the species diversity and multistability based on the abundances of the 𝑁 species (s<sub>i</sub> in our model), instead of all the p<sub>ij</sub> subpopulations. Therefore, although there exist more ‘classes’ in the model with HGT, the number of ‘classes’ considered when we calculated community diversity and multistability was equal. In light of the reviewer’s suggestion, we have also performed additional simulations, where we initialized the subpopulations in the model with nonzero abundances. Our results suggested that initializing the p<sub>ij</sub> subpopulations with non-zero abundances didn’t change the main conclusion (Fig. S11, line 188-189).

      (3) I am worried that the absence of double gene acquisitions from the model may unintentionally promote bistability. This assumption is equivalent to an implicit assumption of incompatibility between the genes transferred from different species. A highly abundant species with high HGT rates could fill up the "MGE niche" in a species before any other species have reached appreciable size. This would lead to greater importance of initial conditions and could thus lead to increased multistability.

      This concern also feels reminiscent of the "coexistence for free" literature (first described here http://dx.doi.org/10.1016/j.epidem.2008.07.001 ) which was recently discussed in the context of plasmid conjugation models in the supplementary material (section 3) of https://doi.org/10.1098/rstb.2020.0478 .

      We appreciate the comments. Our model didn’t assume the incompatibility between MGEs transferred from different species. Instead, it allows a cell to acquire more than one MGEs. In our model, p<sub>ij</sub> described the subpopulation in the 𝑖-th species that acquired the MGE from the 𝑗th species. Here, p<sub>ij</sub> can have overlaps with p<sub>ik</sub> (𝑗 ≠ 𝑘). In other words, a cell can belong to p<sub>ij</sub> and p<sub>ik</sub> at the same time. The p<sub>ij</sub> subpopulation is allowed to carry the MGEs from the other species. In the model, we used to describe the influence of the other MGEs on the growth of p<sub>ij</sub>.

      We also thank the reviewer for bringing two papers into our attention. We have cited and discussed these papers in the updated manuscript (line 355-362).

      (4) The parameter values tested seem to focus on very large effects, which are unlikely to occur commonly in nature. If I understand the parameters in Figure 1b correctly for instance, lambda2 leads to a 60% increase in growth rate. Such huge effects of mobile genes (here also assumed independent from genetic background) seem unlikely except for rare cases. To make this figure easier to interpret and relate to real-world systems, it could be worthwhile to plot the axes in terms of the assumed cost/benefit of the mobile genes of each species.

      Thanks for the comments. In the main text, we presented one simulation results that assumed relatively large effects of MGE on species fitness, as the reviewer pointed out. In the updated manuscript, we have supplemented numerical simulations that considered different ranges of fitness effects, including the fitness effect as small as 10% (Fig. S13a). We have also plotted the relationship between community multistability and the assumed fitness effects of MGEs, as the reviewer suggested (Fig. S13b). Our results suggested that multistability was more feasible when the fitness effects of MGEs were small, and changing the range of MGE fitness effects didn’t fundamentally change our main conclusion. These results were discussed in line 197-205 of the updated main text.

      Something similar holds for the HGT rate (eta): given that the population of E. coli or Klebsiella in the gut is probably closer to 10^9 than 10^12 (they make up only a fraction of all cells in the gut), the assumed rates for eta are definitely at the high end of measured plasmid transfer rates (e.g. F plasmid transfers at a rate of 10^-9 mL/CFU h-1, but it is derepressed and considered among the fastest - https://doi.org/10.1016/j.plasmid.2020.102489 ). To adequately assess the impact of the HGT rate on microbial community stability it would need to be scanned on a log (rather than a linear) scale. Considering the meta-analysis by Sheppard et al. it would make sense to scan it from 10^-7 to 1 for a community with a carrying capacity around 10^9.

      We thank the reviewer for the constructive suggestion. We have carried out additional simulations by scanning the 𝜂 value from 10<sup>-7</sup> to 1. The results suggested that increasing HGT rates started to promote multistability when 𝜂 value exceeded 10<sup>-2</sup> per hour (Fig. S9, line 337-346). This corresponds to a conjugation efficiency of 10<sup>-11</sup> cell<sup>-1</sup> ∙ mL<sup>-1</sup>∙ mL when the maximum carrying capacity equals 10<sup>9</sup> cellsmL<sup>-1</sup>, or a conjugation efficiency of 10<sup>-14</sup> cell<sup>-1</sup> ∙ hr<sup>-1</sup>∙ mL when the maximum carrying capacity equals 10<sup>12</sup> cellsmL<sup>-1</sup>.

      (5) It is not clear how sensitive the results (e.g. Figure 2a on the effect of HGT) are to the assumption of the fitness effect distribution of the mobile genes. This is related to the previous point that these fitness effects seem quite large. I think some sensitivity analysis of the results to the other parameters of the simulation (also the assumed interspecies competition varies from figure to figure) would be helpful to put the results into perspective and relate them to real biological systems.

      We appreciate the comments. In light of the reviewer’s suggestion, we have changed the range of the fitness effects and analyzed the sensitivity of our predictions to this range. As shown in Fig. S13, changing the range of MGE fitness effects didn’t alter the qualitative interplay between HGT and community multistability. We have also examined the sensitivity of the results to the strength of interspecies competition strength (Fig. S3, S10, S12). These results suggested that while the strength of interspecies interactions played an important role in shaping community multistability, the relationship between HGT rate and multistability was not fundamentally changed by varying interaction strength. In addition, we examined the role of death rates (Fig. S2). In the updated manuscript, we discussed the sensitivity of our prediction to these parameters in line 136-147, 190205, 335-354.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Please find below a few suggestions that, in my opinion, could help improve the manuscript.

      TITLE

      It might not be clear what I 'gene exchange communities' are. Perhaps it could be rewritten for more specificity (e.g. '...communities undergoing horizontal gene transfer').

      We have updated the title as the reviewer suggested.

      ABSTRACT

      The abstract could also be edited to improve clarity and specificity. Terms like 'complicating factors' are vague, and enumerating specific factors would be better. The results are largely based on simulations, no analytical results are plotted, so I find that the sentence starting with 'Combining theoretical derivation and numerical simulations' can be a bit misleading.

      We appreciate the suggestions. We have enumerated the specific factors and scenarios in the updated abstract (line 18-26). We have also replaced 'Combining theoretical derivation and numerical simulations' with ‘Combining mathematical modeling and numerical simulations’.

      INTRODUCTION

      -  Line 42, please revise this paragraph. The logical flow is not so clear, it seems a bit like a list of facts, but the main message might not be clear enough. Also, it would be good to define 'hidden' states or just rewrite this sentence.

      We appreciate the suggestion. In the updated manuscript, we have rewritten this paragraph to improve the logical flow and clarity (line 46-52).

      -  Line 54, there is little detail about both theoretical models and HGT in this paragraph, and mixing the two makes the paragraph less focused. I suggest to divide into two paragraphs and expand its content. For example, you could explain a bit some relevant implications of MGE.

      We appreciate the suggestion. In the updated manuscript, we have divided this paragraph into two paragraphs, focusing on theoretical models and HGT, respectively (line 55-71). In particular, we have added explanations on the implications of MGEs (line 66-69), as the reviewer suggested.

      -  Line 72, as mentioned in the abstract, it would be better to explicitly mention which confounding factors are going to be discussed.

      Thanks for the suggestion. We have rewritten this part as “We further extended our analysis to scenarios where HGT changed interspecies interactions, where microbial communities were subjected to strong environmental selections and where microbes lived in metacommunities consisting of multiple local habitats. We also analyzed the role of different mechanisms, including interspecies interaction strength, the growth rate effects of MGEs, MGE epistasis and microbial death rates in shaping the multistability of microbial communities. These results created a comprehensive framework to understand how different dynamic processes, including but not limited to HGT rates, collectively shaped community multistability and diversity” (line 75-82).

      RESULTS

      -  The basic concepts (line 77) should be explained with more detail, keeping the non-familiar reader in mind. The reader might not be familiar with the concept of bistability in terms of species abundance. Also, note that mutual inhibition does not necessarily lead to positive feedback, as an interaction strength between 0 and 1 might still be considered inhibition. In any case, in Figure 1 it is not obvious how the positive feedback is represented, the caption should explain it. Note that neither the main text nor the caption explains the metaphor of the landscape and the marble that you are using in Figure 1a.

      We have rewritten this paragraph to provide more details on the basic concepts (line 86-99). We have removed the statement about ‘mutual inhibition’ to avoid being misleading. We have also updated the caption of Fig. 1a to explain the metaphor of the landscape and the marble (line 389396). 

      -  In the classical LV model, bistability does not depend on growth rates, but only on interaction strength. Therefore, I think that much of the results are significantly influenced by the added death rate. I believe that if the death rate is set to zero, mobile genetic elements that only modify growth rates will have no effect on the system's bistability. Because of this, I think that a thorough analysis of the role of the added death (dilution) rate and the distribution of growth rates is especially needed.

      We are grateful for the reviewer’s insightful comments. In the updated manuscript, we have thoroughly analyzed the role of the added death (dilution) rate on the bistability of communities composed of two species (Fig. S2). Indeed, as the reviewer pointed out, if the death rate equals zero, mobile genetic elements that only modify growth rates will have no effect on the system's bistability. We have discussed the role of death rate in line 136-142 of the updated manuscript.

      We have also expanded our analysis on the distribution of growth rates. In particular, we considered different ranges of growth rates effects of mobile genetic elements, by sampling 𝜆<sub>ij</sub> values from uniform distributions with given widths (Fig. S13). Greater width led to larger range of growth rate effects. We used five-species populations as an example and tested different ranges.

      Our results suggested that multistability was more feasible when the growth rate effects of MGEs were small (Fig. S13b). The qualitative relationship between HGT and community was not dependent on the range of growth rate effects (Fig. S13a). These results are discussed in line 197205 of the updated manuscript.

      -  The analysis uses gamma values that, in the absence of an added death rate, render a species pair bistable. Therefore, multistability would be quite expected for a 5 species community. Note that, multistability is possible in communities of more than 2 species even if all gamma values are smaller than 1. Analyzing a wide range of interaction strength distributions would really inform on the relative role of HGT in multistability across different community scenarios.

      We are grateful for the reviewer’s suggestion. In light of the reviewer’s comments, in the updated manuscript, we have performed additional analysis by focusing on a broader range of interaction strengths (Fig. S3, S10, S12), especially the gamma values below 1 (Fig. S10). Our results agreed with the reviewer’s notion that multistability was possible in communities of more than 2 species even if all gamma values were smaller than 1 (Fig. S10). 

      -  I would recommend the authors extend the analysis of the model used for Figures 1 and 2. Figures 3 and 4 could be moved to the supplement (see my point in the public review), unless the authors extend the analysis to explain some non-intuitive outcomes for niches and metacommunities.

      Thanks. In the updated manuscript we have performed additional simulations to extend the analysis in Figure 1 and 2. These results were presented in Fig. S2, S3, S9, S10, S11, S12, and S13. We have also moved Figure 3 and 4 to SI as the reviewer suggested.

      -  The authors seem to refer to fitness and growth rates as the same thing. This could lead to confusion - the strongest competitor in a species pair could also be interpreted as the fittest species despite being the slowest grower. I think there's no need to use fitness if they refer to growth rates. In any case, they should define fitness if they want to use this concept in the text.

      We are grateful for the insightful suggestion. To avoid confusion, we have used ‘growth rate’ throughout the updated manuscript.

      -  Across the text, the language needs some revision for clarity, specificity, and scientific style. In lines 105 - 109 there are some examples, like the use of 'in a lot of systems', and ' interspecies competitions' (I believe they mean interspecies interaction strengths).

      We appreciate the reviewer for pointing them out. We have thoroughly checked the text and made the revisions whenever applicable to improve the clarity and specificity.

      -  Many plots present the HGT rate on the horizontal axis. Could the authors explain why is it that the rate of HGT is relatively important for the number of alternative stable states? I understand how from zero to a small positive number there is a qualitative change. Beyond that, it shouldn't affect bistability too much, I think. If I am right, then other parameters could be more informative to plot in the horizontal axis. If I am wrong, I think that providing an explanation for this would be valuable.

      Thanks. To address the reviewer’s comment, we have systematically analyzed the effects of HGT on community multistability, by scanning the HGT rate from 10<sup>-7</sup> to 10<sup>0</sup>hr<sup>-1</sup> . In communities of two or multiple species, our simulation results showed that multistability gradually increased with HGT rate when HGT rate exceeded 10<sup>2</sup>hr<sup>-1</sup>. These results, presented in Fig. S9 and discussed in line 337-346, provided a more quantitative relationship between multistability and HGT rate.

      While in this work we showed the potential role of HGT in modulating community multistability, our results didn’t exclude the role of the other parameters. Motivated by the comments raised by the reviewers, in the updated manuscript, we have performed additional simulations to analyze the influence of other parameters in shaping community multistability. These parameters include death or dilution rate (Fig. S2), interaction strength (Fig. S3, S9, S10, S11, S12, S14, S15), 𝜆 range (Fig. S13, S15) and 𝛿 value (Fig. 3g, h, i). In many of the supplemented results (Fig. S2b, S3b, S13b, Fig. 3g, 3h and 3i), we have also plotted the data by using these parameters as the x axis. We believe the updated work now provided a more comprehensive framework to understand how different mechanisms, including but not limited to HGT, might shape the multistability of complex microbiota. These points were discussed in line 136-147, 190-205, 238-253, 334-354 of the updated main text. 

      -  My overall thoughts on the case of antibiotic exposure are similar to those of previous sections. Very few of the different parameters of the model are analyzed and discussed. In this case, the authors increased the interaction strength to ~0.4 times higher compared to previous sections. Was this necessary, and why?

      Thanks for the comments. In the previous draft, the interaction strength 𝛾=1.5 was tested as an example. Motivated by the reviewer’s comments, in the updated manuscript, we have examined different interaction strengths, including the strength ( 𝛾 = 1.1 ) commonly tested in other scenarios. The prediction equally held for different 𝛾 values (Fig. S15). We have also analyzed different 𝜆 ranges (Fig. S15). These results, together with the analyses presented in the earlier version of the manuscript, suggested the potential role of HGT in promoting multistability for communities under strong selection. The supplemented results were presented in Fig. S15 and discussed in line 293-295 of the updated manuscript.

      -  Line 195, if a gene encodes for the production of a public good, why would its HGT reduce interaction strength? I can think of the opposite scenario: the gene is a public good, and without HGT there is only one species that can produce it. Let's imagine that the public good is an enzyme that deactivates an antibiotic that is present in the environment, and then the species that produces has a positive interaction with another species in a pairwise coculture. If HGT happens, the second species becomes a producer and does not need the other one to survive in the presence of antibiotics anymore. The interaction can then become more competitive, as e.g. competition for resources could become the dominant interaction.

      We are grateful for pointing it out. In the updated manuscript, we have removed this statement.

      DISCUSSION

      -  L 267 "by comparison with empirical estimates of plasmid conjugation rates from a previous study [42], the HGT rates in our analysis are biologically relevant in a variety of natural environments". The authors are using a normalized model and the relevance of other parameter values is not discussed. If the authors want to claim that they are using biologically relevant HGT, they should also discuss whether the rest of the parameter values are biologically relevant. I recommend relaxing this statement about HGT rates.

      We appreciate the suggestion. We agree with the reviewer that other parameters including the death/dilution rate, interactions strength and 𝜆 ranges are also important in shaping community multistability. We have performed additional analysis to show the effects of these parameters. In light of the reviewer’s suggestion, we have relaxed this statement and thoroughly discussed the context-dependent effect of HGT as well as the roles of different parameters (line 334-354).

      -  Last sentence: "Therefore, inhibiting the MGE spread using small molecules might offer new opportunities to reshape the stability landscape and narrow down the attraction domains of the disease states". It is not clear what procedure/technique the authors are suggesting. If they want to keep this statement, the authors should give more details on how small molecules can be/are used to inhibit MGE.

      We appreciated the comments. Previous studies have shown some small molecules like unsaturated fatty acids can inhibit the conjugative transfer of plasmids. By binding the type IV secretion traffic ATPase TrwD, these compounds limit the pilus biogenesis and DNA translocation. We have provided more details regarding this statement in the updated manuscripts (line 376-379).

      METHODS

      -  Line 439, mu_i should be presented as the maximum 'per capita' growth rate.

      We have updated the definition of 𝜇i following the suggestion (line 529).

      -  Line 444, this explanation is hard to follow, please expand it to provide more details. You could provide an example, like explaining that all individuals from S1 have the MGE1 and therefore they have mu_1 = mu_01 ... After HGT, their fitness changes if they get the plasmid from S2, so a term lambda2 appears.

      Thanks. In the updated manuscript, we have expanded the explanation by providing an example as the reviewer suggested (line 534-537).

      -  The normalization assumes a common carrying capacity Nm (Eqs 1-4) and then it's normalized (Eqs. 5-8). It would be better to start from a more general scenario in which each species has a different carrying capacity and then proceed with the normalization.

      We appreciate the suggestion. In the updated manuscript, we have started our derivation from the scenario where each species has a different carrying capacity before proceeding with the normalization (section 1 of Methods, line 516-554). The same equations can be obtained after normalization.

      -  I think that the meaning of kappa (the plasmid loss rate) is not explained in the text.

      Thanks for pointing it out. We have explained the meaning of kappa in the updated text (line 108, 154, 539-541, 586-587, 607).

      SUPPLEMENT

      -  Figure S4, what are the different colors in panel b?

      In panel b of Fig. S4, the different colors represent the simulation results repeated with randomized growth rates. We have made it clear in the updated SI.

      Reviewer #3 (Recommendations for the authors):

      (1) Please extend your description of the model, so it is easier to understand for readers who have not read the first paper. Especially the choice to describe the model as species and subpopulations, as opposed to writing it as MGE-carrying and MGE-free populations of each species makes it quite complicated to understand which parameters influence each other.

      Thanks for the suggestion. We have extended the model description in the updated manuscript, which provides a more detailed introduction on model configurations and parameter definitions (line 86-99, 101-113, 151-159). We have also updated the Methods to extend the model description.

      (2) Please define gamma_ji in equation 13 and eta_jki in equation 14 (how to map the indices onto the assumed directionality of the interaction).

      We have defined these two parameters in the updated manuscript (line 584-586, 630-632).

      (3)  Line 511: please add at the beginning of this paragraph that you are assuming a grid-like arrangement of patches which will be captured by dispersal term H.

      We have updated this paragraph to make this assumption clear (line 636-637).

      (4)  Line 540: "used in our model" (missing a word).

      We have corrected it in the updated manuscript.

      (5)  Currently the analyses looking at the types of growth effects HGT brings (Figures 5-7) feel very "tacked on". These are not just "confounding factors", but rather scenarios that are much more biologically realistic than the assumption of independent effects. I would introduce them earlier in the text, as I think many readers may not trust your results until they know this was considered (+ how it changes the conclusions).

      We are grateful for the suggestion. We agree with the reviewer that these biologically realistic scenarios should be introduced earlier in the text. In the updated manuscript, we have moved these analyses forward, as sections 3, 4 and 5. We have also avoided the term “confounding factors”. Instead, in the updated manuscript, we have separated these analyses into different sections, and clearly described each scenario in the section title (line 217-218, 254, 275).

      (6)  In some places the manuscript refers to HGT, in others to MGE presence (e.g. caption of Figure 6). These are not generally the same thing, as HGT could also occur due to extracellular vesicles or natural transformation etc. Please standardize the nomenclature and make it clearer which type of processes the model describes.

      We appreciate the comment. The model in this work primarily focused on the process of plasmid transfer. We have made it clear throughout the main text. 

      (7)  In many figures the y-axis starts at a value other than 0. This is a bit misleading. In addition, I would recommend changing the title "Area of bistability region" to "Area of bistability" or perhaps even "Area of multistability" (since more than two species are considered).

      Thanks for the suggestion. We have updated all the relevant figures to make sure that their y-axes start at 0. We have also changed the title “Area of bistability region” to “Area of multistability”, whenever it is applicable.

      (8)  Figure 7: what are the assumed fitness effects of the mobile genes in the simulation? Which distribution were they drawn from? Please add this info to the figure caption here and elsewhere.

      In Figure 7, we explored an extreme scenario of the fitness effects of the mobile genes, where the population was subjected to strong environmental selection and only cells carrying the mobile gene could grow. Therefore, the carriage of the mobile gene changed the species growth rate from 0 to a positive value µ<sub>i</sub>. When calculating the number of stable states in the communities, we randomly drew the µ<sub>i</sub> values from a uniform distribution between 0.3 and 0.7 hr<sup>-1</sup>. We had added this information in the figure caption (line 505-508) and method (line 615-617) of the updated manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Detection of early-stage colorectal cancer is of great importance. Recently, both laboratory scientists and clinicians have reported different exosomal biomarkers to identify colorectal cancer patients.

      Here, the authors exhibited a full RNA landscape for plasma exosomes of 60 individuals, including 31 colorectal cancer (CRC) patients, 19 advanced adenoma (AA) patients, and 10 noncancerous controls. RNAs with high fold change, high absolute abundance, and various module attribution were used to construct RT-qPCR-based RNA models for CRC and AA detection.

      Overall, this is a well-performed proof-of-concept study to highlight exosomal RNAs as potential biomarkers of early-stage colorectal cancer and its precancerous lesions.

      Thank you for your careful evaluation and valuable suggestions, which have provided valuable guidance for the improvement of our paper. In response to your feedback, we have implemented the following improvements.

      (1) Depicting the full RNA landscape of circulating exosomes is still quite challenging. The authors annotated 58,333 RNA species in exosomes, most of which were lncRNAs, but the authors do not explain how they characterized those RNAs.

      Author response and action taken: Thanks for your comments. In the Supplementary Methods section titled "Identification of mRNAs and lncRNAs", we have provided a comprehensive explanation on the characterization of mRNAs and lncRNAs to address the concerns you raised. Characterization of long-chain RNAs is a great challenge. For lncRNA analysis, the transcriptome was assembled using the Cufflinks and Scripture based on the reads mapped to the reference genome. The assembled transcripts were annotated using the Cuffcompare program from the Cufflinks package. The unknown transcripts were used to screen for putative lncRNAs.

      (2) The authors tested their models in a medium size population of 124 individuals, which is not enough to obtain an accurate evaluation of the specificity and sensitivity of the biomarkers proposed here. External validation would be required.

      Author response and action taken: Thanks for your comments. We fully acknowledge the significance of external validations in the evaluation of diagnostic model performance. Unfortunately, as a pilot study, we currently do not have the conditions for a multicenter investigation. To mitigate result bias and overfitting effects, we implemented a rigorous variable selection strategy and enhanced model stability through 10-fold cross-validation. In the meantime, we will persist in our efforts to elevate the quality of our research and seek additional resources for external validation in future studies.

      Reviewer #2:

      The authors present an important study on the potential of small extracellular vesicle (sEV)-derived RNAs as biomarkers for the early detection of colorectal cancer (CRC) and precancerous adenoma (AA). The authors provide a detailed analysis of the RNA landscape of sEVs isolated from participants, identifying differentially expressed sEV-RNAs associated with T1a stage CRC and AA compared to normal controls. The paper further categorises these sEV-RNAs into modules and constructs a 60-gene model that successfully distinguishes CRC/AA from NC samples. The authors also validate their findings using RT-qPCR and propose an optimised classifier with high specificity and sensitivity. Additionally, the authors discuss the potential of sEV-RNAs in understanding CRC carcinogenesis and suggest that a comprehensive biomarker panel combining sEV-RNAs and proteins could be promising for identifying both early and advanced CRC patients. Overall, the study provides valuable insights into the potential clinical application of sEV-RNAs in liquid biopsy for the early detection of CRC and AA.

      Major strengths:

      (1) Comprehensive sEV RNA profiling: The study provides a valuable dataset of the whole-transcriptomic profile of circulating sEVs, including miRNA, mRNA, and lncRNA. This approach adds to the understanding of sEV-RNAs' role in CRC carcinogenesis and facilitates the discovery of potential biomarkers.

      (2) Detection of early-stage CRC and AA: The developed 60-gene t-SNE model successfully differentiated T1a stage CRC/AA from normal controls with high specificity and sensitivity, indicating the potential of sEV-RNAs as diagnostic markers for early-stage colorectal lesions.

      (3) Independent validation cohort: The study combines RNA-seq, RT-qPCR, and modelling algorithms to select and validate candidate sEV-RNAs, maximising the performance of the developed RNA signature. The comparison of different algorithms and consideration of other factors enhance the robustness of the findings.

      Thank you for your careful evaluation and valuable suggestions. These comments have been highly valuable for the performance evaluation and clinical applications of our work. In response to your feedback, we have implemented the following improvements.

      (1). Lack of analysis on T1-only patients in the validation cohort: While the study identifies key sEV-RNAs associated with T1a stage CRC and AA, the validation cohort is only half of the patients in T1(25 out of 49). It would be better to do an analysis using only the T1 patients in the validation cohort, so the conclusion is not affected by the T2-T3 patients.

      Author response and action taken: Thanks for your comments. This feedback is essential for ensuring consistency in the results with our previous findings. In this context, we revalidated various diagnostic panels using exclusively Stage I patients (Figure 7—figure supplement 2). To minimize the potential overfitting effect due to the reduction in sample size after partitioning, we implemented a 10-fold cross-validation for each panel and these panels exhibit promising performance in Stage I colorectal cancer (CRC) patients.

      Author response image 1.

      The ROC analysis of different sEV-RNA signatures in the prediction of Stage I CRC patients by different algorithms (a: 6-gene panel; b: 7-gene panel; c: 8-gene panel; d: 9-gene panel).

      (2). Lack of performance analysis across different demographic and tumor pathology factors listed in Supplementary Table 12. It's important to know if the sEV-RNAs identified in the study work better/worse in different age/sex/tumor size/Yamada subtypes etc.

      Author response and action taken: Thanks for your comments. This feedback will be immensely beneficial for clinical diagnosis. Similarly, cross-validation was performed in this section. We assessed the discriminative effects of CRC on NC, taking into account different age groups, genders, tumor sizes, and anatomical locations (Figure 7—figure supplement 3). Overall, these sEV RNA panels perform better in individuals under the age of 55 and in female patients. There is no significant difference in discriminative effects across different tumor sizes. Compared to rectal cancer, the discriminative effects are better in colon cancer.

      Author response image 2.

      The ROC analysis of different sEV-RNA signatures for predicting CRC patients using the Lasso regression algorithm in different clinical parameters (ab: age; cd: gender; ef: tumor size; gh: anatomical position).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study puts forth the model that under IFN-B stimulation, liquid-phase WTAP coordinates with the transcription factor STAT1 to recruit MTC to the promoter region of interferon-stimulated genes (ISGs), mediating the installation of m<sup>6</sup>A on newly synthesized ISG mRNAs. This model is supported by strong evidence that the phosphorylation state of WTAP, regulated by PPP4, is regulated by IFN-B stimulation, and that this results in interactions between WTAP, the m<sup>6</sup>A methyltransferase complex, and STAT1, a transcription factor that mediates activation of ISGs. This was demonstrated via a combination of microscopy, immunoprecipitations, m<sup>6</sup>A sequencing, and ChIP. These experiments converge on a set of experiments that nicely demonstrate that IFN-B stimulation increases the interaction between WTAP, METTL3, and STAT1, that this interaction is lost with the knockdown of WTAP (even in the presence of IFN-B), and that this IFN-B stimulation also induces METTL3-ISG interactions.

      Strengths:

      The evidence for the IFN-B stimulated interaction between METTL3 and STAT1, mediated by WTAP, is quite strong. Removal of WTAP in this system seems to be sufficient to reduce these interactions and the concomitant m<sup>6</sup>A methylation of ISGs. The conclusion that the phosphorylation state of WTAP is important in this process is also quite well supported.

      Weaknesses:

      The evidence that the above mechanism is fundamentally driven by different phase-separated pools of WTAP (regulated by its phosphorylation state) is weaker. These experiments rely relatively heavily on the treatment of cells with 1,6-hexanediol, which has been shown to have some off-target effects on phosphatases and kinases (PMID 33814344).

      Given that the model invoked in this study depends on the phosphorylation (or lack thereof) of WTAP, this is a particularly relevant concern.

      We are grateful for the reviewer’s positive comment and constructive feedback. 1,6-hexanediol (hex) was considered an inhibitor of hydrophobic interaction, thereby capable of dissolving protein phase separation condensates. Hex (5%-10% w/v) was still widely used to explore the phase separation characteristic and function on various protein, including some phosphorylated proteins such as pHSF1, or kinases such as NEMO1-3. Since hydrophobic interactions involved in various types of protein-protein interaction, the off-target effects of hex were inevitable. It has been reported that hex impaired RNA polymerase II CTD-specific phosphatase and kinase activity at 5% concentration4, which led to the reviewer’s concern. In response to this concern, we investigated the phosphorylation level of WTAP under hex concentration gradient treatment. Surprisingly, we found that both 2% and 5% hex maintained the PPP4c-mediated dephosphorylation level of WTAP but still led to the dissolution of WTAP LLPS condensates (Figure 5-figure supplement 1H; Author response image 1), indicating that hex dispersed WTAP phase separation in a phosphorylation-independent manner. Consistent with our findings, Ge et al. used 10% hex to dissolve WTAP phase separation condensates5. Additionally, we found the phosphorylation level of STAT1 was not affected by both 2% and 5% hex, revealing the off-target and impairment function of hex on kinases or phosphatases might not be universal (Figure 5-figure supplement 1H). Collectively, since the 5% hex we used did not influence the de-phosphorylation event of WTAP, function of WTAP LLPS mediating interaction between methylation complex and STAT1 revealed by our results was independent from its phosphorylation status.

      Author response image 1.

      mCherry-WTAP-rescued HeLa cells were treated with 10 ng/mL IFN-β together with or without 2% or 5% hex and 20 μg/mL digitonin for 1 hour or left untreated. Phase separation of mCherry-WTAP was observed through confocal microscopy. The number of WTAP condensates that diameter over 0.4 μm of n = 20 cells were counted through ImageJ and shown. Scale bars indicated 10 μm. All error bars, mean values ± SD, P-values were determined by unpaired two-tailed Student’s t-test of n = 20 cells in (B). For (A), similar results were obtained for three independent biological experiments.

      Related to this point, it is also interesting (and potentially concerning for the proposed model) that the initial region of WTAP that was predicted to be disordered is in fact not the region that the authors demonstrate is important for the different phase-separated states.

      A considerable number of proteins undergo phase separation via interactions between intrinsically disordered regions (IDRs). IDR contains more charged and polar amino acids to present multiple weakly interacting elements, while lacking hydrophobic amino acids to show flexible conformations6. In our study, we used PLAAC websites (http://plaac.wi.mit.edu/) to predict IDR domain of WTAP, and a fragment (234-249 amino acids) was predicted as prion-like domain (PLD). However, deletion of this fragment failed to abolish the phase separation properties of WTAP, which might be the main confusion to reviewers. To explain this issue, we checked the WTAP structure (within part of MTC complex) from protein data bank (https://www.rcsb.org/structure/7VF2) and found that the prediction of IDR has been renewed due to the update of different algorithm. IDR of WTAP expanded to 245-396 amino acids, encompassing the entire CTD region. Our results demonstrate that the CTD was critical for WTAP LLPS, as CTD deficiency significantly inhibited the formation of liquid condensates both in vitro and in cells. Also, phosphorylation sites on CTD were important for the phase transition of WTAP. These observations highlight the phosphorylation status on CTD region as a key driving force of phase separation, consistent with the established role of IDR in regulating LLPS. We have revised our description on WTAP IDR in our article following the reviewers’ suggestion.

      Taking all the data together, it is also not clear to me that one has to invoke phase separation in the proposed mechanism.

      In this article, we observed that WTAP underwent phase transition during virus infection and IFN-β treatment, and proposed a novel mechanism whereby post translational modification (PTM)-driven WTAP LLPS was required for the m<sup>6</sup>A modification of ISG mRNAs. To verify the hypothesis, we first demonstrated the relationship between PTM and phase separation of WTAP. We constructed WTAP 5ST-D and 5ST-A mutant to mimic WTAP phosphorylation and dephosphorylation status respectively, and figured out that dephosphorylated WTAP underwent LLPS. We also found that PPP4 was the main phosphatase to regulate WTAP dephosphorylation. To further investigation, we introduced the potent PPP4 inhibitor, fostriecin. Consistent with our findings in PPP4 deficient model, fostriecin treatment significantly inhibited the IFN-β-induced dephosphorylation of WTAP and disrupted its LLPS condensates, indicating that PPP4 was the key phosphatase recruited by IFN-β to regulate WTAP, confirming that PTM governs WTAP LLPS dynamics (Figure 2-figure supplement 1C and H). Furthermore, fostriecin-induced impairment of WTAP phosphorylation and phase separation correlated with reduced m<sup>6</sup>A modification level and elevated ISGs expression level (Figure 4C and Figure 4-figure supplement 1E). These findings together emphasized that dephosphorylation is required for WTAP LLPS.

      As for the function of WTAP LLPS, we assumed that WTAP might undergo LLPS to sequester STAT1 together with m<sup>6</sup>A methyltransferase complex (MTC) for mediating m<sup>6</sup>A deposition on ISG mRNAs co-transcriptionally under IFN-β stimulation. Given that hex dissolved WTAP LLPS condensates without affecting dephosphorylation status (Figure 5-figure supplement 1H; Author response image 1), various experiments we performed previously actually confirmed the critical role of WTAP LLPS during m<sup>6</sup>A modification (Figure 4A), as well as the mechanism that WTAP phase separation enhanced the interaction between MTC and STAT1 (Figure 5E-F). Subsequent to reviewer’s comments, more experiments were conducted for further validation. We found the WTAP liquid condensates formed by wild type (WT) WTAP or WTAP 5ST-A mutant (which mimics dephosphorylated-WTAP) could be dissembled by hex, which also led to the impairment of the interaction between STAT1, METTL3 and WTAP (Figure 5-figure supplement 1E). In addition, in vitro experiments demonstrated that hex treatment significantly disrupted the interaction between recombinant GFP-STAT1 and mCherry-WTAP which expressed in the E. coli system (Figure 5F and Figure 5-figure supplement 1G). Notably, this prokaryotic expression system lacks endogenous phosphorylation machinery, resulting in non-phosphorylated mCherry-WTAP. For further validation, we performed the interaction of WTAP WT or 5ST-A with the promoter regions of ISG as well as the m<sup>6</sup>A modification of ISG mRNAs were attenuated by WTAP LLPS dissolution (Figure 4E and Figure 6E). These findings together revealed that WTAP LLPS were the critical mediators of the STAT1-MTC interactions, ISG promoter regions binding and the co-transcription m<sup>6</sup>A modification.

      Collectively, our results demonstrated that IFN-β treatment recruited PPP4c to dephosphorylate WTAP, thereby driving the formation of WTAP liquid condensates which were essential for facilitating STAT1-MTC interaction and m<sup>6</sup>A deposition, subsequently mediated ISG expression. Since we revealed a strong association between PTM-regulated WTAP phase transition and its m<sup>6</sup>A modification function, WTAP LLPS was a novel and functionally distinct mechanism that must be reckoned with in this study.

      Author response image 2.

      Wild type (WT) WTAP or 5ST-A mutant-rescued WTAP<sup>sgRNA</sup> THP-1-derived macrophages are stimulated with or without with 10 ng/mL IFN-β together followed with 2% or 5% 1,6-hexanediol (hex) and 20 μg/mL digitonin for 1 hour or left untreated. antibody and imaged using confocal microscope. Scale bars indicated 10 μm.

      Reviewer #2 (Public review):

      In this study, Cai and colleagues investigate how one component of the m<sup>6</sup>A methyltransferase complex, the WTAP protein, responds to IFNb stimulation. They find that viral infection or IFNb stimulation induces the transition of WTAP from aggregates to liquid droplets through dephosphorylation by PPP4. This process affects the m<sup>6</sup>A modification levels of ISG mRNAs and modulates their stability. In addition, the WTAP droplets interact with the transcription factor STAT1 to recruit the methyltransferase complex to ISG promoters and enhance m<sup>6</sup>A modification during transcription. The investigation dives into a previously unexplored area of how viral infection or IFNb stimulation affects m<sup>6</sup>A modification on ISGs. The observation that WTAP undergoes a phase transition is significant in our understanding of the mechanisms underlying m<sup>6</sup>A's function in immunity. However, there are still key gaps that should be addressed to fully accept the model presented.

      Major points:

      (1) More detailed analyses on the effects of WTAP sgRNA on the m<sup>6</sup>A modification of ISGs:

      a. A comprehensive summary of the ISGs, including the percentage of ISGs that are m<sup>6</sup>A-modified. merip-isg percentage

      b. The distribution of m<sup>6</sup>A modification across the ISGs. Topology

      c. A comparison of the m<sup>6</sup>A modification distribution in ISGs with non-ISGs. Topology

      In addition, since the authors propose a novel mechanism where the interaction between phosphorylated STAT1 and WTAP directs the MTC to the promoter regions of ISGs to facilitate co-transcriptional m<sup>6</sup>A modification, it is critical to analyze whether the m<sup>6</sup>A modification distribution holds true in the data.

      We appreciate the reviewer’s summary of our manuscript and the constructive assessment. We have conducted the related analysis accordingly to present the m<sup>6</sup>A modification in ISGs in our model as reviewers suggested. Our results showed that about 64.29% of core ISGs were m<sup>6</sup>A modified under IFN-β stimulation (Figure 3-figure supplement 1B; Figure 3G), which was consistent with the similar percentage in previous studies[7,8]. The m<sup>6</sup>A distribution of the ISGs transcripts including IFIT1, IFIT2, OAS1 and OAS2 was validated (Figure 3-figure supplement 1H).

      Generally, m<sup>6</sup>A deposition preferentially located in the vicinity of stop codon, while m<sup>6</sup>A modification was highly dynamic under different cellular condition. However, we compared the topology of m<sup>6</sup>A deposition of ISGs with non-ISGs, and found that m<sup>6</sup>A modification of ISG mRNAs showed higher preference of coding sequences (CDS) localization compared to global m<sup>6</sup>A deposition (Figure 3H). Similarly, various researches uncovered the m<sup>6</sup>A deposition regulated by co-transcriptionally m<sup>6</sup>A modification preferred to locate in the CDS [9-11]. Since our results of m<sup>6</sup>A modification distribution of ISGs was consistent with the co-transcriptional m<sup>6</sup>A modification characteristics, we believed that our hypothesis and results were correlated and consistent.

      (2) Since a key part of the model includes the cytosol-localized STAT1 protein undergoing phosphorylation to translocate to the nucleus to mediate gene expression, the authors should focus on the interaction between phosphorylated STAT1 and WTAP in Figure 4, rather than the unphosphorylated STAT1. Only phosphorylated STAT1 localizes to the nucleus, so the presence of pSTAT1 in the immunoprecipitate is critical for establishing a functional link between STAT1 activation and its interaction with WTAP.

      Thank you for the constructive comments. As we showed in Figure 4, we found the enhanced interaction between phase-separated WTAP and the nuclear-translocated STAT1 which were phosphorylated under IFN-β stimulation, indicating the importance of phosphorylation of STAT1. We repeated the immunoprecipitation experiments to clarify the function of pSTAT1 in WTAP interaction. Our results showed that IFN-β stimulation induced the interaction of WTAP with both pSTAT1 and STAT1 (Figure 5-figure supplement 1C), indicating that most of the WTAP-associated STAT1 was phosphorylated. Taken together, our data proved that LLPS WTAP bound with nuclear-translocated pSTAT1 under IFN-β stimulation.

      (3) The authors should include pSTAT1 ChIP-seq and WTAP ChIP-seq on IFNb-treated samples in Figure 5 to allow for a comprehensive and unbiased genomic analysis for comparing the overlaps of peaks from both ChIP-seq datasets. These results should further support their hypothesis that WTAP interacts with pSTAT1 to enhance m<sup>6</sup>A modifications on ISGs.

      Thank you for raising this thoughtful comment. In this study, MeRIP-seq and RNA-seq along with immunoprecipitation and immunofluorescence experiments supported that phase transition of WTAP enhanced its interaction to pSTAT1, thus mediating ISGs m<sup>6</sup>A modification and expression by enhancing its interaction with nuclear-translocated pSTAT1 during virus infection and IFN-β stimulation. However, how WTAP-mediated m<sup>6</sup>A modification was related to STAT1-mediated transcription remained unclear. Several researches have revealed the recruitment of m<sup>6</sup>A methyltransferase complex (MTC) to transcription start sites (TSS) of coding genes and R-loop structure by interacting with transcriptional factors STAT5B, SMAD2/3, DNA helicase DDX21, indicating the engagement of MTC mediated m<sup>6</sup>A modification on nascent transcripts at the very beginning of transcription [11-13]. These researches inspired us that phase-separated WTAP could be recruited to the ISG promoter regions via interacting with nuclear-translocated pSTAT1. To validate this mechanism, we have conducted ChIP-qPCR experiments targeting STAT1 and WTAP, revealed that IFN-β induced the comparable recruitment dynamics of both STAT1 and WTAP to ISG promoter regions (Figure 6A-B). Notably, STAT1 deficiency significantly abolished the bindings between WTAP and ISG promoter regions (Figure 6C). These findings established nuclear-translocated STAT1-dependent recruitment of phase separated WTAP and ISG promoter region, substantiated our hypothesis within the current study. We totally agree that ChIP-seq data will be more supportive to explore the mechanism in depth. We will continuously focus on the whole genome chromatin distribution of WTAP and explore more functional effect of transcriptional factor-dependent WTAP-promoter regions interaction in the future.

      Minor points:

      (1) Since IFNb is primarily known for modulating biological processes through gene transcription, it would be informative if the authors discussed the mechanism of how IFNb would induce the interaction between WTAP and PPP4.

      Protein phosphatase 4 (PPP4) is a multi-subunit serine/threonine phosphatase complex that participates in diverse biologic process, including DDR, cell cycle progression, and apoptosis[14]. Several signaling pathway such as NF-κB and mTOR signaling, can be regulated by PPP4. Previous research showed that deficiency of PPP4 enhanced IFN-β downstream signaling and ISGs expression, which was consistent with our findings that knockdown of PPP4C impaired WTAP-mediated m<sup>6</sup>A modification, enhanced the ISGs expression[15,16]. Since there was no significant enhancement in PPP4 expression level during 0-3 hours of IFN-β stimulation in our results, we explored the PTM that may influence the protein-protein interaction, such as ubiquitination. Intriguingly, we found the ubiquitination level of PPP4 was enhanced after IFN-β stimulation, which may affect the interaction between PPP4 and WTAP (Author response image 3). Further investigation between PPP4 and WTAP will be conducted in our future study.

      Author response image 3.

      HEK 293T transfected with HA-ubiquitin (HA-Ub) and Flag-PPP4 were treated with 10 ng/mL IFN-β or left untreated. Whole cell lysate (WCL) was collected and immunoprecipitation (IP) experiment using anti-Flag antibody was performed, followed with immunoblot. Similar results were obtained for three independent biological experiments.

      (2) The authors should include mCherry alone controls in Figure 1D to demonstrate that mCherry does not contribute to the phase separation of WTAP. Does mCherry have or lack a PLD?

      According to the crystal structure of mCherry protein in protein structure database RCSB-PDB, mCherry protein presents as a β-barrel protein, with no IDRs or multivalent interaction domains including PLD, indicating that mCherry protein has no capability to undergo phase separation. This characteristic makes it a suitable protein to tag and trace the localization or expression levels of proteins, and a negative control for protein phase separation studies. As the reviewer suggested, we include mCherry alone controls in the revised version (Figure 1D).

      (3) The authors should clarify the immunoprecipitation assays in the methods. For example, the labeling in Figure 2A suggests that antibodies against WTAP and pan-p were used for two immunoprecipitations. Is that accurate?

      Due to the lack of phosphorylated-WTAP antibody, the detection of phosphorylation of WTAP was conducted by WTAP-antibody-mediated immunoprecipitation. We conducted immunoprecipitation to pull-down WTAP protein by WTAP antibody, then used antibody against phosphoserine/threonine/tyrosine (pan-p) to detect the phosphorylation level of WTAP. To avoid the misunderstanding, we have modified the description from pan-p to pWTAP (pan-p) in figures and revised the figure legends.

      (4) The authors should include overall m<sup>6</sup>A modification levels quantified of GFP<sup>sgRNA</sup> and WTAP<sup>sgRNA</sup> cells, either by mass spectrometry (preferably) or dot blot.

      We thank reviewer for raising these useful suggestions. As we showed in Figure 3F and J-K, the m<sup>6</sup>A modification event and degradation of ISG mRNAs were significantly depleted in WTAP<sup>sgRNA</sup> cell lines, indicating that function of WTAP was deficient. Thus, we used the WTAP<sup>sgRNA</sup> #2 cell line to perform most of our experiment. Furthermore, we also found 46.4% of global m<sup>6</sup>A modification was decreased in WTAP<sup>sgRNA</sup> THP-1 cells than that of control cells with or without IFN-β stimulation (Author response image 4), further validating that level of m<sup>6</sup>A modification was significantly affected in WTAP<sup>sgRNA</sup> cells. Taken together, our data confirmed that m<sup>6</sup>A methyltransferase activity was efficiently inhibited in our WTAP<sup>sgRNA</sup> cells.

      Author response image 4.

      Control (GFP<sup>sgRNA</sup>) and WTAP<sup>sgRNA</sup> #2 THP-1-derived macrophages were treated with 10 ng/mL IFN-β for 4 hours. Global m<sup>6</sup>A level was detected and quantified through ELISA assays. All error bars, mean values ± SEM, P-values were determined by two-way ANOVA test independent biological experiments.

      Reviewer #3 (Public review):

      Summary:

      This study presents a valuable finding on the mechanism used by WTAP to modulate the IFN-β stimulation. It describes the phase transition of WTAP driven by IFN-β-induced dephosphorylation. The evidence supporting the claims of the authors is solid, although major analysis and controls would strengthen the impact of the findings. Additionally, more attention to the figure design and to the text would help the reader to understand the major findings.

      Strength:

      The key finding is the revelation that WTAP undergoes phase separation during virus infection or IFN-β treatment. The authors conducted a series of precise experiments to uncover the mechanism behind WTAP phase separation and identified the regulatory role of 5 phosphorylation sites. They also succeeded in pinpointing the phosphatase involved.

      Weaknesses:

      However, as the authors acknowledge, it is already widely known in the field that IFN and viral infection regulate m<sup>6</sup>A mRNAs and ISGs. Therefore, a more detailed discussion could help the reader interpret the obtained findings in light of previous research.

      We are grateful for the positive comments and the unbiased advice by the reviewer. To interpret our findings in previous research, we have revised the manuscript carefully and added more detailed discussion on ISG mRNAs m<sup>6</sup>A modification during virus infection or IFN stimulation.

      It is well-known that protein concentration drives phase separation events. Similarly, previous studies and some of the figures presented by the authors show an increase in WTAP expression upon IFN treatment. The authors do not discuss the contribution of WTAP expression levels to the phase separation event observed upon IFN treatment. Similarly, METTL3 and METTL14, as well as other proteins of the MTC are upregulated upon IFN treatment. How does the MTC protein concentration contribute to the observed phase separation event?

      We thank reviewer for pointing out the importance of the concentration dependent phase transition. Previously, Ge et al. discovered that expression level of WTAP was up-regulated during LPS stimulation, thereby promoting WTAP phase separation ability and m<sup>6</sup>A modification, indicating that WTAP concentration indeed affects the phase separation event. In our article, we have generated the phase diagram with different concentration of recombinant mCherry-WTAP in vitro (Figure 1-figure supplement 1C). For in cells experiments, we constructed the TRE-mCherry-WTAP HeLa cells in which the expression of mCherry-WTAP was induced by doxycycline (Dox) in a dose-dependent manner (Author response image 5A). We detected the LLPS of mCherry-WTAP at different concentrations by increasing the doses of Dox, and found that WTAP automatically underwent LLPS only in an artificially high expression level (Author response image 5B). However, the cells we used to detect the WTAP phase separation in our article was mCherry-WTAP-rescued HeLa cells, in which mCherry-WTAP was introduced in WTAP<sup>sgRNA</sup> HeLa cells to stably express mCherry-WTAP. We had adjusted and verified the mCherry-WTAP expression level precisely to make the protein abundance similar to endogenous WTAP in wild type (WT) HeLa cells (Author response image 5C). We also repeated the IFN-β stimulation experiments and found no significant increase of WTAP protein level (Figure 5-figure supplement 1A). These findings indicated the phase separation of WTAP in our article was not artificially induced due to the extremely high protein expression level.

      MTC protein expression level was crucial for m<sup>6</sup>A modification during virus infection event. Rubio et al. and Winkler et al. revealed that WTAP, METTL3 and METTL14 were up-regulated after 24 hours of HCMV infection[8,17]. Recently, Ge et al. proposed that WTAP protein was degraded after 12 hours of VSV and HSV stimulation5,18. However, these studies only focused on the virus infection event, how the MTC protein expression change after direct IFN-β stimulation was still unclear. Our study investigated the transition change of WTAP under IFNβ stimulation in a short time, we detected the expression level of MTC proteins within 6 hours of IFN-β treatment, and found no significant enhancement of WTAP, METTL3 or METTL14 protein level and mRNA level (Figure 5-figure supplement 1B and Figure 5-figure supplement 1A;). Our in vitro experiments showed that introducing CFP-METTL3 protein have no significant influence on WTAP phase separation (Figure 4H). Additionally, we performed in cells experiments and found that increased expression of METTL3 had no effect on WTAP phase separation event (Author response image 5D). Taken together, WTAP phase separation can be promoted by dramatically increased concentration of WTAP both in vitro and in cells, but the phase separation of WTAP under IFN-β stimulation in our study was not related with the expression level of MTC proteins.

      Author response image 5.

      (A) Immunoblot analysis of the expression of mCherry-WTAP in TRE-mCherry-WTAP HeLa cells treated with different doses of doxycycline (Dox). Protein level of mCherry-WTAP was quantified and normalized to β-actin of n=3 independent biological experiments. Results were obtained for three independent biological experiments. (B) Phase separation diagram of mCherry-WTAP in TRE-mCherry-WTAP HeLa cells treated with different doses of Dox were observed through confocal microscopy. Representative images for three independent biological experiments were shown in b while number of WTAP condensates that diameter over 0.4 μm of n=80 cells were counted and shown as medium with interquartile range. Dotted white lines indicated the location of nucleus. Scale bars indicated 10 μm. (C) Immunoblot analysis of the expression of endogenous WTAP in wildtype (WT) HeLa cells and mCherry-WTAP-rescued WTAP<sup>sgRNA</sup> HeLa cells. (D) mCherry-WTAP-rescued HeLa cells were transfected with 0, 200 or 400 ng of Flag-METTL3, followed with 10 ng/mL IFN-β for 1 hour or left untreated (UT). Phase separation of mCherry-WTAP was observed through confocal microscopy. The number of WTAP condensates that diameter over 0.4 μm of n = 20 cells were counted through ImageJ and shown. Representative images of n=20 cells were shown. All error bars, mean values ± SD were determined by unpaired two-tailed Student’s t-test of n = 3 independent biological experiments in (A). For (A, C), similar results were obtained for three independent biological experiments.

      How is PP4 related to the IFN signaling cascade?

      Both reviewer #2 and reviewer #3 raised a similar point on the relationship between PPP4 and IFN signaling. In short, protein phosphatase 4 (PPP4) participates in diverse biologic process, including DDR, cell cycle progression and apoptosis14 and several signaling pathway. Previous research showed that deficiency of PPP4 enhanced IFN-β downstream signaling and ISGs expression, which was consistent with our findings that knockdown of PPP4C impaired WTAP-mediated m<sup>6</sup>A modification, and enhanced the ISGs expression[15,16]. Since there was no significant enhancement in PPP4C expression level during 0-3 hours of IFN-β stimulation in our results, we tried to explore the post-translation modification which may influence the protein-protein interaction, such as ubiquitination. Intriguingly, we found the ubiquitination level of PPP4 was enhanced after IFN-β stimulation, which may affect the interaction between PPP4 and WTAP (Author response image 4). Investigation between PPP4 and WTAP will be conducted in our further study (also see minor points 1 of reviewer#2).

      In general, it is very confusing to talk about WTAP KO as WTAPgRNA.

      As we describe above, all WTAP-deficient THP-1 cells were generated using the CRISPR-Cas9 system with WTAP-specific sgRNA, and used bulk cells instead of the monoclonal knockout cell for further experiments. Since monoclonal knockout cells were not obtained, we refer it as WTAP<sup>sgRNA</sup> THP-1 cells rather than WTAP-KO THP-1 cells. We confirmed that WTAP expression was efficiently knocked down in WTAP<sup>sgRNA</sup> THP-1 cells, and the m<sup>6</sup>A modification level was significantly impaired (Figure 3I, Figure 3-figure supplement 1A and Author response image 4), which was suitable for mechanism investigation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Related to the points raised in 'weaknesses' above, if the authors want to claim that this mechanism is reliant on WTAP phase-separated states, additional controls should be done to demonstrate this. Based on the available data it seems that it is just as likely that the phosphorylation state of WTAP is mediating interactions with other factors and/or altering its subcellular localization, which may or may not be related to phase separation.

      We are grateful for the constructive suggestions. As we showed in , Figure 5-figure supplement 1H; Author response image 1 along with the explanation above, 5% hex dispersed the phase separation condensates of WTAP without affecting its phosphorylation status, proving the interaction between STAT1 and methylation complex impaired by hex was depended on WTAP LLPS but not its phosphorylation status (Figure 5E-H). To further confirmed the recruitment of WTAP LLPS on ISG promoter region, we performed the immunoprecipitation and ChIP-qPCR experiments of wild type (WT) WTAP, 5ST-D and 5ST-A rescued THP-1 cells. Our results uncovered the interaction between de-phosphorylated-mimic WTAP mutant and STAT1, and its binding ability with ISG promoter regions were depleted by hex without affecting its phosphorylation status (Author response image 2, Figure 5-figure supplement 1 F, Figure 6E). Taken together, we identified the de-phosphorylation event that regulated phase transition of WTAP during IFN-β stimulation, and proposed that LLPS of WTAP mediated by dephosphorylation was the key mechanism to bind with STAT1 and mediate the m<sup>6</sup>A modification on ISG mRNAs.

      Reviewer #2 (Recommendations for the authors):

      The author order is different for the main text and the supplementary file.

      Thank you for pointing out this mistake. We have corrected it in our revised version.

      Reviewer #3 (Recommendations for the authors):

      Signaling molecules? Do you mean RNA or protein post-translational modification?

      Thank you for pointing out this problem. In this sentence, we mean the post-translational modification of signaling proteins. We have corrected this mistake in our revised version.

      Line 145: Do you mean Figure 1C?

      We have corrected it in our revised version.

      Line 214: Are the cells KO for WTAP? Do you mean CRISPR KO? In general, it is easier to present WTAPgRNA as WTAPKO.

      Thank you for the constructive suggestion. As we explained above, in this project, all WTAP-deficient THP-1 cells were generated using the CRISPR-Cas9 system with WTAP-specific sgRNA, and used bulk cells instead of the monoclonal knockout cells. We confirmed that WTAP expression was efficiently knocked down in WTAP<sup>sgRNA</sup> THP-1 cells, and the m<sup>6</sup>A modification level was significantly impaired (Figure 3I, Figure3-figure supplement 1A and Author response image 4). Since monoclonal knockout cells were not obtained, we refer it as WTAP<sup>sgRNA</sup> THP-1 cells rather than WTAP-KO THP-1 cells.

      Line 221: WTAP KO has no effect on the expression level of transcription factors?

      Thank you for pointing out the uncritical expression. We have corrected this in our revised version.

      Figure 3C: I would suggest removing the tracks and showing the expression levels in TPMs.

      According to the reviewer’s suggestion, we have analyzed the results and showed the ISGs expression levels in fold change of TPMs (Figure 3D).

      Figure 4C: It seems that the IP efficiency from METTL3 is lower in WTAP KO cells. That may impact the author's conclusions.

      We have repeated the immunoprecipitation experiments of METTL3 and confirmed the immunoprecipitation (IP) efficiency from METTL3 had no difference between WTAP<sup>sgRNA</sup> cells and the control cells (Figure 5C).

      I would suggest performing an IP of WTAP with the 5StoA mutation to confirm the missing interaction with WTAP.

      According to the reviewer’s suggestion, we investigated the interaction between STAT1 and WTAP in WT cells and WTAP 5ST-A-rescued THP-1 cells. Our results showed that interaction between STAT1, METTL3 and WTAP were enhanced with WTAP 5ST-A mutation, which was depleted by hex treatment (Figure 5-figure supplement 1E). Thus, the interaction of WTAP WT or 5ST-A with the promoter regions of ISG were attenuated by WTAP LLPS dissolution (Figure 6E). Taken together, the interaction between STAT1 and MTC were relied on LLPS of WTAP.

      In the graphical abstract, it is confusing to represent WTAP as a line. All other proteins are presented as circles. It is easy to confuse WTAP protein with an RNA. Additionally, m<sup>6</sup>A is too big in size. It should be smaller than a protein.

      We thank the reviewer for raising this suggestion. We have modified the graphical abstract to avoid the confusion in our revised version (Figure 6F).

      References

      (1) Wegmann, S., Eftekharzadeh, B., Tepper, K., Zoltowska, K.M., Bennett, R.E., Dujardin, S., Laskowski, P.R., MacKenzie, D., Kamath, T., Commins, C., et al. (2018). Tau protein liquid-liquid phase separation can initiate tau aggregation. The EMBO journal 37. 10.15252/embj.201798049.

      (2) Lu, Y., Wu, T., Gutman, O., Lu, H., Zhou, Q., Henis, Y.I., and Luo, K. (2020). Phase separation of TAZ compartmentalizes the transcription machinery to promote gene expression. Nat Cell Biol 22, 453-464. 10.1038/s41556-020-0485-0.

      (3) Zhang, H., Shao, S., Zeng, Y., Wang, X., Qin, Y., Ren, Q., Xiang, S., Wang, Y., Xiao, J., and Sun, Y. (2022). Reversible phase separation of HSF1 is required for an acute transcriptional response during heat shock. Nat Cell Biol 24, 340-352. 10.1038/s41556-022-00846-7.

      (4) Duster, R., Kaltheuner, I.H., Schmitz, M., and Geyer, M. (2021). 1,6-Hexanediol, commonly used to dissolve liquid-liquid phase separated condensates, directly impairs kinase and phosphatase activities. J Biol Chem 296, 100260. 10.1016/j.jbc.2021.100260.

      (5) Ge, Y., Chen, R., Ling, T., Liu, B., Huang, J., Cheng, Y., Lin, Y., Chen, H., Xie, X., Xia, G., et al. (2024). Elevated WTAP promotes hyperinflammation by increasing m<sup>6</sup>A modification in inflammatory disease models. J Clin Invest 134. 10.1172/JCI177932.

      (6) Hou, S., Hu, J., Yu, Z., Li, D., Liu, C., and Zhang, Y. (2024). Machine learning predictor PSPire screens for phase-separating proteins lacking intrinsically disordered regions. Nat Commun 15, 2147. 10.1038/s41467-024-46445-y.

      (7) McFadden, M.J., McIntyre, A.B.R., Mourelatos, H., Abell, N.S., Gokhale, N.S., Ipas, H., Xhemalce, B., Mason, C.E., and Horner, S.M. (2021). Post-transcriptional regulation of antiviral gene expression by N6-methyladenosine. Cell Rep 34, 108798. 10.1016/j.celrep.2021.108798.

      (8) Winkler, R., Gillis, E., Lasman, L., Safra, M., Geula, S., Soyris, C., Nachshon, A., Tai-Schmiedel, J., Friedman, N., Le-Trilling, V.T.K., et al. (2019). m(6)A modification controls the innate immune response to infection by targeting type I interferons. Nat Immunol 20, 173-182. 10.1038/s41590-018-0275-z.

      (9) Li, Y., Xia, L., Tan, K., Ye, X., Zuo, Z., Li, M., Xiao, R., Wang, Z., Liu, X., Deng, M., et al. (2020). N(6)-Methyladenosine co-transcriptionally directs the demethylation of histone H3K9me2. Nat Genet 52, 870-877. 10.1038/s41588-020-0677-3.

      (10) Huang, H., Weng, H., Zhou, K., Wu, T., Zhao, B.S., Sun, M., Chen, Z., Deng, X., Xiao, G., Auer, F., et al. (2019). Histone H3 trimethylation at lysine 36 guides m(6)A RNA modification co-transcriptionally. Nature 567, 414-419. 10.1038/s41586-019-1016-7.

      (11) Barbieri, I., Tzelepis, K., Pandolfini, L., Shi, J., Millan-Zambrano, G., Robson, S.C., Aspris, D., Migliori, V., Bannister, A.J., Han, N., et al. (2017). Promoter-bound METTL3 maintains myeloid leukaemia by m(6)A-dependent translation control. Nature 552, 126-131. 10.1038/nature24678.

      (12) Hao, J.D., Liu, Q.L., Liu, M.X., Yang, X., Wang, L.M., Su, S.Y., Xiao, W., Zhang, M.Q., Zhang, Y.C., Zhang, L., et al. (2024). DDX21 mediates co-transcriptional RNA m(6)A modification to promote transcription termination and genome stability. Mol Cell 84, 1711-1726 e1711. 10.1016/j.molcel.2024.03.006.

      (13) Bhattarai, P.Y., Kim, G., Lim, S.C., and Choi, H.S. (2024). METTL3-STAT5B interaction facilitates the co-transcriptional m(6)A modification of mRNA to promote breast tumorigenesis. Cancer Lett 603, 217215. 10.1016/j.canlet.2024.217215.

      (14) Dong, M.Z., Ouyang, Y.C., Gao, S.C., Ma, X.S., Hou, Y., Schatten, H., Wang, Z.B., and Sun, Q.Y. (2022). PPP4C facilitates homologous recombination DNA repair by dephosphorylating PLK1 during early embryo development. Development 149. 10.1242/dev.200351.

      (15) Zhan, Z., Cao, H., Xie, X., Yang, L., Zhang, P., Chen, Y., Fan, H., Liu, Z., and Liu, X. (2015). Phosphatase PP4 Negatively Regulates Type I IFN Production and Antiviral Innate Immunity by Dephosphorylating and Deactivating TBK1. J Immunol 195, 3849-3857. 10.4049/jimmunol.1403083.

      (16) Raja, R., Wu, C., Bassoy, E.Y., Rubino, T.E., Jr., Utagawa, E.C., Magtibay, P.M., Butler, K.A., and Curtis, M. (2022). PP4 inhibition sensitizes ovarian cancer to NK cell-mediated cytotoxicity via STAT1 activation and inflammatory signaling. J Immunother Cancer 10. 10.1136/jitc-2022-005026.

      (17) Rubio, R.M., Depledge, D.P., Bianco, C., Thompson, L., and Mohr, I. (2018). RNA m(6) A modification enzymes shape innate responses to DNA by regulating interferon beta. Genes Dev 32, 1472-1484. 10.1101/gad.319475.118.

      (18) Ge, Y., Ling, T., Wang, Y., Jia, X., Xie, X., Chen, R., Chen, S., Yuan, S., and Xu, A. (2021). Degradation of WTAP blocks antiviral responses by reducing the m(6) A levels of IRF3 and IFNAR1 mRNA. EMBO Rep 22, e52101. 10.15252/embr.202052101.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The model of phosphotransfer from Y169 IKK to S32 IkBa is compelling and an important new contribution to the field. In fact, this model will not be without controversy, and publishing the work will catalyze follow-up studies for this kinase and others as well. As such, I am supportive of this paper, though I do also suggest some shortening and modification.

      We appreciate the reviewers candid response on the difficulty of this study and the requirement of follow-up studies to confirm a direct transfer of the phosphate. We also have edited the manuscript to make it shorter.

      Generally, the paper is well written, but several figures should be quantified, and experimental reproducibility is not always clear. The first 4 figures are slow-going and could be condensed to show the key points, so that the reader gets to Figures 6 and 7 which contain the "meat" of the paper.

      We have indicated the experimental reproducibility in the methodology section against each assay. We have shortened the manuscript corresponding to sections describing figures 1-4. However, when we talked to some of our colleagues whose expertise do not align with kinases and IKK, we realized that some description were necessary to introduce them to the next figures. Additionally, we added Fig. S6 indicating that the radiolabelled phospho-IKK2 Y169F is unable to transfer its own phosphate group(s) to the substrate IkBa.

      Reviewer #2 (Public Review):

      Phosphorylation of IκBα is observed after ATP removal, although there are ambiguous requirements for ADP.

      We agree with the reviewer that this observation is puzzling. We hypothesize that ADP is simultaneously regulating the transfer process likely through binding to the active site.

      It seems that the analysis hinges on the fidelity of pan-specific phosphotyrosine antibodies.

      We agree with the reviewer. To bolster our conclusion, we used antibodies from two different sources. These were Monoclonal mouse anti-Phospho-Tyrosine (catalogue number: 610000) was from BD Biosciences or from EMD Millipore (catalogue no. 05-321X).

      The analysis often returns to the notion that tyrosine phosphorylation(s) (and critical active site Lys44) dictate IKK2 substrate specificity, but evidence for this seems diffuse and indirect. This is an especially difficult claim to make with in vitro assays, omitting the context of other cellular specificity determinants (e.g., localization, scaffolding, phosphatases).

      We agree with the concerns that the specificity could be dependent on other cellular specificity determinants and toned down our claims where necessary. However, we would like to point out that the specificity of IKK2 towards S32 and S36 of IkBa in cells in response to specific stimuli is well-established. It is also well-established that its non-catalytic scaffolding partner NEMO is critical in selectively bringing IkBa to IKK from a large pool of proteins. The exact mechanism of how IKK2 choose the two serines amongst many others in the substrate is not clear.

      Multiple phosphorylated tyrosines in IKK2 were apparently identified by mass spectrometric analyses, but the data and methods are not described. It is common to find non-physiological post-translational modifications in over-expressed proteins from recombinant sources. Are these IKK2 phosphotyrosines evident by MS in IKK2 immunoprecipitated from TNFa-stimulated cells? Identifying IKK2 phosphotyrosine sites from cells would be especially helpful in supporting the proposed model.

      Mass spectrometric data for identification of phosphotyrosines from purified IKK2 is now incorporated (Figure S3A). Although we have not analyzed IKK2 from TNF-a treated cells in this study, a different study of phospho-status of cellular IKK2 indicated tyrosine phosphorylation (Meyer et al 2013).

      Reviewer #3 (Public Review):

      The identity and purity of the used proteins is not clear. Since the findings are so unexpected and potentially of wide-reaching interest - this is a weakness. Similar specific detection of phospho-Ser/Thr vs phospho-Tyr relies largely on antibodies which can have varying degrees of specificity.

      We followed a stringent purification protocol of several steps (optimized for the successful crystallization of the IKK2) that removed most impurities (PMID: 23776406, PMID: 39227404). The samples analysed with ESI MS did not show any significant contaminating kinase from the Sf9 cells.

      Sequence specific phospho-antibodies used in this study are very well characterized and have been used in the field for years (Basak et al 2007, PMID: 17254973). We agree on the reviewer’s concerns on the pan-specific phospho-antibodies. Since phospho-tyrosine detection is the crucial aspect of this study, we minimized such bias by using pan-specific phosphotyrosine antibodies from two independent sources.

      Reviewer #1 (Recommendations For The Authors):

      I understand that Figure 3 shows that K44M abolishes both S32/26 phosphorylation and tyrosine phosphorylation, but not PEST region phosphorylation. This suggests that autophosphorylation is reflective of its known specific biological role in signal transduction. But I do not understand why "these results strongly suggest that IKK2-autophosphorylation is critical for its substrate specificity". That statement would be supported by a mutant that no longer autophosphorylates, and as a result shows a loss of substrate specificity, i.e. phosphorylates non-specific residues more strongly. Is that the case? Maybe Darwech et al 2010 or Meyer et al 2013 showed this.

      Later figures seem to address this point, so maybe this conclusion should be stated later in the paper.

      We have now clarified this in the manuscript and moved the comment to the next section. We have consolidated the results in Figure 3 and 4 in the previous version into a single figure in Figure. The text has also been modified accordingly.

      Page 10: mentions DFG+1 without a proper introduction. The Chen et al 2014 paper appears to inform the author's interest in Y169 phosphorylation, or is it just an additional interesting finding? Does this publication belong in the Introduction or the Discussion?

      The position of Y169 at the DFG+1 was intriguing and the 2014 article Chen et al further bolstered our interest in this residue to be investigated. We think this publication is important in both sections. 

      To understand the significance of Figure 4D, we need a WT IKK2 control: or is there prior literature to cite? This is relevant to the conclusion that Y169 phosphorylation is particularly important for S32 phosphorylation.

      We have now added a new supplementary figure where activities of WT and Y169F IKK2 towards WT and S32/S36 mutants are compared (Figure S3F). At a similar concentration, the activity of WT-IKK2 is many fold higher than that of YtoF mutants (Fig. 4C). The experiments were performed simultaneously, although samples were loaded on different gels but otherwise processed in a similar way. The corresponding data is now included in the manuscript as Figure S3F.

      The cold ATP quenching experiment is nice for testing the model that Y169 functions as a phospho sink that allows for a transfer reaction. However, there is only a single timepoint and condition, which does not allow for a quantitative analysis. Furthermore, a positive control would make this experiment more compelling, and Y169F mutant should show that cold ATP quenching reduces the phosphorylation of IkBa.

      We thank the reviewer for appreciating our experimental design, and pointing out the concerns. We kept the ATP-time point as the maximum of the non-competition experiment. Also, we took 50mM ATP to compare its competition with highest concentration of ADP used. The idea behind using the maximum time and ATP (comparable to ADP) was to capture the effect of competitive-effect of ATP, if any, that would be maximal in the given assay condition in comparison with the phospho-transfer set up in absence of cold ATP. We agree that finer ranges of ATP concentration and time points would have enabled more quantitative analyses. We have now included data where different time intervals are tested (Figure S5D).

      Why is the EE mutant recognized by anti-phospho-serine antibodies? In Figure 2F.

      We anticipate Serine residues besides those in the activation loop to be phosphorylated when IKK2 is overexpressed and purified from the Sf9 cells. Since Glu (E) mimics phospho-Ser, the said antibody cross reacts with the IKK2-EE that mimics IKK2 phosphorylated at Ser177 and 181.

      Figure 7B is clear, but 7C does not add much.

      We have now removed the Fig. 7C in the current version. Figure 7 is now renumbered as Figure 6 that does not contain the said cartoon.  

      Reviewer #2 (Recommendations For The Authors):

      Regarding the specificity arguments (see above in public review), the authors note that NEMO is very important in IKK specificity, and - if I'm understanding correctly - most of these assays were performed without NEMO. Would the IKK2-NEMO complex change these conclusions?

      NEMO is a scaffolding protein whose action goes beyond the activation of the IKK-complex. In cells, NEMO brings IkBa from a pool of thousands of proteins to its bonafide kinase when the cells encounter specific signals. In other words, NEMO channels IKK-activity towards its bonafide substrate IkBa at that moment. Though direct proof is lacking, it is likely that NEMO present IkBa in the correct pose to IKK such that the S32/S36 region of IkBa is poised for phosphorylation. The proposed mechanism in the current study further ensures the specificity and fidelity of that phosphorylation event. We believe this mechanism will be preserved in the IKK-NEMO complex unless proven otherwise. As shown below, IKK2 undergoes tyrosine autophosphorylation in presence of NEMO.

      Author response image 1.

      The work primarily focuses on Y169 as a candidate target for IKK autophosphorylation. This seems reasonable given the proximity to the ATP gamma phosphate. However, Y188F more potently disrupted IκBα phosphorylation. The authors note that this could be due to folding perturbations, but this caveat would also apply to Y169F. A test for global fold perturbations for both Tyr mutants would be helpful.

      Y188 is conserved in S/T kinases and that in PKA (Y204) has been studied extensively using structural, biochemical and biophysical tools. It was found in case of PKA that Y204 participates in packing of the hydrophobic core of the large lobe. Disruption of this core structure by mutation allosterically affect the activity of the kinase. We also observed similar engagement of Y188 in IKK2’s large lobe, and speculated folding perturbations in analogy with the experimental evidence observed in PKA. What we meant was mutation of Y188 would allosterically affect the kinase activity. Y169 on the other hand is unique at that position, an no experimental evidence on the effect of phospho-ablative mutation of this residue exist in the literature. Hence, we refrained from speculating its effect on the folding or conformational allostery, however, such a possibility cannot be ruled out. 

      I struggled to follow the rationalization of the results of Figure 4D, the series of phosphorylation tests of Y169F against IκBα with combinations of phosphoablative or phosphomimetic variants at Ser32 and Ser36. This experiment is hard to interpret without a direct comparison to WT IKK2.

      We agree with the reviewer’s concerns. Through this experiment we wanted to inform about the importance of Tyr-phosphorylation of IKK2 in phosphorylating S32 of IκBα which is of vital importance in NF-kB signaling. We have now provided a comparison with WT-IKK2 in the supplementary Figure S3F. We hope this will help bring more clarity to the issue.

      MD simulations were performed to compare structures of unphosphorylated vs. Ser-phosphorylated (p-IKK2) vs. Ser+Tyr-phosphorylated (P-IKK2) forms of IKK2. These simulations were performed without ATP bound, and then a representative pose was subject to ADP or ATP docking. The authors note distortions in the simulated P-IKK2 kinase fold and clashes with ATP docking. Given the high cellular concentration of ATP, it seems more logical to approach the MD with the assumption of nucleotide availability. Most kinase domains are highly dynamic in the absence of substrate. Is it possible that the P-IKK2 poses are a result of simulation in a non-physiological absence of bound ATP? Ultimately, this MD observation is linked to the proposed model where ADP-binding is required for efficient phospho-relay to IκBα. Therefore, this observation warrants scrutiny. Perhaps the authors could follow up with binding experiments to directly test whether P-IKK2 binds ADP and fails to bind ATP.

      We thank that reviewer for bringing up this issue. This is an important issue and we must agree that we don’t fully understand it yet. We took more rigorous approach this time where we used three docking programs: ATP and ADP were docked to the kinase structures using LeDock and GOLD followed by rescoring with AutoDock Vina. We found that ATP is highly unfavourable to P-IKK2 compared to ADP. To further address these issues, we performed detailed MM-PBSA (Molecular Mechanics Poisson-Boltzmann Surface Area) analyses after MD-simulation to estimate binding free energies and affinities of ADP and ATP for each of the three differently phosphorylated states of IKK2. These analyses (Figure S4 E and F) clearly indicate that phosphorylated IKK2 have much higher preference for ADP over ATP. However, it does not negate ATP-binding by P-IKK2 in a different pose that may not support kinase activity.

      We could not perform any binding experiment because of the following reason. We incubated FL IKK2 WT with or without cold ATP for 30mins, and then incubated these samples with <sup>32</sup>P-ATP and analysed the samples by autoradiography after resolving them on a 10% SDS-PAGE. We found that even after pre-incubation of the kinase with excess cold ATP it still underwent autophosphorylation when radioactive ATP was added as shown below. This prevented us from doing direct binding experiment with ATP as it would not represent true binding event. We also noticed that after removal of bulk ATP post autophosphorylation, phosphorylated IKK2 is capable of further autophosphorylation when freshly incubated with ATP. We have not been able to come up with a condition that would only account for binding of ATP and not hydrolysis. 

      Author response image 2.

      The authors could comment on whether robust phosphorylation of NEMO was expected (Figure 1D). On a related note, why is NEMO a single band in the 1D left panel and double bands on the right?

      No, we did not expect robust phosphorylation of NEMO. However, robust phosphorylation of NEMO is observed only in the absence of IκBα. In presence of IκBα, phosphorylation of NEMO goes down drastically. These were two different preparations of NEMO. When TEV-digestion to remove His-tag is incomplete it gives two bands as the tagged and untagged versions cannot be separated in size exclusion chromatography which is the final step.

      Page 14, line 360. "...observed phosphorylation of tyrosine residue(s) only upon fresh ATP-treatment..." I'm not sure I understand the wording here (or the relevance of the citation). Is this a comment on unreported data demonstrating the rapid hydrolysis of the putative phosphotyrosine(s)? If so, that would be helpful to clarify and report in the supporting information.

      In our X-ray crystallographic studies with phosphorylated IKK2 we failed to observe any density of phosphate moiety. Furthermore, this IKK2 showed further autophosphorylation when incubated with fresh ATP. These two observations lead us to believe that some of the autophosphorylation are transient in nature. However, quantitative kinetic analyses of this dephosphorylation have not been performed.

      Figure S3 middle panel: The PKA substrate overlaid on the IKK2 seems sterically implausible for protein substrate docking. Is that just a consequence of the viewing angle? On a related note, Figure S3 may be mislabeled as S4 in the main text).

      It is a consequence of the viewing angle. Also, we apologize for this inadvertent mislabelling. It has been corrected in the current version.

      Reviewer #3 (Recommendations For The Authors):

      The detection of phosphorylated amino acids relies largely on antibodies which can have a varying degree of specificity. An alternative detection mode of the phospho-amino acids for example by MS would strengthen the evidence.

      We agree with the concern of specificity bias of antibodies. We tried to minimize such bias by using two different p-Tyr antibodies as noted previously and also in the methodology section. We were also able to detect phospho-tyrosine residues by MS/MS analyses, representative spectra are now added (Figure S3A).

      IKK2 purity - protocol states "desired purity". What was the actual purity and how was it checked? MS would be useful to check for the presence of other kinases.

      Purity of the recombinantly purified IKK2s are routinely checked by silver staining. A representative silver stained SDS-PAGE is shown (Figure S1C). It may be noted that, there’s a direct correlation of expression level and solubility, and hence purification yield and quality with the activity of the kinase. Active IKK2s express at much higher level and yields cleaner prep. In our experience, inactive IKKs like K44M give rise to poor yield and purity. We analysed K44M by LC MS/MS to identify other proteins present in the sample. We did not find any significant contaminant kinase the sample (Figure S1D). The MS/MS result is attached.

      Figure 1C&D: where are the Mw markers? What is the size of the band? What is the MS evidence for tyrosine phosphorylation?

      We have now indicated MW marker positions on these figures.

      MS/MS scan data for the two peptides containing pTyr169 and pTyr188 are shown separately (Figure S3A).

      Figure 2F: Why is fresh ATP necessary? Why was Tyr not already phosphorylated? The kinetics of this process appear to be unusual when the reaction runs to completion within 5 minutes ?

      As stated earlier, we believe some of the autophosphorylation are transient in nature. We think the Tyr-phosphorylation are lost due to the action of cellular phosphatases. We agree with the concern of the reviewer that, the reaction appears to reach completion within 5 minutes in Fig 2F. We believe it is probably due to the fact that the amount of kinase used in this study exceeds the linear portion of the dynamic range of the antibody used. Lower concentration of the kinase do show that reaction does not reach completion until 60mins as shown in Fig. 2A.

      Figure 3: Can the authors exclude contamination with a Tyr kinase in the IKK2-K44M prep? The LC/MS/MS data should be included.

      We have reanalysed the sample on orbitrap to check if there’s any Tyr-kinase or any other kinase contamination. We used Spodoptera frugiperda proteome available on the Uniprot website for this analysis. These analyses confirmed that there’s no significant kinase contaminant present in the fraction (Figure S1D).

      What is the specificity of IKK-2 Inhibitor VII? Could it inhibit a contaminant kinase?

      This inhibitor is highly potent against IKK2 and the IKK-complex, and to a lesser extent to IKK1. No literature is available on its activity on other kinases. In an unrelated study, this compound was used alongside MAPK inhibitor SB202190 wherein they observed completely different outcomes of these two inhibitors (Matou-Nasri S, Najdi M, AlSaud NA, Alhaidan Y, Al-Eidi H, Alatar G, et al. (2022) Blockade of p38 MAPK overcomes AML stem cell line KG1a resistance to 5-Fluorouridine and the impact on miRNA profiling. PLoS ONE 17(5):e0267855. https://doi.org/10.1371/journal.pone.0267855). This study indirectly proves that IKK inhibitor VII does not fiddle with the MAPK pathways. We have not found any literature on the non-specific activity of this inhibitor.

      Figure 6B: the band corresponding to "p-IkBa" appears to be similar in the presence of ADP (lanes 4-7) or in the absence of ADP but the presence of ATP (lane 8).

      Radioactive p-IκBα level is more when ADP is added than in absence of ADP. In presence of cold ATP, radioactive p-IκBα level remains unchanged. This result strongly indicate that the addition of phosphate group to IκBα happens directly from the radioactively labelled kinase that is not competed out by the cold ATP.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      In the manuscript "Intergenerational transport of double-stranded RNA limits heritable epigenetic changes," Shugarts and colleagues investigate intergenerational dsRNA transport in the nematode C. elegans. By inducing oxidative damage, they block dsRNA import into cells, which affects heritable gene regulation in the adult germline (Fig. 2). They identify a novel gene, sid-1-dependent gene-1 (sdg-1), upregulated upon SID-1 inhibition (Fig. 3). Both transient and genetic depletion of SID-1 lead to the upregulation of sdg-1 and a second gene, sdg-2 (Fig. 5). Interestingly, while sdg-1 expression suggests a potential role in dsRNA transport, neither its overexpression nor loss-of-function impacts dsRNA-mediated silencing in the germline (Fig. 7).

      Strengths:

      • The authors employ a robust neuronal stress model to systematically explore SID-1 dependent intergenerational dsRNA transport in C. elegans.

      • They discover two novel SID-1-dependent genes, sdg-1 and sdg-2.

      • The manuscript is well-written and addresses the compelling topic of dsRNA signaling in C. elegans.

      Weaknesses:

      • The molecular mechanism downstream of SDG-1 remains unclear. Testing whether sdg-2 functions redundantly with sdg-1could provide further insights.

      • SDG-1 dependent genes in other nematodes remain unknown.

      We thank the reviewer for highlighting the strengths of the work along with a couple of the interesting future directions inspired by the reported discoveries. The restricted presence of genes encoding SDG-1 and its paralogs within retrotransposons suggests intriguing evolutionary roles for these proteins. Future work could examine whether such fast-evolving or newly evolved proteins with potential roles in RNA regulation are more broadly associated with retrotransposons. Multiple SID-1-dependent proteins (including SDG-1 and SDG-2) could act together to mediate downstream effects. This possibility can be tested using combinatorial knockouts and overexpression strains. Both future directions have the potential to illuminate the evolutionarily selected roles of dsRNA-mediated signaling through SID-1, which remain a mystery.

      Reviewer #2 (Public review):

      Summary:

      RNAs can function across cell borders and animal generations as sources of epigenetic information for development and immunity. The specific mechanistic pathways how RNA travels between cells and progeny remains an open question. Here, Shugarts, et al. use molecular genetics, imaging, and genomics methods to dissect specific RNA transport and regulatory pathways in the C. elegans model system. Larvae ingesting double-stranded RNA is noted to not cause continuous gene silencing throughout adulthood. Damage of neuronal cells expressing double-stranded target RNA is observed to repress target gene expression in the germline. Exogenous short or long double-stranded RNA required different genes for entry into progeny. It was observed that the SID-1 double-stranded RNA transporter showed different expression over animal development. Removal of the sid-1 gene caused upregulation of two genes, the newly described sid-1-dependent gene sdg-1 and sdg-2. Both genes were observed to be negatively regulated by other small RNA regulatory pathways. Strikingly, loss then gain of sid-1 through breeding still caused variability of sdg-1 expression for many, many generations. SDG-2 protein co-localizes with germ granules, intracellular sites for heritable RNA silencing machinery. Collectively, sdg-1 presents a model to study how extracellular RNAs can buffer gene expression in germ cells and other tissues.

      Strengths:

      (1) Very cleaver molecular genetic methods and genomic analyses, paired with thorough genetics, were employed to discover insights into RNA transport, sdg-1 and sdg-2 as sid-1-dependent genes, and sdg-1's molecular phenotype.

      (2) The manuscript is well cited, and figures reasonably designed.

      (3) The discovery of the sdg genes being responsive to the extracellular RNA cell import machinery provides a model to study how exogenous somatic RNA is used to regulate gene expression in progeny. The discovery of genes within retrotransposons stimulates tantalizing models how regulatory loops may actually permit the genetic survival of harmful elements.

      Weaknesses:

      (1) The manuscript is broad, making it challenging to read and consider the data presented. Of note, since the original submission, the authors have improved the clarity of the writing and presentation.

      Comments on revised version:

      This reviewer thanks the authors for their efforts in revising the manuscript. In their rebuttal, the authors acknowledged the broad scope of their manuscript. I concur. While I still think the manuscript is a challenge to read due to its expansive nature, the current draft is substantially improved when compared to the previous one. This work will contribute to our general knowledge of RNA biology, small RNA regulatory pathways, and RNA inheritance.

      We thank the reviewer for highlighting the strengths of the manuscript and for helping us improve the presentation of our results and discussion.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the manuscript "Intergenerational transport of double-stranded RNA limits heritable epigenetic changes" Shugarts and colleagues investigate intergenerational dsRNA transport in the nematode C. elegans. They induce oxidative damage in worms, blocking dsRNA import into cells (and potentially affecting the worms in other ways). Oxidative stress inhibits dsRNA import and the associated heritable regulation of gene expression in the adult germline (Fig. 2). The authors identify a novel gene, sid-1-dependent gene-1 (sdg-1), which is induced upon inhibition of SID-1 (Fig. 3). Both transient inhibition and genetic depletion of SID-1 lead to the upregulation of sdg-1 and a second gene, sdg-2 (Fig. 5). The expression of SDG-1 is variable, potentially indicating buffering regulation. While the expression of Sdg-1 could be consistent with a role in intergenerational transport of dsRNA, neither its overexpression nor loss-of-function impacts dsRNA-mediated silencing (Fig. 7) in the germline. It would be interesting to test if sdg-2 functions redundantly.

      In summary, the authors have identified a novel worm-specific protein (sdg-1) that is induced upon loss of dsRNA import via SID-1, but is not required to mediate SID-1 RNA regulatory effects.

      We thank the reviewer for highlighting our findings on SDG-1. We found that oxidative damage in neurons enhanced dsRNA transport into the germline and/or subsequent silencing.

      Remaining Questions:

      • The authors use an experimental system that induces oxidative damage specifically in neurons to release dsRNAs into the circulation. Would the same effect be observed if oxidative damage were induced in other cell types?

      It is possible that oxidative damage of other tissues using miniSOG (as demonstrated in Xu and Chisholm, 2016) could also enhance the release of dsRNA into the circulation from those tissues. However, future experiments would be needed to test this empirically because it is also possible that the release of dsRNA depends on physiological properties (e.g., the molecular machinery promoting specific secretion) that are particularly active in neurons. We chose to use neurons as the source of dsRNA because by expressing dsRNA in a variety of tissues, neurons appeared to be the most efficient at the export of dsRNA as measured using SID-1-dependent silencing in other tissues (Jose et al., PNAS, 2009).

      • Besides dsRNA, which other RNAs and cellular products (macromolecules and small signalling molecules) are released into the circulation that could affect the observed changes in germ cells?

      We do not yet know all the factors that could be released either in naive animals or upon oxidative damage of neurons that influence the uptake of dsRNA into other tissues. The dependence on SID-1 for the observed enhancement of silencing (Fig. 2) shows that dsRNA is necessary for silencing within the germline. Whether this import of dsRNA occurs in conjunction with other factors (e.g., the uptake of short dsRNA along with yolk into oocytes (Marré et al., PNAS, 2016)) before silencing within the germline will require further study. A possible approach could be the isolation of extracellular fluid (Banse and Hunter, J Vis Exp., 2012) followed by characterization of its contents. However, the limited material available using this approach and the difficulty in avoiding contamination from cellular damage by the needle used for isolating the material make it challenging.

      • SID-1 modifies RNA regulation within the germline (Fig. 7) and upregulates sdg-1 and sdg-2 (Fig. 5). However, SID-1's effects do not appear to be mediated via sdg-1. Testing the role of sdg-2 would be intriguing.

      We observe the accumulation of sdg-1 and sdg-2 RNA in two different mutants lacking SID-1, which led us to conservatively focus on the analysis of one of these proteins for this initial paper. We expect that more sensitive analyses of the RNA-seq data will likely reveal additional genes regulated by SID-1. With the ability to perform multiplexed genome-editing, we hope in future work to generate strains that have mutations in many SID-1-dependent genes to recapitulate the defects observed in sid-1(-) animals. Indeed, as surmised by the reviewer, we are focusing on sdg-2 as the first such SID-1-dependent gene to analyze using mutant combinations.

      • Are sdg-1 or sdg-2 conserved in other nematodes or potentially in other species?  appears to be encoded or captured by a retro-element in the C. elegans genome and exhibits stochastic expression in different isolates. Is this a recent adaptation in the C. elegans genome, or is it present in other nematodes? Does loss-of-function of sdg-1 or sdg-2 have any observable effect?

      Clear homologs of SDG-1 and SDG-2 are not detectable outside of C. elegans. Consistent with the location of the sdg-1 gene within a Cer9 retrotransposon that appears to have integrated only within the C. elegans genome, sequence conservation between the genomes of related species is only observed outside the region of the retrotransposon (see Author response image 1, screenshot from UCSC browser). There were no obvious defects detected in animals lacking sdg-1 (Fig. 7) or in animals lacking sdg-2 (data not shown). It is possible that further exploration of both mutants and mutant combinations lacking additional SID-1-dependent genes would reveal defects. We also plan to examine these mutants in sensitized genetic backgrounds where one or more members of the RNA silencing pathway have been compromised.

      Author response image 1.

      Clarification for Readability:

      To enhance readability and avoid misunderstandings, it is crucial to specify the model organism and its specific dsRNA pathways that are not conserved in vertebrates:

      We agree with the reviewer and thank the reviewer for the specific suggestions provided below. To take the spirit of the suggestion to heart we have instead changed the title of our paper to clearly signal that the entire study only uses C. elegans. We have titled the study ‘Intergenerational transport of double-stranded RNA in C. elegans can limit heritable epigenetic changes’

      • In the first sentence of the paragraph "Here, we dissect the intergenerational transport of extracellular dsRNA ...", the authors should specify "in the nematode C. elegans". Unlike vertebrates, which recognise dsRNA as a foreign threat, worms and other invertebrates pervasively use dsRNA for signalling. Additionally, worms, unlike vertebrates and insects, encode RNA-dependent RNA polymerases that generate dsRNA from ssRNA substrates, enabling amplification of small RNA production. Especially in dsRNA biology, specifying the model organism is essential to avoid confusion about potential effects in humans.

      We agree with most statements made by the reviewer, although whether dsRNA is exclusively recognized as a foreign threat by all vertebrates of all stages remains controversial. Our changed title now eliminates all ambiguity regarding the organism used in the study.

      • Similarly, the authors should specify "in C. elegans" in the sentence "Therefore, we propose that the import of extracellular dsRNA into the germline tunes intracellular pathways that cause heritable RNA silencing." This is important because C. elegans small RNA pathways differ significantly from those in other organisms, particularly in the PIWI-interacting RNA (piRNA) pathways, which depend on dsRNA in C. elegans but uses ssRNA in vertebrates. Specification is crucial to prevent misinterpretation by the reader. It is well understood that mechanisms of transgenerational inheritance that operate in nematodes or plants are not conserved in mammals.

      The piRNAs of C. elegans are single-stranded but are encoded by numerous independent genes throughout the genome. The molecules used for transgenerational inheritance of epigenetic changes that have been identified thus far are indeed different in different organisms. However, the regulatory principles required for transgenerational inheritance are general (Jose, eLife, 2024). Nevertheless, we have modified the title to clearly state that the entire study is using C. elegans.  

      • The first sentence of the discussion, "Our analyses suggest a model for ...", would also benefit from specifying "in C. elegans". The same applies to the figure captions. Clarification of the model organism should be added to the first sentence, especially in Figure 1.

      With the clarification of the organism used in the title, we expect that all readers will be able to unambiguously interpret our results and the contexts where they apply. 

      Reviewer #2 (Public review):

      Summary:

      RNAs can function across cell borders and animal generations as sources of epigenetic information for development and immunity. The specific mechanistic pathways how RNA travels between cells and progeny remains an open question. Here, Shugarts, et al. use molecular genetics, imaging, and genomics methods to dissect specific RNA transport and regulatory pathways in the C. elegans model system. Larvae ingesting double stranded RNA is noted to not cause continuous gene silencing throughout adulthood. Damage of neuronal cells expressing double stranded target RNA is observed to repress target gene expression in the germline. Exogenous supply of short or long double stranded RNA required different genes for entry into progeny. It was observed that the SID-1 double-stranded RNA transporter showed different expression over animal development. Removal of the sid-1 gene caused upregulation of two genes, the newly described sid-1-dependent gene sdg-1 and sdg-2. Both genes were observed to also be negatively regulated by other small RNA regulatory pathways. Strikingly, loss then gain of sid-1 through breeding still caused variability of sdg-1 expression for many, many generations. SDG-2 protein co-localizes with a Z-granule marker, an intracellular site for heritable RNA silencing machinery. Collectively, sdg-1 presents a model to study how extracellular RNAs can buffer gene expression in germ cells and other tissues.

      We thank the reviewer for highlighting our findings and underscoring the striking nature of the discovery that mutating sid-1 using genome-editing resulted in a transgenerational change that could not be reversed by changing the sid-1 sequence back to wild-type.

      Strengths:

      (1) Very clever molecular genetic methods and genomic analyses, paired with thorough genetics, were employed to discover insights into RNA transport, sdg-1 and sdg-2 as sid-1-dependent genes, and sdg-1's molecular phenotype.

      (2) The manuscript is well cited, and figures reasonably designed.

      (3) The discovery of the sdg genes being responsive to the extracellular RNA cell import machinery provides a model to study how exogenous somatic RNA is used to regulate gene expression in progeny. The discovery of genes within retrotransposons stimulates tantalizing models how regulatory loops may actually permit the genetic survival of harmful elements.

      We thank the reviewer for the positive comments.

      Weaknesses:

      (1) As presented, the manuscript is incredibly broad, making it challenging to read and consider the data presented. This concern is exemplified in the model figure, that requires two diagrams to summarize the claims made by the manuscript.

      RNA interference (RNAi) by dsRNA is an organismal response where the delivery of dsRNA into the cytosol of some cell precedes the processing and ultimate silencing of the target gene within that cell. These two major steps are often not separately considered when explaining observations. Yet, the interpretation of every RNAi experiment is affected by both steps. To make the details that we have revealed in this work for both steps clearer, we presented the two models separated by scale - organismal vs. intracellular. We agree that this integrative manuscript appears very broad when the many different findings are each considered separately. The overall model revealed here forms the necessary foundation for the deep analysis of individual aspects in the future.

      (2) The large scope of the manuscript denies space to further probe some of the ideas proposed. The first part of the manuscript, particularly Figures 1 and 2, presents data that can be caused by multiple mechanisms, some of which the authors describe in the results but do not test further. Thus, portions of the results text come across as claims that are not supported by the data presented.

      We agree that one of the consequences of addressing the joint roles of transport and subsequent silencing during RNAi is that the scope of the manuscript appears large. We had suggested multiple interpretations for specific observations in keeping with the need for further work. To avoid any misunderstandings that our listing of possible interpretations be taken as claims by the reader, we have followed the instructions of the reviewer (see below) and moved some of the potential explanations we raised to the discussion section.

      (3) The manuscript focuses on the genetics of SDGs but not the proteins themselves. Few descriptions of the SDGs functions are provided nor is it clarified why only SDG-1 was pursued in imaging and genetic experiments. Additionally, the SDG-1 imaging experiments could use additional localization controls.

      We agree that more work on the SDG proteins will likely be informative, but are beyond the scope of this already expansive paper.  We began with the analysis of SDG-1 because it had the most support as a regulator of RNA silencing (Fig. 5f). Indeed, in other work (Lalit and Jose, bioRxiv, 2024), we find that AlphaFold 2 predicts the SDG-1 protein to be a regulator of RNA silencing that directly interacts with the dsRNA-editing enzyme ADR-2 and the endonuclease RDE-8. Furthermore, we expect that more sensitive analyses of the RNA-seq data are likely to reveal additional genes regulated by SID-1. Using multiplexed genome editing, we hope to generate mutant combinations lacking multiple sdg genes to reveal their function(s).

      We agree that given the recent discovery of many components of germ granules, our imaging data does not have sufficient resolution to discriminate between them. We have modified our statements and our model regarding the colocalization of SDG-1 with Z-granules to indicate that the overlapping enrichment of SDG-1 and ZNFX-1 in the perinuclear region is consistent with interactions with other nearby granule components.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Major

      (1) As presented, the manuscript is almost two manuscripts combined into one. This point is highlighted in Figure 7h, which basically presents two separate models. The key questions addressed in the manuscript starts at Figure 3. Figures 1 and 2 are interesting observations but require more experiments to define further. For example, as the Results text describes for Figure 1, "These differences in the entry of ingested dsRNA into cells and/or subsequent silencing could be driven by a variety of changes during development. These include changes in the uptake of dsRNA into the intestine, distribution of dsRNA to other tissues from the intestine, import of dsRNA into the germline, and availability of RNA silencing factors within the germline." Presenting these (reasonable) mechanistic ideas detracted from the heritable RNA epigenetic mechanism explored in the later portion of the manuscript. There are many ways to address this issue, one being moving Figures 1 and 2 to the Supplement to focus on SID-1 related pathways.

      Since this manuscript addresses the interaction between intercellular transport of dsRNA and heritable epigenetic changes, it was necessary to establish the possible route(s) that dsRNA could take to the germline before any inference could be made regarding heritable epigenetic changes. As suggested below (pt. 2), we have now moved the alternatives we enumerated as possible explanations for some experimental results (e.g., for the differences quoted here) to the discussion section.

      (2) The manuscript includes detailed potential interpretations in the Results, making them seem like claims. Here is an example:

      "Thus, one possibility suggested by these observations is that reduction of sdg-1 RNA via SID-1 alters the amount of SDG-1 protein, which could interact with components of germ granules to mediate RNA regulation within the germline of wild-type animals."

      This mechanism is a possibility, but placing these ideas in the citable results makes it seem like an overinterpretation of imaging data. This text and others should be in the Discussion, where speculation is encouraged. Results sections like this example and others should be moved to the discussion.

      We have rephrased motivating connections between experiments like the one quoted above and also moved such text to the discussion section wherever possible.

      (3) A paragraph describing the SDG proteins will be helpful. Homologs? Conserved protein domains? mRNA and/or protein expression pattern across worm, not just the germline? Conservation across Caenorhabditis sp? These descriptions may help establish context why SDG-1 localizes to Z-granules.

      We have now added information about the conservation of the sdg-1 gene in the manuscript. AlphaFold predicts domains with low confidence for the SDG-1 protein, consistent with the lack of conservation of this protein (AlphaFold requires multiple sequence alignments to predict confidently). In the adult animal, the SDG-1 protein was only detectable in the germline. Future work focused on SDG-1, SDG-2 and other SDG proteins will further examine possible expression in other tissues and functional domains if any. Unfortunately, in multiple attempts of single-molecule FISH experiments using probes against the sdg-1 open reading frame, we were unable to detect a specific signal above background (data not shown). Additional experiments are needed for the sensitive detection of sdg-1 expression outside the germline, if any.  

      (4) Based on the images shown, SDG-1 could be in other nearby granules, such as P granules or mutator foci. Additional imaging controls to rule out these granules/condensates will greatly strengthen the argument that SDG-1 protein localizes to Z-granules specifically.

      We have modified the final model to indicate that the perinuclear colocalization is with germ granules broadly and we agree that we do not have the resolution to claim that the observed overlap of SDG-1::mCherry with GFP::ZNFX-1 that we detect using Airyscan microscopy is specifically with Z granules. Our initial emphasis of Z-granule was based on the prior report of SDG-1 being co-immunoprecipitated with the Z-granule surface protein PID-2/ZSP-1. However, through other work predicting possible direct interactions using AlphaFold (Lalit and Jose, bioRxiv, 2024), we were unable to detect any direct interactions between PID-2 and SDG-1. Indeed, many additional granules have been recently reported (Chen et al., Nat. Commun., 2024; Huang et al., bioRxiv 2024), making it possible that SDG-1 has specific interactions with a component of one of the other granules (P, Z, M, S, E, or D) or adjacent P bodies.

      Minor

      (1) "This entry into the cytosol is distinct from and can follow the uptake of dsRNA into cells, which can rely on other receptors." Awkard sentence. Please revise.

      We have now revised this sentence to read “This entry into the cytosol is distinct from the uptake of dsRNA into cells, which can rely on other receptors”

      (2) Presumably, the dsRNA percent of the in vitro transcribed RNA is different than the 50 bp oligos that can be reliably annealed by heating and cooling. Other RNA secondary structure possibilities warrant further discussion.

      We agree that in vitro transcribed RNA could include a variety of undefined secondary structures in addition to dsRNAs of mixed length. Such structures could recruit or titrate away RNA-binding proteins in addition to the dsRNA structures engaging the canonical RNAi pathway, resulting in mixed mechanisms of silencing. Future work identifying such structures and exploring their impact on the efficacy of RNAi could be informative. We have now added these considerations to the discussion and thank the reviewer for highlighting these possibilities.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors engineer the endogenous left boundary of the Drosophila eve TAD, replacing the endogenous Nhomie boundary by either a neutral DNA, a wildtype Nhomie boundary, an inverted Nhomie boundary, or a second copy of the Homie boundary. They perform Micro-C on young embryos and conclude that endogenous Nhomie and Homie boundaries flanking eve pair with head-to-tail directionality to form a chromosomal stem loop. Abrogating the Nhomie boundary leads to ectopic activation of genes in the former neighboring TAD by eve embryonic stripe enhancers. Replacing Nhomie by an inverted version or by Homie (which pairs with itself head-to-head) transformed the stem loop into a circle loop. An important finding was that stem and circle loops differentially impact endogenous gene regulation both within the eve TAD and in the TADs bracketing eve. Intriguingly, an eve TAD with a circle loop configuration leads to ectopic activation of flanking genes by eve enhancers - indicating compromised regulatory boundary activity despite the presence of an eve TAD with intact left and right boundaries.

      Strengths:

      Overall, the results obtained are of high-quality and are meticulously discussed. This work advances our fundamental understanding of how 3D genome topologies affect enhancer-promoter communication.

      Weaknesses:

      Though convincingly demonstrated at eve, the generalizability of TAD formation by directional boundary pairing remains unclear, though the authors propose this mechanism could underly the formation of all TADs in Drosophila and possibly even in mammals. Strong and ample evidence has been obtained to date that cohesin-mediated chromosomal loop extrusion explains the formation of a large fraction of TADs in mammals. 

      (1.1) The difficultly with most all of the studies on mammal TADs, cohesin and CTCF roadblocks is that the sequencing depth is not sufficient, and large bin sizes (>1 kb) are needed to visualize chromosome architecture.  The resulting contact profiles show TAD neighborhoods, not actual TADs.

      The problem with these studies is illustrated by comparing the contact profiles of mammalian MicroC data sets at different bin sizes in Author response image 1.  In this figure, the darkness of the “pixels” in panels E, F, G and H was enhanced by reducing brightness in photoshop.

      Author response image 1.

      Mammalian MicroC profiles different bun sizes

      Panels A and C show “TADs” using bin sizes typical of most mammalian studies (see Krietenstein et al. (2023) (Krietenstein et al. 2020)).  At this level of resolution, TADs, the “trees” that are the building blocks of chromosomes, are not visible.  Instead, what is seen are TAD neighborhoods or “forests”.  Each neighborhood consists of several dozen individual TADs.  The large bins in these panels also artificially accentuated TAD:TAD interactions, generating a series of “stripes” and “dots” that correspond to TADs bumping into each other and sequences getting crosslinked.  For example, in panel A there is prominent stripe on the edge of a “TAD” (blue arrow).  In panel C, this stripe resolves into a series of dots arranged as parallel, but interrupted “stripes” (green and blue arrows).  At the next level of resolution, it can be seen that the stripe marked by the blue arrow and magenta asterisk is generated by contacts between the left boundary of the TAD indicated by the magenta bar with sequences in a TAD (blue bar) ~180 kb way.  While dots and stripes are prominent features in contact profiles visualized with larger bin sizes (A and C), the actual TADs that are observed with a bin size of 200 bp (examples are underlined by black bars in panel G) are not bordered by stripes, nor are they topped by obvious dots.  The one possible exception is the dot that appears at the top of the volcano triangle underlined with magenta.

      The chromosome 1 DNA segment from the MicroC data of Hseih et al. (2023) (Hsieh et al. 2020) shows a putative volcano triangle with a plume (indicated by a V in Author response image 1 panels D, F and H).  Sequences in the V TAD don’t crosslink with their immediate neighbors, and this gives a “plume” above the volcano triangle, as indicate by the light blue asterisk in panels D, F and H.  Interestingly the V TAD does contact two distant TADs, U on the left and W on the right. The U TAD is ~550 kb from V, and the region of contact is indicated by the black arrow.  The W TAD is ~585 kb from V, and the region of contact is indicated by the magenta arrow.  While the plume still seems to be visible with a bin size of 400 bp (light blue asterisk), it is hard to discern when the bin size is 200 bp, as there are not enough reads.

      The evidence demonstrating that cohesin is required for TAD formation/maintenance is based on low resolution Hi-C data, and the effects that are observed are on TAD neighborhoods (forests) and not TADs (trees).  In fact, there is published evidence that cohesin is not required in mammals for TAD formation/maintenance.  In an experiment from Goel et al. 2023 the authors depleted the cohesin component Rad21 and then visualized the effects on TAD organization using the high resolution region capture MicroC (RCMC) protocol.  The MicroC contact map in this figure visualizes a ~250 kb DNA segment around the Ppm1pg locus at 250 bp resolution.  On the right side of the diagonal is the untreated control, while the left side shows the MicroC profile of the same region after Rad21 depletion.  The authors indicated that there was a 97% depletion of Rad21 in their experiment.  However, as is evident from a comparison of the experimental and control, loss of Rad21 has no apparent effect on the TAD organization of this mammalian DNA segment.

      Several other features are worth noting.  First, unlike the MicroC experiments shown in Author response image 1, there are dots at the apex of the TADs in this chromosomal segment.  In the MicroC protocol, fixed chromatin is digested to mononucleosomes by extensive MNase digestion.  The resulting DNA fragments are then ligated, and dinucleosome-length fragments are isolated and sequenced. 

      DNA sequences that are nucleosome free in chromatin (which would be promoters, enhancers, silencers and boundary elements) are typically digested to oligonucleotides in this procedure and won’t be recovered. This means that the dots shown here must correspond to mononucleosome-length elements that are MNase resistant.  This is also true for the dots in the MicroC contact profiles of the Drosophila Abd-B regulatory domain (see Fig. 2B in the paper).  Second, the TADs are connected to each other by 45o stripes (see blue and green arrowheads).  While it is not clear from this experiment whether the stipes are generated by an active mechanism (enzyme) or by some “passive” mechanism (e.g., sliding), the stripes in this chromosomal segment are not generated by cohesin, as they are unperturbed by Rad21 depletion.  Third, there are no volcano triangles with plumes in this chromosomal DNA segment.  Instead, the contact patterns (purple and green asterisks) between neighboring TADs closely resemble those seen for the Abd-B regulatory domains (compare Goel et al. 2023 with Fig. 2B in the paper).  This similarity suggests that the TADs in and around Ppm1g may be circle-loops, not stem-loops.  As volcano triangles with plumes also seem to be rare in the MicroC data sets of Krietenstein et al. (Krietenstein et al. 2020) and Hesih et al. (Hsieh et al. 2020) (with the caveat that these data sets are low resolution: see Author response image 1), it is possible that much of the mammalian genome is assembled into circle-loop TADs, a topology that can’t be generated by the cohesin loop extrusion (bolo tie clip) /CTCF roadblock model.

      While Rad21 depletion has no apparent effect on TADs, it does appear to impact TAD neighborhoods.  This is in a supplemental figure in Goel et al. (Goel et al. 2023).  In this figure, TADs in the Ppm1g region of chromosome 5 are visualized with bin sizes of 5 kb and 1 kb.  A 1.2 Mb DNA segment is shown for the 5 kb bin size, while an 800 kb DNA segment is shown for the 1 kb bin size.  As can be seen from comparing the MicroC profiles in Author response image 2 with that in Goel et al. 2023, individual TADs are not visible.  Instead, the individual TADs are binned into large TAD “neighborhoods” that consist of several dozen or more TADs.

      Unlike the individual TADs shown in Goel et al. 2023, the TAD neighborhoods in Author response image 2 are sensitive to Rad21 depletion.  The effects of Rad21 depletion can be seen by comparing the relative pixel density inside the blue lines before (above the diagonal) and after (below the diagonal) auxin-induced Rad21 degradation.  The reduction in pixel density is greatest for more distant TAD:TAD contacts (farthest from the diagonal).  By contrast, the TADs themselves are unaffected (Goel et al. 2023), as are contacts between individual TADs and their immediate neighbors.  In addition, contacts between partially overlapping TAD neighborhoods are also lost.  At this point it isn’t clear why contacts between distant TADs in the same neighborhood are lost when Rad21 is depleted; however, a plausible speculation is that it is related to the functioning of cohesin in holding newly replicated DNAs together until mitosis and whatever other role it might have in chromosome condensation.

      Author response image 2.

      Ppm1g full locus chr5

      Moreover, given the unique specificity with which Nhomie and Homie are known to pair (and exhibit "homing" activity), it is conceivable that formation of the eve TAD by boundary pairing represents a phenomenon observed at exceptional loci rather than a universal rule of TAD formation. Indeed, characteristic Micro-C features of the eve TAD are only observed at a restricted number of loci in the fly genome…..

      (1.2) The available evidence does not support the claim that nhomie and homie are “exceptional.”  To begin with, nhomie and homie rely on precisely the same set of factors that have been implicated in the functioning of other boundaries in the fly genome.  For example, homie requires (among other factors) the generic boundary protein Su(Hw) for insulation and long-distance interactions (Fujioka et al. 2024).  (This is also true of nhomie: unpublished data.)  The Su(Hw) protein (like other fly polydactyl zinc finger proteins) can engage in distant interactions.  This was first shown by Sigrist and Pirrotta (Sigrist and Pirrotta 1997), who found that the su(Hw) element from the gypsy transposon can mediate long-distance regulatory interactions (PRE dependent silencing) between transgenes inserted at different sites on homologous chromosomes (trans interactions) and at sites on different chromosomes.

      The ability to mediate long-distance interactions is not unique to the su(Hw) element, or homie and nhomie.  Muller et al. (Muller et al. 1999) found that the Mcp boundary from the Drosophila BX-C is also able to engage in long-distance regulatory interactions—both PRE-dependent silencing of mini-white and enhancer activation of mini-white and yellow.  The functioning of the Mcp boundary depends upon two other generic insulator proteins, Pita and the fly CTCF homolog (Kyrchanova et al. 2017).  Like Su(Hw) both are polydactyl zinc finger proteins, and they resemble the mammalian CTCF protein in that their N-terminal domain mediates multimerization (Bonchuk et al. 2020; Zolotarev et al. 2016).  Figure 6 from Muller et el. 1999 shows PRE-dependent “pairing sensitive silencing” interactions between transgenes carrying a mini-white reporter, the Mcp and scs’ (Beaf dependent)(Hart et al. 1997) boundary elements, and a PRE closely linked to Mcp.  In this experiment flies homozygous for different transgene inserts were mated and the eye color was examined in their transheterozygous progeny.  As indicated in the figure, the strongest trans-silencing interactions were observed for inserts on the same chromosomal arm; however, transgenes inserted on the left arm of chromosome 3 can interact across the centromere with transgenes inserted on the right arm of chromosome 3. 

      Figure 5C (left) from Muller et el. 1999 shows a trans-silencing interaction between w#11.102 at 84D and w#11.16 approximately 5.8 Mb away, at 87D.  Figure 5C (right) shows a trans-silencing interaction across the centromere between w#14.29 on the left arm of chromosome 3 at 78F and w#11.102 on the right arm of chromosome 3 at 84D. The eye color phenotype of mini-white-containing transgenes is usually additive: homozygyous inserts have twice as dark eye color as the corresponding hemizygous inserts.  Likewise, in flies trans-_heterozygous for _mini-white transgenes inserted at different sites, the eye color is equivalent to the sum of the two transgenes.  This is not true when mini-white transgenes are silenced by PREs.  In the combination shown in panel A, the t_rans-_heterozygous fly has a lighter eye color than either of the parents.  In the combination in panel B, the _trans-_heterozygous fly is slightly lighter than either parent.

      As evident from the diagram in Figure 6 from Muller et el. 1999, all of the transgenes inserted on the 3rd chromosome that were tested were able to participate in long distance (>Mbs) regulatory interactions.  On the other hand, not all possible pairwise interactions are observed.  This would suggest that potential interactions depend upon the large scale (Mb) 3D folding of the 3rd chromosome.

      When the scs boundary (Zw5 dependent) (Gaszner et al. 1999) was added to the transgene to give sMws’, it further enhanced the ability of distant transgenes to find each other and pair.  All eight of the sMws’ inserts that were tested were able to interact with at least one other sMws’ insert on a different chromosome and silence mini-white.  Vazquez et al. () subsequently tagged the sMws’ transgene with LacO sequences (ps0Mws’) and visualized pairing interactions in imaginal discs.  Trans-heterozygous combinations on the same chromosome were found paired in 94-99% of the disc nuclei, while a trans-heterozygous combination on different chromosomes was found paired in 96% of the nuclei (Table 3 from Vazquez et al. 2006).  Vazquez et al. also examined a combination of four transgenes inserted on the same chromosome (two at the same insertion site, and two at different insertion sites).  In this case, all four transgenes were clustered together in 94% of the nuclei (Table 3 from Vazquez et al. 2006).  Their studies also suggest that the distant transgenes remain paired for at least several hours.  A similar experiment was done by Li et al. (Li et al. 2011), except that the transgene contained only a single boundary, Mcp or Fab-7.  While pairing was still observed in trans-heterozygotes, the frequency was reduced without scs and scs’.

      It is worth pointing out that there is no plausible mechanism in which cohesin could extrude a loop through hundreds of intervening TADs, across the centromere (ff#13.101_ßà_w#11.102: Figure 6 from Muller et el. 1999; w#14.29_ßà_w#11.02: Figure 6 from Muller et el. 1999 and 5) and come to a halt when it “encounters” Mcp containing transgenes on different homologs.  The same is true for Mcp-dependent pairing interactions in cis (Fig. 7 in Muller et al. (Muller et al. 1999)) or Mcp-dependent pairing interactions between transgenes inserted on different chromosomes (Fig. 8 in Muller et al. (Muller et al. 1999); Line 8 in Table 3 from Vazquez et al. 2006). 

      These are not the only boundaries that can engage in long-distance pairing.  Mohana et al. (Mohana et al. 2023) identified nearly 60 meta-loops, many of which appear to be formed by the pairing of TAD boundary elements.  Two examples (at 200 bp resolution from 12-16 hr embryos) are shown in Author response image 3.

      Author response image 3.

      Metaloops on the 2nd and 3rd chromosomes: circle-loops and multiple stem-loops

      One of these meta-loops (panel A) is generated by the pairing of two TAD boundaries on the 2nd chromosome.  The first boundary, blue, (indicated by blue arrow) is located at ~2,006, 500 bp between a small TAD containing the Nplp4 and CG15353 genes and a larger TAD containing 3 genes, CG33543, Obp22a and Npc2aNplp4 encodes a neuropeptide.  The functions of CG15354 and CG33543 are unknown.  Obp22a encodes an odorant binding protein, while Npc2a encodes the Niemann-Pick type C-2a protein which is involved sterol homeostasis.  The other boundary (purple: indicated by purple arrow) is located between two TADs 2.8 Mb away at 4,794,250 bp.  The upstream TAD contains the fipi gene (CG15630) which has neuronal functions in male courtship, while the downstream TAD contains CG3294, which is thought to be a spliceosome component, and schlaff (slf) which encodes a chitin binding protein.  As illustrated in the accompanying diagram, the blue boundary pairs with the purple boundary in a head-to-head orientation, generating a ~2.8 Mb loop with a circle-loop topology.  As a result of this pairing, the multi-gene (CG33543, Obp22a and Npc2a) TAD upstream of the blue boundary interacts with the CG15630 TAD upstream of the purple boundary.  Conversely the small Nplp4:CG15353 TAD downstream of the blue boundary interacts with the CG3294:slf TAD downstream of the purple boundary.  Even if one imagined that the cohesin bolo tie clip was somehow able to extrude 2.8 Mb of chromatin and then know to stop when it encountered the blue and purple boundaries, it would’ve generated a stemloop, not a circle-loop.

      The second meta-loop (panel B) is more complicated as it is generated by pairing interactions between four boundary elements.  The blue boundary (blue arrow) located ~4,801,800 bp (3L) separates a large TAD containing the RhoGEF64C gene from a small TAD containing CG7509, which encodes a predicted subunit of an extracellular carboxypeptidase.  As can be seen in the MicroC contact profile and the accompanying diagram, the blue boundary pairs with the purple boundary (purple arrow) which is located at ~7,013, 500 (3L) just upstream of the 2nd internal promoter (indicated by black arrowhead) of the Mp (Multiplexin) gene.  This pairing interaction is head-to-tail and generates a large stem-loop that spans ~2.2 Mb.  The stem-loop brings sequences upstream of the blue boundary and downstream of the purple boundary into contact (the strings below a bolo tie clip), just as was observed in the boundary bypass experiments of Muravyova et al. (Muravyova et al. 2001) and Kyrchanova et al. (Kyrchanova et al. 2008).  The physical interactions result in a box of contacts (right top) between sequences in the large RhoGEF64C TAD and sequences in a large TAD that contains an internal Mp promoter.  The second pairing interaction is between the brown boundary (brown arrow) and the green boundary (green arrow).  The brown boundary is located at ~4 805,600 bp (3L) and separates the TAD containing CG7590 from a large TAD containing CG1808 (predicted to encode an oxidoreductase) and the Dhc64C (Dynein heavy chain 64C) gene.  The green boundary is located at ~6,995,500 bp (3L), and it separates a TAD containing CG32388 and the biniou (bin) transcription factor from a TAD that contains the most distal promoter of the Mp (Multiplexin) gene (blue arrowhead).  As indicated in the diagram, the brown and green boundaries pair with each other head-to-tail, and this generates a small internal loop (and the final configuration would resemble a bolo tie with two tie clips).  This small internal loop brings the CG7590 TAD into contact with the TAD that extends from the distal Mp promoter to the 2nd internal Mp promoter.  The resulting contact profile is a rectangular box with diagonal endpoints corresponding to the paired blue:purple and brown:green boundaries.  The pairing of the brown:green boundaries also brings the TADs immediately downstream of the brown boundary and upstream of the green boundary into contact with each other, and this gives a rectangular box of interactions between the Dhc64C TAD, and sequences in the bin/CG3238 TAD.  This box is located on the lower left side of the contact map.

      Since the bin and Mp meta-loops in Author response image 3B are stem-loops, they could have been generated by “sequential” cohesin loop extrusion events.  Besides the fact that cohesin extrusion of 2 Mb of chromatin and breaking through multiple intervening TAD boundaries challenges the imagination, there is no mechanism in the cohesion loop extrusion/CTCF roadblock model to explain why cohesion complex 1 would come to a halt at the purple boundary on one side and the blue boundary on the other, while cohesin complex 2 would instead stop when it hits the brown and green boundaries.  This highlights another problem with the cohesin loop extrusion/CTCF roadblock model, namely that the roadblocks are functionally autonomous: they have an intrinsic ability to block cohesin that is entirely independent of the intrinsic ability of other roadblocks in the neighborhood.  As a result, there is no mechanism for generating specificity in loop formation.  By contrast, boundary pairing interactions are by definition non-autonomous and depend on the ability of individual boundaries to pair with other boundaries: specificity is built into the model. The mechanism for pairing, and accordingly the basis for partner preferences/specificity, are reasonably well understood.  Probably the most common mechanism in flies is based on shared binding sites for architectural proteins that can form dimers or multimers (Bonchuk et al. 2021; Fedotova et al. 2017).  Flies have a large family of polydactyl zinc finger DNA binding proteins, and as noted above, many of these form dimers or multimers and also function as TAD boundary proteins.  This pairing principle was first discovered by Kyrchanova et al. (Kyrchanova et al. 2008).  This paper also showed that orientation-dependent pairing interactions is a common feature of endogenous fly boundaries.  Another mechanism for pairing is specific protein:protein interactions between different DNA binding factors (Blanton et al. 2003).  Yet a third mechanism would be proteins that bridge different DNA binding proteins together.  The boundaries that use these different mechanisms (BX-C boundaries, scs, scs’) depend upon the same sorts of proteins that are used by homie and nhomie.  Likewise, these same set of factors reappear in one combination or another in most other TAD boundaries.  As for the orientation of pairing interactions, this is most likely determined by the order of binding sites for chromosome architectural proteins in the partner boundaries.

      …and many TADs lack focal 3D interactions between their boundaries.

      (1.3) The idea that flies differ from mammals in that they “lack” focal 3D interactions is simply mistaken.  One of the problems with drawing this distinction is that most all of the “focal 3D interactions” seen mammalian Hi-C experiments are a consequence of binning large DNA segments in low resolution restriction enzyme-dependent experiments.  This is even true in the two “high” resolution MicroC experiments that have been published (Hsieh et al. 2020; Krietenstein et al. 2020).  As illustrated above in Author response image 1, most of the “focal 3D interactions” (the dots at the apex of TAD triangles) seen with large bin sizes (1 kb and greater) disappear when the bin size is 200 bp and TADs rather than TAD neighborhoods are being visualized.

      As described in point #1.1, in the MicroC protocol, fixed chromatin is first digested to mononucloesomes by extensive MNase digestion, processed/biotinylated, and ligated to give dinucleosome-length fragments, which are then sequenced.  Regions of chromatin that are nucleosome free (promoters, enhancers, silencers, boundary elements) will typically be reduced to oligonucleotides in this procedure and will not be recovered when dinucleosome-length fragments are sequenced.  The loss of sequences from typical paired boundary elements is illustrated by the lar meta-loop shown in Author response image 4 (at 200 bp resolution).  Panels A and B show the contact profiles generated when the blue boundary (which separates two TADs that span  the Lar (Leukocyteantigen-related-like) transcription unit interacts with the purple boundary (which separates two TADs in a gene poor region ~620 kb away).  The blue and purple boundaries pair with each other head-to-head, and this pairing orientation generates yet another circle-loop.  In the circle-loop topology, sequences in the TADs upstream of both boundaries come into contact with each other, and this gives the small dark rectangular box to the upper left of the paired boundaries (Author response image 4A).  (Note that this small box corresponds to the two small TADs upstream of the blue and purple boundaries, respectively. See panel B.)  Sequences in the TADs downstream of the two boundaries also come into contact with each other, and this gives the large box to the lower right of the paired boundaries.  While this meta-loop is clearly generated by pairing interactions between the blue and purple boundaries, the interacting sequences are degraded in the MicroC protocol, and sequences corresponding to the blue and purple boundaries aren’t recovered.  This can be seen in panel B (red arrow and red arrowheads).  When a different Hi-C procedure is used (dHS-C) that captures nucleosome-free regions of chromatin that are physically linked to each other (Author response image 4C & D), the sequences in the interacting blue and purple boundaries are recovered and generate a prominent “dot” at their physical intersection (blue arrow in panel D).

      Author response image 4.

      Lar metaloop. Panels A & bB: MicroC. Panels C & D: dHS-C

      While sequences corresponding to the blue and purple boundaries are lost in the MicroC procedure, there is at least one class of elements that engage in physical pairing interactions whose sequences are (comparatively) resistant to MNase digestion.  This class of elements includes many PREs ((Kyrchanova et al. 2018); unpublished data), the boundary bypass elements in the Abd-B region of BX-C (Kyrchanova et al. 2023; Kyrchanova et al. 2019a; Kyrchanova et al. 2019b; Postika et al. 2018), and “tethering” elements (Batut et al. 2022; Li et al. 2023).  In all of the cases tested, these elements are bound in nuclear extracts by a large (>1000 kD) GAGA factor-containing multiprotein complex called LBC.  LBC also binds to the hsp70 and eve promoters (unpublished data).  Indirect end-labeling experiments (Galloni et al. 1993; Samal et al. 1981; Udvardy and Schedl 1984) indicate that the LBC protects a ~120-180 bp DNA segment from MNase digestion.  It is likely that this is the reason why LBC-bound sequences can be recovered in MicroC experiments as dots when they are physically linked to each other.  One such example (based on the ChIP signatures of the paired elements) is indicated by the green arrow in panel B and D of Author response image 4.  Note that there are no dots corresponding to these two LBC elements within either of the TADs immediately downstream of the blue and purple boundaries.  Instead the sequences corresponding to the two LBC elements are only recovered when the two elements pair with each other over a distance of ~620 kb.  The fact that these two elements pair with each other is consistent with other findings which indicate that, like classical boundaries, LBC elements exhibit partner preferences.  In fact, LBC elements can sometimes function as TAD boundaries.  For example, the Fab-7 boundary has two LBC elements, and full Fab-7 boundary function can be reconstituted with just these two elements (Kyrchanova et al. 2018).

      Reviewer #2 (Public Review):

      "Chromatin Structure II: Stem-loops and circle-loops" by Ke*, Fujioka*, Schedl, and Jaynes reports a set of experiments and subsequent analyses focusing on the role of Drosophila boundary elements in shaping 3D genome structure and regulating gene expression. The authors primarily focus on the region of the fly genome containing the even skipped (eve) gene; eve is expressed in a canonical spatial pattern in fly embryos and its locus is flanked by the well-characterized neighbor of homie (nhomie) and homie boundary elements. The main focus of investigation is the orientation dependence of these boundary elements, which had been observed previously using reporter assays. In this study, the authors use Crispr/Cas9 editing followed by recombination-mediated cassette exchange to create a series of recombinant fly lines in which the nhomie boundary element is either replaced with exongenous sequence from phage 𝝀, an inversion of nhomie, or a copy of homie that has the same orientation as the endogenous homie sequence. The nhomie sequence is also regenerated in its native orientation to control for effects introduced by the transgenesis process.

      The authors then perform high-resolution Micro-C to analyze 3D structure and couple this with fluorescent and colorimetric RNA in situ hybridization experiments to measure the expression of eve and nearby genes during different stages of fly development. The major findings of these experiments are that total loss of boundary sequence (replacement with 𝝀 DNA) results in major 3D structure changes and the most prominent observed gene changes, while inversion of the nhomie boundary or replacement with homie resulted in more modest effects in terms of 3D structure and gene expression changes and a distinct pattern of gene expression change from the 𝝀 DNA replacement. As the samples in which the nhomie boundary is inverted or replaced with homie have similar Micro-C profiles at the eve locus and show similar patterns of a spurious gene activation relative to the control, the observed effects appear to be driven by the relative orientation of the nhomie and homie boundary elements to one another.

      Collectively, the findings reported in the manuscript are of broad interest to the 3D genome field. Although extensive work has gone into characterizing the patterns of 3D genome organization in a whole host of species, the underlying mechanisms that structure genomes and their functional consequences are still poorly understood. The perhaps best understood system, mechanistically, is the coordinated action of CTCF with the cohesin complex, which in vertebrates appears to shape 3D contact maps through a loop extrusion-pausing mechanism that relies on orientation-dependent sequence elements found at the boundaries of interacting chromatin loops.

      (2.1) The notion that mammalian genome is shaped in 3D by the coordinate action of cohesin and CTCF has achieved the status of dogma in the field of chromosome structure in vertebrates.  However, as we have pointed out in #1.1, the evidence supporting this dogma is far from convincing.  To begin with, it is based on low resolution Hi-C experiments that rely on large bin sizes to visualize so-called “TADs.”  In fact, the notion that cohesin/CTCF are responsible on their own for shaping the mammalian 3D genome appears to be a result of mistaking a series of forests for the actual trees that populate each of the forests.

      As illustrated in Author response image 1 above, the “TADs” that are visualized in these low resolution data sets are not TADs at all, but rather TAD neighborhoods consisting of several dozen or more individual TADs.  Moreover, the “interesting” features that are evident at low resolution (>1 kb)—the dots and stripes—largely disappear at resolutions appropriate for visualizing individual TADs (~200 bp).

      In Goel et al. 2023, we presented data from one of the key experiments in Goel et al. (Goel et al. 2023).  In this experiment,  the authors used RCMC to generate high resolution (~250 bp) MicroC contact maps before and after Rad21 depletion.  Contrary to dogma, Rad21 depletion has absolutely no effect on TADs in a ~250 kb DNA segment—and these TADs look very much like the TADs we observe in the Drosophila genome, in particular in the Abd-B region of BX-C that is thought to be assembled into a series of circle-loops (see Fig. 2B).

      While Goel et al. (Goel et al. 2023) observed no effect of Rad21 depletion on TADs, they found that loss of Rad21 disturbs long-distance (but not short-distance) contacts in large TAD neighborhoods when their RCMC data set is visualized using bin sizes of 5 kb and I kb.  This is shown in Author response image 2.  The significance of this finding is, however, uncertain.  It could mean that the 3D organization of large TAD neighborhoods have a special requirement for cohesin activity.  On the other hand, since cohesin functions to hold sister chromosomes together after replication until they separate during mitosis (and might also participate in mitotic condensation), it is also possible that the loss of long-range contacts in large TAD neighborhoods when Rad21 is depleted is simply a reflection of this particular activity.  Further studies will be required to address these possibilities.

      As for CTCF: a careful inspection of the ChIP data in Goel et al. 2023 indicates that CTCF is not found at each and every TAD boundary.  In fact, the notion that CTCF is the be-all and end-all of TAD boundaries in mammals is truly hard to fathom.  For one, the demands for specificity in TAD formation (and in regulatory interactions) are likely much greater than those in flies, and specificity can’t be generated by a single DNA binding protein.  For another, several dozen chromosomal architectural proteins have already been identified in flies.  This means that (unlike what is thought to be true in mammals) it is possible to use a combinatorial mechanism to generate specificity in, for example, the long distance interactions in RFig 6 and 7.  As noted in #2.1 above, many of the known chromosomal architectural proteins in flies are polydactyl zinc finger proteins (just like CTCF).  There are some 200 different polydactyl zinc finger proteins in flies, and the function of only a hand full of these is known at present.  However, it seems likely that a reasonable fraction of this class of DNA binding proteins will ultimately turn out to have an architectural function of some type (Bonchuk et al. 2021; Fedotova et al. 2017).  The number of different polydactyl zinc finger protein genes in mammals is nearly 3 times that of flies.  It is really possible that of these, only CTCF is involved in shaping the 3D structure of the mammalian genome?

      Despite having a CTCF paralog and cohesin, the Drosophila genome does not appear to be structure by loop extrusion-pausing. The identification of orientation-dependent elements with pronounced structural effects on genome folding thus may shed light on alternative mechanisms used to regulated genome structure, which in turn may yield insights into the significance of particular folding patterns.

      (2.2) Here we would like to draw the reviewer’s and reader’s attention to Author response image 3, which shows that orientation-dependent pairing interactions have a significant impact on physical interactions between different sequences.  We would also refer the reader to two other publications.  One of these is Kyrchanova et al. (Kyrchanova et al. 2008), which was the first to demonstrate that orientation of pairing interactions matters.  The second is Fujioka et al. (Fujioka et al. 2016), which describes experiments indicating that nhomie and homie pair with each other head-to-tail and with themselves head-to-head.

      On the whole, this study is comprehensive and represents a useful contribution to the 3D genome field. The transgenic lines and Micro-C datasets generated in the course of the work will be valuable resources for the research community. Moreover, the manuscript, while dense in places, is generally clearly written and comprehensive in its description of the work. However, I have a number of comments and critiques of the manuscript, mainly centering on the framing of the experiments and presentation of the Micro-C results and on manner in which the data are analyzed and reported. They are as follows:

      Major Points:

      (1) The authors motivate much of the introduction and results with hypothetical "stem loop" and "circle loop" models of chromosome confirmation, which they argue are reflected in the Micro-C data and help to explain the observed ISH patterns. While such structures may possibly form, the support for these specific models vs. the many alternatives is not in any way justified. For instance, no consideration is given to important biophysical properties such as persistence length, packing/scaling, and conformational entropy. As the biophysical properties of chromatin are a very trafficked topic both in terms of experimentation and computational modeling and generally considered in the analysis of chromosome conformation data, the study would be strengthened by acknowledgement of this body of work and more direct integration of its findings.

      (2.3) The reviewer is not correct in claiming that “stem-loops” and “circle-loops” are “hypothetical.”  There is ample evidence that both types of loops are present in eukaryotic genomes, and that loop conformation has significant readouts in terms of not only the physical properties of TADs but also their functional properties.  Here we would draw the reviewer’s attention to Author response image 3 and Author response image 4 for examples of loops formed by the orientation-dependent pairing of yet other TAD boundary elements.  As evident from the MicroC data in these figures, circle-loops and stem-loops have readily distinguishable contact patterns.  The experiments in Fujioka et al. (Fujioka et al. 2016) demonstrate that homie and nhomie pair with each other head-to-tail, while they pair with themselves head-to-head.  The accompany paper (Bing et al. 2024) also provides evidence that loop topology is reflected both in the pattern of activation of reporters and in the MicroC contact profiles.  We would also mention again Kyrchanova et al. (Kyrchanova et al. 2008), who were the first to report orientation-dependent pairing of endogenous fly boundaries.

      At this juncture it would premature to try to incorporate computational modeling of chromosome conformation in our studies.  The reason is that the experimental foundations that would be essential for building accurate models are lacking.  As should be evident from RFigs. 1-3 above, studies on mammalian chromosomes are simply not of high enough resolution to draw firm conclusions about chromosome conformation: in most studies only the forests are visible.  While the situation is better in flies, there are still too many unknown.  As just one example, it would be important to know the orientation of the boundary pairing interactions that generate each TAD.  While it is possible to infer loop topology from how TADs interact with their neighbors (a plume versus clouds), a conclusive identification of stem- and circle-loops will require a method to unambiguously determine whether a TAD boundary pairs with its neighbor head-to-head or headto-tail.

      (2) Similar to Point 1, while there is a fair amount of discussion of how the observed results are or are not consistent with loop extrusion, there is no discussion of the biophysical forces that are thought to underly compartmentalization such as block-polymer co-segregation and their potential influence. I found this absence surprising, as it is generally accepted that A/B compartmentalization essentially can explain the contact maps observed in Drosophila and other non-vertebrate eukaryotes (Rowley, ..., Corces 2017; PMID 28826674). The manuscript would be strengthened by consideration of this phenomenon.

      (2.4) Compartments in mammals have typically been identified and characterized using lowresolution data sets, and these studies have relied on visualizing compartments using quite large bin sizes (>>1 kb).  Our experiments have nothing to do with the large-scale compartments seen in these Hi-C experiments.  Instead, we are studying the properties of individual TADs: how TADs are formed, the relationship between TAD topology and boundary:boundary pairing, and the impact of TAD topology on interactions between TADs in the immediate neighborhood.  There is no evidence to date that these large compartments or “block polymer co-segregation” have a) any impact on the properties of individual boundary elements, b) have a role in determining which boundary elements actually come together to form a given TAD, c) impact the orientation of the interactions between boundaries that generate the TAD or d) determine how TADs tend to interact with their immediate neighbors.  

      In more recent publications (c.f., Harris et al. 2023) compartments have shrunk in size and instead of being units of several hundred kb, the median length of the “compartmental” unit in mammalian cells is about12 kb. This is not too much different from the size of fly TADs.  However, the available evidence does not support the idea that block polymer co-segregation/co-repulsion drive the TAD:TAD interactions seen in MicroC experiments.  For example, according to this “micro-compartment” model, the specific patterns of interaction between TADs in the CG3294 meta-loop in Author response image 3 would be driven by block polymer co-segregation and co-repulsion. In this model, the TAD upstream of the blue boundary (which contains CG33543, the odorant binding protein gene Obp22a and the Npc2a gene which encodes a protein involved in sterol homeostasis) would share the same chromatin state/biophysical properties as the TAD upstream of the purple boundary, which has the fipi gene. While it is true that CG33543, Obp22a and also the fipi gene are not expressed in embryos, Npc2a is expressed at high levels during embryogenesis, yet it is part of the TAD that interacts with the fipi TAD.  The TAD downstream of the blue boundary contains CG15353 and Nplp4 and it interacts with the TAD downstream of the purple boundary which contains CG3294 and slfCG15353 and Nplp4 are not expressed in the embryo and as such should share a compartment with a TAD that is also silent. However, slf is expressed at a high level in 1216 hr embryos, while CG3294 is expressed at a low level.  In neither case would one conclude that the TADs upstream and downstream of the blue and purple boundaries, respectively, interact because of shared chromatin/biophysical states that drive block polymer co-segregation corepulsion. 

      One might also consider several gedanken experiments involving the long-range interactions that generate the CG3294 meta-loop in Author response image 3.    According to the micro-compartment model the patchwork pattern of crosslinking evident in the CG3294 meta-loop arises because the interacting  TADs share the same biochemical/biophysical properties, and this drives block polymer cosegregation and co-repulsion.  If this model is correct, then this patchwork pattern of TAD:TAD interactions would remain unchanged if we were to delete the blue or the purple boundary.  However, given what we know about how boundaries can find and pair with distant boundaries (c.f., Figure 6 from Muller et el. 1999 and the discussion in #1.2), the result of these gedanken experiments seem clear: the patchwork pattern shown in Author response image 3A will disappear.  What would happen if we inverted the blue or the purple boundary? Would the TAD containing CG33543, Obp22a and Npc2a still interact with fipi as would be expected from the compartment model?  Or would the pattern of interactions flip so that the CG33543, Obp22a and Npc2a TAD interacts with the TAD containing CG3294 and slf?  Again we can anticipate the results based on previous studies: the interacting TADs will switch when the CG3294 meta-loop is converted into a stem-loop.  If this happened, the only explanation possible in the compartment model is that the chromatin states change when the boundary is inverted so that TAD upstream of blue boundary now shares the same chromatin state as the TAD downstream of the purple boundary, while the TAD downstream of the blue boundary shares same state as the TAD upstream of the purple boundary.  However, there is no evidence that boundary orientation per se can induce a complete switch in “chromatin states” as would be required in the compartment model. 

      While we have not done these experimental manipulations with the CG3294 meta-loop, an equivalent experiment was done in Bing et al. (Bing et al. 2024).  However, instead of deleting a boundary element, we inserted a homie boundary element together with two reporters (gfp and LacZ) 142 kb away from the eve TAD.  The result of this gedanken “reverse boundary deletion” experiment is shown in Author response image 5.  Panel A shows the MicroC contact profile in the region spanning the transgene insertion site and the eve TAD in wild type (read “deletion”) NC14 embryos.  Panel B shows the MicroC contact profile from 12-16 hr embryos carrying the homie dual reporter transgene inserted at -142 kb.  Prior to the “deletion”, the homie element in the transgene pairs with nhomie and homie in the eve TAD and this generates a “mini-metaloop.”  In this particular insert, the homie boundary in the transgene (red arrow) is “pointing” in the opposite orientation from the homie boundary in the eve TAD (red arrow).  In this orientation, the pairing of the transgene homie with eve nhomie/homie brings the LacZ reporter into contact with sequences in the eve TAD.  Since a mini-metaloop is formed by homie_à _nhomie/homie pairing, sequences in TADs upstream and downstream of the transgene insert interact with sequences in TADs close to the eve TAD (Author response image 5B).  Taken together these interactions correspond to the interaction patchwork that is typically seen in “compartments” (see boxed region and inset).  If this patchwork is driven as per the model, by block polymer co-segregation and co-repulsion, then it should still be present when the transgene is deleted.  However, panel A shows that the interactions linking the transgene and the sequences in TADs next to the transgene to eve and TADs next to eve disappear when the homie boundary (plus transgene) is “deleted” in wild type flies.

      Author response image 5.

      Boundary deletion and compartments

      A second experiment would be to invert the homie boundary so that instead of pointing away from eve it points towards eve.  Again, if the compartmental patchwork is driven by block polymer co-segregation and co-repulsion, inverting the homie boundary in the transgene should have no effect on the compartmental contact profile.  Inspection of Fig. 7 in Bing et al. (Bing et al. 2024) will show that this prediction doesn’t hold either.  When homie is inverted, sequences in the eve TAD interact with the gfp reporter not the LacZ reporter.  In addition, there are corresponding changes in how sequences in TADs to either side of eve interact with sequences to either side of the transgene insert.  

      Yet another “test” of compartments generated by block polymer co-segregation/co-repulsion is provided by the plume above the eve volcano triangle.  According to the compartment model, sequences in TADs flanking the eve locus form the plume above the eve volcano triangle because their chromatin shares properties that drive block polymer co-segregation.  These same properties result in repulsive interactions with chromatin in the eve TAD, and this would explain why the eve TAD doesn’t crosslink with its neighbors.  If the distinctive chromatin properties of eve and the neighboring TADs drive block polymer co-segregation and co-repulsion, then inverting the nhomie boundary or introducing homie in the forward orientation should have absolutely no effect on the physical interactions between chromatin in the eve TAD and chromatin in the neighboring TADs.  However, Figures 4 and 6 in this paper indicate that boundary pairing orientation, not block polymer co-segregation/co-repulsion, is responsible for forming the plume above the eve TAD. Other findings also appear to be inconsistent with the compartment model. (A) The plume topping the eve volcano triangle is present in NC14 embryos when eve is broadly expressed (and potentially active throughout the embryo).  It is also present in 12-16 hr embryos when eve is only expressed in a very small subset of cells and is subject to PcG silencing everywhere else in the embryo.  B) According to the compartment model the precise patchwork pattern of physical interactions should depend upon the transcriptional program/chromatin state that is characteristic of a particular developmental stage or cell type.  As cell fate decisions are just being made during NC14 one might expect that most nuclei will share similar chromatin states throughout much of the genome.  This would not be true for 12-16 hr embryos.  At this stage the compartmental patchwork would be generated by a complex mixture of interactions in cells that have quite different transcriptional programs and chromatin states.  In this case, the patchwork pattern would be expected to become fuzzy as a given chromosomal segment would be in compartment A in one group of cells and in compartment B in another.   Unlike 12-16 hr embryos,  larval wing discs would be much more homogeneous and likely give a distinct and relatively well resolved compartmental pattern. We’ve examined the compartment patchwork of the same chromosomal segments in NC14 embryos, 12-16 hr embryos and larval wing disc cells.  While there are some differences (e.g., changes in some of the BX-C TADs in the wing disc sample) the compartmental patchwork patterns are surprisingly similar in all three cases. Nor is there any “fuzziness” in the compartmental patterns evident in 12-16 hr embryos, despite the fact that there are many different cell types at this stage of development.  C) TAD interactions with their neighbors and compartmental patchworks are substantially suppressed in salivary gland polytene chromosomes.  This would suggest that features of chromosome structure might be the driving force behind many of the “compartmental” interactions as opposed to distinct biochemical/biophysical of properties of small chromosomal segments that drive polymer co- segregation/co-repulsion.  

      (3) The contact maps presented in the study represent many cells and distinct cell types. It is clear from single-cell Hi-C and multiplexed FISH experiments that chromosome conformation is highly variable even within populations of the same cell, let alone between cell types, with structures such as TADs being entirely absent at the single cell level and only appearing upon pseudobulking. It is difficult to square these observations with the models of relatively static structures depicted here. The authors should provide commentary on this point.

      (2.5) As should be evident from Author response image 1, single-cell Hi-C experiments would not provide useful information about the physical organization of individual TADs, TAD boundaries or how individual TADs interact with their immediate neighbors.  In addition, since they capture only a very small fraction of the possible contacts within and between TADs, we suspect that these single-cell studies aren’t likely to be useful for making solid conclusions about TAD neighborhoods like those shown in Author response image 1 panels A, B, C and D, or Author response image 2.  While it might be possible to discern relatively stable contacts between pairs of insulators in single cells with the right experimental protocol, the stabilities/dynamics of these interactions may be better judged by the length of time that physical interactions are seen to persist in live imaging studies such as Chen et al. (2018), Vazquez et al. (2006) and Li et al. (2011).

      The in situ FISH data we’ve seen also seems problematic in that probe hybridization results in a significant decondensation of chromatin.  For two probe sets complementary to adjacent ~1.2 kb DNA sequences, the measured center-to-center distance that we’ve seen was ~110 nM.  This is about 1/3rd the length that is expected for a 1.2 kb naked DNA fragment, and about 1.7 times larger than that expected for a beads-on-a-string nucleosome array (~60 nM).  However, chromatin is thought to be compacted into a 30 nM fiber, which is estimated to reduce the length of DNA by at least another ~6 fold.  If this estimate is correct, FISH hybridization would appear to result in a ~10 fold decompaction of chromatin.  A decompaction of this magnitude would necessarily be followed by a significant distortion in the actual conformation of chromatin loops.

      (4) The analysis of the Micro-C data appears to be largely qualitative. Key information about the number of reads sequenced, reaps mapped, and data quality are not presented. No quantitative framework for identifying features such as the "plumes" is described. The study and its findings would be strengthened by a more rigorous analysis of these rich datasets, including the use of systematic thresholds for calling patterns of organization in the data.

      Additional information on the number of reads and data quality have been included in the methods section. 

      (5) Related to Point 4, the lack of quantitative details about the Micro-C data make it difficult to evaluate if the changes observed are due to biological or technical factors. It is essential that the authors provide quantitative means of controlling for factors like sampling depth, normalization, and data quality between the samples.

      In our view the changes in the MicroC contact patterns for the eve locus and its neighbors when the nhomie boundary is manipulated are not only clear cut and unambiguous but are also readily evident in the Figs that are presented in the manuscript.  If the reviewer believes that there aren’t significant differences between the MicroC contact patterns for the four different nhomie replacements, it seems certain that they would also remain unconvinced by a quantitative analysis.

      The reviewer also suggests that biological and/or technical differences between the four samples could account for the observed changes in the MicroC patterns for the eve TAD and its neighbors.  If this were the case, then similar changes in MicroC patterns should be observed elsewhere in the genome.  Since much of the genome is analyzed in these MicroC experiments there is an abundance of internal controls for each experimental manipulation of the nhomie boundary.  For two of the nhomie replacements, nhomie reverse and homie forward, the plume above the eve volcano triangle is replaced by clouds surrounding the eve volcano triangle.  If these changes in the eve MicroC contact patterns are due to significant technical (or biological) factors, we should observe precisely the same sorts of changes in TADs elsewhere in the genome that are volcano triangles with plumes.   Author response image 6 shows the MicroC contact pattern for several genes in the Antennapedia complex.  The deformed gene is included in a TAD which, like eve, is a volcano triangle topped by a plume.  A comparison of the deformed MicroC contact patterns for nhomie forward (panel B) with the MicroC patterns for nhomie reverse (panel C) and homie forward (panel D) indicates that while there are clearly technical differences between the samples, these differences do not result in the conversion of the deformed plume into clouds as is observed for the eve TAD.  The MicroC patterns elsewhere in Antennapedia complex are also very similar in all four samples.  Likewise, comparisons of regions elsewhere in the fly genome indicate that the basic contact patterns are similar in all four samples.   So while there are technical differences which are reflected in the relative pixel density in the TAD triangles and the LDC domains, these differences do not result in converting plumes into clouds nor do the alter the basic patterns of TAD triangles and LDC domains.  As for biological differences— the embryos in each sample are at roughly the same developmental stage and were collected and processed using the same procedures. Thus, the biological factors that could reasonably be expected to impact the organization of specific TADs (e.g., cell type specific differences) are not going to impact the patterns we see in our experiments. 

      Author response image 6.

      (6) The ISH effects reported are modest, especially in the case of the HCR. The details provided for how the imaging data were acquired and analyzed are minimal, which makes evaluating them challenging. It would strengthen the study to provide much more detail about the acquisition and analysis and to include depiction of intermediates in the analysis process, e.g. the showing segmentation of stripes.

      The imaging analysis is presented in Fig. 5 is just standard confocal microscopy.  Individual embryos were visualized and scored.  An embryo in which stripes could be readily detected was scored as ‘positive’ while an embryo in which stripes couldn’t be detected was scored as ‘negative.’   

      Recommendations for the authors:

      Editor comments:

      It was noted that the Jaynes lab previously published extensive genetic evidence to support the stem loop and circle loop models of Homie-Nhomie interactions (Fujioka 2016 Plos Genetics) that were more convincing than the Micro-C data presented here in proof of their prior model. Maybe the authors could more clearly summarize their prior genetic results to further try to convince the reader about the validity of their model.

      Reviewer #1 (Recommendations For The Authors):

      Below, I list specific comments to further improve the manuscript for publication. Most importantly, I recommend the authors tone down their proposal that boundary pairing is a universal TAD forming mechanism.

      (1) The title is cryptic.

      (2) The second sentence in the abstract is an overstatement: "In flies, TADs are formed by physical interactions between neighboring boundaries". Hi-C and Micro-C studies have not provided evidence that most TADs in Drosophila show focal interactions between their bracketing boundaries. The authors rely too strongly on prior studies that used artificial reporter transgenes to show that multimerized insulator protein binding sites or some endogenous fly boundaries can mediate boundary bypass, as evidence that endogenous boundaries pair.

      Please see responses #1.1 and #1.3 and figures Author response image 1 and Author response image 3.  Note that using dHS-C, most TADs that we’ve looked at so far are topped by a “dot” at their apex.

      (3) Line 64: the references do not cite the stated "studies dating back to the '90's'".

      The papers cited for that sentence are reviews which discussed the earlier findings.  The relevant publications are cited at the appropriate places in the same paragraph.  

      (4) Line 93: "On the other hand, while boundaries have partner preferences, they are also promiscuous in their ability to establish functional interactions with other boundaries." It was unclear what is meant here.

      Boundaries that a) share binding sites for proteins that multimerized, b) have binding sites for proteins that interact with each other, or c) have binding sites for proteins that can be bridged by a third protein can potentially pair with each other.  However, while these mechanisms enable promiscuous pairing interactions, they will also generate partner preferences (through a greater number of a, b and/or c).

      (5) It could be interesting to discuss the fact that it remains unclear whether Nhomie and Homie pair in cis or in trans, given that homologous chromosomes are paired in Drosophila.

      The studies in Fujioka et al. (Fujioka et al. 2016) show that nhomie and homie can pair both in cis and in trans.  Given the results described in #1.2, we imagine that they are paired in both cis and trans in our experiments.

      (6) Line 321: Could the authors further explain why they think that "the nhomie reverse circle-loop also differs from the nhomie deletion (λ DNA) in that there is not such an obvious preference for which eve enhancers activate expression"?

      The likely explanation is that the topology/folding of the altered TADs impacts the probability of interactions between the various eve enhancers and the promoters of the flanking genes.  

      (7) The manuscript would benefit from shortening the long Discussion by avoiding repeating points described previously in the Results.

      (8) Line 495: "If, as seems likely, a significant fraction of the TADs genome-wide are circle loops, this would effectively exclude cohesin-based loop extrusion as a general mechanism for TAD formation in flies". The evidence provided in this manuscript appears insufficient to discard ample evidence from multiple laboratories that TADs form by compartmentalization or loop extrusion. Multiple laboratories have, for example, demonstrated that cohesin depletion disrupts a large fraction of mammalian TADs. 

      Points made here and in #9 have been responded to in #1.1, #2.1 and #2.4 above.  We would suggest that the evidence for loop extrusion falls short of compelling (as it is based on the analysis of TAD neighborhoods, not TADs—that is forests, not trees) and given the results reported in Goel et al. (in particular Fig. 4 and Sup Fig. 8) is clearly suspect. This is not to mention the fact that cohesin loop-extrusion can’t generate circle-loops TADs, yet circle-loops clearly exist.  Likewise, as discussed in #2.4, it is not clear to us that the shared chromatin states, polymer co-segregation and co-repulsion account for the compartmental patchwork patterns of TAD;TAD interactions. The results from the  experimental manipulations in this paper and the accompanying paper, together with studies by others (e.g., Kyrchanova et al. (Kyrchanova et al. 2008), Mohana et al. (Mohana et al. 2023) would also seem to be at odds with the model for compartments as currently formulated.  

      The unique properties of Nhomie and Homie, namely the remarkable specificity with which they physically pair over large distances (Fujioka et al. 2016) may rather suggest that boundary pairing is a phenomenon restricted to special loci. Moreover, it has not yet been demonstrated that Nhomie or Homie are also able to pair with the TAD boundaries on their left or right, respectively.

      Points made here were discussed in detail in #1.2.  As described in detail in #1.2, It is not the case that nhomie and homie are in “unique” or “special.”  Other fly boundaries can do the same things.  As for whether nhomie and homie pair with their neighbors:  We haven’t done transgene experiments (e.g., testing by transvection or boundary bypass).  Likewise, in MicroC experiments there are no obvious dots at the apex of the neighboring TADs that would correspond to nhomie pairing with the neighboring boundary to the left and homie pairing with the neighboring boundary to the right. However, this is to be expected. As we discussed in in #1.3 above, only MNase resistant elements will generate dots in standard MicroC experiments.  On the other hand, when boundary:boundary interactions are analyzed by dHS-C (c.f., Author response image 4), there are dots at the apex of both neighboring TADs.  This would be direct evidence that nhomie pairs with the neighboring boundary to the left and homie pairs with the neighboring boundary to the right.

      (9) The comment in point 8 also applies to the concluding 2 sentences (lines 519-524) of the Discussion.

      See response to 8 above. Otherwise, the concluding sentences are completely accurate. Validation of the cohesin loop extrusion/CTCF roadblock model will required demonstrating a) that all TADs are either stem-loops or unanchored loops and b) that TAD endpoints are always marked by CTCF. 

      The likely presence of circle-loops and evidence that TAD boundaries that don’t have CTCF (c.f.,Goel et al. 2023) already suggests that this model can’t (either fully or not all) account for TAD formation in mammals. 

      (10) Figs. 3 and 6: It would be helpful to add the WT screenshot in the same figure, for direct comparison.

      It is easy enough to scroll between Figs-especially since nhomie forward looks just like WT.

      (11) Fig. 6: It would be helpful to show a cartoon view of a circle loop to the right of the Micro-C screenshot, as was done in Fig. 3.

      Good idea.   Added to the Fig.

      (12) Fig. 5: It would be helpful to standardize the labelling of the different genotypes throughout the figures and panels ("inverted" versus "reverse" versus an arrow indicating the direction).

      Fixed.

      Reviewer #2 (Recommendations For The Authors):

      Minor Points:

      (1) The Micro-C data does not appear to be deposited in an appropriate repository. It would be beneficial to the community to make these data available in this way.

      This has been done.

      (2) Readers not familiar with Drosophila development would benefit from a gentle introduction to the stages analyzed and some brief discussion on how the phenomenon of somatic homolog pairing might influence the study, if at all.

      We included a rough description the stages that were analyzed for both the in situs and MicroC. We thought that an actual description of what is going on at each of the stages wasn’t necessary as the process of development is not a focus of this manuscript.  In other studies, we’ve found that there are only minor differences in MicroC patterns between the blastoderm stage and stage 12-16 embryos.  While these minor differences are clearly interesting, we didn’t discuss them in the text.   In all of experiments chromosomes are likely to be paired.  In NC14 embryos (the stage for visualizing eve stripes and the MicroC contact profiles in Fig. 2) replication of euchromatic sequences is thought to be quite rapid.  While homolog pairing is incomplete at this stage, sister chromosomes are paired.  In stage 12-16 embryos, homologs will be paired and if the cells are arrested in G2, then sister chromosome will also be paired.  So in all of experiments, chromosomes (sisters and/or homologs) are paired. However, since we don’t have examples of unpaired chromosomes, our experiments don’t provide any info on how chromosome pairing might impact MicroC/expression patterns.

      (3) "P > 0.01" appears several times. I believe the authors mean to report "P < 0.01".

      Fixed.  

      References for Response

      Batut PJ, Bing XY, Sisco Z, Raimundo J, Levo M, Levine MS. 2022. Genome organization controls transcriptional dynamics during development. Science. 375(6580):566-570.

      Bing X, Ke W, Fujioka M, Kurbidaeva A, Levitt S, Levine M, Schedl P, Jaynes JB. 2024. Chromosome structure i: Loop extrusion or boundary:Boundary pairing? eLife.

      Blanton J, Gaszner M, Schedl P. 2003. Protein:Protein interactions and the pairing of boundary elements in vivo. Genes Dev. 17(5):664-675.

      Bonchuk A, Boyko K, Fedotova A, Nikolaeva A, Lushchekina S, Khrustaleva A, Popov V, Georgiev P. 2021. Structural basis of diversity and homodimerization specificity of zincfinger-associated domains in drosophila. Nucleic Acids Res. 49(4):2375-2389.

      Bonchuk A, Kamalyan S, Mariasina S, Boyko K, Popov V, Maksimenko O, Georgiev P. 2020. Nterminal domain of the architectural protein ctcf has similar structural organization and ability to self-association in bilaterian organisms. Sci Rep. 10(1):2677.

      Chen H, Levo M, Barinov L, Fujioka M, Jaynes JB, Gregor T. 2018. Dynamic interplay between enhancer–promoter topology and gene activity. Nat Genet. 50(9):1296.

      Fedotova AA, Bonchuk AN, Mogila VA, Georgiev PG. 2017. C2h2 zinc finger proteins: The largest but poorly explored family of higher eukaryotic transcription factors. Acta Naturae. 9(2):47-58.

      Fujioka M, Ke W, Schedl P, Jaynes JB. 2024. The homie insulator has sub-elements with different insulating and long-range pairing properties. bioRxiv. 2024.02.01.578481.

      Fujioka M, Mistry H, Schedl P, Jaynes JB. 2016. Determinants of chromosome architecture: Insulator pairing in cis and in trans. PLoS Genet. 12(2):e1005889.

      Galloni M, Gyurkovics H, Schedl P, Karch F. 1993. The bluetail transposon: Evidence for independent cis‐regulatory domains and domain boundaries in the bithorax complex. The EMBO Journal. 12(3):1087-1097.

      Gaszner M, Vazquez J, Schedl P. 1999. The zw5 protein, a component of the scs chromatin domain boundary, is able to block enhancer-promoter interaction. Genes Dev. 13(16):2098-2107.

      Goel VY, Huseyin MK, Hansen AS. 2023. Region capture micro-c reveals coalescence of enhancers and promoters into nested microcompartments. Nat Genet. 55(6):1048-1056.

      Harris HL, Gu H, Olshansky M, Wang A, Farabella I, Eliaz Y, Kalluchi A, Krishna A, Jacobs M, Cauer G et al. 2023. Chromatin alternates between a and b compartments at kilobase scale for subgenic organization. Nat Commun. 14(1):3303.

      Hart CM, Zhao K, Laemmli UK. 1997. The scs' boundary element: Characterization of boundary element-associated factors. Mol Cell Biol. 17(2):999-1009.

      Hsieh TS, Cattoglio C, Slobodyanyuk E, Hansen AS, Rando OJ, Tjian R, Darzacq X. 2020. Resolving the 3d landscape of transcription-linked mammalian chromatin folding. Mol Cell. 78(3):539-553.e538.

      Krietenstein N, Abraham S, Venev SV, Abdennur N, Gibcus J, Hsieh TS, Parsi KM, Yang L, Maehr R, Mirny LA et al. 2020. Ultrastructural details of mammalian chromosome architecture. Mol Cell. 78(3):554-565.e557.

      Kyrchanova O, Chetverina D, Maksimenko O, Kullyev A, Georgiev P. 2008. Orientation-dependent interaction between drosophila insulators is a property of this class of regulatory elements. Nucleic Acids Res. 36(22):7019-7028.

      Kyrchanova O, Ibragimov A, Postika N, Georgiev P, Schedl P. 2023. Boundary bypass activity in the abdominal-b region of the drosophila bithorax complex is position dependent and regulated. Open Biol. 13(8):230035.

      Kyrchanova O, Kurbidaeva A, Sabirov M, Postika N, Wolle D, Aoki T, Maksimenko O, Mogila V, Schedl P, Georgiev P. 2018. The bithorax complex iab-7 polycomb response element has a novel role in the functioning of the fab-7 chromatin boundary. PLoS Genet. 14(8):e1007442. Kyrchanova O, Sabirov M, Mogila V, Kurbidaeva A, Postika N, Maksimenko O, Schedl P, Georgiev P. 2019a. Complete reconstitution of bypass and blocking functions in a minimal artificial fab-7 insulator from drosophila bithorax complex. Proceedings of the National Academy of Sciences.201907190.

      Kyrchanova O, Wolle D, Sabirov M, Kurbidaeva A, Aoki T, Maksimenko O, Kyrchanova M, Georgiev P, Schedl P. 2019b. Distinct elements confer the blocking and bypass functions of the bithorax fab-8 boundary. Genetics.genetics. 302694.302019.

      Kyrchanova O, Zolotarev N, Mogila V, Maksimenko O, Schedl P, Georgiev P. 2017. Architectural protein pita cooperates with dctcf in organization of functional boundaries in bithorax complex. Development. 144(14):2663-2672.

      Li H-B, Muller M, Bahechar IA, Kyrchanova O, Ohno K, Georgiev P, Pirrotta V. 2011. Insulators, not polycomb response elements, are required for long-range interactions between polycomb targets in drosophila melanogaster. Mol Cell Biol. 31(4):616-625.

      Li X, Tang X, Bing X, Catalano C, Li T, Dolsten G, Wu C, Levine M. 2023. Gaga-associated factor fosters loop formation in the drosophila genome. Mol Cell. 83(9):1519-1526.e1514.

      Mohana G, Dorier J, Li X, Mouginot M, Smith RC, Malek H, Leleu M, Rodriguez D, Khadka J, Rosa P et al. 2023. Chromosome-level organization of the regulatory genome in the drosophila nervous system. Cell. 186(18):3826-3844.e3826.

      Muller M, Hagstrom K, Gyurkovics H, Pirrotta V, Schedl P. 1999. The mcp element from the drosophila melanogaster bithorax complex mediates long-distance regulatory interactions. Genetics. 153(3):1333-1356.

      Muravyova E, Golovnin A, Gracheva E, Parshikov A, Belenkaya T, Pirrotta V, Georgiev P. 2001. Loss of insulator activity by paired su(hw) chromatin insulators. Science. 291(5503):495498.

      Postika N, Metzler M, Affolter M, Müller M, Schedl P, Georgiev P, Kyrchanova O. 2018. Boundaries mediate long-distance interactions between enhancers and promoters in the drosophila bithorax complex. PLoS Genet. 14(12):e1007702.

      Samal B, Worcel A, Louis C, Schedl P. 1981. Chromatin structure of the histone genes of d. Melanogaster. Cell. 23(2):401-409.

      Sigrist CJ, Pirrotta V. 1997. Chromatin insulator elements block the silencing of a target gene by the drosophila polycomb response element (pre) but allow trans interactions between pres on different chromosomes. Genetics. 147(1):209-221.

      Udvardy A, Schedl P. 1984. Chromatin organization of the 87a7 heat shock locus of drosophila melanogaster. J Mol Biol. 172(4):385-403.

      Vazquez J, Muller M, Pirrotta V, Sedat JW. 2006. The mcp element mediates stable long-range chromosome-chromosome interactions in drosophila. Molecular Biology of the Cell. 17(5):2158-2165.

      Zolotarev N, Fedotova A, Kyrchanova O, Bonchuk A, Penin AA, Lando AS, Eliseeva IA, Kulakovskiy IV, Maksimenko O, Georgiev P. 2016. Architectural proteins pita, zw5,and zipic contain homodimerization domain and support specific long-range interactions in drosophila. Nucleic Acids Res. 44(15):7228-7241.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study examines the role of a host in conditions that shift pathogenicity of opportunistic microbes. The use of single-cell microbial transcriptomics and metabolomics to demonstrate the host's effects on pathogen dynamics is interesting and convincing. However, the connection to host antimicrobial peptides driving these effects is incomplete and would benefit from additional evidence and improved explanation in the text. This paper has the potential to be of broad interest to those working in host-microbe (microbiome and pathogen) interactions.

      We appreciate the editors for organizing our manuscript and providing eLife assessment. We went through each comment and carried out some necessary experiments. According to the comments, we here provide additional evidence that further supports our findings in this revised manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, Wang and colleagues used Drosophila-Serratia as a host-microbe model to investigate the impact of the host on gut bacteria. The authors showed that Drosophila larvae reduce S. marcescens abundance in the food likely due to a combination of mechanical force and secretion of antimicrobial peptides. S. marcescens exposed to Drosophila larvae lost virulence to flies and could promote larval growth similar to typical Drosophila gut commensals. These phenotypic changes were reflected in the transcriptome and metabolome of bacteria, suggesting that the host could drive the switch from pathogenicity to commensalism in bacteria. Further, the authors used single-cell bacterial RNA-seq to demonstrate the heterogeneity in gut bacterial populations.

      Strengths:

      This is a valuable work that addresses an important question of the effect of the host on its gut microbes. The authors could convincingly demonstrate that gut bacteria are strongly affected by the host with important consequences for both interacting partners. Moreover, the authors used state-of-the-art bacterial single-cell RNA-seq to reveal heterogeneity in host-associated commensal populations.

      Weaknesses:

      Some of the conclusions are not fully supported by the data.

      Specifically, in lines 142-143, the authors claim that larva antagonizes the pathogenicity of S. marcescens based on the survival data. I do not fully agree with this statement. An alternative possibility could be that, since there are fewer S. marcescens in larvae-processed food, flies receive a lower pathogen load and consequently survive. Can the authors rule this out?

      Also, the authors propose that Drosophila larvae induce a transition from pathogenicity to commensalism in S. marcescens and provide nice phenotypic and transcriptomic data supporting this claim. However, is it driven only by transcriptional changes? Considering high mutation rates in bacteria, it is possible that S. marcescens during growth in the presence of larvae acquired mutations causing all the observed phenotypic and transcriptional changes. To test this possibility, the authors could check how long S. marcescens maintains the traits it acquires during growth with Drosophila. If these traits persist after reculturing isolated bacteria, it is very likely they are caused by genome alterations, if not - likely it is a phenotypic switch driven by transcriptional changes.

      We thank the reviewer for providing a feasible method to distinguish the shift in transcriptional profile from genomic mutations. According to this valuable suggestion, we checked phenotypic and transcriptional changes after re-culturing the bacterium that had coexisted with larvae. We found that all phenotypes can be recovered after re-culturing. The new data supported our previous result that a phenotypic switch was driven by transcriptional changes rather than genome mutations. We now add these results to the text with figure supplement 3 (line 147-151, 192-194). Please see the following text.

      “To rule out the possibility that phenotypic alterations could stem from genomic mutations, we examined the prodigiosin yield and CFUs of re-culturing S. marcescens that had coexisted with larvae. Our results showed that neither prodigiosin yield nor CFUs of re-culturing S. marcescens differed from the original strain (Figure 2-figure supplement 3A-C), suggesting that a phenotypic switch was driven primarily by transcriptional reprogramming.” “Consistent with the previous result that this phenotypic switch was driven by transcriptional changes, the expression of virulent and growth genes was recovered after re-culturing (Figure 3-figure supplement 3D, E).”

      For the first question, we admit the possibility that the high morality of flies could result from the acquirement of a higher pathogen load, because of an increase in the bacterial load of single S. marcescens. However, host pathogenesis is normally determined by the virulence of pathogens rather than the number of bacteria. For example, hosts constantly harbor astonishing commensals in their guts, but remain healthy. This evidence suggests that it was the property (virulence) of a pathogen that is more important to affect the health status of the hosts. Moreover, an increase in virulence of single S. marcescens was verified by real-time PCR (Fig. 2F) and TE (Fig. 2G). Taken together, we could draw a conclusion that the impaired survival of flies challenged with single S. marcescens mainly arose from an increase in the virulence of S. marcescens. Thanks for your understanding!

      Reviewer #2 (Public Review):

      Summary:

      While many studies have explored the impacts of pathogens on hosts, the effect of hosts on pathogens has received less attention. In this manuscript, Wang et al. utilize Drosophila melanogaster and an opportunistic pathogen, Serratia marcescens, to explore how the host impacts pathogenicity. Beginning with an observation that larval presence and density impacted microbial growth in fly vials (which they assess qualitatively as the amount of 'slick' and quantitatively as microbial load/CFUs), the authors focus on the impact of axenic/germ-free larvae on an opportunistic pathogen S. marcescens. Similar to their observations with general microbial load, they find that larvae reduce the presence of a pinkish slick of Sm, indicative of its secondary metabolite prodigiosin. The presence of larvae alters prodigiosin production, pathogen load, pathogen cellular morphology, and virulence, and this effect is through transcriptional and metabolic changes in the pathogen. Overall, they observe a loss of virulence factors/pathways and an increase in pathways contributing to growth. Given the important role the host plays in this lifestyle shift, the authors then examined host features that might influence these effects, focusing on the role of antimicrobial peptides (Amps). The authors combine the use of synthetic Amps and an Amp-deficient fly line and conclude much of the larval inhibitory effect is due to their production of AMPs.

      Strengths:

      This is a very interesting question and the use of Drosophila-Serratia marcescens is a great model to explore these interactions and effects.

      The authors have an interesting and compelling phenotype and are asking a unique question on the impact of the host on the pathogen. The use of microbial transcriptomics and metabolomics is a strength, especially in order to assess these impacts on the pathogen level and at the single-cell level to capture heterogeneity.

      Weaknesses:

      Overall, the writing style in the manuscript makes it difficult to fully understand and appreciate the data and its interpretation.

      The data on the role of AMPs would benefit from strengthening. Some of the arguments in the text of that section are also counterintuitive. The authors show that △AMP larvae have a reduced impact on Sm as compared to wt larvae, but it seems less mild of an effect than that observed with wt excreta (assuming the same as secreta in Figures 7, should be corrected or harmonized). Higher doses of AMPs give a phenotype similar to wt larvae, but a lower dose (40 ng/ul) gives phenotypes more similar to controls. The authors argue that this data suggests AMPs are the factor responsible for much of the inhibition, but their data seems more to support that it's synergistic- you seem to still need larvae (or some not yet defined feature larvae make, although secreta/excreta was not sufficient) + AMPs to see similar effects as wt. Based on positioning and color scheme guessing that AMP 40ng/ul was used in Figures 7D-H, but could not find this detail in the text, methods, or figure legend and it should be indicated. This section does not seem to be well supported by the provided data, and this inconsistency greatly dampened this reviewer's enthusiasm for the paper.

      We thank the reviewer’s valuable comments and suggestions. We admitted that some photos of the pinkish slick (prodigiosin) are counterintuitive in Figure 7 as well as figure supplement 2B. Here comes the reason. Single S. marcescens produced prodigiosin that only stayed on the surface of fly agar medium. As we know, larvae can agitate food and form a stratification of prodigiosin, even making higher prodigiosin yield inside food lighter than the surface slick of prodigiosin. We mentioned it in the previous manuscript line 166-168. This is why some photos treated with excreta and a lower dose of AMP seemed more intense than those with WT larvae. However, we precisely quantified the prodigiosin yield inside food with the spectrophotometer, so we provided a prodigiosin yield following the photos of the slick. Therefore, we drew our conclusions mainly relying on the quantification of the prodigiosin yield. We actually used cecropin A for our experiments, so we added this information in the text. We hope that our replies can reignite your enthusiasm for our manuscript, and thanks for your great support!

      Reviewer #3 (Public Review):

      In this study, Wang and coworkers established a model of Drosophila-S. marcescens interactions and thoroughly examined host-microbe bidirectional interactions. They found that:

      (1) Drosophila larvae directly impact microbial aggregation and density;

      (2) Drosophila larvae affect microbial metabolism and cell wall morphology, as evidenced by reduced prodigiosin production and EPS production, respectively;

      (3) Drosophila larvae attenuate microbial virulence;

      (4) Drosophila larvae modulate the global transcription of microbes for adaptation to the host;

      (5) Microbial single-cell RNA sequencing (scRNA-seq) analysis revealed heterogeneity in microbial pathogenicity and growth;

      (6) AMPs are key factors controlling microbial virulence phenotypes.

      Taken together, they concluded that host immune factors such as AMPs are directly involved in the pathogen-to-commensal transition by altering microbial transcription.

      General comments:

      In general, this study is intriguing as it demonstrates that host immune effectors such as AMPs can serve as critical factors capable of modulating microbial transcription for host-microbe symbiosis. However, several important questions remain unanswered. One such question is: What is the mechanism by which AMPs modulate the pathogen-to-commensal transition? One hypothesis suggests that antimicrobial activity may influence microbial physiology, subsequently modulating transcription for the transition from pathogen to commensal. In this context, it is imperative to test various antibiotics with different modes of action (e.g., targeting the cell wall, transcription, or translation) at sub-lethal concentrations to determine whether sub-lethal doses of antimicrobial activity are sufficient to induce the pathogen-to-commensal transition.

      Thank you for the important comments on our manuscript. We checked the effect of antibiotics (5 μg/μl kanamycin and 10 μg/μl ampicillin) on the virulence switch of S. marcescens. We found that the two antibiotics with the sub-lethal doses similarly resulted in a decrease in prodigiosin yield and virulence expression of S. marcescens. Intriguingly, the two antibiotics also resulted in a dramatic decline in the bacterial load and the expression of genes involved in cell growth. These results suggest that antibiotics reduced the virulence primarily through suppressing most activities of bacteria.

      We found that larvae and AMPs at 40 μg/μl modestly resulted in a decrease in bacterial load and an increase in the relative level of genes involved in cellular proliferation, suggesting that AMPs could maintain the exponential phase of bacterial growth. This result is consistent that Drosophila larvae can support the long-term persistence of commensals in the shared habitat (DOI: 10.1016/j.cmet.2017.11.011). The inhibition could prevent bacteria from rapidly exhausting their nutritional resources, and consequently maintain symbiosis. It is likely that AMPs could maintain S. marcescens at the exponential phase of cell growth and prevent bacteria from rapidly exhausting their nutritional resources.

      Author response image 1.

      (A) Representative images of surface slick with S. marcescens alone, with kanamycin (5 μg/μl) and ampicillin (10 μg/μl). (B) The prodigiosin production of S. marcescens alone, with kanamycin (5 μg/μl) and ampicillin (10 μg/μl). n = 6 for each. (C) Bacterial loads of S. marcescens alone, with kanamycin (5 μg/μl) and ampicillin (10 μg/μl). n = 6 for each. (D, E) RT-qPCR analysis of the expression levels of downregulated and upregulated genes in the S. marcescens alone, with kanamycin (5 μg/μl) and ampicillin (10 μg/μl). n = 3 for each. Means ± SEMs. All variables have different letters, they are significantly different (p < 0.05). If two variables share a letter, they are not significantly different (p > 0.05). ns, no significance. Kruskal-Wallis test followed by Dunn’s multiple comparisons test.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Here are some specific points that need to be addressed:

      (1) Lack of statistical analysis for many figures. The authors should perform and report the statistical analysis for all figures where it is currently lacking, specifically, Figures 2C, D, E, F, H; Figures 3E, F; Figures 7G, H; Figure S2E, Figures S3D, E.

      Thanks for your valuable suggestions. We re-checked the manuscript and performed the statistical analysis for these figures.

      (2) For graphs showing dots, it should be specified what exactly individual dots show and how many animals were used per replicate. Also, time points at which specific analysis was performed should be specified.

      We provided the important information in the legends in the revised manuscript.

      (3) Figure 2. No letters illustrating statistical significance are shown, although this is claimed in the legend (line 848).

      We added statistical significance in the updated Figure 2.

      (4) In Figure 7, the authors used AMPs of defined concentration, but it is not specified what exactly these AMPs are. Please provide the full composition of the AMP mix used.

      We used the antimicrobial peptide cecropin A produced by a silkworm. We added this information in the methods line 487-488 and Figure 7 legend.

      (5) Figure S2B. To me, it looks like that medium with larvae is redder than after mechanical force. I find it hard to believe the quantification in panel C that the medium with larvae has 3 times less pigment as compared to the mechanical force.

      Larvae could only agitate the surface of food (~0.4 cm), but sticks completely agitated the food up to 3 cm. Thus, the layer of food with pink pigment with agitation seemed much deeper than with larvae, which was responsible for the counterintuitively. We explained it in the previous manuscript (line 166-168). “Of note, the surface of the slick with agitation appeared lighter than that of larvae, mainly due to a stratification of prodigiosin following agitation.”

      (6) The authors need to proofread the manuscript as there are missing words, terms that need definition, and wrong terms. For example, L86 - naked eye?, L117 - what do the authors mean by co-culture?, L309 - not resist but rather combat, L347 - Species? or competition?, Figure 2A - 2nd?

      We have corrected these errors in the new manuscript. We added an "eye" in L86. Co-culture means “S. marcescens in co-culture”. Interspecies competition for nearly the same or similar nutrients and space occurs in the habitat.

      (7) The authors should reorganize either the text or the figures' order in a way that the figures are described in a consecutive order (Figure 1A, B ... and not Figure 1D first and then 1A).

      Thanks for your valuable advice. We reorganize the order of the text.

      (8) Do the authors have an idea which bacteria they quantified in Figures 1E to 1G? I didn't find the medium that was used for culturing. Also, in Figure 1F, Is the control group comprised of females or males?

      Mixed bacteria (bacteria in the living environment of Drosophila) were quantified in the NA medium that supports the growth of Drosophila microbiota (Jia Y, et al. Nat Commun. 2021) line 474-475. The control group comprised of both males and females with a 1:1 ratio. Similarly, the aged group contained 100 50-day-aged flies, male: female = 1:1. We provided details in Figure 1 legend line 849-850, 851-852.

      (9) L118-129. it is not possible to make all these statements without any statistical analysis. To me, at 96h both treatments have the same CFUs, while the authors claim they are different.

      We added statistical analysis in the current version. In fact, single S. marcescens became collapsed after 72 h post inoculation, and the CFU number of single S. marcescens declined step by step. The bacterial load of S. marcescens in co-culture was comparable (at 96 h post-inoculation, p>0.05) or higher (at 120 h post-inoculation, p<0.001) than S. marcescens alone, possibly explained by the possibility that bacteria rapidly exhausted the nutritional resources and collapsed through population suicide. We rewrote this sentence line 125-129 in the updated manuscript.

      (10) L136. term "symbionts" is not appropriate here.

      We change “symbionts” into “S. marcescens”.

      (11) In Figure 1, the authors used flies of different fitness: weak, strong, and infertile. They should be specific and describe exactly what these terms mean, are these mutants or treatments that affect the fitness?

      We apologize for this missing information and add them in the method and legend. Strong flies (wild-type fly CS), weak flies (yw; Sp/CyO; MKRS/TM6B), infertile flies (dfmr150M null mutant) Figure 1 legend line 849-850.

      (12) Figure S2. The title of this figure is misleading, please modify it. Mechanical force did affect S. marcescens but to a lesser degree as compared to larvae.

      Thank you for your suggestion. We admit that mechanical force affected S. marcescens but to a lesser degree as compared to larvae, so we changed the title to "Biological factors mainly determine S. marcescens lifestyle."

      Reviewer #2 (Recommendations For The Authors):

      General improvement to writing and presentation (see below):

      Describing confluent growth would make more sense than 'slick' and then using descriptions of broken, etc. "colour intensity of the surface slick".

      We used the slick to describe visible surface films of bacteria, which has been used in the previous study (DOI: 10.1038/s43705-023-00307-8). Slick is equal to confluent growth, but seems simple and easy than confluent growth. To make sense, we add this reference to the text.

      We reorganized the text of Figure 1.

      Suggest more specific language to describe observations. For example: Bacterial loading - S. marcescens growth (for example: the presence of dense fly populations reduced Sm growth).

      Thanks for the suggests. We replaced some of them.

      Symbiont, microbiota, microbiome, etc were all used interchangeably throughout the manuscript, but I am not sure I would call Sm part of the indigenous microbiome. Suggest to ensure proper usage and then harmonize throughout the ms.

      We used microbes and microbiome to replace symbiont and microbiota, respectively.

      Details missing from the message and Figure legends that would be helpful (including and especially Figure 7 - what AMP concentration?)

      Thanks for valuable comments. According to this comment, we provided concrete details in the Materials and methods and Figure 7 legend about AMPs, including the source and concentration of AMPs line 487-488, 954-955. Please see the response below.

      L73: define 'these issues" maybe or lead better with the prior sentence, it is not evident as currently written.

      Change "to address these issues" to " To investigate whether and/or how the host modulates bacterial lifestyles,” and merge two paragraphs.

      L74: repetitive sentence with the above.

      Thanks for pointing out this detail. We deleted it.

      L86: naked 'eye'.

      Added.

      L87: what is meant by 'weak flies'?

      Genotypes were added in the updated manuscript. Weak fly stocks display weaker activity and generate fewer eggs than WT flies.

      L96: bacterial load, not loading.

      Corrected.

      L128: no evidence to support, could be reflective of increased numbers in dying/dead larvae that impact total numbers in the vial.

      The number of CFUs of S. marcescens alone was gradually decreased at 96 h post-inoculation. In addition, we observed pale biofilm on the surface of the medium at the late stage. The numbers of CFUs of S. marcescens alone at the later stages were reduced (compared to the peak load at 48 h post-inoculation), so it was deterred that bacteria could undergo ecological suicide. Ecological suicide of the bacterial population was similarly examined by recording the number of CFUs in the medium over time (Ratzke C, et al. Nat Ecol Evol. 2018.). Taken together, we draw a conclusion that bacteria possibly underwent ecological suicide.

      L129: the prior sentence is in contradiction, reduced load only at early time points in the presence of larvae....

      Thanks for pointing out this detail. We added " before 72 h post-inoculation " in the sentence.

      L134: data is only focused on S marcescens, so inferring to 'symbionts' broadly is outside study.

      We change “symbionts” into “S. marcescens”.

      L139: sentence poorly written and confusing.

      We re-organized this sentence.

      To this end, we sought to examine the S. marcescens lifestyle switch from pathogenicity to commensalism by assessing the respective survival of flies on the fly medium that had been processed by single or coexisting S. marcescens.

      L189: evidence for long-term symbiosis is not well established in this paper, suggest editing this language throughout to more specifically reflect what the data supports and leave such interpretations to discussion points and future work.

      Thanks for your valuable advice. We deleted long-term and “thereby promoting the fitness of symbionts in the long maintenance.”.

      L192; used metabolomics to assess the impacts of larvae on bacterial metabolism, as currently written does not make sense.

      We rewrote this sentence. “Next, we investigated whether larvae could further elicit changes in the metabolism of S. marcescens using untargeted metabolomics.”

      L331: the use of monitored here is not correct/odd.

      We changed 'monitored' to 'reshaping’.

      L340: While the authors initially see a cost to Sm in reduced load (CFUs) at 120 h populations associated with larvae become higher - there is also a cost to producing virulence factors, which their RNASeq and metabolomics data support - trade-offs between growth and virulence.

      Thanks for your suggestion. We added “before 72 hours post inoculation” to define the early stage of the bacterial growth in the sentence.

      Reviewer #3 (Recommendations For The Authors):

      (1) Figures 1 A-D: What defines weak and strong flies, and what criteria determine the robustness of flies? How was the experiment conducted? The manuscript lacks details on this matter.

      We thank you for your comments. We lack a criterium, but the robustness of flies comes from daily experience. Weak fly stocks display weak activity and generate fewer eggs than WT flies. Genotypes with different robustness were added in the legend in the updated manuscript

      (2) The authors mentioned, "Noteworthily, the number of CFUs of S. marcescens alone was lower than S. marcescens in co-cultures at the late stage (at 96 h post inoculation), likely that bacteria rapidly exhausted their nutritional resources and underwent ecological suicide." How did they determine that the bacteria exhausted nutritional resources and underwent ecological suicide? One might speculate that larvae could have removed the bacteria simply by consuming them.

      Thanks for this comment. Virtually, there were no larvae inside the vials with single S. marcescens, so bacterial cells were not consumed. However, the numbers of CFUs of S. marcescens alone at the later stages were reduced (compared to the peak load at 48 h post-inoculation), so it was deterred that bacteria could undergo ecological suicide. Ecological suicide of the bacterial population was examined by recording the number of CFUs in the medium over time (Ratzke C, et al. Nat Ecol Evol. 2018.). A similar method was also applied to the number of CFUs of S. marcescens. Taken together, we draw a conclusion that bacteria possibly underwent ecological suicide.

      (3) Figure 2E: The experimental details should be provided in the text. What was the CFU of the bacteria used in this survival experiment?

      We provided further experimental details in the legend line 869-870. The same amount of inocula was used in both single and coculturing S. marcescens.

      (4) The experimental data in Figures 2G and 2H do not sufficiently prove the relationship between the width of the cell wall and virulence, as it lacks experimental validation.

      Previous studies (DOI: 10.1371/journal.ppat.1005946) reveal that glucosylating toxins on the surface are primary virulence determinants, so an increased surface-anchored polysaccharide and protein profile promotes the virulence of the pathogen. Alterations in cell surface (the width of the cell wall) can be examined by TE. Moreover, TE was used to observe changes in the virulence of S. marcescens (DOI: 10.1093/nar/gkab1186). We think that the width of the cell wall could be used to reflect virulence in S. marcescens.

      (5) While it's acknowledged that agitation decreases the color intensity of the bacteria, comparing mechanical agitation with larval crawling seems inappropriate, as the mechanical forces exerted by both methods are not of the same magnitude.

      Thanks for the suggestion. In fact, food was agitated more heavily by glass sticks than by larvae, because larvae merely agitated the surface of food (about 0.5 cm-depth). If the decrease in bacterial load and color was related to the magnitude of agitation, larvae would confer a less decrease (from the decrease in stick agitation) in bacterial load than the sticks. Consequently, it would further support our result that biofactors more importantly confer the inhibition of S. marcescens than force.

      (6) Figure 4D: with this metabolome data, they mentioned, "host suppresses differentiation of S. marcescens into the population with pathogenicity." What evidence supports the claim that downregulation of amino acid metabolism, phosphotransferase system, and ABC transporter directly correlates with decreased pathogenicity?

      Thanks for the comment. Earlier studies showed that amino acid-derived quorum sensing molecules are closely related to bacterial pathogenicity (Defoirdt T. PLoS Pathog. 2019; Wen J, et al. Microbiol Spectr. 2022). Moreover, the phosphotransferase system and ABC transporter can transport and/or produce virulence factors. Therefore, we claimed that downregulation of amino acid metabolism, phosphotransferase system, and ABC transporter directly were related to decreased pathogenicity. To support this claim, we add some references in the updated manuscript line 662-664, 827-830.

      (7) Serotonin: Does serotonin also reduce the virulence of S. marcescens?

      Our primary result showed that serotonin indeed could reduce the virulence of S. marcescens (figure supplement 4), because the survival rate of adult flies was increased and the expression levels of virulence-related genes of S. marcescens alone in the case of serotonin.

      (8) Figures 6D, E, H, I: The expression of key genes should be verified using quantitative real-time polymerase chain reaction (qRT-PCR), as scRNA-seq expression levels might not accurately reflect the true expression levels.

      Bacterial single-cell RNA-seq can evaluate alterations in gene expression in the single-cell resolution. The expression of key genes screened by scRNA-seq was changed only in subpopulations, so the average expression of these genes would be comparable when mixed with a large population. We are afraid that qRT-PCR could be illegible to verify the expression of genes in subpopulations.

      (9) Figure 7: The authors mentioned. "AMPs were supplemented to fly food". However, I could not find information regarding which AMPs and their respective concentrations (i.e., concentration of each AMP) were used in this study. This is a critical aspect of the research; therefore, details should be provided.

      Thanks for your important suggestions. We used the antimicrobial peptide cecropin A, which is produced by silkworms. We provided this information in the methods line 487-488. The concentrations of cecropin A were added in Figure 7 legend.

      (10) Figure 7: Delta AMP + AMP exhibited a stronger effect on the bacteria compared to AMP alone, indicating that immune effectors other than AMP may be involved. Since the IMD pathway is necessary for most immune effectors, including AMP, it would be interesting to test IMD pathway mutant animals and compare them with Delta AMP. Delta AMP + AMP exhibited a stronger effect on the bacteria compared to AMP alone. 

      We appreciate this important question. Indeed, Delta AMP + AMP exhibited a stronger effect on the bacteria compared to AMP alone. We admitted that immune effectors other than AMP may be involved. Alternatively, mechanical force, to a less extent, accounted for the stronger effect on the bacteria (Explained by larvae agitation in figure supplement 2). To rule out this possibility, we examined the effect of total immune effectors on the bacterial load and the prodigiosin yield of S. marcescens using the IMD pathway mutant (RelE20 larvae). Our result showed that the optical density and yield of prodigiosin in Delta AMP group did not significantly differ from the ones in RelE20 group. Moreover, the load of S. marcescens associated with Delta AMP mutant was comparable to that of S. marcescens associated with RelE20 mutant. These results suggested that AMPs play a major role in recapitulating the response of _S. marcescens t_o larvae.

      “To rule out the potential role of other immune effectors, we turned to the IMD pathway mutant RelE20 that is deficient in total immune effectors. Our result showed that the optical density and yield of prodigiosin in RelE20 group did not significantly differ from the ones in DAMP group (figure supplement 7A, B). Moreover, the load of S. marcescens associated with RelE20 mutant was comparable to that of S. marcescens associated with Delta AMP mutant (figure supplement 7C).”

      We now added these results in the text line 326-331.

    1. Author response:

      The following is the authors’ response to the original reviews

      List of major changes

      (1) We have emphasized the assumptions underlying our modeling approach in the third paragraph of the Introduction section.

      (2) We have included a new paragraph in the Discussion section to compare our model with a molecular mechanism-oriented model.

      (3) We have included a new paragraph at the end of the Introduction section to outline the main content of each subsection in Results and the logical connections between them. Correspondingly, the chapter hierarchy and section titles have been adjusted.

      (4) The Supplementary Material includes an additional table (Table S2) that provides detailed explanations of the symbols used in the model.

      (5) We have included a new paragraph in the Introduction section to explicitly emphasize the phenomenological nature of our model and its broad applicability.

      (6) In the Osmoregulation subsection, we have added a discussion on how our model can be directly generalized to scenarios involving the environmental uptake of osmolytes.

      (7) We have included a more detailed examination of the limitations inherent in our modeling approach in the second last paragraph of the Discussion section.

      (8) In the third last paragraph of the Discussion section, we have explicitly demonstrated that our model does not conflict with the observation that, in E. coli, cell wall synthesis is not directly regulated by the turgor pressure.

      Reviewer #1 (Public review):

      Summary:

      A theoretical model for microbial osmoresponse was proposed. The model assumes simple phenomenological rules: (i) the change of free water volume in the cell due to osmotic imbalance based on pressure balance, (ii) osmoregulation that assumes change of the proteome partitioning depending on the osmotic pressure that affects the osmolyte-producing protein production, (iii) the cell-wall synthesis regulation where the change of the turgor pressure to the cell-wall synthesis efficiency to go back to the target turgor pressure, (iv) effect of Intracellular crowding assuming that the biochemical reactions slow down for more crowding and stops when the protein density (protein mass divided by free water volume) reaches a critical value. The parameter values were found in the literature or obtained by fitting to the experimental data. The authors compare the model behavior with various microorganisms (E. coli, B. subtils, S. Cerevisiae, S. pombe), and successfully reproduced the overall trend (steady state behavior for many of them, dynamics for S. pombe). In addition, the model predicts non-trivial behavior such as the fast cell growth just after the hypoosmotic shock, which is consistent with experimental observation. The authors further make experimentally testable predictions regarding mutant behavior and transient dynamics.

      Strength:

      The theory assumes simple mechanistic dependence between core variables without going into specific molecular mechanisms of regulations. The simplicity allows the theory to apply to different organisms by adjusting the time scales with parameters, and the model successfully explains broad classes of observed behaviors. Mathematically, the model provides analytical expressions of the parameter dependences and an understanding of the dynamics through the phase space without being buried in the detail. This theory can serve as a base to discuss the universality and diversity of microbial osmoresponse.

      We would like to thank Reviewer 1 for thoroughly reading our work and appreciating our theoretical approach to investigating microbial osmotic response.

      Weakness:

      The core part of this model is that everything is coupled with growth physiology, and, as far as I understand, the assumption (iv) (Eq. 8) that imposes the global reaction rate dependence on crowding plays a crucial role. I would think this is a strong and interesting assumption. However, the abstract or discussion does not discuss the importance of this assumption. In addition, the paper does not discuss gene regulation explicitly, and some comparison with a molecular mechanismoriented model may be beneficial to highlight the pros and cons of the current approach

      We thank Reviewer 1 for their very helpful feedback. We have significantly revised the manuscript as suggested by Reviewer 1. See the detailed answers in the following.

      Reviewer #1 (Recommendations for the authors)

      (1) Explicitly stating the assumption (iv) in the abstract and discussing its role would help readers understand.

      In the revised manuscript, we have significantly rewritten the third paragraph of the Introduction section to emphasize our key assumptions as suggested by Reviewer 1, including the relationship between global reaction rate and crowding:

      “Our model assumes the following phenomenological rules: (1) the change in free water volume within the cell is driven by osmotic imbalance (Cadart et al., Nature Physics, 2019; Rollin et al., Elife, 2023), while the remaining volume changes in proportion to protein production; (2) osmoregulation influences the production of osmolyte-producing protein, governed by intracellular protein density (Scott et al., Science, 2010); (3) cell-wall synthesis is regulated through a feedback mechanism, wherein turgor pressure modulates the efficiency of cell-wall synthesis, enabling the cell to maintain a relatively stable turgor pressure; and (4) intracellular crowding slows down biochemical reactions as cytoplasmic density increases, with reactions ceasing entirely when protein density reaches a critical threshold.”

      We have also modified the abstract to mention the crowding effects explicitly. Additionally, we have added a few sentences in the first and second paragraphs of the Discussion section to emphasize the importance of crowding effects to our conclusions regarding the growth rate reduction in steady states and the non-monotonic dependence of the growth rate peak on the shock amplitude after a hyperosmotic shock.

      (2) I found [Shen W , Gao Z, Chen K, Zhao A, Ouyang Q, Luo C. The regulatory mechanism of the yeast osmoresponse under different glucose concentrations. Iscience. 2023 Jan 20;26(1)], which discusses the medium glucose concentration dependence of the response, focused on the gene regulatory circuit and the metabolic flux. As far as I understood, this paper considers the effect of the reallocation of resources but not the mechanical part of the osmoresponse such as pressure explicitly. It will be interesting to discuss the pros and cons in comparison with such a model. In principle, I will not be surprised if the current model does not differentiate the different glucose concentrations much since it is a more coarse-grained model, and I don't think it is a problem, but it will be good to have an explicit discussion.

      We appreciate Reviewer 1's insightful comment regarding the work by Shen et al. (iScience, 2023), which elucidates the two distinct osmoresponse strategies in yeast. By quantifying Hog1 nuclear translocation dynamics and downstream protein expression, the study reveals that in a rich medium, cells can leverage surplus glycolytic products as defensive reserves, reallocating metabolic flux to facilitate rapid adaptation to osmotic changes. Conversely, limited glycolytic intermediates in low-glucose environments necessitate increased enzyme synthesis for osmotic adaptation. 

      The paper highlighted by Reviewer 1 studies yeast's adaptive strategies under two stresses— nutrient limitation and osmotic pressure and provides an important complement to our study.

      In our simplified model, we did not include the interaction between cell growth and osmolyte production, assuming a constant fraction of ribosomes translating ribosomal proteins, supported by the experiments of E. coli (Dai et al., mBio, 2018). We remark that incorporating competitive dynamics for translational resources into our framework can be achieved by modifying the proportion of ribosomes translating themselves (X<sub>r</sub>), from a constant to a function related to the translation strategy of the osmolyte-producing enzyme ((X<sub>a</sub>).

      In the revised manuscript, we have included a new discussion in the third paragraph of the Discussion section to compare our approach with the molecular mechanism-oriented model:

      “We remark that our model is intrinsically a coarse-grained model with many molecular details regarding gene expression regulation neglected, which allows us to gain more analytical insights. In [Shen et al., iScience, 2023], the authors studied the responses to osmotic stress in glucose-limited environments and found that cells exhibited stronger osmotic gene expression response under glucose-limited conditions than under glucose-rich conditions. Using a computational model based on molecular mechanisms combined with experimental measurements, the authors demonstrated that in a glucose-limited environment, glycolysis intermediates were limited, which required cells to express more glycerol-production enzymes for stress adaptation. In the current version of our model, we do not account for the interaction between cell growth and osmolyte production; instead, we assume a constant fraction of ribosomes dedicated to translating ribosomal proteins. Our model can be further generalized to include the more complex interactions, including the coupling between biomass and osmolyte production, e.g., by allowing the fraction of ribosomes translating ((X<sub>r</sub>) to depend on the translation strategy of the osmolyte-producing enzyme ((X<sub>a</sub>).”

      (3) A minor comment: The authors call assumption (iii) (eq. 7) "positive feedback from turgor pressure to the cell-wall synthesis efficiency" (line 204). I have a hard time seeing this as positive feedback. It regulates the cell wall synthesis so that turgor pressure returns to the desired value; hence, isn't it negative feedback?

      We apologize for this confusion. We have removed the term "positive feedback" in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this study, Ye et al. have developed a theoretical model of osmotic pressure adaptation by osmolyte production and wall synthesis.

      Strengths:

      They validate their model predictions of a rapid increase in growth rate on osmotic shock experimentally using fission yeast. The study has several interesting insights which are of interest to the wider community of cell size and mechanics.

      Weaknesses:

      Multiple aspects of this manuscript require addressing, in terms of clarity and consistency with previous literature. The specifics are listed as major and minor comments.

      Major comments:

      (1) The motivation for the work is weak and needs more clarity.

      We thank Reviewer 2 for this very helpful comment, which we believe has significantly improved our manuscript. We would like to clarify the two major motivations of our study. 

      First, we aim to construct a systems-level and coarse-grained model capable of elucidating the complex processes underlying microbial osmoresponse. By leveraging the separation of timescales associated with mechanical equilibrium, cell-wall synthesis regulation, and osmoregulation, our model facilitates in-depth analytical and numerical analysis of how these various processes interact during cellular adaptation. In particular, we demonstrate the key physiological functions of osmoregulation and cell-wall synthesis regulation.

      Second, we seek to apply this model to interpret the phenomenon of supergrowth observed in fission yeast Schizosaccharomyces pombe (Knapp et al., Cell Systems, 2019). This application addresses an essential challenge in experimental studies: exclusive knockout experiments can be difficult, and mechanistic interpretations of experimental observations are often lacking. Our theoretical framework offers a valuable tool for understanding such phenomena, contributing to the fundamental knowledge of microbial physiology and developing predictive models for microbial behavior under osmotic stress.

      In the revised manuscript, we have included a new paragraph at the end of the Discussion section to emphasize our motivations better:

      “In this work, we construct a systems-level and coarse-grained model capable of elucidating the complex processes underlying microbial osmoresponse. By leveraging the separation of timescales associated with mechanical equilibrium, cell-wall synthesis regulation, and osmoregulation, our model facilitates in-depth analytical and numerical analysis of how these various processes interact during cellular adaptation. In particular, we demonstrate the key physiological functions of osmoregulation and cell-wall synthesis regulation. We then apply this model to interpret the unusual phenomenon of supergrowth observed in fission yeast. This application addresses an essential challenge in experimental studies: exclusive knockout experiments can be difficult, and mechanistic interpretations of experimental observations are often lacking. Our theoretical framework offers a valuable tool for understanding such phenomena, contributing to the fundamental knowledge of microbial physiology and developing predictive models for microbial behavior under osmotic stress.”

      (2) The link between sections is very frequently missing. The authors directly address the problem that they are trying to solve without any motivation in the results section.

      We are grateful to Reviewer 2 for their valuable feedback. In the revised manuscript, we have included a new paragraph at the end of the Introduction section to outline the main content of each subsection in Results and the logical connections between them:

      “In the following “Results” section, we begin by outlining the primary assumptions and equations of our model in the subsection "Model Description," which includes four parts, each addressing one of the four phenomenological rules. Additional details can be found in Methods. We then proceed to the subsection “Steady states in constant environments”, where we employ our theoretical framework to analyze steady-state growth and examine how the growth rate varies with external osmolarity. In the “Transient dynamics after a constant osmotic shock” subsection, we investigate the time-dependent osmoresponse after a constant hyperosmotic and hypoosmotic shock. Finally, in “Comparison with experiments: supergrowth phenomena after osmotic oscillation”, we address the supergrowth phenomena observed in S. pombe, utilizing our model to elucidate these experimental observations.”

      (3) The parameters used in the models (symbols) need to be explained better to make the paper more readable.

      We apologize for this confusion. In the revised Supplementary Material, we have included an additional table (Table S2) to explain the meanings of the symbols employed in the model to help the reader better understand.

      (4) Throughout the paper, the authors keep switching between organisms that they are modelling. There needs to be some consistency in this aspect where they mention what organism they are trying to model, since some assumptions that they make may not be valid for both yeast as well as bacteria.

      We thank Reviewer 2 for this very helpful comment. We would like to clarify that our model is coarse-grained without including detailed molecular mechanisms; therefore, it presumably applies to various species of microorganisms. Indeed, the predicted steady-state growth curves derived from our model and the experimental data obtained from various organisms agree reasonably well (Figure 2A of the main text). 

      In the revised manuscript, we have explicitly emphasized the nature of our phenomenological model and its broad applicability in the fourth paragraph of the Introduction section:

      “We remark that our model is coarse-grained, without including detailed molecular mechanisms, and is therefore applicable across diverse microbial species. Notably, the predicted steady-state growth rate as a function of internal osmotic pressure from our model aligns well with experimental data from diverse organisms. This alignment allows us to quantify the sensitivities of translation speed and regulation of osmolyte-producing protein in response to intracellular density. Additionally, we demonstrate that osmoregulation and cellwall synthesis regulation enable cells to adapt to a wide range of external osmolarities and prevent plasmolysis. Our model also predicts a non-monotonic time dependence of growth rate and protein density as they approach steady-state values following a constant osmotic shock, in concert with experimental observations (Rojas et al., PNAS, 2014; Rojas et al., Cell systems, 2017). Moreover, we show that a supergrowth phase can arise following a sudden decrease in external osmolarity, driven by cell-wall synthesis regulation, either through the direct application of a hypoosmotic shock or the withdrawal of an oscillatory stimulus. Remarkably, the predicted amplitudes of supergrowth (i.e., growth rate peaks) quantitatively agree with multiple independent experimental measurements.”

      Furthermore, we have also included a comparison with a detailed molecular mechanism model in the third paragraph of the Discussion section:

      “We remark that our model is intrinsically a coarse-grained model with many molecular details regarding gene expression regulation neglected, which allows us to gain more analytical insights. In [Shen et al., iScience, 2023], the authors studied the responses to osmotic stress in glucose-limited environments and found that cells exhibited stronger osmotic gene expression response under glucose-limited conditions than under glucose-rich conditions. Using a computational model based on molecular mechanisms combined with experimental measurements, the authors demonstrated that in a glucose-limited environment, glycolysis intermediates were limited, which required cells to express more glycerol-production enzymes for stress adaptation. In the current version of our model, we do not account for the interaction between cell growth and osmolyte production; instead, we assume a constant fraction of ribosomes dedicated to translating ribosomal proteins. Our model can be further generalized to include the more complex interactions, including the coupling between biomass and osmolyte production, e.g., by allowing the fraction of ribosomes translating ((X<sub>r</supb) to depend on the translation strategy of the osmolyte-producing enzyme ((X<sub>a</sub>).”

      (5) The extent of universality of osmoregulation i.e the limitations are not very well highlighted.

      The osmoregulation mechanism described in our model primarily addresses changes in cytoplasmic osmolarity through the de-novo synthesis of compatible solutes, widely observed across bacteria, archaea, and eukaryotic microorganisms. This review article (GundeCimerman et al., FEMS microbiology reviews, 2018) provides an extensive summary and exploration of the primary compatible solutes utilized by organisms from all three domains of life, underscoring the prevalence of this osmoregulatory strategy. Furthermore, our model can be directly generalized to scenarios involving the direct uptake of osmolytes from the environment. One only needs to change the interpretation of the parameter, 𝑘<sub>𝑎</sub> in the production of osmolyte molecule, , from the synthesis rate to the uptake rate, and all the results are equally applicable. In the revised manuscript, we have briefly discussed this point in the subsection “Osmoregulation.”

      We agree with Reviewer 2 that our model's coarse-grained nature makes it broadly applicable to diverse microbial taxa; however, more specialized adaptations are beyond our model. In the revised manuscript, we have included a more detailed examination of the limitations inherent in our modeling approach in the second last paragraph of the Discussion section:

      “We remark several limitations of our current coarse-grained model. First, the high membrane tension that inhibits transmembrane flux of peptidoglycan precursors, leading to a growth inhibition before the supergrowth peak (Rojas et al., Cell systems 2017) is beyond our model. Second, in our current framework, the osmoregulation and cell-wall synthesis regulation rely on the instantaneous cellular states. However, microorganisms can exhibit memory effects to external stimuli by adapting to their temporal order of appearance (Mitchell et al., Nature 2009). Notably, in the osmoregulation of yeast, a short-term memory, facilitated by post-translational regulation of the trehalose metabolism pathway, and a long-term memory, orchestrated by transcription factors and mRNP granules, have been identified (Jiang et al., Science signaling 2020). Besides, our model does not account for the role of osmolyte export in osmoregulation (Tamas et al., Molecular microbiology, 1999) and the interaction between biomass and osmolyte production (Shen et al., Iscience 2023). Extending our model to include more realistic biological processes will be interesting.”

      (6) Line 198-200: It is not clear in the text what organisms the authors are writing about here. "Experiments suggested that the turgor pressure induce cell-wall synthesis, e.g., through mechanosensors on cell membrane [45, 46], by increasing the pore size of the peptidoglycan network [5], and by accelerating the moving velocity of the cell-wall synthesis machinery [31]". This however is untrue for bacteria as shown by the study (reference 22 is this paper: E. Rojas, J. A. Theriot, and K. C. Huang, Response of escherichia coli growth rate to osmotic shock, Proceedings of the National Academy of Sciences 111, 7807 (2014).

      We thank Reviewer 2 for pointing out this very important issue and apologize for the confusion. References 45 and 46 (Dupres et al., Nature Chemical Biology 2009; Neeli-Venkata et al., Developmental Cell 2021) discuss how Wsc1 acts as a mechanosensor in S. pombe, detecting turgor pressure and activating pathways that reinforce the cell wall. Reference 5 (Typas et al., Cell 2010) explains the role of LpoA and LpoB, the two outer membrane lipoprotein regulators in E. coli, which modulate peptidoglycan synthesis in an extracellular manner. Reference 31 (Amir and Nelson, PNAS 2012) is a theoretical paper showing that turgor pressure may accelerate the moving velocity of the cell wall synthesis machinery in E. coli. In the revised manuscript, we have been more explicit about the organisms we refer to in the subsection “Cell-wall synthesis regulation.”

      Meanwhile, we agree with Reviewer 2 that cell wall synthesis may not be directly regulated by turgor pressure in E. coli (Rojas et al., PNAS 2014). We would like to clarify that this scenario is also included in our model corresponding to H<sub>cw</sub> = 0 (Eq. (7) in the main text): the turgor pressure does not affect the cell-wall synthesis. Therefore, the supergrowth phenomenon observed in S. pombe does not manifest under hypotonic stimulation in E. coli.

      In the revised manuscript, we have emphasized this point more explicitly in the third last paragraph of the Discussion section:

      “Reference 22 (Rojas et al., PNAS, 2014) showed that the expansion of E. coli cell wall is not directly regulated by turgor pressure, and this scenario is also included in our model as the case of H<sub>cw</sub> \= 0. According to our model, the supergrowth phase is absent if H<sub>cw</sub> = 0 (Figure S8), consistent with the absence of a growth rate peak after a hypoosmotic shock in the experiments of E. coli (Rojas et al., PNAS, 2014). Meanwhile, our predictions are consistent with the growth rate peak after a hypoosmotic shock observed for B. subtilis (Rojas et al., Cell systems, 2017).”

      (7) The time scale of reactions to hyperosmotic shocks does not agree with previous literature (reference 22). Therefore defining which organism you are looking at is important. Hence the statement " Because the timescale of the osmoresponse process, which is around hours (Figure 3B), is much longer than the timescale of the supergrowth phase, which is about 20 minutes, the turgor pressure at the growth rate peak can be well approximated by its immediate value after the shock." from line 447 does not seem to make sense. The authors need to address this.

      We apologize for this confusion. In the revised manuscript, we have clarified that the cited time scales are for the fission yeast S. pombe after Eq. (13) in the main text.

      Reviewer #2 (Recommendations for the authors):

      (1) Inconsistency in nomenclature: On line 117, the equation reads V<sub>b</sub> = αm<sub>p where V<sub>b</sub> is the bound volume. Whereas bound volume has been referred to as V<sub>bd</sub> previously and in Figure 1.

      Answer: We apologize for this confusion. In our model, the total bound volumeV<sub>b</sub> comprises the volume of dry mass and bound water, V<sub>b</sub> \= V<sub>bd</sub> + V<sub>bw</sub>, where V<sub>bd</sub> is the volume occupied by dry mass and V<sub>bw</sub> is the volume of bound water. In the revised manuscript, we have added a brief discussion of this point in the caption of Figure 1.

      (2) Line 180: Please define 𝜌𝜌 for equation 4.

      We apologize for this confusion. In the text, the symbol 𝜌<sub>p</sub> denotes the mass of a given substance per unit volume of free water, and its unit is g/ml. The specific substance in consideration is indicated by a subscript. For example, 𝜌<sub>p</sub> in Eq. (4) represents the protein density, and 𝜌<sub>c</sub> stands for the critical protein density, above which intracellular chemical reactions cease according to Eq. (8) of the main text. In the revised manuscript, we have clarified the meaning of 𝜌<sub>c</sub> after Eq. (4).

      (3) Line 187: Equation 5 also needs to be explained better. Hence there is a need to be more specific while stating the assumptions.

      The elastic modulus 𝐺 defined in Eq. (5) of the main text is a measure of the cell wall's resistance to volume expansion. We assume a constant 𝐺 for simplicity, which is reasonable when the cell wall deformation is mild. In the revised manuscript, we have been more explicit about our assumptions regarding the turgor pressure in the subsection “Cell-wall synthesis regulation.”

      (4) Line 225: For a biological audience some elaboration on "glass transition" may be required- either as a reference to a review or to a 1 sentence statement of relevance.

      We appreciate Reviewer 2’s helpful comment. In the revised manuscript, we have added a brief introduction to the glass transition and a citation to a review paper (Hunter and Weeks, Rep. Prog. Phys. 2012) at the beginning of the subsection “Intracellular crowding.”

      (5) Line 247: "All growth rates in steady states of cell growth are the same: 𝜇<sub>𝑓</sub> \= 𝜇<sub>r</sub> \= 𝜇<sub>cw</sub>". The authors need to explain in a line or two why this is true. Since the processes are independent, it is safe to assume that all 𝜇's are constant, but it is not obvious why they should all be equal.

      We apologize for the lack of a clear explanation regarding the equality of steady-state growth rates in our previous manuscript. In the revised manuscript, we have added a brief explanation of the equality of the three growth rates at the beginning of the subsection “Steady states in constant environments”:

      “When cell growth reaches a steady state, the proportions of all components, including free water volume, cell mass, and cell wall volume, must be constant relative to the total cell volume to ensure homeostasis. Therefore, all growth rates in steady states of cell growth must be the same: 𝜇<sub>𝑓</sub> \= 𝜇<sub>r</sub> \= 𝜇<sub>cw</sub>.”

      (6) Line 264: "Because the typical doubling times of microorganisms are around hours, we can estimate 𝜇<sub>𝑓</sub>/k<sub>w</sub> ∼ 10 Pa [51, 52] ..." since the authors are generalizing for yeast and bacteria, specifically E. coli, this is not a valid assumption to make. There is also a need to explain the basis of "𝜇<sub>𝑓</sub>/k<sub>w</sub> ∼ 10 Pa".  

      We appreciate the need for clarity in the estimation and its implications. The rough estimation of 𝜇<sub>𝑓</sub>/k<sub>w</sub> ~ 10 Pa in the main text is given by:

      Here, the typical value of 𝜇<sub>𝑓</sub> (which equals to 𝜇<sub>r</sub> in steady state) is approximated by the inverse of the cell cycle, which is around hours. The estimation above is employed to justify the assumption that 𝜇<sub>𝑓</sub>/k<sub>w</sub> is much smaller than the cytoplasmic osmotic and turgor pressures, which can be several atmospheric pressures.

      For the case of E. coli, based on the experimental results from Boer et al. (Boer et al., Biochemistry 2011), an 800mM hypoosmotic shock leads to a rapid expansion of cell volume accomplished within a time scale of 0.1s, from which we obtain:

      .

      Therefore, our assumption that 𝜇<sub>𝑓</sub>/k<sub>w</sub> is much smaller than the cytoplasmic osmotic and turgor pressures is still valid. 

      In the revised manuscript, we have increased the estimation ranges to include the case of E. coli in the first paragraph of the subsection “Steady states in constant environments.”

      (7) Lines 279-283 need to be explained better.  

      We apologize for the confusion. In the revised manuscript, we have explained more explicitly the meaning of the growth curve in the second paragraph of the subsection “Steady states in constant environments”:

      “Intriguingly, the relationship between the normalized growth rate () and the normalized cytoplasmic osmotic pressure (), which we refer to as the growth curve in the following, has only one parameter 𝐻<sub>r</sub>/(𝐻<sub>𝑎</sub>) . Therefore, the growth curves of different organisms can be unified by a single formula, Eq. (10b), and different organisms may have different values of 𝐻<sub>r</sub>/(𝐻<sub>𝑎</sub> + 1).”

      (8) In Figure 3, an arrow representing the onset of osmotic shock would make the figure more intuitive to understand.

      We appreciate Reviewer 2 for this helpful suggestion. We have modified Figure 3 as suggested.

      (9) It is unclear to me if the growth rate 𝜇𝜇𝑟𝑟 is representative of the growth of total protein. This can be motivated better.

      We would like to clarify that the growth rate 𝜇𝜇𝑟𝑟 is defined as the changing rate of total protein mass divided by the total protein mass:

      Here, 𝑚<sub>𝑝,𝑟</sub> is the total mass of ribosomal proteins and 𝑘𝑘𝑟𝑟 is a constant proportional to the elongation speed of ribosome. The expression of 𝜇<sub>𝑟</sub> is a direct consequence of ribosomes being responsible for producing all proteins. In the revised manuscript, we have added more details in the introduction of the variable 𝜇<sub>𝑟</sub> in the last paragraph of the subsection “Cell growth”:

      “In this work, we assume that the dry-mass growth rate is proportional to the fraction of ribosomal proteins within the total proteome for simplicity, 𝜇<sub>𝑟</sub> \= 𝑘<sub>r</sub>𝑚<sub>𝑝,𝑟</sub>/𝑚<sub>𝑝</sub> \= 𝑘<sub>r</sub>𝜙<sub>𝑟</sub>. This assumption leverages the fact that ribosomes are responsible for producing all proteins. The proportionality coefficient 𝑘<sub>𝑟</sub> encapsulates the efficiency of ribosomal activity, being proportional to the elongation speed of the ribosome. We remark that 𝑘𝑘𝑟𝑟 is influenced by the crowding effect, which we address later.”

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #2:

      Line 295 – was the time post-infection, which varies considerably between groups and across samples, taken into consideration when comparison of response was between ChatCre mice (4-9 weeks post-infection) and WT mice (four to five weeks post-infection)?

      Thank you for your comment. We did not originally assess the effects of time post-injection on DREADD response. Generally, AAV transgene expression has been demonstrated to be long-term and stable in the CNS of mice.[1] However, there is some variation in the reporting time of peak transgene expression[2], and this may potentially impact our results.

      In investigating this issue further, we discovered an error in our reporting as we did have n = 1 wild-type mouse that underwent EMG recordings 62 days (~9 weeks) post-AAV injection. This has been corrected in the manuscript (lines 87-88).

      Addressing this question is challenging due to the uneven distribution of time points within the 4–9-week windows for each group. Essentially, there were two groups per cohort, one studied at 4-5 weeks and one at 8-9 weeks. More specifically:

      - Wild-type cohort: n = 10 animals were studied 28–33 days post-injection, and n = 1 at 62 days.

      - ChAT-Cre cohort: n = 4 animals were studied 28–30 days post-injection, and n = 5 at 56–59 days.

      We performed Pearson correlation analyses between time post-injection and diaphragm EMG response to DREADD activation (peak amplitude and area under the curve, AUC) for both cohorts (Author response image 1):

      - ChAT-Cre: No significant correlations were found (peak amplitude: r<sup>2</sup> = -0.117, r = -0.1492, p = 0.702, Figure 1a-b; AUC:r<sup>2</sup> = -0.0883, r = 0.2184, p = 0.572, Figure 1c-d).

      - Wild type: Initial analysis of all data showed significant correlations (peak amplitude:r<sup>2</sup> = 0.362, r = 0.6523, p = 0.0296, Figure 1a; AUC: r<sup>2</sup> = 0.347, r = 0.6424, p = 0.033, Figure 1c), suggesting a moderate positive correlation between time post-injection and EMG response. However, when the single 8–9-week wild-type mouse was excluded, these correlations were no longer significant (peak amplitude: r<sup>2</sup> = 0.172, r = 0.5142, p = 0.128, Figure 1b; AUC: r<sup>2</sup> = 0.23, r = 0.5614, p = 0.0913, Figure1d).

      Comparing wild-type and ChAT-Cre groups directly was unreliable due to the single wild-type mouse studied at the later time point. We attempted to model time post-injection as a continuous variable (i.e., exact days post-injection) using a restricted maximum likelihood mixed linear model in JMP; however, the analysis could not be performed because there were not sufficient overlapping time points between the two cohorts (i.e., not all days post-injection were represented in both groups). To mitigate this, we binned animals into two groups: 4–5 weeks and 8–9 weeks post-injection. This analysis returned a significant interaction between cohort and time post-injection (p = 0.0391), however there were no significant multiple comparisons upon Tukey post hoc test (i.e., p > 0.05).

      Based on these findings, we feel confident that time post-injection is unlikely to have a significant impact on diaphragm EMG response to DREADD activation in the ChAT-Cre cohort. However, in the wild-type cohort, it is difficult to draw definitive conclusions, as only one animal was studied at the 8–9-week time point. For similar reasons, it remains unclear whether the relationship between time post-AAV transduction and DREADD response differs between cohorts. Given the inconclusive nature of these results, we have elected not to include this analysis in the manuscript. Nevertheless, to ensure transparency, we have provided Author response image 1 below of peak amplitude and AUC plotted against time, allowing readers to evaluate the data independently.

      Author response image 1.

      Plots of diaphragm EMG peak amplitude (a-b) and area under the curve (c-d) vs. days post-AAV injection for wild-type (blue) and ChAT-Cre (orange) mice. Pearson correlation analyses were performed to assess the relationship between time post-AAV injection and diaphragm EMG DREADD response in wild-type and ChAT-Cre mouse cohorts. r<sup>2</sup>, r, and p-values are shown in each panel for both cohorts. Panels a and c display peak amplitude and AUC, respectively, including all animals. Panels b and d present the same variables with the n = 1 wild-type mouse at the 9-week time point excluded; ChAT-Cre data is unchanged between corresponding panels. Scatter points represent data from individual animals. Polynomial trendlines are displayed for each cohort with wild-type in blue and ChAT-Cre in orange.

      REFERENCES

      (1) Kim, J. Y., Grunke, S. D., Levites, Y., Golde, T. E. & Jankowsky, J. L. Intracerebroventricular viral injection of the neonatal mouse brain for persistent and widespread neuronal transduction. J Vis Exp, 51863 (2014). https://doi.org/10.3791/51863

      (2) Hollidge, B. S. et al. Kinetics and durability of transgene expression after intrastriatal injection of AAV9 vectors. Front Neurol 13, 1051559 (2022). https://doi.org/10.3389/fneur.2022.1051559


      The following is the authors’ response to the original reviews.

      Response to reviewer’s public reviews:

      We chose the dose of J60 based on a prior publication that established that off-target effects were possible at relatively high doses[1]. The dose that we used (0.1 mg/kg) was 30-fold less than the dose that was reported in that paper to potentially have off-target responses (3 mg/kg). Further, Author response image 1 shows the results of experiments in which J60 was given to animals that did not have the excitatory DREADD expressed in the spinal cord. This includes a sample of mice (n = 2) and rats (n = 3), recorded from using the same diaphragm EMG procedure described in the manuscript. The figure shows that there was no consistent response to the J60 at 0.1 mg/kg in the “control experiment” in which the DREADD was not expressed in the spinal cord.

      Author response image 1.

      Diaphragm EMG response to J60 administrated to naïve rats and mice. Panel a-b show raw EMG values at baseline, following vehicle (saline) and J60 administration for the left and right hemidiaphragm. Panel c-d shows EMG values normalized to baseline. Neither One-way RM ANOVA (panel a-b) nor paired t-test (panel c-d) returned significant p values (p < 0.05).

      Response to specific reviewer comments:

      Reviewer #1:

      How old were the animals at the time of AAV injection, and in subsequent experiments?

      The wildtype cohort of mice were 7-9 weeks old at time of AAV injection and DREADD experiments took place 4-5 weeks after AAV injection. ChAT-Cre mice were 6-10 weeks old at time of AAV injection and DREADD experiments took place 4-9 weeks after AAV injection. ChAT-Cre rats were 2-5 months old at time of AAV spinal injection. These animals underwent plethysmography recordings 3-4 months post-AAV injection and subsequently phrenic nerve recording 3-8 weeks later. These details have been added to the Method section.

      How many mice were excluded from electrophysiology experiments due to deteriorating electrode contact?

      No mice were excluded from electrophysiology experiments due to deteriorating electrode contact. If you are referring to the n = 1 excluded ChAT-Cre mouse (line 368) this animal was excluded because it showed no histological evidence of DREADD expression (lines 200-206).

      What was the urethane dose?

      The urethane dose for phrenic nerve recordings was 2.1 g/kg. See methods section line 395.

      A graphical timeline of the experimental progression for plethysmography and electrophysiology studies would enhance clarity.

      A graphical timeline has been added. See Figure S6.

      Significance indicators in the figures would greatly enhance clarity. It is a little awkward to have to refer to supplemental tables to figure out statistical differences.

      Significance indicators have been added. See Figures 1, 2, 4, and 5

      In Figures 1, 2, and 5, individual data points should be shown, as in Fig 4.

      Thank you for this suggestion. We agree that, in general, it is best practice to scatter individual data points. However, when we drafted the new figures, it was apparent that including individual scatter points, in this case, created very “cluttered” figures that were very difficult to interpret.

      More detail regarding the plethysmography studies is needed. Was saline/J60 infused via a tail vein catheter? Were animals handled during the infusion? How long is the "IV" period? What volume of fluid was delivered?

      All IV infusions were delivered via a tail vein catheter. Animals were not handled during infusion nor at any point during the recording. An IV catheter was externalized via a port in the plethysmograph allowing for IV infusion without handling of the animal or opening the plethysmograph. The infusion period for both saline and J60 was standardized to 2 minutes. The volume of fluid of both saline and J60 was standardized to 0.6 mL. This information has been added to the methods section (lines 408-410, 415-16, 419-420).

      Reviewer #2:

      The abstract could be improved by briefly highlighting the rationale, scope, and novelty of the study - the intro does a great job of highlighting the scope of the study and the research questions.

      A brief explanation of the rationale, scope, and novelty of the study has been added to the abstract. See lines 2-8.

      Line 18, specifies that this was done under urethane anesthesia.

      This detail has been added to the abstract (line 20).

      The methods section should be moved to the end of the manuscript according to Journal policy.

      The methods section has been moved to the end of the manuscript.

      The authors mention the use of both female and male rats but it is not indicated if they tested for and observed any differences between sexes across experiments.

      We included the use of both male and female animals in this study to improve the generalizability of the results. However, we were not adequately powered for sex comparisons and therefore did not perform any statistical analysis to assess differences between sexes across experiments. Text has been added to the methods section (lines 534-537) to clarify.

      Line 40, since delivery of J60 was performed in both IV and IP, this general statement should be updated.

      This detail has been revised to include both IV and IP. See line 43.

      Line 42. "First, we determined if effective diaphragm activation requires focal DREADD expression targeting phrenic motor neurons, or if non-specific expression in the immediate vicinity of the phrenic motor nucleus would be sufficient...." I don't think that in the experiments with wild-type mice the authors can claim that they selectively targeted the cervical propriospinal network (in isolation from the motoneurons). Given the fact that the histological analysis did not quantify interneurons or motoneurons in the spinal cord, authors should be cautious in proposing which neuronal population is activated in the non-specific approach.

      We agree, and this was a poorly worded statement in our original text. We agree that wild-type DREADD expression was not limited to the cervical propriospinal networks but likely a mix of interneurons and motoneurons. The text has been edited to reflect that (see lines 56-60).

      AAV virus source is not described.

      All AAVs were obtained from the UF Powell Gene Therapy Center. Details of virus source and production have been added to the methods section. See lines 336-347.

      Line 108-125. Because the diaphragm EMG recordings are only described for mice here, I would suggest editing this methods section to clearly state mice instead of vaguely describing "animals" in the procedure.

      “Animals” has been changed to “mice” to avoid ambiguity.

      Line 120, add parenthesis.

      Parenthesis has been added.

      Line 126. Whole body plethysmography protocol. Three hypercapnic hypoxic challenges are a lot for a rat within a 3-hour recording session in freely behaving rats. Did the authors verify with control/ vehicle experiments that repeated challenges in the absence of J60 do not cause potentiation of the response? I understand that it is not possible to invert the order of the injections (due to likely long-term effects of J60) or it is too late to perform vehicle and J60 injections on different days, but controls for repeated challenges should be performed in this type of experiment, especially considering the great variability in the response observed in Figure 4 (in normoxic conditions).

      We did not conduct control experiments to assess the impact of repeated hypercapnic hypoxic challenges on the naïve response (i.e., in the absence of J60). However, our experimental protocol was designed such that each experimental period (i.e., post-vehicle or post-J60 infusion) was normalized to baseline recordings taken immediately prior to the vehicle or J60 infusion. While repeated exposure to hypercapnic hypoxic challenges may have altered respiratory output, we are confident that normalizing each experimental period to its respective baseline effectively captures the impact of DREADD activation on ventilation, independent of any potential potentiation that may have occurred due to gas challenge exposure. We have included raw values for all plethysmography outcomes (see Figure 4, panels a-c) to ensure full data transparency. Still, we believe that the baseline-normalized values more accurately reflect the impact of DREADD activation on the components of ventilation.

      Furthermore, why the response to the hypercapnic hypoxic challenges are not reported? These could be very interesting to determine the effects of DREADD stimulation on chemosensory responses and enhance the significance of the study.

      Response to the hypercapnic hypoxic challenges has been added to the manuscript. See Figure S3 and results section lines 162-167. Briefly, there were no statistically significant (p < 0.05) differences in tidal volume, respiratory rate, or minute ventilation between J60 vs sham condition during hypercapnic-hypoxic ventilatory challenges.

      Line 200 - what is the reason behind performing a qualitative analysis of mCherry in various quadrants? This limits the interpretation of the results. If the authors used Chat-cre rats, the virus should only be in Chat+ MN. Knowing how selective the virus is, and whether its expression was selective for Phrenic MN versus other MN pools, could address several technical questions.

      We agree that detailed quantification of expression by motoneuron pool would be of value in future work.  However, for these initial proof-of-concept experiments, we performed the quadrant-based qualitative analysis of mCherry expression to provide a simple comparison of mCherry expression between groups (i.e., ChAT-Cre vs. wildtype mice). This analysis allowed us to: 1) show the reader that each animal included in the study showed evidence of mCherry expression and 2) give the reader an idea of patterns of mCherry expression throughout the mid-cervical spinal cord. Additionally, it is important to note that while ChAT is a marker of motoneurons some populations of interneurons also express ChAT(2-4).

      Given the increased values of Dia EMG AUC and no changes in respiratory rate, did the authors determine if there was a change in the inspiratory time with J60 administration?

      We did not assess inspiratory time.

      High death rate in DREADD WT mice - was histological analysis performed on these mice? Could it be due to the large volume injected into the spinal cord that affects not only descending pathways but also ascending ones? Or caused by neuronal death due to the large volume of viral solution in injected in mice.

      Histological analysis was performed on these animals to assess mCherry expression only (i.e., no staining for NeuN or other markers was performed). While the reviewer's speculations are reasonable, we feel these reasons are unlikely to explain the death rate in DREADD WT mice as ChAT-Cre mice received the same volume injected into their spine and lived up until and during diaphragm EMG recordings. Additionally, WT mice lived for 4-5 weeks post-injection which would be past the acute phase that a large immune response to the viral dose would have occurred.

      Line 299-304. Can you please clarify whether these rats were tested under anesthesia?

      These rats were assessed under anesthesia. This detail has been added (line 146).

      Given some of the unexpected results on cardiovascular parameters in urethane anesthetized rats, did the authors test the effects of J60 in the absence of AAV construct infection?

      A small cohort (n = 2) of urethane anesthetized naïve wildtype rats were given the J60 ligand (IV, 0.1 mg/kg dose). We did observe a sudden drop in blood pressure after J60 administration that was sustained for the duration of the recording. One animal showed a 12% decrease in mean arterial blood pressure following J60 administration while the other showed a 35% decrease. Thus, it does appear that in this preparation the J60 ligand is producing a drop in arterial blood pressure.

      Line 393. I believe this comment is referred to the intrapleural and diaphragmatic injection. Maybe this should clarified in the sentence.

      This sentence has been revised for clarity (see lines 248-250).

      Figures 1 and 2. It would be informative to show raw traces of the Diaphragm EMG to demonstrate the increase in tonic EMG. It is not possible to determine that from the integrated traces in Figures 1A and B.

      Thank you for bringing up this concern. While the mean data in Figures 1F and 2F do indicate that, on average, animals had tonic diaphragm EMG responses to DREADD activation, the examples given in Figures 1A and 2A show minimal responses. This makes it difficult to fully appreciate the tonic response from those particular traces. However, clear tonic activity can be appreciated from Figures 5A and S2. In these figures, tonic activity is evident from the integrated EMG signals, presenting as a sustained increase in baseline activity between bursts—essentially an upward shift from the zero point.

      References

      (1) Van Savage, J. & Avegno, E. M. High dose administration of DREADD agonist JHU37160 produces increases in anxiety-like behavior in male rats. Behav Brain Res 452, 114553 (2023). https://doi.org/10.1016/j.bbr.2023.114553

      (2) Mesnage, B. et al. Morphological and functional characterization of cholinergic interneurons in the dorsal horn of the mouse spinal cord. J Comp Neurol 519, 3139-3158 (2011). https://doi.org/10.1002/cne.22668

      (3) Gotts, J., Atkinson, L., Yanagawa, Y., Deuchars, J. & Deuchars, S. A. Co-expression of GAD67 and choline acetyltransferase in neurons in the mouse spinal cord: A focus on lamina X. Brain Res 1646, 570-579 (2016). https://doi.org/10.1016/j.brainres.2016.07.001

      (4) Alkaslasi, M. R. et al. Single nucleus RNA-sequencing defines unexpected diversity of cholinergic neuron types in the adult mouse spinal cord. Nat Commun 12, 2471 (2021). https://doi.org/10.1038/s41467-021-22691-2

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Eaton et al. examine the regulation of transcription directionality using a powerful genomic approach (more about the methodology below). Their data challenge the notion that the polyadenylation signal-reading Cleavage and Polyadenylation (CPA) complex is responsible for controlling promoter directionality by terminating antisense transcription. Namely, depletion of the required CPA factor RBBP6 has little effect on antisense transcription measured by POINT. They find instead that initiation is intrinsically preferential in the sense direction and additionally maintained by the activities of an alternative processing complex called Integrator, together with the kinase CDK9. In the presence of CDK9 activity, depletion of Integrator endoribonuclease INTS11 leads to globally increased transcription in the antisense direction, and minor effects in the sense direction. However, CDK9 inhibition reveals that sense transcription is also sensitive to INS11 depletion. The authors suggest that CDK9 activity is stronger in the sense direction, preventing INTS11-mediated premature termination of sense transcrpts.

      Strengths:

      The combination of acute depletion of the studied factors using degron approaches (important to limit possible secondary effects), together with novel and very sensitive nascent transcriptomics methods POINT and sPOINT is very powerful. The applied spike-in normalization means the analysis is more rigorous than most. Using this methodology allowed the authors to revisit the interesting question of how promoter/transcription directionality is determined.

      The data quality appears very good and the fact that both global analysis as well as numerous gene-specific examples are shown makes it convincing.

      The manuscript is well written and hence a pleasure to read.

      We appreciate this positive assessment.

      Weaknesses:

      I am slightly worried about the reproducibility of the data - it is unclear to me from the manuscript if and which experiments were performed in replicate (lack of table with genomic experiments and GEO access, mentioned in more detail in below recommendations to authors), and the methods could be more detailed.

      All sequencing data was deposited with GEO. Multiple biological replicates were performed for each sequencing experiment.  Bigwig files are presented as a table in the GEO submissions. This data has now been made public.

      A separate discussion section would be useful, particularly since the data provided challenge some concepts in the field. How do the authors interpret U1 data from the Dreyfuss lab in light of their results? How about the known PAS-density directionality bias (more PAS present in antisense direction than in sense) - could the differential PAS density be still relevant to transcription directionality?

      As suggested, we have expanded our discussion to relate our findings to existing data. We think the results from the Dreyfuss lab are very important and highlight the role of U1 snRNA in enforcing transcriptional elongation.  It does this in part by shielding PAS sequences.  Recent work from our lab also shows that U1 snRNA opposes the Restrictor complex and PNUTS, which otherwise suppress transcription (Estell et al., Mol Cell 2023).  Most recently, the Adelman lab has demonstrated that U1 snRNA generally enhances transcription elongation (Mimoso and Adelman., Mol Cell 2023).  Our work does not challenge and is not inconsistent with these studies.

      The role of U1 in opposing PAS-dependent termination inspired the idea that antisense transcriptional termination may utilise PASs.  This was because such regions are rich in AAUAAA and comparatively poor in U1 binding sites. However, our RBBP6 depletion and POINT-seq data suggest that PAS-dependent termination is uncommon in the antisense direction. As such, other mechanisms suppress antisense transcription and influence promoter directionality. In our paper, we propose a major role for the Integrator complex.

      We do not completely rule out antisense PAS activity and discuss the prior work that identified polyadenylated antisense transcripts. Nevertheless, this was detected by oligo-dT primed RT-PCR/Northern blotting, which cannot determine the fraction of non-polyadenylated RNA that could result from PAS-independent termination (e.g. by Integrator).  To do that requires an analysis of total nascent transcription as achieved by our POINT-seq.  Based on these experiments, Integrator depletion has a greater impact on antisense transcription than RBBP6 depletion. 

      I find that the provided evidence for promoter directionality to be for the most part due to preferential initiation in the sense direction should be stressed more. This is in my eyes the strongest effect and is somehow brushed under the rug.

      We agree that this is an important finding and incorporated it into the title and abstract.  As the reviewer recommends, we now highlight it further in the new discussion.

      References 12-17 report an effect of Integrator on 5' of protein-coding genes, while data in Figure 2 appears contradictory. Then, experiments in Figure 4 show a global effect of INST11 depletion on promoter-proximal sense transcription. In my opinion, data from the 2.5h time-point of depletion should be shown alongside 1.5h in Figure 2 so that it is clear that the authors found an effect similar to the above references. I find the current presentation somehow misleading.

      We are grateful for this suggestion and present new analyses demonstrating that our experiment in Figure 2 concurs with previous findings (Supplemental Figures 2A and B). Our original heatmap (Figure 2E) shows a very strong and general antisense effect of INTS11 loss. On the same scale, the effects in the sense direction are not as apparent, which is also the case using metaplots.  New supplemental figure 2A now shows sense transcription from this experiment in isolation and on a lower scale, demonstrating that a subset of genes shows promoter-proximal increases in transcription following INTS11 depletion.  This is smaller and less general than the antisense effect but consistent with previous findings.  Indeed, our new analysis in supplemental figure 2B shows that affected protein-coding genes are lowly expressed, in line with Hu et al., Mol Cell 2023. This explains why a sense effect is not as apparent by metaplot, for which highly expressed genes contribute the most signal.

      As a result of our analyses, we are confident that the apparently larger effect at the 2.5hr timepoint (Figure 4) that we initially reported is due to experimental variability and not greater effects of extended INTS11 depletion. Overlaying the 1.5h and 2.5h datasets (Supplemental Figure 4B) revealed a similar number of affected protein-coding genes with a strong (83%) overlap between the affected genes.  To support this, we performed qPCR on four affected protein-coding transcripts which revealed no significant difference in the level of INTS11 effect after 2.5h vs 1.5h (Supplemental Figure 4C).

      We now present data for merged replicates in Figures 2 and 4 which reveal very similar average profiles for -INTS11 vs +INTS11 at both timepoints. Overall, we believe that we have resolved this discrepancy by showing that it amounts to experimental variability and because the most acutely affected protein-coding genes are lowly expressed. As detailed above, we show this in multiple ways (and validate by qPCR) We have revised the text accordingly and removed our original speculation that differences reflected the timeframe of INTS11 loss.

      Conclusion/assessment:

      This important work substantially advances our understanding of the mechanisms governing the directionality of human promoters. The evidence supporting the claims of the authors is compelling, with among others the use of advanced nascent transcriptomics including spike-in normalization controls and acute protein depletion using degron approaches.

      In my opinion, the authors' conclusions are in general well supported.

      Not only the manuscript but also the data generated will be useful to the wide community of researchers studying transcriptional regulation. Also, the POINT-derived novel sPOINT method described here is very valuable and can positively impact work in the field.

      We are grateful for the reviewers' positive assessment of our study.

      Reviewer #2 (Public Review):

      Summary:

      Eaton and colleagues use targeted protein degradation coupled with nascent transcription mapping to highlight a role for the integrator component INST11 in terminating antisense transcription. They find that upon inhibition of CDK9, INST11 can terminate both antisense and sense transcription - leading to a model whereby INST11 can terminate antisense transcription and the activity of CDK9 protects sense transcription from INST11-mediated termination. They further develop a new method called sPOINT which selectively amplifies nascent 5' capped RNAs and find that transcription initiation is more efficient in the sense direction than in the antisense direction. This is an excellent paper that uses elegant experimental design and innovative technologies to uncover a novel regulatory step in the control of transcriptional directionality.

      Strengths:

      One of the major strengths of this work is that the authors endogenously tag two of their proteins of interest - RBBP6 and INST11. This tag allows them to rapidly degrade these proteins - increasing the likelihood that any effects they see are primary effects of protein depletion rather than secondary effects. Another strength of this work is that the authors immunoprecipitate RNAPII and sequence extracted full-length RNA (POINT-seq) allowing them to map nascent transcription. A technical advance from this work is the development of sPOINT which allows the selective amplification of 5' capped RNAs < 150 nucleotides, allowing the direction of transcription initiation to be resolved.

      We appreciate this positive assessment.

      Weaknesses:

      While the authors provide strong evidence that INST11 and CDK9 play important roles in determining promoter directionality, their data suggests that when INST11 is degraded and CDK9 is inhibited there remains a bias in favour of sense transcription (Figures 4B and C). This suggests that there are other unknown factors that promote sense transcription over antisense transcription and future work could look to identify these.

      We agree that other (so far, unknown) factors promote sense transcription over antisense, which was demonstrated by our short POINT.  We have provided an expanded discussion on this in the revision. In our opinion, demonstrating that sense transcription is driven by preferential initiation in that direction is a key finding and we agree that the identification of the underlying mechanism constitutes an interesting avenue for future study.

      Reviewer #3 (Public Review):

      Summary:

      Using a protein degradation approach, Eaton et al show that INST11 can terminate the sense and anti-sense transcription but higher activity of CDK9 in the sense direction protects it from INS11-dependent termination. They developed sPOINT-seq that detects nascent 5'-capped RNA. The technique allowed them to reveal robust transcription initiation of sense-RNA as compared to anti-sense.

      Strengths:

      The strength of the paper is the acute degradation of proteins, eliminating the off-target effects. Further, the paper uses elegant approaches such as POINT and sPOINT-seq to measure nascent RNA and 5'-capped short RNA. Together, the combination of these three allowed the authors to make clean interpretations of data.

      We appreciate this positive assessment.

      Weaknesses:

      While the manuscript is well written, the details on the panel are not sufficient. The methods could be elaborated to aid understanding. Additional discussion on how the authors' findings contradict the existing model of anti-sense transcription termination should be added.

      We have added more detail to the figure panels, which we hope will help readers to navigate the paper more easily. Specifically, the assay employed for each experiment is indicated in each figure panel. As requested, we provide a new and separate discussion section in the revision.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Congratulations on this important piece of work!

      Some specific suggestions.

      MAJOR

      -The data are not available (Accession "GSE243266" is currently private and is scheduled to be released on Sep 01, 2026.) This should be corrected and as a minimum, the raw sequencing files as well as the spike-in scaled bigwig files should be provided in GEO.

      We have made the data public. Raw and bigwig files are provided as part of the GEO upload.

      MINOR

      - It would be useful for readers if you could include catalog numbers of the reagents used in the study.

      We have included this information in our revision.

      - A table in experimental procedures summarizing the genomic experiments performed in this study as well as published ones reanalyzed here would be helpful.

      This is now provided as part of the resources table.

      - It would be easier for reviewers to evaluate the manuscript if the figure legends were included together with the figures on one page. This is now allowed by most journals.

      We have used this formatting in the revision.

      - Providing some captions for the results sections would be helpful.

      We have included subheadings as suggested.

      Reviewer #2 (Recommendations For The Authors):

      Generally, I would suggest writing the experiment-type above panels where it is not immediately obvious what they are so a reader can appreciate the figures without referencing the legend. E.g. write POINT-seq on Figure 1B just to make it obvious to someone looking at the figures what methodology they are looking at. Likewise, you could write RNAPII ChIP-seq for Supplementary Figures 3D and 3E.

      We have carried out this recommendation.

      Can a y-axis be indicated on POINT-seq genome browser tracks? This could make them easier to interpret.

      Y-axis scales are provided as RPKM as stated in the figure legends.

      The authors could address/speculate in the text why there is less POINT-seq signal for the antisense transcript in the treatment condition in Figure 1B? Or could consider including a different example locus where this is not the case for clarity.

      Acute depletion of poly(A) factors (like RBBP6) results in a strong read-through beyond the poly(A) signal of protein-coding genes as Figure 1 shows.  However, it also causes a reduction in transcription levels, which can be seen in the figure and is correctly noted by the reviewer in this comment.  We see this with other poly(A) factor depletions (e.g. CPSF73 and CPSF30 – Eaton et al., 2020 and Estell et al., 2021) and other labs have observed this too (e.g for CPSF73-dTAG depletion (Cugusi et al., Mol Cell 2022)).  Plausible reasons include a limited pool of free RNAPII due to impaired transcriptional termination or limited nucleotide availability due to their incorporation within long read-through transcripts. For these reasons, we have retained the example in Figure 1B as a typical representation of the effect. Moreover, the heatmap in Figure 1D fairly represents the spectrum of effects following RBBP6 loss – highlighting the strong read-through beyond poly(A) signals and the marginal antisense effects.

      "The established effect of INTS11 at snRNAs was detected in our POINT-seq data and demonstrates the efficacy of this approach (Figure 2B)." The authors could explain this point more clearly in the text and describe the data - e.g. As expected, depletion of INTS11 leads to increased POINT-seq signal at the 3' end of snRNAs, consistent with defects in transcriptional termination. This is highlighted by the RNU5A-1 and RNU5B-1 loci (Figure 2B).

      We agree and have added more context to clarify this.

      I would suggest adjusting the scale of the heatmap in Figure 2E - I think it would be easier to interpret if the value of 0 was white - with >0 a gradient of orange and <0 a gradient of blue (as is done in Figure 1C). I think making this change would make the point as written in the text clearer i.e. "heatmap analysis demonstrates the dominant impact of INTS11 on antisense versus sense transcription at most promoters (Figure 2E)." I'm assuming most of the sense transcription would be white (more clearly unchanging) when the scale is adjusted.

      We agree and have done this. The reviewer is correct that most sense transcription is unchanged by INTS11 loss.  However, as we alluded to in the original submission, a subset of transcripts shows a promoter-proximal increase after INTS11 depletion. We have expanded the analyses of this effect (see responses to other comments) but stress that it is neither as general nor as large as the antisense effect.

      The authors make the point that there is mildly increased transcription over the 5' end of some genes upon INST11 depletion and show a track (Supplementary Fig 2A). It is not immediately obvious from the presentation of the meta-analysis in Figure 2D how generalisable this statement is. Perhaps the size of the panel or thickness of the lines in Figure 2D could be adjusted so that the peak of the control (in blue) could be seen. Perhaps an arrow indicating the peak could be added? I'm assuming the peak at the TSS is slightly lower in the control compared to INST11 depletion based on the authors' statement.

      We have provided multiple new analyses of this data to highlight where there are promoter-proximal effects of INTS11 loss in the sense direction.  Please see our response to the public review of reviewer 1 and new supplemental figures 2A, 2B, 4A and 4B which highlight the sense transcription increased in the absence of INTS11.

      The authors label Figure 4 "Promoters lose their directionality when CDK9 is inhibited" - but in INST11 depleted cells treated with CDK9i they find that there still is a bias towards sense transcription. Suggested edit "Some promoter directionality is lost when CDK9 is inhibited" or similar.

      We agree and have made this change.

      The authors conclude that INTS11-mediated effects are the result of perturbation of the catalytic activities of Integrator, the authors should perform rescue experiments with the catalytically dead E203Q-INTS11 mutant.

      This is a very good suggestion and something we had intended to pursue.  However, as we will describe below (and shown in Supplemental Figure 4G), there were confounding issues with this experiment.

      The E203Q mutant of INTS11 is widely used in the literature to test for catalytic functions of INTS11.  However, we have found that this mutation impairs the ability of INTS11 to bind other Integrator modules in cells. Based on co-immunoprecipitation of flag-tagged WT and E203Q derivatives, INTS1 (backbone module), 10 (tail module), and 8 (phosphatase module) all show reduced binding to E203Q vs. WT. Because E203Q INTS11 is defective in forming Integrator complexes, rescue experiments might not fully distinguish the effects of INTS11 activity from those caused by defects in complex assembly. While this may at first seem unexpected, in the analogous 3’ end processing complex, catalytic mutants of CPSF73 (which is highly related to INTS11) negatively affect its interaction with other complex members (Kolev and Steitz, EMBO Reports 2005).

      We hypothesise that INTS11 activity is most likely involved in attenuating promoter-proximal transcription, but we cannot formally rule out other explanations and discuss this in our revision. Regardless of how INTS11 attenuates transcription, our main conclusion is on its requirement to terminate antisense transcription whether this involves its cleavage activity or not.

      The authors suggest that CDK9 modulates INTS11 activity/assembly and suggest this may be related to SPT5. Is there an effect of CDK9 inhibition on the snRNA's highlighted in Figure 2B?

      We believe that snRNAs are different from protein-coding genes concerning CDK9 function. Shona Murphy’s lab previously showed that, unlike protein-coding genes, snRNA transcription is insensitive to CDK9 inhibition, and that snRNA processing is impaired by CDK9 inhibition (Medlin et al., EMBO 2003 and EMBO 2005).  We reproduce these findings by metaanalysis of 15 highly expressed and well-separated snRNAs and by qRT-PCR of unprocessed RNU1-1, RNU5A-1 and RNU7-1 snRNA following CDK9 inhibition. We observe snRNA read-through by POINT-seq following INTS11 loss whether CDK9 is inhibited or not (left panel, below). Note the higher TES proximal signal in CDK9i conditions, which likely reflects the accumulation of unprocessed snRNA as validated by qPCR for three example snRNAs (right panel, below).

      Author response image 1.

      For Figure 4, would similar results be observed using inhibitors targeting other transcriptional CDKs such as CDK7,12/13?

      In response to this suggestion, we analysed four selected protein-coding transcripts (the same 4 that we used to validate the CDK9i results) by qRT-PCR in a background of CDK7 inhibition using the THZ2 compound (new Supplemental Figure 4E).  THZ2 suppresses transcription from these genes as expected.  Interestingly, expression is restored by co-depleting Integrator, recapitulating our findings with CDK9 inhibition.  As CDK7 is the CDK-activating kinase for CDK9, its inhibition will also inhibit CDK9 so THZ2 may simply hit this pathway upstream of where CDK9 inhibitors.  Second, CDK7 may independently shield transcription from INTS11.  We allude to both interesting possibilities.

      What happens to the phosphorylation state of anti-sense engaged RNAPII when INTS11 is acutely depleted and/or CDK9 is inhibited? This could be measured by including Ser5 and Ser2 antibodies in the sPOINT-seq assay and complemented with Western Blot analysis.

      We have performed the western blot for Ser5 and Ser2 phosphorylation as suggested.  Both signals are mildly enhanced by INTS11 loss, which is consistent with generally increased transcription.  Ser2p is strongly reduced by CDK9 inhibition, which is consistent with the loss of nascent transcription in this condition.  Interestingly, both modifications are partly recovered when INTS11 is depleted in conjunction with CDK9 inhibition. This is consistent with the effects that we see on POINT-seq and shows that the recovered transcription is associated with some phosphorylation of RNAPII CTD.  This presumably reflects the action(s) of kinases that can act redundantly with CDK9.

      We have not performed POINT-seq with Ser5p and Ser2p antibodies under these various conditions.  Our rationale is that our existing data uses an antibody that captures all RNAPII (regardless of its phosphorylation status), which we feel most comprehensively assays transcription in either direction. Moreover, the lab of Fei Chen (Hu et al., Mol Cell 2023) recently published Ser5p and Ser2p ChIP-seq following INTS11 loss. By ChIP-seq, they observe a bigger increase in antisense RNAPII occupancy vs. sense providing independent and orthogonal support for our POINT-seq data.  Interestingly, this antisense increase is not paralleled by proportional increases in Ser5p or Ser2p signals.  This suggests that the unattenuated antisense transcription resulting from INTS11 loss does not have high Ser5p or Ser2p.  Since CDK7 and 9 are major Ser5 and 2 kinases, this supports our model that their activity is less prevalent for antisense transcription.  We now discuss these data in our revision.   

      The HIV reporter RNA experiments should be performed with the CDK9 inhibitor added to the experimental conditions. Presumably CDK9 inhibition would result in no upregulation of the reporter upon addition of TAT and/or dTAG. Perhaps the amount of TAT should be reduced to still have a dynamic window in which changes can be detected. It is possible that reporter activation is simply at a maximum. Can anti-sense transcription be measured from the reporter?

      We have performed the requested CDK9 inhibitor experiment to confirm that TAT-activated transcription from the HIV promoter is CDK9-dependent (new supplemental figure 4F).  Consistent with previous literature on HIV transcription, CDK9 inhibition attenuates TAT-activated transcription.  Importantly, and in line with our other experiments, depletion of INTS11 results in significant restoration of transcription from the HIV promoter when CDK9 is inhibited. Thus, TAT-activated transcription is CDK9-dependent and, as for endogenous genes, CDK9 prevents attenuation by INTS11.

      While TAT-activated transcription is high, we do not think that the plasmid is saturated. When considering this question, we revisited previous experiments using this system to study RNA processing (Dye et al., Mol Cell 1999, Cell 2001, Mol Cell 2006). In these cases, mutations in splice sites or polyadenylation sites have a strong effect on RNA processing and transcription around HIV reporter plasmids. Effects on transcription and RNA processing are; therefore, apparent in the appropriate context. In contrast, we find that the complete elimination of INTS11 has no impact on RNA output from the HIV reporter. Our original experiment assessing the impact of INTS11 loss in +TAT conditions used total RNA.  One possibility is that this allows non-nascent RNA to accumulate which might confound our interpretation of INTS11 effects on ongoing transcription.  However, the new experiment described in the paragraph above was performed on chromatin-associated (nascent) RNA to rule this out.  This again shows no impact of INTS11 loss on HIV promoter-derived transcription in the presence of TAT.

      To our knowledge, antisense transcription is not routinely assayed from plasmids. They generally employ very strong promoters (e.g. CMV, HIV) to drive sense transcription.  Crucially, their circular nature means that RNAPII going around the plasmid could interfere with antisense transcription coming the other way which does not happen in a linear genomic context. This is why we restricted our use of plasmids to looking at the effects of stimulated CDK9 recruitment (via TAT) on transcription rather than promoter directionality.   

      The authors should clearly state how many replicates were performed for the genomics experiments. Ideally, a signal should be quantified and compared statistically rather than relying on average profiles only.

      We have stated the replicate numbers for sequencing experiments in the relevant figure legends. All sequencing experiments were performed in at least two biological replicates, but often three. In addition, we validated their key conclusions by qPCR or with orthogonal sequencing approaches.

      Reviewer #3 (Recommendations For The Authors):

      The authors provide strong evidence in support of their claims.

      ChIP-seq of pol2S5 and S2 upon INST11 and CDK9 inhibition will strengthen the observation that transcription in the sense direction is more efficient.

      We view the analysis of total RNAPII as the most unbiased way of establishing how much RNAPII is going one way or the other. Importantly, ChIP-seq was very recently performed for Ser2p and Ser5p RNAPII derivatives in the lab of Fei Chen (Hu et al., Mol Cell 2023). Their data shows that loss of INTS11 increases the occupancy of total RNAPII in the antisense direction more than in the sense direction, which is consistent with our finding. Interestingly, the increased antisense RNAPII was not paralleled with an increase in Ser2p or Ser5p. This suggests that, following INTS11 loss, the unattenuated antisense transcription is not associated with full/normal Ser2p or Ser5p. These modifications are normally established by CDK7 and 9; therefore, this published ChIP-seq suggests that they are not fully active on antisense transcription when INTS11 is lost. This supports our overall model that CDK9 (and potentially CDK7 as suggested for a small number of genes in new Supplemental Figure 4E) is more active in the sense direction to prevent INTS11-dependent attenuation. We now discuss these data in our revision.

      In Supplementary Figure 2, the eRNA expression increases upon INST11 degradation, I wonder if the effects of this will be appreciated on cognate promoters? Can the authors test some enhancer:promoter pairs?

      We noticed that some genes (e.g. MYC) that are regulated by enhancers show reduced transcription in the absence of INTS11. Whilst this could suggest a correlation, the transcription of other genes (e.g. ACTB and GAPDH) is also reduced by INTS11 loss although they are not regulated by enhancers.  A detailed and extensive analysis would be required to establish any link between INTS11-regulated enhancer transcription and the transcription of genes from their cognate promoters.  We agree that this would be interesting, but it seems beyond the scope of our short report on promoter directionality.

      Line 111, meta plot was done of 1316 genes. Details on this number should be provided. Overall, the details of methods and analysis need improvement. The layout of panels and labelling on graphs can be improved.

      We have now explained the 1316 gene set.  In essence, these are the genes separated from an expressed neighbour by at least 10kb.  This distance was selected because depletion of RBBP6 induces extensive read-through transcription beyond the polyadenylation site of protein-coding genes.  To avoid including genes affected by transcriptional read-through from nearby transcription units we selected those with a 10kb gap between them. This was the only selection criteria so is unlikely to induce any unintended biases. Finally, we have added more information to the figure panels and their legends, which we hope will make our manuscript more accessible.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary of the work: In this work, Fruchard et. al. study the enzyme Tgt and how it modifies guanine in tRNAs to queuosine (Q), essential for Vibrio cholerae's growth under aminoglycoside stress. Q's role in codon decoding efficiency and its proteomic effects during antibiotic exposure is examined, revealing Q modification impacts tyrosine codon decoding and influences RsxA translation, affecting the SoxR oxidative stress response. The research proposes Q modification's regulation under environmental cues reprograms the translation of genes with tyrosine codon bias, including DNA repair factors, crucial for bacterial antibiotic response.

      The experiments are well-designed and conducted and the conclusions, for the most part, are well supported by the data. However, a few clarifications will significantly strengthen the manuscript.

      Thank you.

      Major:

      Figure S4 A-D. These growth curves are important data and should be presented in the main figures. Moreover, given that it is not possible to make a rsxA mutant, I wonder if it would be possible to connect rsx and tgt using the following experiment: expression of tgt results in resistance to TOB (in B), while expression of only rsx lower resistance to TOB (in D). Then simultaneous overexpression of both tgt/rsx in the WT strain should have either no effect on TOB resistance or increased resistance, relative to the WT. Perhaps the authors have done this, and if so, the data should be included as it will significantly strengthen their model.

      We thank the reviewer for this suggestion, we have tried to overexpress both tgt and rsxA simultaneously. However, this appears to be toxic as cells form small colonies and cannot grow well in liquid. We think that the presence of 2 plasmids and corresponding selection antibiotics amplify the toxicity of overexpressing rsxA, and even tgt. In fact, it can be seen that tgt overexpression in WT is already slightly deleterious, in the absence of tobramycin (figure 1B).

      Figure S4 - Is there a rationale for why it is possible to make rsx mutants in E. coli, but not in V. cholerae? For example, does E. coli have a second gene/protein that is redundant in function to rsxA, while V. cholerae does not? I think your data hint at this, since in the right panel growth data, your double mutant does not fully rescue back to rsx single mutant levels, suggesting another factor in tgt mutant also acts to lower resistance to TOB. If so, perhaps a line or two in text will be helpful for readers.

      This point raised by the referee is an interesting one that we have also asked ourselves at multiple occasions. In fact, the Rsx operon is linked with oxidative stress and respiration. Vibrio cholerae and E. coli show differences on genes involved in these pathways. V. cholerae lacks the cyo/nuo respiratory complex genes, and does not encode a Suf operon. Moreover, deletion of the anaerobic respiration Frd pathway leads to strong decrease of V. cholerae growth even in aerobic conditions. (10.1128/spectrum.01730-23). We have previously also generally seen differences between the 2 species in response to stress (10.1128/AAC.01549-10) and the way they deal with ROS (10.1371/journal.pgen.1003421). Therefore, we think that the fact that rsx is essential in V. cholerae and not E. coli could either be due to the presence of an additional redundant pathway in E. coli as suggested by the referee, or to more general differences in respiration and treatment of ROS. We thank the referee for highlighting this and we have now included a comment about this in the manuscript.

      - For growth curves in Figure 2 and relative comparisons like in Figure 5D and Figure S4 (and others in the paper), statistics and error bars, along with replicate information should be provided.

      We had mentioned this in the methods section, we have now added the specific information also on figure legends.

      - Figure 6A - Is the transcript fold change in linear or log? If linear, then tgt expression should not be classified as being upregulated in TOB. It is barely up by ~2-fold with TOB- 0.6....which is a mild phenotype, at best.

      We think that 2-fold change of tgt expression can be sufficient to lead to changes in tRNA modification levels. We agree that this is a mild induction, we have thus changed “increase” to “mildly increase” in the results.  

      - Line 779- 780: "This indicates that sub-MIC TOB possibly induces tgt expression through the stringent response activation." To me, the data presented in this figure, do not support this statement. The experiment is indirect.

      We agree, we rephrased: “Tobramycin may induces tgt expression through stringent response activation or through an independent pathway. “

      - Figure 3B and D. - These samples only have tobramycin, correct? The legend says both carbenicillin and tobramycin.

      The legend is correct, samples also have carbenicillin because we are testing here the growth with 2 synonymous beta-lactamase genes in presence of beta-lactams.

      - Figure 5. The color schemes in bars do not match up with the color scheme in cartoons below panels B and C. That makes it confusing to read. Please fix.

      Fixed.

      - A lot of abbreviations have been used. This makes reading a bit cumbersome. Ideally, less abbreviations will be used.

      Fixed

      Reviewer #2 (Public Review):

      Fruchard et al. investigate the role of the queuosine (Q) modification of the tRNA (Q-tRNA) in the human pathogen Vibrio cholerae. First, the authors state that the absence of Q-modified tRNAs (tgt mutant) increases the translation of TAT codons and proteins with a high TAT codon bias. Second, the absence of Q increases rsxA translation, because rsxA gene has a high TAT codon bias. Third, increased RsxA in the absence of Q inhibits SoxR response, reducing resistance towards the antibiotic tobramycin (TOB). Authors also predict in silico which genes harbor a higher TAT bias and found that among them are some involved in DNA repair, experimentally observing that a tgt mutant is more resistant to UV than the wt strain. It is worth noting that authors employ a wide variety of techniques, both experimental and bioinformatic. However, some aspects of the work need to be clarified or reevaluated.

      (1) The statement that the absence of Q increases the translation of TAT codons and proteins encoded by TAT-enriched genes presents the following problems that should be addressed:

      (1.1) The increase in TAT codon translation in the absence of Q is not supported by proteomics, since there was no detected statistical difference for TAT codon usage in proteins differentially expressed. Furthermore, there are some problems regarding the statistics of proteomics. Some proteins shown in Table S1 have adjusted p-values higher than their pvalues, which makes no sense. Maybe there is a mistake in the adjusted p-value calculation.

      We appreciate the reviewer’s thorough examination of our findings. In our study, we employed an adaptive Benjamini-Hochberg (BH) procedure to control the false discovery rate in our list of selected proteins, as explained in the Data Analysis part of the Proteomics MS and analysis part of our material and methods. The classical BH procedure (10.1111/j.2517-6161.1995.tb02031.x) calculates the 𝑚×𝑝(𝑗) adjusted p-value for the i-th ranked p-value as min where 𝑝(𝑗) is the j-th ranked pvalue and 𝑚 is the number of tests (e.g. number of proteins) (see 10.1021/acs.jproteome.7b00170 for details). Since m/j > 1 and 𝑝(𝑗) > 𝑝(𝑖) for 𝑗≥𝑚, it follows that for 𝑗≥i, resulting in adjusted p-values being higher or equal than the original p-values. Therefore, contrary to the reviewer's comment, it is a mathematical property that the adjusted p-value is greater than the original p-value when using the classical Benjamini-Hochberg procedure. 

      However, we want to underline that we used an « adaptive » BH procedure, which calculates the adjusted p-value for the i-th ranked p-value as min , where 𝜋0 is an estimate of the proportion of true null hypotheses (see 10.1021/acs.jproteome.7b00170 for details). Indeed, the classical BH procedure makes the assumption that 𝜋0 \= 1, which is a strong assumption in MS-based proteomics context.  Consequently, the mathematical property that the adjusted p-value is greater than the original p-value does not always hold true in our approach (that depends also on the 𝜋0 parameter).

      In addition, it is not common to assume that proteins that are quantitatively present in one condition and absent in another are differentially abundant proteins. Proteomics data software typically addresses this issue and applies some corrections. It would be advisable to review that.

      We thank the reviewer for highlighting this point. Indeed, some software impute a random small value to replace missing values and then produces statistics based on this imputed data (10.1038/nmeth.3901). However, the validity and relevance of generating statistics in the absence of actual data is questionable. 

      There are no universally accepted guidelines for handling this situation, and we believe it is more logical to set these values aside as potential interesting proteins. It is well-established that intensity values are often missing due to the detection limits of the spectrometer, suggesting that the missing values observed in several replicates of a condition are actually due to low values (see 10.1093/bioinformatics/btp362 and 10.1093/bioinformatics/bts193 for instance). It is thus logical to consider the associated proteins as potentially differentially abundant when comparing their complete absence in all replicates of one condition to their presence in several replicates of another condition.

      (1.2) Problems with the interpretation of Ribo-seq data (Figure 4D). On the one hand, the Ribo-seq data should be corrected (normalized) with the RNA-seq data in each of the conditions to obtain ribosome profiling data, since some genes could have more transcription in some of the conditions studied. In other articles in which this technique is used (such as in Tuorto et al., EMBO J. 2018; doi: 10.15252/embj.201899777), it is interpreted that those positions in which the ribosome moves most slowly and therefore less efficiently translated), are the most abundant. Assuming this interpretation, according to the hypothesis proposed in this work, the fragments enriched in TAT codons should have been less abundant in the absence of Q-tRNA (tgt mutant) in the Rib-seq experiment. However, what is observed is that TAT-enriched fragments are more abundant in the tgt mutant, and yet the Ribo-seq results are interpreted as RNA-seq, stating that this is because the genes corresponding to those sequences have greater expression in the absence of Q. 

      As recommended by the reviewer, we normalized the RiboSeq data with the RNAseq data to account for potential RNA variations. The updated Figure 4 demonstrates that this normalization does not alter our findings, confirming that variations at the RNAseq level do not contradict changes at the translational level. 

      The reviewer's observation that pauses at TAT codons would lead to ribosome accumulation and subsequent categorization as "up" genes is accurate. We must emphasize, however, that this category of “up genes” is probably quite diverse. The effect of ribosome stalling at TAT codons on total mRNA ribosome occupancy is likely highly variable, depending on the location of the TAT codon(s) within the CDS and the gene's expression level. We therefore think that genes in the "Up" category mainly correspond to genes that are more translated because the impact of pausing at TAT codons is probably not strong enough. Note that unlike what is usually done in bacterial riboseq experiments, we did not use any antibiotics to artificially freeze the ribosomes.

      On the other hand, it would be interesting to calculate the mean of the protein levels encoded by the transcripts with high and low ribosome profiling data.

      While this is a common request, we believe that comparing RiboSeq and proteomics data is not particularly informative. RiboSeq data directly measures translation, while proteomics provides information about protein abundance at steady state, reflecting the balance between protein synthesis and degradation. Furthermore, the number of proteins detectable by mass spectrometry is significantly smaller than the number of genes quantified by RiboSeq. Given these factors, there is often a low correlation between translation and protein abundance, making a direct comparison less relevant 

      (1.3) This statement is contrary to most previously reported studies on this topic in eukaryotes and bacteria, in which ribosome profiling experiments, among others, indicate that translation of TAT codons is slower (or unaffected) than translation of the TAC codons, and the same phenomenon is observed for the rest of the NAC/T codons. This is completely opposed to the results showed in Figure 4. However, the results of these studies are either not mentioned or not discussed in this work. Some examples of articles that should be discussed in this work:

      - "Queuosine-modified tRNAs confer nutritional control of protein translation" (Tuorto et al., 2018; 10.15252/embj.201899777)

      - "Preferential import of queuosine-modified tRNAs into Trypanosoma brucei mitochondrion is critical for organellar protein synthesis" (Kulkarni et al., 2021; doi:10.1093/nar/gkab567.

      - "Queuosine-tRNA promotes sex-dependent learning and memory formation by maintaining codonbiased translation elongation speed" (Cirzi et al., 2023; 10.15252/embj.2022112507)

      - "Glycosylated queuosines in tRNAs optimize translational rate and post-embryonic growth" (Zhao et al., 2023; 10.1016/j.cell.2023.10.026)

      - "tRNA queuosine modification is involved in biofilm formation and virulence in bacteria" (Diaz-Rullo and Gonzalez-Pastor, 2023; doi: 10.1093/nar/gkad667). In this work, the authors indicate that QtRNA increases NAT codon translation in most bacterial species. Could the regulation of TAT codonenriched proteins by Q-tRNAs in V. cholerae an exception? In addition, authors use a bioinformatic method to identify genes enriched in NAT codons similar to the one used in this work, and to find in which biological process are involved the genes whose expression is affected by Q-tRNAs (as discussed for the phenotype of UV resistance). It will be worth discussing all of this.

      Thank you for detailed suggestions, we agree that this discussion was missing and this comment gives us a chance to address that in the revised version of the manuscript.

      About the references above suggested by the referee, 4 of these papers were not mentioned in our manuscript, these were published while our manuscript was previously in review and we realize we have not cited them in the latest version of our manuscript. We thank the referee for highlighting this. We have now included a discussion about this. 

      We included the following in the discussion:

      “However, the opposite codon preference was shown in E. coli {Diaz-Rullo, 2023 #1888}. In eukaryotes also, several recent studies indicate slower translation of U-ending codons in the absence of Q34 {Cirzi, 2023 #1887;Kulkarni, 2021 #1886;Tuorto, 2018 #1268}. It’s important to note here, that in V. cholerae ∆tgt, increased decoding of U-ending codons is observed only with tyrosine, and not with the other three NAC/U codons (Histidine, Aspartate, Asparagine). This is interesting because it suggests that what we observe with tyrosine may not adhere to a general rule about the decoding efficiency of U- or C-ending codons, but instead seems to be specific to Tyr tRNAs, at least in the context of V. cholerae. Exceptions may also exist in other organisms. For example, in human cells, queuosine increases efficiency of decoding for U- ending codons and slows decoding of C- ending codons except for AAC {Zhao, 2023 #1889}. In this case, the exception is for tRNA Asparagine. Moreover, in mammalian cells {Tuorto, 2018 #1268}, ribosome pausing at U-ending codons is strongly seen for Asp, His and Asn, but less with Tyr. In Trypanosoma {Kulkarni, 2021 #1886}, reporters with a combination of the 4 NAC/NAU codons for Asp, Asn, Tyr, His have been tested, showing slow translation at U- ending version of the reporter in the absence of Q, but the effect on individual codons (e.g. Tyr only) is not tested. In mice {Cirzi, 2023 #1887}, ribosome slowdown is seen for the Asn, Asp, His U-ending codons but not for the Tyr U-ending codon. In summary, Q generally increases efficiency of U- ending codons in multiple organisms, but there appears to be additional unknown parameters which affect tyrosine UAU decoding, at least in V. cholerae. Additional factors such as mRNA secondary structures or mistranslation may also contribute to the better translation of UAU versions of tested genes. Mistranslation could be an important factor. If codon decoding fidelity impacts decoding speed, then mistranslation could also contribute to decoding efficiency of Tyr UAU/UAC codons and proteome composition.”

      (1.4) It is proposed that the stress produced by the TOB antibiotic causes greater translation of genes enriched in TAT codons. 

      Actually, it’s the opposite because in presence of TOB, in the wt, tgt would be induced leading to more Q on tRNA-Tyr and less translation of TAT.

      On the one hand, it is shown that the GFP-TAT version (gene enriched in TAT codons) and the RsxATAT-GFP protein (native gene naturally enriched in TAT) are expressed more, compared to their versions enriched in TAC in a tgt mutant than in a wt, in the presence of TBO (Fig. 5C). 

      Figure 5C shows relative fluorescence, ie changes of fluorescence in delta-tgt compared to WT. So it’s not necessarily more expressed but “more increased”

      However, in the absence of TOB, and in a wt context, although the two versions of GFP have a similar expression level (Fig. 3SD), the same does not occur with RsxA, whose RsxA-TAT form (the native one) is expressed significantly more than the RsxA-TAC version (Fig. 3SA). How can it be explained that in a wt context, in which there are also tRNA Q-modification, a gene naturally enriched in TAT is translated better than the same gene enriched in TAC?

      We thank the referee for this question based on careful assessment of our data. We agree, there appears to be significantly more RsxA-TAT in WT than RsxA-TAC. This could be due to other effects such as secondary structure formation on mRNA when the wt RsxA is recoded with TAC codons. This does not hinder the conclusion that the translation of the TAT version is increased in delta-tgt compared to WT.  

      It would be expected that in the presence of Q-tRNAs the two versions would be translated equally (as happens with GFP) or even the TAT version would be less translated. On the other hand, in the presence of TOB the fluorescence of WT GFP(TAT) is higher than the fluorescence of WT GFP(TAC) (Figure S3E) (mean fluorescence data for RsxA-GFP version in the presence of TOB is not shown). These results may indicate that the apparent better translation of TAT versions could be due to indirect effects rather from TAT codon translation.

      This is now mentioned in the manuscript

      “We cannot exclude, however, that additional factors such as mRNA secondary structures also contributes to the better translation of UAU versions of tested genes. “

      (2) Another problem is related to the already known role of Q in prevention of stop codon readthrough, which is not discuss at all in the work. In the absence of Q, stop codon readthrough is increased. In addition, it is known that aminoglycosides (such as tobramycin) also increase stop codon readthrough ("Stop codon context influences genome-wide stimulation of termination codon readthrough by aminoglycosides"; Wanger and Green, 2023; 10.7554/eLife.52611). Absence of Q and presence of aminoglycosides can be synergic, producing devastating increases in stop codon readthrough and a large alteration of global gene expression. All of these needs to be discussed in the work. Moreover, it is known that stop codon readthrough can alter gene expression and mRNA sequence context all influence the likelihood of stop codon readthrough. Thus, this process could also affect to the expression of recoded GFP and RsxA versions.

      We included the following in the revised version of the manuscript (results):

      “Q modification impacts decoding fidelity in V. cholerae.

      To test whether a defect in Q34 modification influences the fidelity of translation in the presence and absence of tobramycin, previously developed reporter tools were used (Fabret & Namy, 2021), to measure stop codons readthrough in V. cholerae ∆tgt and wild-type strains. The system consists of vectors containing readthrough promoting signals inserted between the lacZ and luc sequences, encoding β-galactosidase and luciferase, respectively. Luciferase activity reflects the readthrough efficiency, while β-galactosidase activity serves as an internal control of expression level, integrating a number of possible sources of variability (plasmid copy number, transcriptional activity, mRNA stability, and translation rate).  We found increased readthrough at stop codons UAA and to a lesser extent at UAG for ∆tgt, and this increase was amplified for UAG in presence of tobramycin (Fig. S2, stop readthrough). In the case of UAA, tobramycin appears to decrease readthrough, this may be artefactual, due to the toxic effect of tobramycin on ∆tgt.

      Mistranslation at specific codons can also impact protein synthesis. To further investigate mistranslation levels by tRNATyr in WT and ∆tgt, we designed a set of gfp mutants where the codon for the catalytic tyrosine required for fluorescence (TAT at position 66) was substituted by nearcognate codons (Fig. S2). Results suggest that in this sequence context, particularly in the presence of tobramycin, non-modified tRNATyr mistakenly decodes Asp GAC, His CAC and also Ser UCC, Ala GCU, Gly GGU, Leu CUU and Val GUC codons, suggesting that Q34 increases the fidelity of tRNATyr. 

      In parallel, we replaced Tyr103 of the β-lactamase described above, with Asp codons GAT or GAC. The expression of the resulting mutant β-lactamase is expected to yield a carbenicillin sensitive phenotype. In this system, increased tyrosine misincorporation (more mistakes) by tRNATyr at the mutated Asp codon, will lead to increased synthesis of active β-lactamase, which can be evaluated by carbenicillin tolerance tests. As such, amino-acid misincorporation leads here to phenotypic (transient) tolerance, while genetic reversion mutations result in resistance (growth on carbenicillin). The rationale is summarized in Fig. 3C. When the Tyr103 codon was replaced with either Asp codons, we observe increased β-lactamase tolerance (Fig. 3D, left), suggesting increased misincorporation of tyrosine by tRNATyr at Asp codons in the absence of Q, again suggesting that Q34 prevents misdecoding of Asp codons by tRNATyr.

      In order to test any effect on an additional tRNA modified by Tgt, namely tRNAAsp, we mutated the Asp129 (GAT) codon of the β-lactamase. When Asp129 was mutated to Tyr TAT (Fig. 3D, right), we observe reduced tolerance in ∆tgt, but not when it was mutated to Tyr TAC, suggesting less misincorporation of aspartate by tRNAAsp at the Tyr UAU codon in the absence of Q. In summary, absence of Q34 increases misdecoding by tRNATyr at Asp codons, but decreases misdecoding by tRNAAsp at Tyr UAU. 

      This supports the fact that tRNA Q34 modification is involved in translation fidelity during antibiotic stress, and that the effects can be different on different tRNAs, e.g. tRNATyr and tRNAAsp tested here.”

      Added figures: Figure S2, Figure 3CD

      (3) The statement about that the TOB resistance depends on RsxA translation, which is related to the presence of Q, also presents some problems:

      (3.1) It is observed that the absence of tgt produces a growth defect in V. cholerae when exposed to TOB (Figure 1A), and it is stated that this is mediated by an increase in the translation of RsxA, because its gene is TAT enriched. However, in Figure S4F, it is shown that the same phenotype is observed in E. coli, but its rsxA gene is not enriched in TAT codons. Therefore, the growth defect observed in the tgt mutant in the presence of TOB may not be due to the increase in the translation of TAT codons of the rsxA gene in the absence of Q. This phenotype is very interesting, but it may be related to another molecular process regulated by Q. Maybe the role of Q in preventing stop codon readthrough is important in this process, reducing cellular stress in the presence of TOB and growing better.

      FigS4F (now figure 5D) shows that rsxA can be toxic during growth in presence of tobramycin, but it does not show that rsxA translation is increased in E. coli in delta-tgt. However, we agree with the referee that there are probably additional processes regulated by Q which are also involved in the response to TOB stress. We already had mentioned this briefly in the discussion (“Note that, our results do not exclude the involvement of additional Q-regulated MoTTs in the response to sub-MIC TOB, since Q modification leads to reprogramming of the whole proteome. “), we further discussed it as follows:

      “As a consequence, transcripts with tyrosine codon usage bias are differentially translated. One such transcript codes for RsxA, an anti-SoxR factor. SoxR controls a regulon involved in oxidative stress response and sub-MIC aminoglycosides trigger oxidative stress in V. cholerae{Baharoglu, 2013 #720}, pointing to an involvement of oxidative stress response in the response to sub-MIC tobramycin stress.

      A link between Q34 and oxidative stress has also been previously found in eukaryotic organisms {Nagaraja, 2021 #1466}. Note that our results do not exclude the involvement of additional Qregulated translation of other transcripts in the response to tobramycin. Q34 modification leads to reprogramming of the whole proteome, not only for other transcripts with codon usage bias, but also through an impact on the levels of stop codon readthrough and mistranslation at specific codons, as supported by our data.”

      (3.2) All experiments related to the effect of Q on the translation of TAT codons have been performed with the tgt mutant strain. Considering that the authors have a pSEVA-tgt plasmid to overexpress this gene, they would have to show whether tgt overexpression in a wt strain produces a decrease in the translation of proteins encoded by TAT-enriched genes such as RsxA. This experiment would allow them to conclude that Q reduces RsxA levels, increasing resistance to TOB.

      We agree that this would be interesting to test, however, as it can be seen in figure 1B, delta-tgt pSEVAtgt (complemented strain) grows better than WT pSEVA-tgt (tgt overexpression). In fact, overexpression of tgt negatively impacts cell growth and yield smaller colonies, especially when cells carry a second plasmid (e.g with gfp constructs). We have also seen this with other RNA modification gene overexpressions in the lab (unpublished). We believe that the expression of tgt is tuned and since overexpression affects fitness, it is generally difficult to conduct experiments with overexpression plasmid for RNA modifications.  Nevertheless, we have done the experiment (with slow growing bacteria) and when we normalize expression of gfp in the presence of tgt overexpressing plasmid to the condition with no plasmid, we see little (1.5 fold) or no effect of tgt overexpression on fluorescence (see graph below). This is probably due to a toxic effect of ooverexpression and we do not believe these results are biologically relevant. 

      Author response image 1.

      (3.3) On the other hand, Fig. 1B shows that when the wt and tgt strains compete, both overexpressing tgt, the tgt mutant strain grows better in the presence of TOB. This result is not very well understood, since according to the hypothesis proposed, the absence of modification by Q of the tRNA would increase the translation of genes enriched in TAT, therefore, a strain with a higher proportion of Q-modified tRNAs as in the case of the wt strain overexpressing tgt would express the rsxA gene less than the tgt strain overexpressing tgt and would therefore grow better in the presence of TOB. For all these reasons, it would be necessary to evaluate the effect of tgt overexpression on the translation of RsxA.

      See our answer above about negative effect of tgt overexpression.

      (3.4) According to Figure 1I, the overexpression of tRNA-Tyr(GUA) caused a better growth of tgt mutant in comparison to WT. If the growth defect observed in tgt mutant in the presence of TOB is due to a better translation of the TAT codons of rsxA gene, the overexpression of tRNA-Tyr(GUA) in the tgt mutant should have resulted in even better RsxA translation a worse growth, but not the opposite result.

      We agree, we think that rsxA is not the only factor responsible for growth defect of tgt in presence of TOB (as now further discussed in the discussion). Overexpression of tRNAtyr possibly changes the equilibrium between the decoding of TAC vs TAT and may restore translation of TAC enriched genes. As also suggested by rev3, we have measured decoding reporters for TAT/TAC while overexpressing tTNA-tyr. This is now added to the results in fig S2C and the following:

      “We also tested decoding reporters for TAT/TAC in WT and ∆tgt overexpressing tRNATyr in trans (Fig. S1C). The presence of the plasmid (empty p0) amplified differences between the two strains with decreased decoding of TAC (and increased TAT, as expected) in ∆tgt compared to WT. Overexpression of tRNATyrGUA did not significantly impact decoding of TAT and increased decoding of TAC, as expected. Since overexpression of tRNATyrGUA rescues ∆tgt in tobramycin (Fig. 1I) and facilitates TAC decoding, this suggests that issues with TAC codon decoding contribute to the fitness defect observed in ∆tgt upon growth with tobramycin. Overexpression of tRNATyrAUA increased decoding of TAT in WT but did not change it in ∆tgt where it is already high. Unexpectedly, overexpression of tRNATyrAUA also increased decoding of TAC in WT. Thus, overexpression of tRNATyrAUA possibly changes the equilibrium between the decoding of TAC vs TAT and may restore translation of TAC enriched transcripts.” 

      Added figure: figure S1C

      (4) It cannot be stated that DNA repair is more efficient in the tgt mutant of V. cholerae, as indicated in the text of the article and in Fig 7. The authors only observe that the tgt mutant is more resistant to UV radiation and it is suggested that the reason may be TAT bias of DNA repair genes. To validate the hypothesis that UV resistance is increased because DNA repair genes are TAT biased, it would be necessary to check if DNA repair is affected by Q. UV not only produces DNA damage, but also oxidative stress. Therefore, maybe this phenotype is due to the increase in proteins related to oxidative stress controlled by RsxA, such as the superoxide dismutase encoded by sodA. It is also stated that these repair genes were found up for the tgt mutant in the Ribo-seq data, with unchanged transcription levels. Again, it is necessary to clarify this interpretation of the Ribo-seq data, since the fact that they are more represented in a tgt mutant perhaps means that translation is slower in those transcripts. Has it been observed in proteomics (wt vs tgt in the absence of TOB) whether these proteins involved in repair are more expressed in a tgt mutant?

      We agree that our results do not directly show that DNA repair is more efficient, but that delta-tgt responds better to UV. This has been modified in the manuscript. About oxidative stress, we did not see a better or worse response to H202 of delta-tgt. Moreover, since we see better response of deltatgt  to UV only in V. cholerae and not in E. coli, we did not favor the hypothesesi of response to stressox. In proteomics, we do not detect changes for DNA repair genes except for RuvA which is more abundant in delta-tgt. We have toned down the statement about DNA repair in the paper.

      (5) The authors demonstrate that in E. coli the tgt mutant does not show greater resistance to UV radiation (Fig. 7D), unlike what happens in V. cholerae. It should be discussed that in previous works it has been observed that overexpression in E. coli of the tgt gene or the queF gene (Q biosynthesis) is involved in greater resistance to UV radiation (Morgante et al., Environ Microbiol, 2015 doi: 10.1111/1462-2920.12505; and Díaz-Rullo et al., Front Microbiol. 2021 doi: 10.3389/fmicb.2021.723874). As an explanation, it was proposed (Diaz-Rullo and Gonzalez-Pastor, NAR 2023 doi: 10.1093/nar/gkad667) that the observed increase in the capacity to form biofilms in strains that overexpress genes related to Q modification of tRNA would be related to this greater resistance to UV radiation.

      We now mention the previous observations suggesting a link between tgt and UV. We thank the referee for the reference which we had overlooked. Note that in the case of our experiments, all cultures are in planktonic form and are not allowed to form biofilms. We thus prefer not to biofilmlinked processes in this study.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript the authors begin with the interesting phenotype of sub-inhibitory concentrations of the aminoglycoside tobramycin proving toxic to a knockout of the tRNA-guanine transglycosylase (Tgt) of the important human pathogen, Vibrio cholerae. Tgt is important for incorporating queuosine (Q) in place of guanosine at the wobble position of GUN codons. The authors go on to define a mechanism of action where environmental stressors control expression of tgt to control translational decoding of particularly tyrosine codons, skewing the balance from TAC towards TAT decoding in the absence of the enzyme. The authors use advanced proteomics and ribosome profiling to reveal that the loss of tgt results in increased translation of proteins like RsxA and a cohort of DNA repair factors, whose genes harbor an excess of TAT codons in many cases. These findings are bolstered by a series of molecular reporters, mass spectrometry, and tRNA overexpression strains to provide support for a model where Tgt serves as a molecular pivot point to reprogram translational output in response to stress.

      Strengths:

      The manuscript has many strengths. The authors use a variety of strains, assays, and advanced techniques to discover a mechanism of action for Tgt in mediating tolerance to sub-inhibitory concentrations of tobramycin. They observe a clear phenotype for a tRNA modification in facilitating reprogramming of the translational response, and the manuscript certainly has value in defining how microbes tolerate antibiotics.

      We thank the referee for their time and comments. 

      Weaknesses:

      The conclusions of the manuscript are mostly very well-supported by the data, but in some places control experiments or peripheral findings cloud precise conclusions. Some additional clarification, discussion, or even experimental extension could be useful in strengthening these areas.

      (1) The authors have created and used a variety of relevant molecular tools. In some cases, using these tools in additional assays as controls would be helpful. For example, testing for compensation of the observed phenotypes by overexpression of the Tyrosine tRNA(GUA) in Figure 2A with the 6xTAT strain, Figure 5C with the rxsA-GFP fusion, and/or Figure 7B with UV stress would provide additional information of the ability of tRNA overexpression to compensate for the defect in these situations.

      Thank you for the suggestions. Since overexpression of tRNA tyr is not expected to decrease decoding of TAT, we do not necessarily expect any effect for UV and rsxA expression. Overexpression of tRNA_GUA restores fitness of delta-tgt in TOB, but this is probably independent of RsxA. As ref2 also suggested above, we included in the discussion that the effect seen in delta-tgt with TOB is not only due to RsxA expression but also additional processes. However, these suggestions are interesting and we performed the following experiments in order to have an answer for these questions: 

      - “testing for compensation of the observed phenotypes by overexpression of the Tyrosine tRNA(GUA) in Figure 2A with the 6xTAT strain”: 

      This is now included in figure S2C and results as follows: 

      “We also tested decoding reporters for TAT/TAC in WT and ∆tgt overexpressing tRNA-Tyr in trans (Fig. S1C). The presence of the plasmid amplified differences between the two strains with decreased decoding of TAC (and increased TAT, as expected) in ∆tgt with empty plasmid compared to WT. Overexpression of tRNA_TyrGUA did not significantly impact decoding of TAT and increased decoding of TAC as expected. Since overexpression of tRNA_TyrGUA rescues ∆tgt in tobramycin (Fig. 1I) and facilitates TAC decoding, this suggests that issues with TAC codon decoding contribute to the fitness defect observed in ∆tgt upon growth with tobramycin. Overexpression of tRNA_TyrAUA increased decoding of TAT in WT but did not change it in ∆tgt where it is already high. Interestingly, overexpression of TyrAUA also increased decoding of TAC in WT. Thus, overexpression of tRNA_TyrAUA possibly changes the equilibrium between the decoding of TAC vs TAT and may restore translation of TAC enriched transcripts. “  

      -  Figure 5C with the rxsA-GFP fusion: 

      When we overexpress tRNA_GUA, rsxA fluorescence is 2-fold higher in delta-tgt compared to wt. However, the fluorescence is highly decreased compared to the condition with no tRNA overexpression. While we are not sure whether this apparent decrease is a technical issue or not (e.g. due to the presence of additional plasmid), we prefer not to further explore this in this manuscript. Note that we could not obtain delta-tgt strain carrying both plasmids expressing tRNA_GUA and rsxA, suggesting toxic overproduction of rsxA in this context.

      Author response image 2.

      - Figure 7B with UV stress: 

      Here again, delta-tgt overexpressing tRNA_GUA is still more UV resistant than WT overexpressing tRNA_GUA.

      Author response image 3.

      (2) The authors present a clear story with a reprogramming towards TAT codons in the knockout strain, particularly regarding tobramycin treatment. The control experiments often hint at other codons also contributing to the observed phenotypes (e.g., His or Asp), yet these effects are mostly ignored in the discussion. It would be helpful to discuss these findings at a minimum in the discussion section, or possibly experimentally address the role of His or Asp by overexpression of these tRNAs together with Tyrosine tRNA(GUA) in an experiment like that of Figure 1I to see if a more "wild type" phenotype would present. In fact, the synergy of Tyr, His, and/or Asp codons likely helps to explain the effects observed with the DNA repair genes in later experiments.

      We thank the referee for the suggestion. We agree that there could be synergies between these codons, and that’s probably why proteomics data does not clearly reflect tyrosine codons usage bias. This is now further discussed in the ideas and speculation section. 

      Moreover, we have added Figure S3G and the following result:

      “Since not all TAT biased proteins are found to be enriched in ∆tgt proteomics data, the sequence context surrounding TAT codons could affect their decoding. To illustrate this, we inserted after the gfp start codon, various tyrosine containing sequences displayed by rsxA (Fig. S3G). The native tyrosines were all TAT codons, our synthetic constructs were either TAT or TAC, while keeping the remaining sequence unchanged.  We observe that the production of GFP carrying the TEYTATLLL sequence from RsxA is increased in Δtgt compared to WT, while it is unchanged with TEYTACLLL. However, production of the GFP with the sequences LYTATRLL/LYTACRLL and EYTATLR/ EYTACLR was not unaffected (or even decreased for the latter) by the absence of tgt. Overall, our results demonstrate that RsxA is upregulated in the ∆tgt strain at the translational level, and that proteins with a codon usage bias towards tyrosine TAT are prone to be more efficiently translated in the absence of Q modification, but this is also dependent on the sequence context. “

      (3) Regarding Figure 6D, the APB northern blot feels like an afterthought. It was loaded with different amounts of RNA as input and some samples are repeated three times, but Δcrp only once. Collectively, it makes this experiment very difficult to assess.

      A different amount of RNA was used only for ∆tgt in which we have only one band because of the absence of modification. For all the other conditions, the same amount of RNA was used (0.9 µg). Additional replicates of crp were in an additional gel but only a representative gel was shown in the manuscript. This is now specified in the legend.

      We also attach below the picture of the gel with total RNA (syber Gold labelling of total RNA), where it can be seen that the lanes contain an equivalent quantity of RNA, except for ∆tgt.

      Author response image 4.

      Minor Points:

      (3) Fig S2B, do the authors have a hypothesis why the Asp and Phe tRNAs lead to a growth decrease in the untreated samples? It appears like Phe(GAA) partially compensates for the defect.

      Yes we agree, at this stage we do not have any satisfactory answer for this unfortunately. This would be interesting to study further but this is beyond the scope of the present study.

      (5) Lines 655 to 660 seem more appropriate as speculation in the discussion rather than as a conclusion in the results, where no direct experiments are performed. The authors might take advantage of the "Ideas and Speculation" section that eLife allows.

      Thank you very much for this suggestion, we added this section to the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor.

      - Figure 6 - Fonts on several mutants is different size/type. fixed

      - What is the Pm promoter. Please expand and give enough details so reader can follow. Especially as it is less used in V. cholerae (typical being pBAD or pTAC promoters). done

      - Spacing where references are inserted should be checked. done

      - Line 860-863 - "V. cholerae's response to sub-MIC antibiotic stress is transposable to other Gramnegative pathogens" . This reads awkard. Consider rephrasing. done

      - Figure 7 - Text in A and C is very small and is very hard to read. Font for tgt is different.

      Fixed. Tgt is in italics.

      Reviewer #2 (Recommendations For The Authors):

      As specified in the public review, more evidence would be necessary to affirm that tRNAs not modified by Q have a greater preference for translating TAT codons, since there are several previous studies in which it is shown that Q-tRNAs have a greater preference for NAT codons (including TAT). For example, it is suggested to explore what happens with other recoded genes (enriched in TAT or TAC) if there is a high level of Q-tRNAs (overexpression of tgt in a wt context). It is also necessary to clarify how to interpret the Ribo-seq results, which apparently is different from how they have been interpreted in other studies.

      Please see above our responses and changes made to the manuscript.

      Minor corrections

      In Figure 8, replace "Epitranscriptomic adapation to stress" with "Epitranscriptomic adaptation to stress".

      Fixed, thank you for noticing!

      Reviewer #3 (Recommendations For The Authors):

      (1) Lines 48-50, and 110 to 112, the authors have a nice mechanism and story, yet the lines mentioned feel very qualified (e.g., "possibly", "plausibly") and lead to the abstract hiding the value and major conclusions of the study. The authors could consider to revise or even remove these lines to focus on the take-home message in the abstract and end of introduction/discussion. 

      Thank you for this comment, we modified the text.  

      (2) Additional description for the samples in the results section for Figure 1 would be helpful to the reader.

      Done

      (3) Figure S1, the line of experiments with rluF is interesting, but in the end the choice seems a little random. Have the authors assessed knockouts of other modifications on the ASL for effects? Since the modification is not well characterized in V. cholerae according to the authors, it might make sense to save this for a future paper.

      We removed S1, as we agree that this experiment does not really add something to the paper.

      (4) Line 334 and 353 are redundant.

      Fixed

      (5) It is likely beyond the scope of the study, but it would strengthen the paper to repeat Figure 3 with His and/or Asp based on the findings of 2C and 4E to better understand the contribution of His and Asp to Q biology.

      We repeated figure 3 with Asp. Based on Fig 2C (less efficient decoding of GAC in deta-tgt in TOB) and 4E (positive GAT codon bias in proteins up in riboseq in delta-tgt TOB), we would expect that beta-lactamase with asp GAC would be less efficiently decoded than GAT in delta-tgt. 

      This was added to the manuscript

      “Like Tyr103, Asp129 was shown to be important for resistance to β-lactams (Doucet et al., 2004; Escobar et al., 1994; Jacob et al., 1990). When we replaced the native Asp129 GAT with the synonymous codon Asp129 GAC, the GAC version did not appear to produce functional β-lactamase in ∆tgt (Fig. 3B), suggesting increased mistranslation or inefficient decoding of the GAC codon by tRNAAsp in the absence of Q. Decoding of GAT codon was also affected in ∆tgt in the presence of tobramycin.”

      Added figure: Figure 3B

      (6) The authors could consider replacing 5D with S4A-D, which is easier to understand in our opinion.

      Done

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      From the Reviewing Editor:

      Four reviewers have assessed your manuscript on valence and salience signaling in the central amygdala. There was universal agreement that the question being asked by the experiment is important. There was consensus that the neural population being examined (GABA neurons) was important and the circular shift method for identifying task-responsive neurons was rigorous. Indeed, observing valenced outcome signaling in GABA neurons would considerably increase the role the central amygdala in valence. However, each reviewer brought up significant concerns about the design, analysis and interpretation of the results. Overall, these concerns limit the conclusions that can be drawn from the results. Addressing the concerns (described below) would work towards better answering the question at the outset of the experiment: how does the central amygdala represent salience vs valence.

      A weakness noted by all reviewers was the use of the terms 'valence' and 'salience' as well as the experimental design used to reveal these signals. The two outcomes used emphasized non-overlapping sensory modalities and produced unrelated behavioral responses. Within each modality there are no manipulations that would scale either the value of the valenced outcomes or the intensity of the salient outcomes. While the food outcomes were presented many times (20 times per session over 10 sessions of appetitive conditioning) the shock outcomes were presented many fewer times (10 times in a single session). The large difference in presentations is likely to further distinguish the two outcomes. Collectively, these experimental design decisions meant that any observed differences in central amygdala GABA neuron responding are unlikely to reflect valence, but likely to reflect one or more of the above features.

      We appreciate the reviewers’ comments regarding the experimental design. When assessing fear versus reward, we chose stimuli that elicit known behavioral responses, freezing versus consumption. The use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. For example, sweet or bitter tastes can be used, but even these activate different taste receptors and vary in the duration of the activation of taste-specific signaling (e.g. how long the taste lingers in the mouth). The approach we employed is similar to that of Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) that used water reward and shock to characterize the response profiles of somatostatin neurons of the central amygdala. Similar to what was reported by Yang and colleagues we observed that the majority of CeA GABA neurons responded selectively to one unconditioned stimulus (~52%). We observed that 15% of neurons responded in the same direction, either activated or inhibited, by the food or shock US. These were defined as salience based on the definitions of Lin and Nicolelis, 2008 (doi: 10.1016/j.neuron.2008.04.031) in which basal forebrain neurons responded similarly to reward or punishment irrespective of valence. The designation of valence encoding based opposite responses to the food or shock is straightforward (~10% of cells); however, we agree that the designation of modality-specific encoding neurons as valence encoding is less straightforward.

      A second weakness noted by a majority of reviewers was a lack of cue-responsive unit and a lack of exploration of the diversity of response types, and the relationship cue and outcome firing. The lack of large numbers of neurons increasing firing to one or both cues is particularly surprising given the critical contribution of central amygdala GABA neurons to the acquisition of conditioned fear (which the authors measured) as well as to conditioned orienting (which the authors did not measure). Regression-like analyses would be a straightforward means of identifying neurons varying their firing in accordance with these or other behaviors. It was also noted that appetitive behavior was not measured in a rigorous way. Instead of measuring time near hopper, measures of licking would have been better. Further, measures of orienting behaviors such as startle were missing.

      The authors also missed an opportunity for clustering-like analyses which could have been used to reveal neurons uniquely signaling cues, outcomes or combinations of cues and outcomes. If the authors calcium imaging approach is not able to detect expected central amygdala cue responding, might it be missing other critical aspects of responding?

      As stated in the manuscript, we were surprised by the relatively low number of cue responsive cells; however, when using a less stringent statistical method (Figure 5 - Supplement 2), we observed 13% of neurons responded to the food associated cue and 23% responded to the shock associated cue. The differences are therefore likely a reflection of the rigor of the statistical measure to define the responsive units. The number of CS responsive units is less than reported in the CeAl by Ciocchi et al., 2010 (doi: 10.1038/nature09559 ) who observed 30% activated by the CS and 25% inhibited, but is not that dissimilar from the results of Duvarci et al., 2011 (doi: 10.1523/JNEUROSCI.4985-10.2011 ) who observed 11% activated in the CeAl and 25% inhibited by the CS. These numbers are also consistent with previous single cell calcium imaging of cell types in the CeA. For example, Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) observed that 13% of somatostatin neurons responded to a reward CS and 8% responded to a shock CS. Yu et al., 2017 (doi: 10.1038/s41593-017-0009-9) observed 26.5% of PKCdelta neurons responded to the shock CS. It should also be noted that our analysis was not restricted to the CeAl. Finally, Food learning was assessed in an operant chamber in freely moving mice with reward pellet delivery. Because liquids were not used for the reward US, licking is not a metric that can be used.

      All reviewers point out that the evidence for salience encoding is even more limited than the evidence for valence. Although the specific concern for each reviewer varied, they all centered on an oversimplistic definition of salience. Salience ought to scale with the absolute value and intensity of the stimulus. Salience cannot simply be responding in the same direction. Further, even though the authors observed subsets of central amygdala neurons increasing or decreasing activity to both outcomes - the outcomes can readily be distinguished based on the temporal profile of responding.

      We thank the reviewers for their comments relating to the definition of salience and valence encoding by central amygdala neurons. We have addressed each of the concerns below.

      Additional concerns are raised by each reviewer. Our consensus is that this study sought to answer an important question - whether central amygdala signal salience or valence in cue-outcome learning. However, the experimental design, analyses, and interpretations do not permit a rigorous and definitive answer to that question. Such an answer would require additional experiments whose designs would address the significant concerns described here. Fully addressing the concerns of each reviewer would result in a re-evaluation of the findings. For example, experimental design better revealing valence and salience, and analyses describing diversity of neuronal responding and relationship to behavior would likely make the results Important or even Fundamental.

      We appreciate the reviewers’ comments and have addressed each concern below.

      Reviewer #2 (Public review):

      In this article, Kong and authors sought to determine the encoding properties of central amygdala (CeA) neurons in response to oppositely valenced stimuli and cues predicting those stimuli. The amygdala and its subregional components have historically been understood to be regions that encode associative information, including valence stimuli. The authors performed calcium imaging of GABA-ergic CeA neurons in freely-moving mice conditioned in Pavlovian appetitive and fear paradigms, and showed that CeA neurons are responsive to both appetitive and aversive unconditioned and conditioned stimuli. They used a variant of a previously published 'circular shifting' technique (Harris, 2021), which allowed them to delineate between excited/non-responsive/inhibited neurons. While there is considerable overlap of CeA neurons responding to both unconditioned stimuli (in this case, food and shock, deemed "salience-encoding" neurons), there are considerably fewer CeA neurons that respond to both conditioned stimuli that predict the food and shock. The authors finally demonstrated that there are no differences in the order of Pavlovian paradigms (fear - shock vs. shock - fear), which is an interesting result, and convincingly presented given their counterbalanced experimental design.

      In total, I find the presented study useful in understanding the dynamics of CeA neurons during a Pavlovian learning paradigm. There are many strengths of this study, including the important question and clear presentation, the circular shifting analysis was convincing to me, and the manuscript was well written. We hope the authors will find our comments constructive if they choose to revise their manuscript.

      While the experiments and data are of value, I do not agree with the authors interpretation of their data, and take issue with the way they used the terms "salience" and "valence" (and would encourage them to check out Namburi et al., NPP, 2016) regarding the operational definitions of salience and valence which differ from my reading of the literature. To be fair, a recent study from another group that reports experiments/findings which are very similar to the ones in the present study (Yang et al., 2023, describing valence coding in the CeA using a similar approach) also uses the terms valence and salience in a rather liberal way that I would also have issues with (see below). Either new experiments or revised claims would be needed here, and more balanced discussion on this topic would be nice to see, and I felt that there were some aspects of novelty in this study that could be better highlighted (see below).

      One noteworthy point of alarm is that it seems as if two data panels including heatmaps are duplicated (perhaps that panel G of Figure 5-figure supplement 2 is a cut and paste error? It is duplicated from panel E and does not match the associated histogram).

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Major concerns:

      (1) The authors wish to make claims about salience and valence. This is my biggest gripe, so I will start here.

      (1a) Valence scales for positive and negative stimuli and as stated in Namburi et al., NPP, 2016 where we operationalize "valence" as having different responses for positive and negative values and no response for stimuli that are not motivational significant (neutral cues that do not predict an outcome). The threshold for claiming salience, which we define as scaling with the absolute value of the stimulus, and not responding to a neutral stimulus (Namburi et al., NPP, 2016; Tye, Neuron, 2018; Li et al., Nature, 2022) would require the lack of response to a neutral cue.

      We appreciate the reviewer’s comment on the definitions of salience and valence and agree that there is not a consistent classification of these response types in the field. As stated above, we used the designation of salience encoding if the cells respond in the same direction to different stimuli regardless of the valence of the stimulus similar to what was described previously (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031). Similar definitions of salience have also been reported elsewhere (for examples see: Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006,  Zhu et al., 2018 doi: 10.1126/science.aat0481, and  Comoli et al., 2003, doi: 10.1038/nn1113P). Per the suggestion of the reviewer, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      Author response image 1.

      (1b) The other major issue is that the authors choose to make claims about the neural responses to the USs rather than the CSs. However, being shocked and receiving sucrose also would have very different sensorimotor representations, and any differences in responses could be attributed to those confounds rather than valence or salience. They could make claims regarding salience or valence with respect to the differences in the CSs but they should restrict analysis to the period prior to the US delivery.

      Perhaps the reviewer missed this, but analysis of valence and salience encoding to the different CSs are presented in Figure 5G, Figure 5 -Supplement 1 C-D, and Figure 5 -Supplement 2 N-O. Analysis of CS responsiveness to CSFood and CSShock were analyzed during the conditioning sessions Figure 3E-F, Figure 4B-C, Figure 5 – Supplement 2J-O and Figure 5 – Supplement 3K-L, and during recall probe tests for both CSFood and CSShock, Figure 5 – Supplement 1C-J.

      (1c) The third obstacle to using the terms "salience" or "valence" is the lack of scaling, which is perhaps a bigger ask. At minimum either the scaling or the neutral cue would be needed to make claims about valence or salience encoding. Perhaps the authors disagree - that is fine. But they should at least acknowledge that there is literature that would say otherwise.<br /> (1d) In order to make claims about valence, the authors must take into account the sensory confound of the modality of the US (also mentioned in Namburi et al., 2016). The claim that these CeA neurons are indeed valence-encoding (based on their responses to the unconditioned stimuli) is confounded by the fact that the appetitive US (food) is a gustatory stimulus while the aversive US (shock) is a tactile stimulus.

      We provided the same analysis for the US and CS. The US responses were larger and more prevalent, but similar types of encoding were observed for the CS. We agree that the food reward and the shock are very different sensory modalities. As stated above, the use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. We agree that the definition of cells that respond to only one stimulus is difficult to define in terms of valence encoding, as opposed to being specific for the sensory modality and without scaling of the stimulus it is difficult to fully address this issue. It should be noted however, that if the cells in the CeA were exclusively tuned to stimuli of different sensory modalities, we would expect to see a similar number of cells responding to the CS tones (auditory) as respond to the food (taste) and shock (somatosensory) but we do not. Of the cells tracked longitudinally 80% responded to the USs, with 65% of cells responding to food (activated or inhibited) and 44% responding to shock (activated or inhibited).

      (2) Much of the central findings in this manuscript have been previously described in the literature. Yang et al., 2023 for instance shows that the CeA encodes salience (as demonstrated by the scaled responses to the increased value of unconditioned stimuli, Figure 1 j-m), and that learning amplifies responsiveness to unconditioned stimuli (Figure 2). It is nice to see a reproduction of the finding that learning amplifies CeA responses, though one study is in SST::Cre and this one in VGAT::cre - perhaps highlighting this difference could maximize the collective utility for the scientific community?

      We agree that the analysis performed here is similar to what was conducted by Yang et al., 2023. With the major difference being the types of neurons sampled. Yang et al., imaged only somatostatin neurons were as we recorded all GABAergic cell types within the CeA. Moreover, because we imaged from 10 mice, we sampled neurons that ostensibly covered the entire dorsal to ventral extent of the CeA (Figure 1 – Supplement 1). Remarkably, we found that the vast majority of CeA neurons (80%) are responsive to food or shock. Within this 80% there are 8 distinct response profiles consistent with the heterogeneity of cell types within the CeA based on connectivity, electrophysiological properties, and gene expression. Moreover, we did not find any spatial distinction between food or shock responsive cells, with the responsive cell types being intermingled throughout the dorsal to ventral axis (Figure 5 – Supplement 3).

      (3) There is at least one instance of copy-paste error in the figures that raised alarm. In the supplementary information (Figure 5- figure supplement 2 E;G), the heat maps for food-responsive neurons and shock-responsive neurons are identical. While this almost certainly is a clerical error, the authors would benefit from carefully reviewing each figure to ensure that no data is incorrectly duplicated.

      We thank the reviewer for catching this error. It has been corrected.

      (4) The authors describe experiments to compare shock and reward learning; however, there are temporal differences in what they compare in Figure 5. The authors compare the 10th day of reward learning with the 1st day of fear conditioning, which effectively represent different points of learning and retrieval. At the end of reward conditioning, animals are utilizing a learned association to the cue, which demonstrates retrieval. On the day of fear conditioning, animals are still learning the cue at the beginning of the session, but they are not necessarily retrieving an association to a learned cue. The authors would benefit from recording at a later timepoint (to be consistent with reward learning- 10 days after fear conditioning), to more accurately compare these two timepoints. Or perhaps, it might be easier to just make the comparison between Day 1 of reward learning and Day 1 of fear learning, since they must already have these data.

      We agree that there are temporal differences between the food and shock US deliveries. This is likely a reflection of the fact that the shock delivery is passive and easily resolved based on the time of the US delivery, whereas the food responses are variable because they are dependent upon the consumption of the sucrose pellet. Because of these differences the kinetics of the responses cannot be accurately compared. This is why we restricted our analysis to whether the cells were food or shock responsive. Aside from reporting the temporal differences in the signals did not draw major conclusions about the differences in kinetics. In our experimental design we counterbalanced the animals that received fear conditioning firs then food conditioning, or food conditioning then fear conditioning to ensure that order effects did not influence the outcome of the study. It is widely known that Pavlovian fear conditioning can facilitate the acquisition of conditioned stimulus responses with just a single day of conditioning. In contrast, Pavlovian reward conditioning generally progresses more slowly. Because of this we restricted our analysis to the last day of reward conditioning to the first and only day of fear conditioning. However, as stated above, we compared the responses of neurons defined as salience during day 1 of reward conditioning and fear conditioning. As would be predicted based on previous definitions of salience encoding (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected

      (5) The authors make a claim of valence encoding in their title and throughout the paper, which is not possible to make given their experimental design. However, they would greatly benefit from actually using a decoder to demonstrate their encoding claim (decoding performance for shock-food versus shuffled labels) and simply make claims about decoding food-predictive cues and shock-predictive cues. Interestingly, it seems like relatively few CeA neurons actually show differential responses to the food and shock CSs, and that is interesting in itself.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). Interestingly, many of these studies did not vary the US intensity.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript entitled Kong and colleagues investigate the role of distinct populations of neurons in the central amygdala (CeA) in encoding valence and salience during both appetitive and aversive conditioning. The study expands on the work of Yang et al. (2023), which specifically focused on somatostatin (SST) neurons of the CeA. Thus, this study broadens the scope to other neuronal subtypes, demonstrating that CeA neurons in general are predominantly tuned to valence representations rather than salience.

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Strengths:

      One of the key strengths of the study is its rigorous quantitative approach based on the "circular-shift method", which carefully assesses correlations between neural activity and behavior-related variables. The authors' findings that neuronal responses to the unconditioned stimulus (US) change with learning are consistent with previous studies (Yang et al., 2023). They also show that the encoding of positive and negative valence is not influenced by prior training order, indicating that prior experience does not affect how these neurons process valence.

      Weaknesses:

      However, there are limitations to the analysis, including the lack of population-based analyses, such as clustering approaches. The authors do not employ hierarchical clustering or other methods to extract meaning from the diversity of neuronal responses they recorded. Clustering-based approaches could provide deeper insights into how different subpopulations of neurons contribute to emotional processing. Without these methods, the study may miss patterns of functional specialization within the neuronal populations that could be crucial for understanding how valence and salience are encoded at the population level.

      We appreciate the reviewer’s comments regarding clustering-based approaches. In order to classify cells as responsive to the US or CS we chose to develop a statistically rigorous method for classifying cell response types. Using this approach, we were able to define cell responses to the US and CS. Importantly, we identified 8 distinct response types to the USs. It is not clear how additional clustering analysis would improve cell classifications.

      Furthermore, while salience encoding is inferred based on responses to stimuli of opposite valence, the study does not test whether these neuronal responses scale with stimulus intensity-a hallmark of classical salience encoding. This limits the conclusions that can be drawn about salience encoding specifically.

      As stated above, we used salience classifications similar to those previously described (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). We agree that varying the stimulus intensity would provide a more rigorous assessment of salience encoding; however, several of the studies mentioned above classify cells as salience encoding without varying stimulus intensity. Additionally, the inclusion of recordings with varying US intensities on top of the Pavlovian reward and fear conditioning would further decrease the number of cells that can be longitudinally tracked and would likely decrease the number of cells that could be classified.

      In sum, while the study makes valuable contributions to our understanding of CeA function, the lack of clustering-based population analyses and the absence of intensity scaling in the assessment of salience encoding are notable limitations.

      Reviewer #4 (Public review):

      Summary:

      The authors have performed endoscopic calcium recordings of individual CeA neuron responses to food and shock, as well as to cues predicting food and shock. They claim that a majority of neurons encode valence, with a substantial minority encoding salience.

      Strengths:

      The use of endoscopic imaging is valuable, as it provides the ability to resolve signals from single cells, while also being able to track these cells across time. The recordings appear well-executed, and employ a sophisticated circular shifting analysis to avoid statistical errors caused by correlations between neighboring image pixels.

      Weaknesses:

      My main critique is that the authors didn't fully test whether neurons encode valence. While it is true that they found CeA neurons responding to stimuli that have positive or negative value, this by itself doesn't indicate that valence is the primary driver of neural activity. For example, they report that a majority of CeA neurons respond selectively to either the positive or negative US, and that this is evidence for "type I" valence encoding. However, it could also be the case that these neurons simply discriminate between motivationally relevant stimuli in a manner unrelated to valence per se. A simple test of this would be to check if neural responses generalize across more than one type of appetitive or aversive stimulus, but this was not done. The closest the authors came was to note that a small number of neurons respond to CS cues, of which some respond to the corresponding US in the same direction. This is relegated to the supplemental figures (3 and 4), and it is not noted whether the the same-direction CS-US neurons are also valence-encoding with respect to different USs. For example, are the neurons excited by CS-food and US-food also inhibited by shock? If so, that would go a long way toward classifying at least a few neurons as truly encoding valence in a generalizable way.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). As reported in Figure 5 and Figure 5 – Supplement 3, ~29% of CeA neurons responded to both food and shock USs (15% in the same direction and 13.5% in the opposite direction). In contrast, only 6 of 303 cells responded to both the CSfood and CSshock, all in the same direction.

      A second and related critique is that, although the authors correctly point out that definitions of salience and valence are sometimes confused in the existing literature, they then go on themselves to use the terms very loosely. For example, the authors define these terms in such a way that every neuron that responds to at least one stimulus is either salience or valence-encoding. This seems far too broad, as it makes essentially unfalsifiable their assertion that the CeA encodes some mixture of salience and valence. I already noted above that simply having different responses to food and shock does not qualify as valence-encoding. It also seems to me that having same-direction responses to these two stimuli similarly does not quality a neuron as encoding salience. Many authors define salience as being related to the ability of a stimulus to attract attention (which is itself a complex topic). However, the current paper does not acknowledge whether they are using this, or any other definition of salience, nor is this explicitly tested, e.g. by comparing neural response magnitudes to any measure of attention.

      As stated in response to reviewer 2, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      The impression I get from the authors' data is that CeA neurons respond to motivationally relevant stimuli, but in a way that is possibly more complex than what the authors currently imply. At the same time, they appear to have collected a large and high-quality dataset that could profitably be made available for additional analyses by themselves and/or others.

      Lastly, the use of 10 daily sessions of training with 20 trials each seems rather low to me. In our hands, Pavlovian training in mice requires considerably more trials in order to effectively elicit responses to the CS. I wonder if the relatively sparse training might explain the relative lack of CS responses?

      It is possible that learning would have occurred more quickly if we had used greater than 20 trials per session. However, we routinely used 20-25 trials for Pavlovian reward conditioning (doi: 10.1073/pnas.1007827107; doi: 10.1523/JNEUROSCI.5532-12.2013; doi: 10.1016/j.neuron.2013.07.044; and doi: 10.1016/j.neuron.2019.11.024).

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This useful study integrates experimental methods from materials science with psychophysical methods to investigate how frictional stabilities influence tactile surface discrimination. The authors argue that force fluctuations arising from transitions between frictional sliding conditions facilitate the discrimination of surfaces with similar friction coefficients. However, the reliance on friction data obtained from an artificial finger, together with the ambiguous correlative analyses relating these measurements to human psychophysics, renders the findings incomplete.

      Our main goal with this paper was to show that the most common metric, i.e. average friction coefficient—widely used in tactile perception and device design – is fundamentally unsound, and to offer a secondary parameter that is compatible with the fact that human motion is unconstrained, leading to dynamic interfacial mechanics.

      We understand the Reviewers wanted, through biomechanical measurements, to demonstrate that humans using instabilities. This is seemingly reasonable, but in individual responses, we explain the significant challenges and fundamental unknowns to those experiments. We believe this paper sets forth an important step to approach this problem. At the same time, we have made several changes in the discussion, conclusion, and title to clarify that our study is correlative between mechanical characterization and human testing.

      In short, there are still several fundamental unknowns that prevented us from basing the study around biomechanical measurements: (1) a decision-making model would need to be created, but it is unknown if tactile decision making follows other models, (2) it is further unknown what constitutes “tactile evidence”, though at our manuscript’s conclusion, we propose that friction instabilities are better suited for to be tactile evidence than the averaging of friction coefficients from a narrow range of human exploration (3) in the design of samples, from a friction mechanics and materials perspective, it is not at this point, possible to pre-program surfaces a priori to deliver friction instabilities and instead must be experimentally determined – especially when attempting to achieve this in controlled surfaces that do not create other overriding tactile cues, like macroscopic bumps or large differences in surface roughness. (4) Given that the basis for tactile percepts, like which object feels “rougher” or “smoother” is not sufficiently established, it is necessary to use a 3-alternative forced choice task which avoids asking objects along a preset perceptual dimension – a challenge recognized by Reviewer 3. However, this would bring in issues of memory in the decision-making model. (5) The prior points are compounded by the fact that, we believe, tactile exploration must be performed in an unconstrained manner, i.e., without an apparatus generating motion onto a stationary finger. Work by Liu et al. (IEEE ToH, 2024) showed that recreating friction obtained during free exploration onto a stationary finger was uninterpretable by the participants, hinting at the importance of efference copies.[1] We believe that many of the above-mentioned issues constitutes a significant advance in knowledge and would require discussion and dissemination with the community.

      Our changes to the manuscript

      Page 1 & SI Page 1, Title

      “Alternatives to Friction Coefficient: Fine Touch Perception Correlates with Frictional Instabilities”

      Reviewer 1 (Public review):

      Summary:

      In this paper, Derkaloustian et. al look at the important topic of what affects fine touch perception. The observations that there may be some level of correlation with instabilities are intriguing. They attempted to characterize different materials by counting the frequency (occurrence #, not of vibration) of instabilities at various speeds and forces of a PDMS slab pulled lengthwise over the material. They then had humans make the same vertical motion to discriminate between these samples. They correlated the % correct in discrimination with differences in frequency of steady sliding over the design space as well as other traditional parameters such as friction coefficient and roughness. The authors pose an interesting hypothesis and make an interesting observation about the occurrences of instability regimes in different materials while in contact with PDMS, which is interesting for the community to see in the publication. It should be noted that the finger is complex, however, and there are many factors that may be quite oversimplified with the use of the PDMS finger, and the consideration and discounting of other parameters are not fully discussed in the main text or SI. Most importantly, however, the conclusions as stated do not align with the primary summary of the data in Figure 2.

      Strengths:

      The strength of this paper is in its intriguing hypothesis and important observation that instabilities may contribute to what humans are detecting as differences in these apparently similar samples.

      We thank Reviewer 1 for their time on the manuscript, recognizing the approach we took, and offering constructive feedback. We believe that our conclusions, in fact, are supported by the primary summary of the data in Fig. 2 but we believe that our use of R<sup>2</sup> could have led to misinterpretation. The trend with friction coefficient and percent correct was indeed statistically significant but was spurious because the slope was negative. In the revision, we add clarifying comments throughout, change from R<sup>2</sup> to r as to highlight the negative trend, and adjust the figures to better focus on friction coefficient.

      Finally, we added a new section to discuss the tradeoffs between using a real human finger versus a mock finger, and which situations may warrant the use of one or the other. In short, for our goal of characterizing surfaces to be used in tactile experiments, we believe a mock finger is more sustainable and practical than using real humans because human fingers are unique per participant, humans move their fingers at constantly changing pressures and velocities, and friction generated during free exploring human cannot be satisfactorily replicated by moving a sample onto a stationary finger. But, we do not disagree that for other types of experiments, characterizing a human participant directly may be more advantageous.

      Weaknesses:

      Comment 1

      The most important weakness is that the findings do not support the statements of findings made in the abstract. Of specific note in this regard is the primary correlation in Figure 2B between SS (steady sliding) and percent correct discrimination. Of specific note in this regard is the primary correlation in Figure 2B between SS (steady sliding) and percent correct discrimination. While the statistical test shows significance (and is interesting!), the R-squared value is 0.38, while the R-squared value for the "Friction Coefficient vs. Percent Correct" plot has an R-squared of 0.6 and a p-value of < 0.01 (including Figure 2B). This suggests that the results do not support the claim in the abstract: "We found that participant accuracy in tactile discrimination was most strongly correlated with formations of steady sliding, and response times were negatively correlated with stiction spikes. Conversely, traditional metrics like surface roughness or average friction coefficient did not predict tactile discriminability."

      We disagree that the trend with friction coefficient suggests the results do not support the claim because the correlation was found to be negative. However, we could have made the comparison more apparent and expanded on this point, given its novelty.

      While the R<sup>2</sup> value corresponding to the “Friction Coefficient vs. Percent Correct” plot is notably higher, our results show that the slope is negative, which would be statistically spurious. This is because a negative correlation between percent correct (accuracy in discriminating surfaces) and difference in friction coefficient means that the more similar two surfaces are (by friction coefficient), the easier it would be for people to tell them apart. That is, it incorrectly concludes that two identical surfaces would be much easier to tell apart than two surfaces with greatly different friction coefficients.

      This is counterintuitive to nearly all existing results, but we believe our samples were well-positioned to uncover this trend by minimizing variability, by controlling multiple physical parameters in the samples, and that the friction coefficient — typically calculated in the field as an average friction coefficient — ignores all the dynamic changes in forces present in elastic systems undergoing mesoscale friction, i.e., human touch, as seen in Fig. 1 in a mock finger and Fig. 3 in a real finger. By demonstrating this statistically spurious trend, we believe this strongly supports our premise that an alternative to friction coefficient is needed in the design of tactile psychophysics and haptic interfaces.

      We believe that this could have been misinterpreted, so we took several steps to improve clarity, given the importance of this finding: we separated the panel on friction coefficient to its own panel, we changed from R<sup>2</sup> to r throughout, and we added clarifying text. We also added a small section focusing on this spurious trend.

      Our changes to the manuscript

      Page 1, Abstract

      “In fact, the typical method of averaging friction coefficients led to a spurious correlation which erroneously suggests that distinct objects should feel identical and identical objects should feel distinct.”

      Page 7

      “As Fig. 1 was constructed from friction measurements, we can also calculate an average friction coefficient, µ, by averaging the friction coefficient obtained at each of the 16 combinations of masses and velocities (Table 1). This calculation is a standard approach in tactile studies for summarizing friction measurements, or in some cases, surfaces are never characterized at multiple masses and velocities. However, summarizing friction data in this manner has been considered as conceptually questionable by others from a mechanics perspective.[3] Fig. 1 shows that the type of instabilities and friction forces encountered on a single surface can vary widely depending on the conditions. As a result, large variations in the friction coefficient are expected, depending on the mass and velocity — even though measurements originate from the same surface. This variability in friction coefficient can be seen with the large interquartile range of friction coefficients, which shows that the variation in friction coefficient across a single surface is similar, or even larger, than the differences in average friction coefficient across two different surfaces. The observation that friction coefficients vary so widely on a single surface calls into question the approach of analyzing how humans may perceive two different objects based on their average friction coefficients.”

      Page 9, Fig. 2 Caption

      “D) GLMM of accuracy vs. difference in average friction coefficient , showing a negative correlation. E) GLMMs of accuracy vs. other commonly used material properties or parameters: ΔAverage roughness R<sub>a</sub>, ΔHurst exponent H, and ΔWater contact angle hysteresis (º) (N = 10 participants_, _n = 600 total trials).”

      Page 9

      “Considering all instabilities individually, we found that only steady sliding was a positive, statistically significant predictor. (r \= 0.62, p < 0.05, shown in Fig. 2B).”

      Page 10

      “To compare the value of looking at frictional instabilities, we also performed GLMM fits on common approaches in the field, like a friction coefficient or material property typically used in tactile discrimination, shown in Fig. 2D-E. Interestingly, in Fig. 2D, we observed a spurious, negative correlation between friction coefficient (typically and often problematically simplified as across all tested conditions) and accuracy (r = -0.64, p < 0.01); that is, the more different the surfaces are by friction coefficient, the less people can tell them apart. This spurious correlation would be the opposite of intuition, and further calls into question the common practice of using friction coefficients in touch-related studies. Interestingly, this spurious correlation was also found by Gueorguiev et al.[21] The alternative, two-term model which includes adhesive contact area for friction coefficient[32] was even less predictive (see Fig. S6A of SI). We believe such a correlation could not have been uncovered previously as our samples are minimal in their physical variations. Yet, the dynamic changes in force even within a single sample are not considered, despite being a key feature of mesoscale friction during human touch.

      We investigate different material properties in Fig. 2E. Differences in average roughness R<sub>a</sub> (or other parameters, like root mean square roughness R<sub>rms</sub> (Fig. S6A of SI) did not show a statistically significant correlation to accuracy. Though roughness is a popular parameter, correlating any roughness parameter to human performance here could be moot: the limit of detecting roughness differences has previously been defined as 13 nm on structured surfaces[36] and much higher for randomly rough surfaces,[49] all of which are magnitudes larger than the roughness differences between our surfaces. The differences in contact angle hysteresis – as an approximation of the adhesion contributions[50] – do not present any statistically significant effects on performance.”

      Page 11-12

      “Despite the correlative nature of this study, we still obtained high correlations compared to existing biomechanical studies[4,19,21], which we speculate is because instabilities are an important predictive phenomenon for models of human touch. We believe that biomechanical studies, including more sophisticated techniques, like spatially resolved force maps from digital image correlation[5,42] may yield stronger correlations and results if they analyze data based on instabilities.

      Added References

      (2) Khamis, H. et al. Friction sensing mechanisms for perception and motor control: passive touch without sliding may not provide perceivable frictional information. J. Neurophysiol. 125, 809– 823 (2021).

      (6) Olczak, D., Sukumar, V. & Pruszynski, J. A. Edge orientation perception during active touch. J. Neurophysiol. 120, 2423–2429 (2018).

      Comment 2, Part 1

      Along the same lines, other parameters that were considered such as the "Percent Correct vs. Difference in Sp" and "Percent Correct vs. Difference in SFW" were not plotted for consideration in the SI. It would be helpful to compare these results with the other three metrics in order to fully understand the relationships.

      We have added these plots to the SI. We note that we had checked these relationships and discussed them briefly, but did not include the plot. The plots show that the type of instability was not as helpful as its presence or absence.

      Our changes to the manuscript

      Page 9

      “Furthermore, a model accounting for slow frictional waves alone specifically shows a significant, negative effect on performance (p < 0.01, Fig. S5 of SI), suggesting that in these samples and task, the type of instability was not as important.”

      “Fig. S5. GLMM fits of participant accuracy vs. the differences in instability incidence for individual instability types. Left: accuracy vs. differences in formation of slow frictional waves (SFW) between pairs. P1 and P5 have the same x-axis value and are shifted for clarity. Right: accuracy vs. differences in formation of stiction spikes (Sp).”

      SI Page 4

      “and no correlation between accuracy and stiction spikes (Fig. S5).”

      Comment 2, Part 2

      Other parameters such as stiction magnitude and differences in friction coefficient over the test space could also be important and interesting.

      We agree these are interesting and have thought about them. We are aware that others, like Gueorguiev et al., have studied stiction magnitudes, and though there was a correlation, the physical differences in surface roughness (glass versus PMMA) investigated made it unclear if these could be generalized further.[3] We are unsure how to proceed here with a satisfactory analysis of stiction magnitude, given that stiction spikes are not always generated. In fact, Fig. 1 shows that for many velocities and pressures, stiction spikes are not formed. In ongoing work, however, we are always cognizant that if stiction spikes are a dominant factor, then a secondary analysis on their magnitude would be important. We offer some speculation on why stiction spikes may be overrepresented in the literature:

      (1) They are prone to being created if the finger was loaded for a long time onto a surface prior to movement, thus creating adhesion by contact aging which is unlike active human exploration. We avoid this by discarding the first pull in our measurements, which is a standard practice in mechanical characterization if contact aging needs to be avoided.

      (2) The ranges of velocities and pressures explored by others were small.

      (3) In an effort to generate strong tactile stimuli, highly adhesive or rough surfaces are used.

      (4) Stiction spikes are visually distinctive on a plot, but we are unaware of any mechanistic reason that mechanoreceptors would be particularly sensitive to this low frequency event over other signals.

      We interpret “difference in friction coefficient over the test space” to be, for a single surface, like C4, to find the highest average friction for a condition of single velocity and mass and subtract that from the lowest average friction for a condition of single velocity and mass. We calculated the difference in friction coefficient in the typical manner of the field, by averaging all data collected at all velocities and masses and assigning a single value for all of a surface, like C4. We had performed this, and have the data, but we are wary of overinterpreting secondary and tertiary metrics because they do not have any fundamental basis in traditional tribology, and this value, if used by humans, would suggest that they rapidly explore a large parameter space to find a “maximum” and “minimum” friction. Furthermore, the range in friction across the test space, after averaging, can be smaller than the range of friction experienced at different masses and velocities on a single surface. We have tabulated and newly included these values (the interquartile range of friction coefficients of different masses and velocities per surface) in Table 1.

      Fig. 2D shows a GLMM fit between percent correct responses across our pairs and the differences in friction coefficient for each pair, where we see a spurious negative correlation. As we had the data of all average friction coefficients for each condition for a given material, we also looked at the difference in maximum and minimum friction coefficients. For our tested pairs, these differences also lined up on a statistically significant, negative GLMM fit (r = -0.86, p < 0.005). However, the values for a given surface can vary drastically, with an interquartile range of 1.20 to 2.09 on a single surface. We fit participant accuracy to the differences in these IQRs across pairs. This also led to a negative GLMM fit (r = -0.65, p < 0.05). However, we are hesitant to add this plot to the manuscript for the reasons stated previously.

      Comment 3, Part 1

      Beyond this fundamental concern, there is a weakness in the representativeness of the PDMS finger, the vertical motion, and the speed of sliding to real human exploration.

      Overall, this is a continuous debate that we think offers two solutions, and we are not advocating for an “either-or” case. There is always a tradeoff between using a synthetic model of a finger versus a real human finger, and there is a place for both models. That is, while our mock finger will be “better” the more similar it is to a human finger, it is not our goal to fully replace a human finger. Rather our goal is to provide a consistent method of characterizing surfaces that is sufficiently similar to human touch as to be a useful and predictive tool.

      The usefulness of the mock finger is in isolating the features of each surface that is independent of human variability, i.e., instabilities that form without changing loading conditions between sliding motions or even within one sliding motion. Of course, with this method, we still require confirmation of these features still forming during human exploration, which we show in Fig. 3. We believe that this method of characterizing surfaces at the mesoscale will ultimately lead to more successful human studies on tactile perception. Currently, and as shown in the paper, characterizing surfaces through traditional techniques, such as a commercial tribometer (friction coefficient, using a steel or hard metal ball), roughness (via atomic force microscopy or some other metrology), surface energy are less or not at all predictive. Thus, we believe this mock finger is better than the current state-of-the-art characterizing surfaces (we are also aware of a commercial mock finger company, but we were unable to purchase or obtain an evaluation model).

      One of the main – and severe – limitations of using a human finger is that all fingers are different, meaning any study focusing on a particular user may not apply to others or be recreated easily by other researchers. We do not think it is feasible to set a standard for replication around a real human finger as that participant may no longer be available, or willing to travel the world as a “standard”. Furthermore, the method in which a person changes their pressures and velocities is different. We note that this is a challenge unique to touch perception – how an object is touched changes the friction generated, and thus the tactile stimulus generated, whereas a standardized stimulus is more straightforward for light or sound.

      However, we do emphasize that we have strongly considered the balance between feasibility and ecological validity in the design of a mock finger. We have a mock finger, with the three components of stiffness of a human finger (more below). Furthermore, we have also successfully used this mock finger in correlations with human psychophysics in previous work, where findings from our mechanical experiments were more predictive of human performance[4–7] than other available methods.

      Our changes to the manuscript Added (Page 2-3)

      “Mock finger as a characterization tool

      We use a mechanical setup with a PDMS (poly(dimethylsiloxane)) mock finger to derive tactile predictors as opposed to direct biomechanical measurements on human participants. While there is a tradeoff in selecting a synthetic finger over a real human finger to modeling human touch, human fingers themselves are also highly variable[23] both in their physical shape and their use during human motion. Our goal is to design a consistent method of characterization of samples that can be easily accessed by other researchers and does not rely on a standard established around single human participant. We believe that sufficient replication of surface, bulk properties, and contact geometry results in characterization that isolates consistent features of surfaces that are not derived from human-to-human variability. We have used this approach to successfully correlate human results with mock finger characterization previously.[8,9,24]

      The major component of a human finger, by volume, is soft tissue (~56%),[25] resulting in an effective modulus close to 100 kPa.[26,27] In order to achieve this same softness, we crosslink PDMS in a 1×1×5 cm mold at a 30:1 elastomer:crosslinker ratio. In addition, two more features in the human finger impart significant mechanical differences. Human fingers have a bone at the fingertip, the distal phalanx,[26–28, 8–10]which we mimic with an acrylic “bone” within our PDMS network. The stratum corneum, the stiffer, glassier outer layer of skin,[29] is replicated with the surface of the mock finger glassified, or further crosslinked, after 8 hours of UV-Ozone treatment.30 This treatment also modifies the surface properties of the native PDMS to align with those of a human finger more closely: it minimizes the viscoelastic tack at the surface, resulting in a comparable non-sticky surface. Stabilizing after one day after treatment, the mock finger surface obtains a moderate hydrophilicity (~60º), as is typically observed for a real finger.[11,31]

      The initial contact area formed before a friction trace is collected is a rectangle of 1×1 cm. While this shape is not entirely representative of a human finger with curves and ridges, human fingers flatten out enough to reduce the effects of curvature with even very light pressures.[31–33] This implies that for most realistic finger pressures, the contact area is largely load-independent, which is more accurately replicated with a rectangular mock finger.

      Lastly, we consider the role of fingerprint ridges. A key finding of our previous work is that while fingerprints enhanced frictional dynamics at certain conditions, key features were still maintained with a flat finger.[11] Furthermore, for some loading conditions, the more amplified signals could also result in more similar friction traces for different surfaces. We have observed good agreement between these friction traces and human experiments.[8,9,22,34]”

      Page 3-4, Materials and Methods

      “Mock Finger Preparation

      Friction forces across all six surfaces were measured using a custom apparatus with a polydimethylsiloxane (PDMS, Dow Sylgard 184) mock finger that mimics a human finger’s mechanical properties and contact mechanics while exploring a surface relatively closely.[8,9] PDMS and crosslinker were combined in a 30:1 ratio to achieve a stiffness of 100 kPa comparable to a real finger, then degassed in a vacuum desiccator for 30 minutes. We are aware that the manufacturer recommended crosslinking ratio for Sylgard 184 is 10:1 due to potential uncrosslinked liquid residues,[35] but further crosslinking concentrated at the surface prevents this. The prepared PDMS was then poured into a 1×1×5 cm mold also containing an acrylic 3D-printed “bone” to attach applied masses on top of the “fingertip” area contacting a surface during friction testing. After crosslinking in the mold at 60ºC for 1 hour, the finger was treated with UV-Ozone for 8 hours out of the mold to minimize viscoelastic tack.

      Mechanical Testing

      A custom device using our PDMS mock finger was used to collect macroscopic friction force traces replicating human exploration.[8,9] After placing a sample surface on a stage, the finger was lowered at a slight angle such that an initial 1×1 cm rectangle of “fingertip” contact area could be established. We considered a broad range of applied masses (M \= 0, 25, 75, and 100 g) added onto the deadweight of the finger (6 g) observed during a tactile discrimination task. The other side of the sensor was connected to a motorized stage (V-508 PIMag Precision Linear Stage, Physikinstrumente) to control both displacement (4 mm across all conditions) and sliding velocity (v \= 5, 10, 25, and 45 mm s<sup>-1</sup>). Forces were measured at all 16 combinations of mass and velocity via a 250 g Futek force sensor (k \= 13.9 kN m<sup>-1</sup>) threaded to the bone, and recorded at an average sampling rate of 550 Hz with a Keithley 7510 DMM digitized multimeter. Force traces were collected in sets of 4 slides, discarding the first due to contact aging. Because some mass-velocity combinations were near the boundaries of instability phase transitions, not all force traces at these given conditions exhibited similar profiles. Thus, three sets were collected on fresh spots for each condition to observe enough occurrences of multiple instabilities, at a total of nine traces per combination for each surface.”

      Added References

      (23) Infante, V. H. P. et al. The role of skin hydration, skin deformability, and age in tactile friction and perception of materials. Sci. Rep. 15, 9935 (2025).

      (24) Nolin, A., Lo, C.-Y., Kayser, L. V. & Dhong, C. B. Transparent and Electrically Switchable Thin Film Tactile Actuators Based on Molecular Orientation. Preprint at https://doi.org/10.48550/arXiv.2411.07968 (2024).

      (25) Murai, M., Lau, H.-K., Pereira, B. P. & Pho, R. W. H. A cadaver study on volume and surface area of the fingertip. J. Hand Surg. 22, 935–941 (1997).

      (26) Abdouni, A. et al. Biophysical properties of the human finger for touch comprehension: influences of ageing and gender. R. Soc. Open Sci. (2017) doi:10.1098/rsos.170321.

      (27) Cornuault, P.-H., Carpentier, L., Bueno, M.-A., Cote, J.-M. & Monteil, G. Influence of physico-chemical, mechanical and morphological fingerpad properties on the frictional distinction of sticky/slippery surfaces. J. R. Soc. Interface (2015) doi:10.1098/rsif.2015.0495.

      (28) Qian, K. et al. Mechanical properties vary for different regions of the finger extensor apparatus. J. Biomech. 47, 3094–3099 (2014).

      (29) Yuan, Y. & Verma, R. Measuring microelastic properties of stratum corneum. Colloids Surf. B Biointerfaces 48, 6–12 (2006).

      (30) Fu, Y.-J. et al. Effect of UV-Ozone Treatment on Poly(dimethylsiloxane) Membranes: Surface Characterization and Gas Separation Performance. Langmuir 26, 4392–4399 (2010).

      Comment 3, Part 2

      The real finger has multiple layers with different moduli. In fact, the stratum corneum cells, which are the outer layer at the interface and determine the friction, have a much higher modulus than PDMS. The real finger has multiple layers with different moduli. In fact, the stratum corneum cells, which are the outer layer at the interface and determine the friction, have a much higher modulus than PDMS.

      We have approximated the softness of the finger with 100 kPa crosslinked PDMS, which is close to what has been reported for the bulk of a human fingertip.[9,10] However, as mentioned in the Materials and Methods, there are two additional features of the mock finger that impart greater strength. The PDMS surrounds a rigid, acrylic bone comparable to the distal phalanx, which provides an additional layer of higher modulus.[8] Additionally, the 8-hour UV-Ozone treatment decreases the viscoelastic tack of the pristine PDMS by glassifying, or further crosslinking the surface of the finger,[12] therefore imparting greater stiffness at the surface similar to the contributions of the stratum corneum, along with a similar surface energy.[13] This technique is widely used in wearables,[14] soft robotics,[15] and microfluidics[16] to induce both these material changes. Additionally, the finger is used at least a day after UV-Ozone treatment is completed to generate a stable surface that is moderately hydrophilic, similar to the outermost layer of human skin.[17]

      Comment 3, Part 3

      In addition, the slanted position of the finger can cause non-uniform pressures across the finger. Both can contribute to making the PDMS finger have much more stick-slip than a real finger.

      To ensure that there is minimal contribution from the slanted position of the finger, an initial contact area of 1×1 cm is established before sliding and recording friction measurements. As the PDMS finger is a soft object, the portion in contact with a surface flattens and the contact area remains largely unchanged during sliding. Any additional stick-slip after this alignment step is caused by contact aging at the interface, but the first trace we collect is always discarded to only consider stick-slip events caused by surface chemistry. We recognize that it is difficult to completely control the pressure distribution due to the planar interface, but this is also expected when humans freely explore a surface.

      Comment 3, Part 4

      In fact, if you look at the regime maps, there is very little space that has steady sliding. This does not represent well human exploration of surfaces. We do not tend to use a force and velocity that will cause extensive stick-slip (frequent regions of 100% stick-slip) and, in fact, the speeds used in the study are on the slow side, which also contributes to more stick-slip. At higher speeds and lower forces, all of the materials had steady sliding regions.”

      We are not aware of published studies that extensively show that humans avoid stickslip regimes. In fact, we are aware familiar with literature where stiction spike formation is suppressed – a recent paper by AliAbbasi, Basdogan et. al. investigates electroadhesion and friction with NaCl solution-infused interfaces, resulting in significantly steadier forces.[18] We also directly showed evidence of instability formation that we observed during human exploration in Fig. 3B-C. These dynamic events are common, despite the lack of control of normal forces and sliding velocities. We also note that Reviewer 1, Comment 2, Part 2 was suggesting that we further explore possible trends from parameterizing the stiction spike.

      We note that many studies have often not gone at the velocities and masses required for stiction spikes – even though these masses and velocities would be routinely seen in free exploration – this is usually due to constraints of their equipment.[19] Sliding events during human free exploration of surfaces can exceed 100 mm/s for rapid touches. However, for the surfaces investigated here, we observe that large regions of stick-slip can emerge at velocities as low as 5 mm/s depending on the applied load. The incidence of steady sliding appears more dependent on the applied mass, with almost no steady sliding observed at or above 75 g. Indeed, the force categorization along our transition zones is the main point of the paper.

      Comment 3, Part 5

      Further, on these very smooth surfaces, the friction and stiction are more complex and cannot dismiss considerations such as finger material property change with sweat pore occlusion and sweat capillary forces. Also, the vertical motion of both the PDMS finger and the instructed human subjects is not the motion that humans typically use to discriminate between surfaces.

      We did not describe the task sufficiently. Humans were only given the instruction to slide their finger along a single axis from top to bottom of a sample, not vertical as in azimuthal to gravity. We have updated our wording in the manuscript to reflect this.

      Page 4

      “Participants could touch for as long as they wanted, but were asked to only use their dominant index fingers along a single axis to better mimic the conditions for instability formation during mechanical testing with the mock finger.”

      Page 11

      “The participant was then asked to explore each sample simultaneously, and ran over each surface in strokes along a single axis until the participant could decide which of the two had “more friction”.”

      Comment 3, Part 6

      Finally, fingerprints may not affect the shape and size of the contact area, but they certainly do affect the dynamic response and detection of vibrations.”

      We are aware of the nuance. Our previous work on the role of fingerprints on friction experienced by a PDMS mock finger showed enhanced signals with the incorporation of ridges on the finger and used a rate-and-state model of a heterogenous, elastic body to find corresponding trends (though there is no existing model of friction that can accurately model experiments on mesoscale friction).[11] The key conclusion was that a flat finger still preserved key dynamic features, and the presence of stronger or more vibrations could result in more similar forces for different surfaces depending on the sliding conditions.

      This is also in the context that we are seeking to provide a reasonable and experimentally accessible method to characterize surfaces, which will always be better as we get closer in replicating a true human finger. But our goal here was to replicate the finger sufficiently for use in human studies. We believe the more appropriate metric of success is if the mock finger is more successful than replacing traditional characterization experiments, like friction coefficient, roughness, surface energy, etc.

      Comment 4

      This all leads to the critical question, why are friction, normal force, and velocity not measured during the measured human exploration and in a systematic study using the real human finger? The authors posed an extremely interesting hypothesis that humans may alter their speed to feel the instability transition regions. This is something that could be measured with a real finger but is not likely to be correlated accurately enough to match regime boundaries with such a simplified artificial finger.

      We are excited that our manuscript offers a tractable manner to test the hypothesis that tactile decision-making models use friction instabilities as evidence. However, we lay out the challenges and barriers, and how the scope of this paper will lead us in that direction. We also clarify that our goals are to provide a method to characterize samples to better design tactile interfaces in haptics or in psychophysical experiments and raise awareness that the common methods of sample characterization in touch by an average friction coefficient or roughness is fundamentally unsound. Throughout the paper, we have made changes to reflect that our study, at this point, is only correlative.

      As discussed in the summary, and with additional detail here, to further support our findings through observation on humans would require answering:

      (1) Which one, or combination of, of the multiple swipes that people make responsible for a tactile decision? (There is a need for a decision-making model)

      (2) Establish what is, or may be, tactile evidence.

      (3) Establish tactile decision-making models are similar or different than existing decision-making models.

      (4) Design a task that does not require the use of subjective tactile descriptors, like “which one feels rougher”, which we have seen causes confusion in participants, which will likely require accounting for memory effects.

      We elaborate these points below:

      To successfully perform this experiment, we note that freely exploring humans make multiple strokes on a surface. Therefore, we would need to construct a decision-making model. It has not yet been demonstrated whether tactile decision making follows visual decision making, but perhaps to start, we can assume it does. Then, in the design of our decision-making paradigm, we immediately run into the problem: What is tactile evidence?

      From Fig. 3C, we already can see that identifying evidence is challenging. Prior to this manuscript, people may have chosen the average force, or the highest force. Or we may choose the average friction force. Then, after deciding on the evidence, we need to find a method to manipulate the evidence, i.e., create samples or a machine that causes high friction, etc. We show that during the course of human touch, due to the dynamic nature of friction, the average can change a large amount and sample design becomes a central barrier to experiments. Others may suggest immobilizing the finger and applying a known force, but given how much friction changes with human exploration, there is no known method to make a machine recreate temporally and spatially varying friction forces during sliding onto a stationary finger. Finally, perhaps most importantly, in addition to mechanical challenges, a study by Liu, Colgate et al. showed that even if they recorded the friction (2D) of a finger exploring a surface and then replicated the same friction forces onto a finger, the participant could not determine which surface the replayed friction force was supposed to represent.[1] This supports that the efference copy is important, that the forces in response to expected motion are important to determine friction. Finally, there is no known method to design instabilities a priori. They must be found through experiments. Especially since if we were to introduce, say a bump or a trough, then we bring in confounding variables to how participants tell surfaces apart.

      Furthermore, even if we had some consistent method to create tactile “evidence”, the paradigm also deserves some consideration. In our experience, the 3-AFC task we perform is important because the vocabulary for touch has not been established. That is, in 3-AFC, by asking to determine which one sample is unlike the others, we do not have to ask the participant questions like “which one is rougher” or “which one has less friction”. In contrast, 2-AFC, which is better for decision-making models because it does not include memory, requires the asking of a perceptual question like: “which one is rougher?”. In our ongoing work, taking two silane coatings, we found that participants could easily identify which surface is unlike the others above chance in a 3-AFC, but participants, even within their own trials, could not consistently identify one silane as perceptually “rougher” by 2-AFC. To us, this calls into question the validity of tactile descriptors, but is beyond the scope of this manuscript.

      This is not our only goal, but in the context of human exploration, in this manuscript here, we believed it was important to identify a mechanical parameter that was consistent with how humans explore surfaces, but was also a parameter that could characterize to some consistent property of a surface – irrespective of whether a human was touching it. We thought that designing human decision-making models and paradigms around the friction coefficient would not be successful.

      Given the scope of these challenges, we do not think it would be possible to establish these conceptual sequences in a single manuscript. However, we think that our manuscript brings an important step forward to approach this problem.

      Reviewer 2 (Public review):

      Summary:

      In this paper, the authors want to test the hypothesis that frictional instabilities rather than friction are the main drivers for discriminating flat surfaces of different sub-nanometric roughness profiles.

      They first produced flat surfaces with 6 different coatings giving them unique and various properties in terms of roughness (picometer scale), contact angles (from hydrophilic to hydrophobic), friction coefficient (as measured against a mock finger), and Hurst exponent.

      Then, they used those surfaces in two different experiments. In the first experiment, they used a mock finger (PDMS of 100kPA molded into a fingertip shape) and slid it over the surfaces at different normal forces and speeds. They categorized the sliding behavior as steady sliding, sticking spikes, and slow frictional waves by visual inspection, and show that the surfaces have different behaviors depending on normal force and speed. In a second experiment, participants (10) were asked to discriminate pairs of those surfaces. It is found that each of those pairs could be reliably discriminated by most participants.

      Finally, the participant's discrimination performance is correlated with differences in the physical attributes observed against the mock finger. The authors found a positive correlation between participants' performances and differences in the count of steady sliding against the mock finger and a negative correlation between participants' reaction time and differences in the count of stiction spikes against the mock finger. They interpret those correlations as evidence that participants use those differences to discriminate the surfaces.

      Strengths:

      The created surfaces are very interesting as they are flat at the nanometer scale, yet have different physical attributes and can be reliably discriminated.

      We thank Reviewer 2 for their notes on our manuscript. The responses below address the reviewer’s comments and recommendations for revised work.

      Weaknesses:

      Comment 1

      In my opinion, the data presented in the paper do not support the conclusions. The conclusions are based on a correlation between results obtained on the mock finger and results obtained with human participants but there is no evidence that the human participants' fingertips will behave similarly to the mock finger during the experiment. Figure 3 gives a hint that the 3 sliding behaviors can be observed in a real finger, but does not prove that the human finger will behave as the mock finger, i.e., there is no evidence that the phase maps in Figure 1C are similar for human fingers and across different people that can have very different stiffness and moisture levels.

      We have made changes throughout the manuscript to acknowledge that our findings are correlative, clarifying this throughout, and incorporating into the discussion how our work may enable biomechanical measurements and tactile decision making models.

      The mechanical characterization conducted with the mock finger seeks to extract significant features of friction traces of a set of surfaces to use as predictors of tactile discriminability. The goal is to find a consistent method to characterize surfaces for use in tactile experiments that can be replicated by others and used prior to any human experiments. However, in the overall response and in a response to a similar comment by Reviewer 1 (recreated below), we also explain why we believe experiments on humans to establish this fact is not yet reasonable.

      First, we discuss the mock finger. The PDMS finger is treated to have comparable surface and bulk properties to a human finger. We have approximated the softness of the finger with 100 kPa crosslinked PDMS, which is close to what has been reported for the bulk of a human fingertip.[9,10] However, as mentioned in the Materials and Methods, there are two additional features of the mock finger that impart greater strength. The PDMS surrounds a rigid, acrylic bone comparable to the distal phalanx, which provides an additional layer of higher modulus.[8] Additionally, the 8-hour UV-Ozone treatment decreases the viscoelastic tack of the pristine PDMS by glassifying, or further crosslinking the surface of the finger,[12] therefore imparting greater stiffness at the surface similar to the contributions of the stratum corneum, along with a similar surface energy.[13] Additionally, the finger is used at least a day after UV-Ozone treatment is completed in order for the surface to return to moderate hydrophilicity, similar to the outermost layer of human skin.[17] We also discuss the shape of the contact formed. To ensure that there is minimal contribution from the slanted position of the finger, an initial contact area of 1×1 cm is established before sliding and recording friction measurements. As the PDMS finger is a soft object, the portion in contact with a surface flattens and the contact area remains largely unchanged during sliding. Any additional stick-slip after this alignment step is caused by contact aging at the interface, but the first trace we collect is always discarded to only consider stick-slip events caused by surface chemistry. We recognize that it is difficult to completely control the pressure distribution due to the planar interface, but this is also expected when humans freely explore a surface. Finally, we consider flat vs. fingerprinted fingers. Our previous work on the role of fingerprints on friction experienced by a PDMS mock finger showed enhanced signals with the incorporation of ridges on the finger and used a rate-and-state model of a heterogenous, elastic body to find corresponding trends.[11] The key conclusion was that a flat finger still preserved key dynamic features, and the presence of stronger or more vibrations could result in more similar forces for different surfaces depending on the sliding conditions. We note that we have subsequently used this flat mock finger in correlations with human psychophysics in previous work, where findings from our mechanical experiments were predictive of human performance.[4–7] We have added these details to the manuscript.

      With this adequately similar mock finger, we collected friction traces at controlled conditions of normal force and velocity in order to extract the signals unique to each material that are not caused by the influence of human variability. For example, we observe the smallest regions of steady sliding on our phase maps (Fig. 1C) for short-chain alkylsilanes C4 and C5, while the increased intermolecular forces of other silanes increase the incidence of steady sliding. We have also previously shown that comparisons of similarly collected mechanical data is predictive of human performance, using the crosscorrelations between signals of two different materials.[4–7] While different participants produce different raw signals, we see that broad categories of stick-slip, i.e. instabilities, can be extracted (Fig. 3B-C) and used as a cue in a tactile discrimination task. As mentioned above, we have provided an additional section about the usefulness of our mock finger, as well as its structure, in the main manuscript.

      Second, we lay out the challenges and barriers to demonstrating this in humans in the manner requested by the reviewer, and how the scope of this paper will lead us in that direction. We also clarify that our goals are to provide a method to characterize samples to better design tactile interfaces in haptics or in psychophysical experiments and raise awareness that the common methods of sample characterization in touch by an average friction coefficient or roughness is fundamentally unsound.

      As discussed in the summary, and with additional detail here, to further support our findings through observation on humans would require answering:

      (1) Which one, or combination of, of the multiple swipes that people make responsible for a tactile decision?

      (2) Establish what is, or may be, tactile evidence.

      (3) Establish tactile decision-making models are similar or different than existing decision-making models.

      (4) Test the hypothesis, in these models, that friction instabilities are evidence, and not some other unknown metric.

      (5) Design a task that does not require the use of subjective tactile descriptors, like “which one feels rougher”, which we see cause confusion in participants, which will likely require accounting for memory effects.

      We elaborate these points below:

      To successfully perform this experiment, we note that freely exploring humans make multiple strokes on a surface. Therefore, we would need to construct a decision-making model. It has not yet been demonstrated whether tactile decision making follows visual decision making, but perhaps to start, we can assume it does. Then, in the design of our decision-making paradigm, we immediately run into the problem: What is tactile evidence?

      From Fig. 3C, we already can see that identifying evidence is challenging. Prior to this manuscript, people may have chosen the average force, or the highest force. Or we may choose the average friction force. Then, after deciding on the evidence, we need to find a method to manipulate the evidence, i.e., create samples or a machine that causes high friction, etc. We show that during the course of human touch, due to the dynamic nature of friction, the average can change a large amount and sample design becomes a central barrier to experiments. Others may suggest immobilizing the finger and applying a known force, but given how much friction changes with human exploration, there is no known method to make a machine recreate temporally and spatially varying friction forces during sliding onto a stationary finger. Finally, perhaps most importantly, in addition to mechanical challenges, a study by Liu, Colgate, et al. showed that even if they recorded the friction (2D) of a finger exploring a surface and then replicated the same friction forces onto a finger, the participant could not determine which surface the replayed friction force was supposed to represent.[1] This supports that the efference copy is important, that the forces in response to expected motion are important to determine friction. Finally, there is no known method to design instabilities a priori. They must be found through experiments, especially since if we were to introduce, say a bump or a trough, then we bring in confounding variables to how participants tell surfaces apart.

      Furthermore, even if we had some consistent method to create tactile “evidence”, the paradigm also deserves some consideration. In our experience, the 3-AFC task we perform is important because the vocabulary for touch has not been established. That is, in 3-AFC, by asking to determine which one sample is unlike the others, we do not have to ask the participant questions like “which one is rougher” or “which one has less friction”. In contrast, 2-AFC, which is better for decision-making models because it does not include memory, requires the asking of a perceptual question like: “which one is rougher?”. In our ongoing work, taking two silane coatings, we found that participants could easily identify which surface is unlike the others above chance in a 3-AFC, but participants, even within their own trials, could not consistently identify one silane as perceptually “rougher” by 2-AFC. To us, this calls into question the validity of tactile descriptors, but is beyond the scope of the current manuscript.

      This is not our only goal, but in the context of human exploration, in this manuscript here, we believed it was important to identify a mechanical parameter that was consistent with how humans explore surfaces, but was also a parameter that could characterize to some consistent property of a surface – irrespective of whether a human was touching it. We thought that designing human decision-making models and paradigms around the friction coefficient would not be successful.

      Given the scope of these challenges, we do not think it would be possible to establish this conceptual sequence in a single manuscript.

      See Reviewer 1, comment 3part 3 for changes to the manuscript

      Comment 2

      I believe that the authors collected the contact forces during the psychophysics experiments, so this shortcoming could be solved if the authors use the actual data, and show that the participant responses can be better predicted by the occurrence of frictional instabilities than by the usual metrics on a trial by trial basis, or at least on a subject by subject basis. I.e. Poor performers should show fewer signs of differences in the sliding behaviors than good performers.

      To fully implement this, a decision-making model is necessary because, as a counter example, a participant could have generated 10 swipes of SFW and 1 swipe of a Sp, but the Sp may have been the most important event for making a tactile decision. This type of scenario is not compatible with the analysis suggested — and similar counterpoints can be made for other types of seemingly straightforward analysis.

      While we are interested and actively working on this, the study here is critical to establish types of evidence for a future decision-making model. We know humans change their friction constantly during real exploration, so it is unclear which of these constantly changing values we should input into the decision making model, and the future challenges we anticipate are explained in Weaknesses, Comment 1.

      Comment 3

      The sample size (10) is very small.

      We recognize that, with all factors being equal, this sample size is on the smaller end. However, we emphasize the degree of control of samples is far above typical, with minimal variations in sample properties such as surface roughness, and every sample for every trial was pristine. Furthermore, the sample preparation (> 300 individual wafers were used) became a factor. Although not typically appropriate, and thus not included in the manuscript, a post-hoc power analysis for our 100 trials of our pair that was closest to chance, P4, (53%, closest to chance at 33%) showed a power of 98.2%, suggesting that the study was appropriately powered.

      Reviewer 2 (Recommendations for the authors):

      Comment 1

      Differences in SS and Sp (Table 2) are NOT physical or mechanical differences but are obtained by counting differences in the number of occurrences of each sliding behavior. It is rather a weird choice.

      We disagree that differences in SS and Sp are not physical or mechanical, as these are well-established phenomena in the soft matter and tribology literature.[20–22] These are known as “mechanical instabilities” and generated due to the effects of two physical phenomena: the elasticity of the finger (which is constant in our mechanical testing) and the friction forces present (which change per sample type). The motivation behind using these different shapes is that the instabilities, in some conditions, can be invariant to external factors like velocity. This would be quite advantageous for human exploration because, unlike friction coefficient, which changes with nearly any factor, including velocity and mass, the instabilities being invariant to velocity would mean that we are accurately characterizing a unique identifier of the surface even though velocity may be variable.

      This “weird choice” is the central innovation of this paper. This choice was necessary because we demonstrated that the common usage of friction coefficient is fundamentally flawed: we see that friction coefficient suggests that surface which are more different would feel more similar – indeed the most distinctive surfaces would be two surfaces that are identical, which is clearly spurious. Furthermore, Table 1 now includes the range of friction generated on a surface, the range of friction coefficients of a single surface is large – of order the differences in friction between two surfaces. This is expected in soft sliding systems and emphasizes our issue with the use of average friction coefficient in psychophysical design. One potential explanation for why we were able to see this is effect is because our surfaces have similar (< 0.6 nm variability) roughness, removing potential confounding factors from large scale roughness, and this type of low roughness control has not been widely used in tactile studies to the best of our knowledge.

      Comment 2

      Figures 2B-C: why are the x-data different than Table 2?

      The x-data in Fig. 2B-C are the absolute differences in the number of occurrences measured for a given instability type or material property out of 144 pulls. Modeling the human participant results in our GLMMs required the independent variables to be in this form rather than percentages. We initially chose to list percent differences in Table 2 to highlight the ranges of differences instead of an absolute value, but have added both for clarity.

      Our changes to the manuscript

      Page 7

      “To determine if humans can detect these three different instabilities, we selected six pairs of surfaces to create a broad range of potential instabilities present across all three types. These are summarized in Table 2, where the first column for each instability is the difference in occurrence of that instability formed between each pair, and the second is the percent difference.”

      “Thus, when comparing C4 versus C4-APTMS, they have a difference in steady sliding of 20 out of a maximum 144 pulls, for a |ΔSS| of 13.9%. The absolute value is taken to compare total differences present, as the psychophysical task does not distinguish between sample order.”

      Comment 3

      We constructed a set of coated surfaces with physical differences which were imperceptible by touch but created different types of instabilities based on how quickly a finger is slid and how hard a human finger is pressed during sliding." Yet, in your experiment, participants could discriminate them, so this is incoherent.

      To clarify the point, macroscopic objects can differ in physical shape and in chemical composition. What we meant was that the physical differences, i.e., roughness, were below a limit (Skedung et al.) that participants, without a coating, would not be able to tell these apart.[23] Therefore, the reason people could tell our surfaces apart was due to the chemical composition of the surface, and not any differences in roughness or physical effects like film stiffness (due to the molecular-scale thinness of the surface coatings, they are mechanically negligible). However, we concede that at the molecular scale, the traditional macroscopic distinction between physical and chemical is blurred.

      We have made minor revisions to the wording in the abstract. We clarify that the surface coatings had physical differences in roughness that were smaller than 0.6 nm, which based purely on roughness, would not be expected to be distinguishable to participants. Therefore, the reason participants can tell these surfaces apart is due to differences in friction generated by chemical composition, and we were able to minimize contributions from physical differences in the sample our study.

      Our changes to the manuscript

      Page 1, Abstract

      “Here, we constructed a set of coated surfaces with minimal physical differences that by themselves, are not perceptible to people, but instead, due to modification in surface chemistry, the surfaces created different types of instabilities based on how quickly a finger is slid and how hard a human finger is pressed during sliding.”

      “In one experiment, we used a mechanical mock finger to quantify and classify differences in instability formation from different coated surfaces. In a second experiment, participants perform a discrimination task using the same coated surfaces. Using the data from these two experiments, we found that human discrimination response times were faster with surfaces where the mock finger produced more stiction spikes and discrimination accuracy was higher where the mock finger produced more steady sliding. Conversely, traditional metrics like surface roughness or average friction coefficient did not relate to tactile discriminability. In fact, the typical method of averaging friction coefficients led to a spurious correlation which erroneously suggests that distinct objects should feel identical and identical objects should feel distinct—similar to findings by others. Friction instabilities may offer a more predictive and tractable framework of fine touch perception than friction coefficients, which would accelerate the design of tactile interfaces.”

      Reviewer 3 (Public review):

      Strengths

      The paper describes a new perspective on friction perception, with the hypothesis that humans are sensitive to the instabilities of the surface rather than the coefficient of friction. The paper is very well written and with a comprehensive literature survey.

      One of the central tools used by the author to characterize the frictional behavior is the frictional instabilities maps. With these maps, it becomes clear that two different surfaces can have both similar and different behavior depending on the normal force and the speed of exploration. It puts forward that friction is a complicated phenomenon, especially for soft materials.

      The psychophysics study is centered around an odd-one-out protocol, which has the advantage of avoiding any external reference to what would mean friction or texture for example. The comparisons are made only based on the texture being similar or not.

      The results show a significant relationship between the distance between frictional maps and the success rate in discriminating two kinds of surface.

      We thank Reviewer 3 for their notes and interesting discussion points on our manuscript. Below, we address the reviewer’s feedback and comments on related works.

      Weaknesses:

      Comment 1

      The main weakness of the paper comes from the fact that the frictional maps and the extensive psychophysics study are not made at the same time, nor with the same finger. The frictional maps are produced with an artificial finger made out of PDMS which is a poor substitute for the complex tribological properties of skin.

      A similar comment was made by Reviewers 1 and 2. We agree in part and have made changes throughout that our study is correlative, but presents an important step forward to these biomechanical measurements and corresponding decision making models.

      We are not claiming that our PDMS fingers are superior to real fingers, but rather, we cannot establish standards in the field by using real human fingers that vary between subjects and researchers. We believe the mock finger we designed is a reasonable mimic of the human finger by matching surface energy, heterogeneous mechanical structure, and the ability to test multiple physiologically relevant pressures and sliding velocities.

      We achieve a heterogeneous mechanical structure with the 3 primary components of stiffness of a human finger. The effective modulus of ~100 kPa, from soft tissue,[9,10] is obtained with a 30:1 ratio of PDMS to crosslinker. The PDMS also surrounds a rigid, acrylic bone comparable to the distal phalanx, which provides an additional layer of higher modulus.[8] Additionally, the 8-hour UV-Ozone treatment decreases the viscoelastic tack of the pristine PDMS by glassifying, or further crosslinking the surface of the finger,[12] therefore imparting greater stiffness at the surface similar to the contributions of the stratum corneum, along with a similar surface energy.[13] The finger is used at least a day after UV-Ozone treatment is completed in order for the surface to return to moderate hydrophilicity, similar to the outermost layer of human skin.[17] We also discuss the shape of the contact formed. To ensure that there is minimal contribution from the slanted position of the finger, an initial contact area of 1×1 cm is established before sliding and recording friction measurements. As the PDMS finger is a soft object, the portion in contact with a surface flattens and the contact area remains largely unchanged during sliding. We recognize that it is difficult to completely control the pressure distribution due to the planar interface, but this variation is also expected when humans freely explore a surface. Finally, we consider flat vs. fingerprinted fingers. Our previous work on the role of fingerprints on friction experienced by a PDMS mock finger showed enhanced signals with the incorporation of ridges on the finger and used a rate-andstate model of a heterogenous, elastic body to find corresponding trends.[11] The key conclusion was that a flat finger still preserved key dynamic features, and the presence of stronger or more vibrations could result in more similar forces for different surfaces depending on the sliding conditions. We note that we have subsequently used the controlled mechanical data collected with this flat mock finger in correlations with human psychophysics in previous work, where findings from our mechanical experiments were predictive of human performance.[4–7] Ultimately, we see from our prior work and here that, despite the drawbacks of our mock finger, it outperforms other standard characterization technique in providing information about the mesoscale that correlates to tactile perception. We have added these details to the manuscript.

      We also note that an intermediate option, replicating real fingers, even in a mold, may also inadvertently limit trends from characterization to a specific finger. One of the main – and severe – limitations of using a human finger is that all fingers are different, meaning any study focusing on a particular user may not apply to others or be recreated easily by other researchers. We cannot set a standard for replication around a real human finger as that participant may no longer be available, or willing to travel the world as a “standard”. Furthermore, the method in which a single person changes their pressures and velocities as they touch a surface is highly variable. We also note that in the Summary Response, we noted that a study by Colgate et al. (IEEE ToH 2024) demonstrated that efference copies may be important, and thus constraining a human finger and replaying the forces recorded during free exploration will not lead to the participant identifying a surface with any consistency. Thus, it is important to allow humans to freely explore surfaces, but creates nearly limitless variability in friction forces.

      This is also against the backdrop that we are seeking to provide a method to characterize surfaces. Indeed, the more features we replicate in the mock finger to a human finger, the more likely it is that the mechanical data will correlate to human performance. However, we have used this technique several times to achieve stronger correlations to human data than other available techniques. We believe the metric of success should be in comparison to the available characterization technique, rather than a 1:1 reconstruction of forces of an arbitrary human finger. Indeed, a 1:1 reconstruction of forces of an arbitrary human finger would be limited to the finger of a single individual, perhaps even to that individual on a given day.

      See Reviewer1 weaknesses, comment 2 part 2 for changes to the manuscript

      Comment 2

      The evidence would have been much stronger if the measurement of the interaction was done during the psychophysical experiment. In addition, because of the protocol, the correlation is based on aggregates rather than on individual interactions.

      We agree that this would have helped further establish our argument, but in the overall statement and in other reviewer responses, we describe the significant challenges to establishing this.

      To fully implement this, a decision-making model is necessary because, as a counter example, a participant could have generated 10 swipes of SFW and 1 swipe of a Sp, but the Sp may have been the most important event for making a tactile decision. We also clarify that our goals are to provide a method to characterize samples to better design tactile interfaces in haptics or in psychophysical experiments.

      As discussed in the summary, and expanded on here, in our view, to develop a decision-making model, the challenges are as follows:

      (1) Which one, or combination of, of the multiple swipes that people make responsible for a tactile decision?

      (2) Establish what is, or may be, tactile evidence.

      (3) Establish tactile decision-making models are similar or different than existing decision-making models.

      (4) Test the hypothesis, in these models, that friction instabilities are evidence, and not some other unknown metric.

      (5) Design a task that does not require the use of subjective tactile descriptors, like “which one feels rougher”, which we see cause confusion in participants, which will likely require accounting for memory effects.

      (6) Design samples that vary in the amount of evidence generated, but this evidence cannot be controlled directly. Rather, the samples indirectly vary evidence by how likely it is for a human to generate different types of friction instabilities during standard exploration.

      We elaborate these points below:

      To successfully perform this experiment, we note that freely exploring humans make multiple strokes on a surface. Therefore, we would need to construct a decision-making model. It has not yet been demonstrated whether tactile decision making follows visual decision making, but perhaps to start, we can assume it does. Then, in the design of our decision-making paradigm, we immediately run into the problem: What is tactile evidence?

      From Fig. 3C, we already can see that identifying evidence is challenging. Prior to this manuscript, people may have chosen the average force, or the highest force. Or we may choose the average friction force. Then, after deciding on the evidence, we need to find a method to manipulate the evidence, i.e., create samples or a machine that causes high friction, etc. We show that during the course of human touch, due to the dynamic nature of friction, the average can change a large amount and sample design becomes a central barrier to experiments. Others may suggest to immobilize the finger and applying a known force, but given how much friction changes with human exploration, there is no known method to make a machine recreate temporally and spatially varying friction forces during sliding onto a stationary finger. Finally, perhaps most importantly, in addition to mechanical challenges, a study by Liu, Colgate et al. showed that even if they recorded the friction (2D) of a finger exploring a surface and then replicated the same friction forces onto a finger, the participant could not determine which surface the replayed friction force was supposed to represent.[1] This supports that the efference copy is important, that the forces in response to expected motion are important to determine friction. Finally, there is no known method to design instabilities a priori. They must be found through experiments, especially since if we were to introduce, say a bump or a trough, then we bring in confounding variables to how participants tell surfaces apart.

      Furthermore, even if we had some consistent method to create tactile “evidence”, the paradigm also deserves some consideration. In our experience, the 3-AFC task we perform is important because the vocabulary for touch has not been established. That is, in 3-AFC, by asking to determine which one sample is unlike the others, we do not have to ask the participant questions like “which one is rougher” or “which one has less friction”. In contrast, 2-AFC, which is better for decision-making models because it does not include memory, requires the asking of a perceptual question like: “which one is rougher?”. In our ongoing work, taking two silane coatings, we found that participants could easily identify which surface is unlike the others above chance in a 3-AFC, but participants, even within their own trials, could not consistently identify one silane as perceptually “rougher” by 2-AFC. To us, this calls into question the validity of tactile descriptors, but is beyond the scope of the current manuscript.

      This is not our only goal, but in the context of human exploration, in this manuscript here, we believed it was important to identify a mechanical parameter that was consistent with how humans explore surfaces, but was also a parameter that could characterize to some consistent property of a surface – irrespective of whether a human was touching it. We thought that designing human decision-making models and paradigms around the friction coefficient would not be successful.

      Given the scope of these challenges, we do not think it would be possible to establish this conceptual sequence in a single manuscript.

      Comment 3

      The authors compensate with a third experiment where they used a 2AFC protocol and an online force measurement. But the results of this third study, fail to convince the relation.

      With this experiment, our central goal was to demonstrate that the instabilities we have identified with the PDMS finger also occur with a human finger. Several instances of SS, Sp, and SFW were recorded with this setup as a participant touched surfaces in real time.

      Comment 4

      No map of the real finger interaction is shown, bringing doubt to the validity of the frictional map for something as variable as human fingers.

      Real fingers change constantly during exploration, and friction is state-dependent, meaning that the friction will depend on how the person was moving the moment prior. Therefore, a map is only valid for a single human movement – even if participants all were instructed to take a single swipe and start from zero motion, humans are unable to maintain constant velocities and pressures. Clearly, this is not sustainable for any analysis, and these drawbacks apply to any measured parameter, whether instabilities suggested here, or friction coefficients used throughout. We believe the difficulty of this approach emphasizes why a standard map of characterization of a surface by a mock finger, even with its drawbacks, is a viable path forward.

      Reviewer 3 (Recommendations for the authors):

      Comment 1

      It would be interesting to comment on a potential connection between the frictional instability maps and Schalamack waves.

      Schallamach waves are a subset of slow frictional waves (SFW). Schallamach waves are very specifically defined in the field. They occur when pockets of air that form between a soft sliding object and rigid surface which then propagate rear-to-front (retrograde waves) relative to motion of the sliding motion and form buckles due to adhesive pinning. Wrinkles then form at the detached portion of the soft material, until the interface reattaches and the process repeats.[24] There is typically a high burden of proof to establish a Schallamach wave over a more general slow frictional wave. We note that it would be exceedingly difficult to design samples that can reliably create subsets of SFW, but we are aware that this may be an interesting question at a future point in our work.

      Comment 2

      The force sensors look very compliant, and given the dynamic nature of the signal, it is important to characterize the frequency response of the system to make sure that the fluctuations are not amplified.

      Thank you for noticing. We mistyped the sensor spring constant as 13.9 N m<sup>-1</sup> instead of kN m<sup>-1</sup>. However, below we show how the instabilities are derived from the mechanics at the interface due to the compliance of the finger. The “springs” of the force sensor and PDMS finger are connected in parallel. Since k<sub>sensor</sub> = 13.9 kN m<sup>-1</sup>, the spring constant of the system overall reflects the compliance of the finger, and highlights the oscillations arising solely from stick-slip. A sample calculation is shown below.

      Author response image 1.

      Fitting a line to the initial slope of the force trace for C6 gives the equation y = 25.679x – 0.2149. The slope here represents force data over time data, and is divided by the velocity (25 mm/s) to determine the spring constant of the system k<sub>total</sub> == 1027.16 N/m. This value is lower than k<sub>sensor</sub> = 13.9 kN/m, indicating that the “springs” representing the force sensor and PDMS finger are connected in parallel:

      . The finger is the compliant component of the system, with k<sub>finger</sub> = 1.11 kN/m, and of course, real human fingers are also compliant so this matches our goals with the design of the mock finger.

      Our changes to the manuscript

      (Page 4) (k = 13.9 kN m<sup>1</sup>)

      Comment 3

      The authors should discuss about the stochastic nature of friction: - Wiertlewski, Hudin, Hayward, IEEE WHC 2011 Greenspon, McLellan, Lieber, Bensmaia, JRSI 2020.

      We believe that, given the references, this comment on “stochastic” refers to the macroscopically-observable fluctuations (i.e., the mechanical “noise” which is not due to instrument noise) in friction arising from the discordant network of stick-slip phenomena occurring throughout the contact zone, and not the stochastic nature of nanoscale friction that occurs thermal fluctuations nor due to statistical distributions in bond breaking associated with soft contact.

      We first note that our small-scale fluctuations do not arise from a periodic surface texture that dominates in the frequency regime. However, even on our comparatively smooth surfaces, we do expect fluctuations due to nanoscale variation in contact, generation of stick-slip across at microscale length scales that occur either concurrently or discordantly across the contact zone, and the nonlinear dependence of friction to nearly any variation in state and composition.[11]

      Perhaps the most relevant to the manuscript is that a major advantage of analysis by friction is that it sidesteps these ever-present microscale fluctuations, leading to more clearly defined classifiers or categories during analysis. Wiertlewski et. al. showed repeated measurements in their systems ultimately gave rise to consistent frequencies[25] (we think their system was in a steady sliding regime and the patterning gave rise to underlying macroscopic waves). These consistent frequencies, at least in soft systems and absent obvious macroscopic patterned features, would be expected to arise from the instability categories and we see them throughout.

      Comment 4

      It is stated that "we observed a spurious, negative correlation between friction coefficient and accuracy".

      What makes you qualify that correlation as spurious?

      We mean this as in the statistical definition of “spurious”.

      This correlation would indicate that by the metric of friction coefficient, more different surfaces are perceived more similarly. Thus, two very different surfaces, like Teflon and sandpaper, by friction coefficient would be expected to feel very similar. Two nearly identical surfaces would be expected to feel very different – but of course, humans cannot consistently distinguish two identical surfaces. This finding is counterintuitive and refutes that friction coefficient is a reliable classifier of surfaces by touch. We do not think it is productive to determine a mechanism for a spurious correlation, but perhaps one reason we were able to observe this is because our study, to the best of our knowledge, is unique for having samples that are controlled in their physical differences in roughness and surface features.

      See response to Reviewer 1 weaknesses, comment 1 for changes to the manuscript

      Comment 5

      The authors should comment on the influence of friction on perceptual invariance. Despite inducing radially different frictional behavior for various conditions, these surfaces are stably perceived. Maybe this is a sign that humans extract a different metric?

      We agree – we are excited that frictional instabilities may offer a more stable perceptual cue because they are not prone to fluctuations (as discussed in Comment 3) and instability formation, in many conditions, is invariant to applied pressures and velocities – thus forming large zones where a human may reasonable encounter a given instability.

      Raw friction is highly prone to variation during human exploration (in alignment with Recommendations for the authors, Comment 3), but ongoing work seeks to explain tactile constancy, or the ability to identify objects despite these large changes in force. Very recently published work by Fehlberg et. al. identified the role of modulating finger speed and normal force in amplifying the differences in friction coefficient between materials in order to identify them,[26] and we postulate that their work may be streamlined and consistent with the idea of friction instabilities, though we have not had a chance to discuss this in-depth with the authors yet.

      We think that the instability maps show a viable path forward to how surfaces are stably perceived, and instabilities themselves show a potential mechanism: mathematically, instabilities for given conditions can be invariant to velocity or mass, creating zones where a certain instability is encountered. This reduces the immense variability of friction to a smaller, more stable classification of surfaces (e.g., a 30% SS surface or a 60% SS surface). A given surface will typically produce the same instability at a specific condition (we found some boundaries of experimental parameters are very condition sensitive, but many conditions are not), whereas a single friction trace which is highly prone to variation is not a stable metric.

      Added Reference

      (53) M. Fehlberg, E. Monfort, S. Saikumar, K. Drewing and R. Bennewitz, IEEE Trans. Haptics, 2024, 17, 957–963.

      References

      (1) Liu, Z., Kim, J.-T., Rogers, J. A., Klatzky, R. L. & Colgate, J. E. Realism of Tactile Texture Playback: A Combination of Stretch and Vibration. IEEE Trans. Haptics 17, 441–450 (2024).

      (2) Waters, I., Alazmani, A. & Culmer, P. Engineering Incipient Slip Into Surgical Graspers to Enhance Grasp Performance. IEEE Transactions on Medical Robotics and Bionics 2, 541–544 (2020).

      (3) Gueorguiev, D., Bochereau, S., Mouraux, A., Hayward, V. & Thonnard, J.-L. Touch uses frictional cues to discriminate flat materials. Sci Rep 6, 25553 (2016).

      (4) Carpenter, C. W. et al. Human ability to discriminate surface chemistry by touch. Mater. Horiz. 5, 70– 77 (2018).

      (5) Nolin, A. et al. Predicting human touch sensitivity to single atom substitutions in surface monolayers for molecular control in tactile interfaces. Soft Matter 17, 5050–5060 (2021).

      (6) Nolin, A. et al. Controlling fine touch sensations with polymer tacticity and crystallinity. Soft Matter 18, 3928–3940 (2022).

      (7) Swain, Z. et al. Self-Assembled Thin Films as Alternative Surface Textures in Assistive Aids with Users Who are Blind. J. Mater. Chem. B (2024) doi:10.1039/D4TB01646G.

      (8) Qian, K. et al. Mechanical properties vary for different regions of the finger extensor apparatus. J Biomech 47, 3094–3099 (2014).

      (9) Abdouni, A. et al. Biophysical properties of the human finger for touch comprehension: influences of ageing and gender. Royal Society Open Science (2017) doi:10.1098/rsos.170321.

      (10) Cornuault, P.-H., Carpentier, L., Bueno, M.-A., Cote, J.-M. & Monteil, G. Influence of physicochemical, mechanical and morphological fingerpad properties on the frictional distinction of sticky/slippery surfaces. Journal of The Royal Society Interface (2015) doi:10.1098/rsif.2015.0495.

      (11) Dhong, C. et al. Role of fingerprint-inspired relief structures in elastomeric slabs for detecting frictional differences arising from surface monolayers. Soft Matter 14, 7483–7491 (2018).

      (12) Fu, Y.-J. et al. Effect of UV-Ozone Treatment on Poly(dimethylsiloxane) Membranes: Surface Characterization and Gas Separation Performance. Langmuir 26, 4392–4399 (2010).

      (13) Yuan, Y. & Verma, R. Measuring microelastic properties of stratum corneum. Colloids Surf B Biointerfaces 48, 6–12 (2006).

      (14) Yu, G. et al. A wearable pressure sensor based on ultra-violet/ozone microstructured carbon nanotube/polydimethylsiloxane arrays for electronic skins. Nanotechnology 29, 115502 (2018).

      (15) Zheng, L. et al. Dual-Stimulus Smart Actuator and Robot Hand Based on a Vapor-Responsive PDMS Film and Triboelectric Nanogenerator. ACS Appl. Mater. Interfaces 11, 42504–42511 (2019).

      (16) Ma, K., Rivera, J., Hirasaki, G. J. & Biswal, S. L. Wettability control and patterning of PDMS using UV–ozone and water immersion. Journal of Colloid and Interface Science 363, 371–378 (2011).

      (17) Mavon, A. et al. Sebum and stratum corneum lipids increase human skin surface free energy as determined from contact angle measurements: A study on two anatomical sites. Colloids and Surfaces B: Biointerfaces 8, 147–155 (1997).

      (18) AliAbbasi, E. et al. Effect of Finger Moisture on Tactile Perception of Electroadhesion. IEEE Trans. Haptics 17, 841–849 (2024).

      (19) Corniani, G. et al. Sub-surface deformation of individual fingerprint ridges during tactile interactions.

      eLife 13, (2024).

      (20) Israelachvili, J. N. Intermolecular and Surface Forces. (Academic Press, 2011).

      (21) Das, S. et al. Stick–slip friction of gecko-mimetic flaps on smooth and rough surfaces. J R Soc Interface 12, 20141346 (2015).

      (22) Persson, B. N. J., Albohr, O., Creton, C. & Peveri, V. Contact area between a viscoelastic solid and a hard, randomly rough, substrate. The Journal of Chemical Physics 120, 8779–8793 (2004).

      (23) Skedung, L. et al. Feeling Small: Exploring the Tactile Perception Limits. Sci Rep 3, 2617 (2013).

      (24) Viswanathan, K., Sundaram, N. K. & Chandrasekar, S. Stick-slip at soft adhesive interfaces mediated by slow frictional waves. Soft Matter 12, 5265–5275 (2016).

      (25) Wiertlewski, M., Hudin, C. & Hayward, V. On the 1/f noise and non-integer harmonic decay of the interaction of a finger sliding on flat and sinusoidal surfaces. in 2011 IEEE World Haptics Conference 25–30 (2011). doi:10.1109/WHC.2011.5945456.

      (26) Fehlberg, M., Monfort, E., Saikumar, S., Drewing, K. & Bennewitz, R. Perceptual Constancy in the Speed Dependence of Friction During Active Tactile Exploration. IEEE Transactions on Haptics 17, 957–963 (2024).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Introduction to the revised manuscript:

      We thank all three reviewers for their time and insightful comments on our original submission. We are submitting a substantially revised manuscript that includes several new experiments, analyses, discussion points, and clarifications that we believe address all of the main concerns of the reviewers.

      To address the request of Reviewers 2 and 3 to reinforce key findings in a more physiologically intact preparation, we performed recordings of YH-HET SST neurons in brain slices and found that these neurons show impairments in AP generation similar to those observed in YH-HET SST cultured neurons. These data are summarized in a new figure (Fig. 9). Along these lines, we performed additional recordings in cultured neurons at room temperature compared with physiological temperature and found that WT and YH-HET PV neuronal properties were similarly altered by temperature increases, suggesting that our YH variant-induced neuronal phenotypes are not temperature dependent. These data are shown in a new supplemental figure (Supplemental Fig. 4-3). To address concerns of Reviewer 1 regarding our KNa and NaP current recordings, we performed new experiments to further assess the specificity of the VU170 blocker in KNa KO neurons (summarized in Supplemental Fig. 5-2) and to better characterize the time course over which TTX blocks the persistent Na+ current and the KNa current (summarized in Supplemental Fig. 7-1). These latter two experiments provide further clarity and confidence in the accuracy of our measurements of both KNa and NaP currents. Lastly, to address the concern of Reviewer 3 regarding statistical analyses of the modeling data, we’ve added a new table with the results of a repeated measures ANOVA analysis (Supplemental Table 6), and two new figures illustrating the relative changes in each neuron group compared to their controls (Supplemental Figures 6-2 and 7-2). 

      In addition to the new experiments and analyses, we’ve added three new paragraphs to the Discussion section. As the hyperexcitability phenotype in YH-HET PV neurons is somewhat unexpected, we’ve added a paragraph comparing our findings with those found in PV neurons in another KCNT1 GOF model. We’ve also added a paragraph to speculate on the contribution of YH-HET variant-induced alterations in SST and PV neurons to network behavior and seizure propensity. Lastly, we’ve added a paragraph to include the additional limitations and caveats of our study requested by the reviewers.  

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript reports the effects of a heterozygous mutation in the KCNT1 potassium channels on the properties of ion currents and the firing behavior of excitatory and inhibitory neurons in the cortex of mice expressing KCNT1-Y777H. In humans, this mutation as well as multiple other heterozygotic mutations produce very severe early-onset seizures and produce a major disruption of all intellectual function. In contrast, in mice, this heterozygous mutation appears to have no behavioral phenotype or any increased propensity to seizures.

      Regarding the last sentence above, we wanted to clarify a point that we neglected to emphasize in the initial submission. In the Results section from our previous paper (Shore et al., 2020), we failed to observe seizures in 14 heterozygous mice, whereas 23/25 homozygous mice showed seizures by video-EEG. However, in the fifth paragraph of the Discussion section from that paper, we further stated that “during the preparation and review of [that] article, we observed seizures in two Kcnt1-Y777H heterozygous mice, one during a widefield Ca2+ imaging experiment and the other during a video-EEG experiment”. Thus, we concluded that “heterozygous expression can result in seizures in a rodent model, but apparently at a much lower frequency than that observed with homozygous expression”. To emphasize these findings, we’ve added a sentence to the Introduction in this manuscript about the occurrence of infrequent seizures in Kcnt1-Y777H heterozygous mice, along with a reference to the Discussion of our previous paper.

      A relevant phenotype is, however, evident in mice with the homozygous mutation, and the authors have previously published the results of similar experiments with the homozygotes. As perhaps expected, the neuronal effects of the heterozygous mutation presented in this manuscript are generally similar but markedly smaller than the previously published findings on homozygotes. There are, however, some interesting differences, particularly on PV+ interneurons, which appear to be more excitable than wild type in the heterozygotes but more excitable in the heterozygotes. This raises the interesting question (which could be more explicitly discussed by the authors) as to whether the reported changes represent homeostatic events that suppress the seizure phenotype in the mouse heterozygotes or simply changes in excitability that do not reach the threshold for behavioral outcomes.

      That is an interesting question. We have added a new paragraph to the Discussion speculating about whether the alterations in SST and PV excitability suppress seizures or do not reach the threshold for behavioral outcomes. This seems to be requested by the second reviewer as well in Weaknesses point #2.

      Strengths and Weaknesses:

      (1) The authors find that the heterozygous mutation in PV+ interneurons increases their excitability, a result that is opposite from their previous observation in neurons with the corresponding homozygous mutation.

      We would like to provide a minor clarification to the above statement that, in this manuscript, we show that “the heterozygous mutation in PV+ interneurons increases their excitability, a result that is opposite from their previous observation in neurons with the corresponding homozygous mutation”. In our previous manuscript, we assessed YH-HOM phenotypes in NFS and FS GABAergic neurons, but did not specifically mark PV neurons. Although the YH-HOM FS neurons showed an increase in rheobase and a decrease in AP firing, the magnitudes of these effects were far less than those observed in the NFS population. More importantly, the FS GABAergic population likely consists of PV- and SST-expressing neurons; thus, we can not directly compare the results from the NFS and FS groups to the PV and SST groups, respectively (please see our response to Weaknesses point #3, Reviewer #2). We apologize for the confusion.

      They propose that this results from the selective upregulation of a persistent sodium current INaP in the PV+ interneurons. While the observations are very interesting, there are three issues concerning this interpretation that should be addressed:

      A) The protocol for measuring the INaP current could potentially lead to results that could be (mis)interpreted in different ways in different cells. First, neither K currents nor Ca currents are blocked in these experiments. Instead, TTX is applied to the cells relatively rapidly (within 1 second) and the ramp protocol is applied immediately thereafter. It is stated that, at this time, Na currents and INaP are fully blocked but that any effects on Na-activated K currents are minimal. In theory, this would allow the pre- to post-difference current to represent a relatively uncontaminated INaP. This would, however, only work if activation of KNa currents following Na entry is very slow, taking many seconds. A good deal of literature has suggested that the kinetics of activation of KNa currents by Na influx vary substantially between cell types, such that single action potentials and single excitatory synaptic events rapidly evoke KNa currents in some cell types. This is, of course, much faster than the time of TTX application. Most importantly, the kinetics of KNa activation may be different in different neuronal types, which would lead to errors that could produce different estimates of INaP in PV+ interneurons vs other cell types.

      First, we’d like to point out that we did not want to block K+ currents (which would also block KNa) when measuring INaP for these experiments, because our hypothesis was that the increased KNa current in YH-HET PV neurons was somehow causing an increase in INaP, and it is possible that this increase depends on an intact KNa. Thus, we decided to use a method based on the observation in our experiments, and previously made by others (Budelli et al., 2009), that the reduction of outward current after TTX addition is slow relative to the rapid reduction in Na+ current. We understand and agree with the reviewer that, if KNa currents were blocked more quickly by TTX in some neuron types than others, then our estimate of INaP using this method would be contaminated in these neuron types, which would lead to inaccurate measurements. To assess this possibility among the main neuron types used in this study, we performed new experiments in which we monitored the time course of INaP block and subsequent IKNa loss following TTX application in PV and SST neurons during slow voltage ramps. We note that action potentials are not present in the slow voltage ramps due to inactivation of the transient Na+ current. These new experiments show that, in SST and PV (both WT and Het) neurons, the block of INaP is nearly complete at the 6s time point, whereas the decay in IKNa is far slower (V50 of ≈ 25s), and importantly, these results do not differ substantially by cell type or genotype. These data suggest that our measurements of INaP are not significantly contaminated by IKNa, and that this method allows for the effective separation of these two currents. These data have been added as a supplemental figure (Supplemental Fig. 7-1) and are briefly described and referenced in the Results section.

      B) As the authors recognize, INaP current provides a major source of cytoplasmic sodium ions for the activation. An expected outcome of increased INaP is, therefore, further activation of KNa currents, rather than a compensatory increase in an inward current that counteracts the increase in KNa currents, as is suggested in the discussion.

      We agree that the increase in INaP could theoretically further increase IKNa, as veratridine was previously shown to increase IKNa (Hage & Salkoff, 2012). However, we do not believe that this would necessarily be the case, because as the reviewer notes in their next comment, there is insufficient information on the relative locations of the INaP and KCNT1 channels, as well as the kinetics of sodium transfer to KCNT1 channels, and even less is known in the context of KCNT GOF neurons. Thus, there are a couple of plausible reasons that increased INaP may not alter KNa currents in YH-HET PV neurons: (1) In YH-HET PV neurons, the particular sodium channels that are responsible for the increased INaP may not be located within close proximity to the KCNT1 channels. (2) Homeostatic mechanisms that alter the AIS length, or move the AIS further from the soma, in response to altered neuronal excitability are well described (Grubb & Burrone, 2010; Kuba et al., 2010); thus, it is possible that in YH-HET PV neurons, the length or location of the AIS is altered, leading to uncoupling of the sodium channels that are responsible for the increased INaP to the KCNT1 channels.

      C) Numerical simulations, in general, provide a very useful way to evaluate the significance of experimental findings. Nevertheless, while the in-silico modeling suggests that increases in INaP can increase firing rate in models of PV+ neurons, there is as yet insufficient information on the relative locations of the INaP channels and the kinetics of sodium transfer to KNa channels to evaluate the validity of this specific model.

      We completely agree; thus, we have described each of these limitations in the Discussion. We state that the model neurons may “lack more detailed features of ion channels, such as post-translational modifications and subcellular localizations”, and that our KCNT1 model conductance is “hampered by an incomplete understanding of the relationship between Na+ influx, membrane voltage, and channel gating in neurons”.  

      (2) The greatest effect of TTX application would be expected to be the elimination of large transient inward sodium currents. Why are no such currents visible in the control (pre-TTX) or the difference currents (Fig. 2)? Is it possible I missed something in the methods?

      We apologize for the confusion and our mistake in failing to mention this important feature of the displayed traces. To include all of the representative traces in the figures, and prevent overlap of the traces, we removed the large inward sodium currents using the masking tool in Adobe Illustrator in Figure 2 and Supplemental Figure 5-1. We have added that information to the relevant figure legends. We have also provided unmasked images of the representative traces from Figure 2 and Supplemental Figure 5-1 to illustrate the large transient inward sodium currents, and the significant reduction of these currents with TTX treatment.

      (3) As expected, the changes in many of the measured parameters are smaller in the present study with heterozygotes than those previously reported for the homozygous mutation. Some of the statements on the significance of some of the present findings need to be stated more clearly. For example, in the results section describing Fig. 2, it is stated that "In glutamatergic and NFS GABAergic YH-HET neurons, the overall KNa current was increased ...as measured by a significant effect of genotype ...." Later in the same paragraph it is stated that the increases in KNa current are not significant. Apparently, different tests lead to different conclusions. Both for the purpose of understanding the pathophysiological effects of changes in KNa current and for making further numerical simulations, more explicit clarifying statements should be made.

      We apologize for the confusion on the description of these statistics. The results come from the same test, which is a Generalized Linear Mixed Model (GLMM). The factors in our GLMM were voltage step, genotype, and a voltage step x genotype interaction term. The overall effect of genotype is significant in glutamatergic neurons, but pairwise tests at each voltage step show no significant effect of genotype at any given voltage. This is somewhat analogous to running a traditional ANOVA on multiple groups and finding a significant ANOVA p-value but no significant post-hoc multiple comparisons tests, and is not uncommon. Our interpretation of this is that heterozygous expression of the YH variant in glutamatergic neurons likely increases KNa currents across positive potentials (as was seen with the YH-HOM glutamatergic neurons), but only a small amount at each positive step; thus, we lack the statistical power to determine any particular voltage step where this occurs.

      (4) The effects of the KCNT1 channel blocker VU170 on potassium currents are somewhat larger and different from those of TTX, suggesting that additional sources of sodium may contribute to activating KCNT1, as suggested by the authors. Because VU170 is, however, a novel pharmacological agent, it may be appropriate to make more careful statements on this. While the original published description of this compound reported no effect on a variety of other channels, there are many that were not tested, including Na and cation channels that are known to activate KCNT1, raising the possibility of off-target effects.

      We agree and thank the reviewer for making this point. To address this question, we measured KNa currents in WT vs. Kcnt1/Kcnt2-dKO neurons using VU170 to illustrate the extent of outward current due to off-target effects of the drug. These data have been included as a supplemental figure (Supplemental Fig. 5-2). We have also added several sentences to the Results section referencing this figure. Interestingly, in Kcnt1/Kcnt2-dKO neurons, VU170 seems to be quite specific across the negative potentials, as no outward currents are apparent until approximately -10 mV onward, whereas across positive potentials, there is a VU170-senstive outward current reaching ~1 nA by +50 mV. We have also included a note of caution in interpreting these data and added the possibility of off-target effects of VU170 as an alternative explanation for the differences observed on KNa currents between TTX and VU170 to the Discussion section.

      (5) The experiments were carried out at room temperature. Is it possible that different effects on firing patterns in heterozygotes and homozygotes would be observed at more physiological temperatures?

      Yes, it is reasonable to assume that an increased temperature would affect neuronal firing patterns in cultured neurons, as temperature differences have been shown to alter synaptic transmission and neuronal function, as assessed in both cultured neuron and slice recordings. All of our recordings were performed at room temperature in this study, and although they are valid with regard to between-group comparisons, this additional caveat is worth mentioning. We have added this to the paragraph describing study limitations in the Discussion section.

      To better understand the effects of temperature in our recordings, we have now compared membrane and AP generation parameters at room temperature (~22°C) and at a more physiological temperature (35°C) in a before-after study of 16 WT neurons, including both glutamatergic and GABAergic neuron types. Not surprisingly, we found robust alterations in all parameters assessed, excluding resting membrane potential and capacitance. We further assessed the effect of temperature on WT and YH-HET PV neurons, as the PV neurons expressing the YH variant showed the most unexpected phenotypes in our study. In our room temperature recordings, we showed that the YH-HET variant decreased the rheobase current, increased the AP amplitude, and increased the AP firing. In our before-after comparison (22°C-35°C) of PV neurons (WT; n=11, YH-Het; n=10), the WT and YH-HET neurons showed the same temperature-dependent effects on these parameters, including increased rheobase, decreased AP amplitude, and a higher maximal firing rate, at 35°C compared to those at 22°C. These data have been added to the manuscript as a supplemental figure (Supplemental Fig. 4-3) and are briefly referenced and described in the Results section.     

      Moreover, in our original manuscript, we showed that the effects of the homozygous YH variant on glutamatergic and NFS GABAergic neuron excitability were highly similar between cultured recordings at room temperature (~22°C) and slice recordings at 32°C. Taken together, these data suggest that the reported neurophysiological phenotypes downstream of the YH variant are likely not temperature dependent. 

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Shore et al. investigate the consequent changes in excitability and synaptic efficacy of diverse neuronal populations in an animal model of juvenile epilepsy. Using electrophysiological patch-clamp recordings from dissociated neuronal cultures, the authors find diverging changes in two major populations of inhibitory cell types, namely somatostatin (SST)- and parvalbumin (PV)-positive interneurons, in mice expressing a variant of the KCNT1 potassium channel. They further suggest that the differential effects are due to a compensatory increase in the persistent sodium current in PV interneurons in pharmacological and in silico experiments.

      Strengths:

      (1) Heterozygous KCNT1 gain of function variant was used which more accurately models the human disorder.

      (2) The manuscript is clearly written, and the flow is easy to follow. The authors explicitly state the similarities and differences between the current findings and the previously published results in the homozygous KCNT1 gain of function variant.

      (3) This study uses a variety of approaches including patch clamp recording, in silico modeling, and pharmacology that together make the claims stronger.

      (4) Pharmacological experiments are fraught with off-target effects and thus it bolsters the authors' claims when multiple channel blockers (TTX and VU170) are used to reconstruct the sodium-activated potassium current. Having said that, it would be helpful to see the two drug manipulations used in the same experiment. Notably, does the more selective blocker VU170 mimic the results of TTX for NFS GABAergic cells in Figure 2? And does it unmask a genotype difference for FS GABAergic cells like the one seen in PV interneurons in Figure 5C3.

      To illustrate the two drug manipulations in the same experiment, we recorded from WT SST and PV neurons (5 neurons/group) and blocked KNa currents first using TTX and then VU170, following wash out between the two drugs, in the same neurons. Below, we have plotted the points at each voltage step for each SST and PV neuron, and for each drug treatment, on the same graph to show how they vary directly. At each voltage step, lines connect the points representing the TTX-sensitive and VU-sensitive currents for each neuron to show the individual effects (left-most graphs). Summary data are shown across all voltages (middle graphs) and across negative voltages (right-most graphs).

      Author response image 1.

      We have not used VU170 on FS and NFS populations of GABAergic neurons. However, for reasons that are explained more extensively below in response to Weaknesses #3, we would not predict KNa currents recorded from SST- and PV-GABAergic neurons to mimic those of NFS- and FS-GABAergic neurons, respectively.

      Weaknesses:

      (1) This study relies on recordings in dissociated cortical neurons. Although specific WT interneurons showed intrinsic membrane properties like those reported for acute brain slices, it is unclear whether the same will be true for those cells expressing KCNT1 variants. This reviewer highly recommends confirming some of the key findings using an ex vivo slice preparation. This is especially important given the discrepant result of reduced excitability of PV cells reported by Gertler et al., 2022 (cited here in the manuscript but not discussed in this context) in acute hippocampal slices for a different KCNT1 gain of function variant.

      We thank the reviewer for this suggestion. To test whether SST-expressing YH-HET neurons show similar impairments to those observed in culture, we crossed the FVB-Tg(GadGFP)45704Swn/J transgenic mouse line (Jackson Labs #003718), also known as the GIN line, to the Kcnt1-YH line. Mice from the GIN line express eGFP in a subpopulation of SST-expressing neurons in the hippocampus and cortex. We performed slice recordings of cortical layer 2/3, GFP-expressing neurons from P21-30, WT and YH-HET GIN mice. Although the input resistance was not significantly decreased, the rheobase was higher in the YH-HET neurons, and they fired fewer APs across increasing current steps, than WT neurons, supporting the main findings from the SST-expressing neurons in culture. These data have been added to the manuscript in a new figure (Fig. 9).

      Regarding the previously published results on the effect of KCNT1 GOF on PV neuron excitability by Gertler et al., we have written a new paragraph in the Discussion section (last paragraph of the section, “Neuron-type-dependent KCNT1 GOF effects”) that discusses the differences between the findings by Gertler et al. and the current study. 

      To further investigate the effects of heterozygous YH variant expression on SST- vs. PV-expressing neuron excitability in ex vivo slice recordings, we are now crossing a cre-inducible, Td-Tomato Red reporter line (Ai9) to the Kcnt1-YH line. After obtaining Ai9Tg/Tg; Kcnt1m/+ mice, we will cross these to Sst-Cre and Pvalb-Cre lines to be able to record from marked SST and PV, WT and YH-HET neurons in slice. We plan on submitting results from these recordings as an eLife Research Advances article linked to this article.

      (2) It is unclear how different pieces of results fit together to form a story about the disease pathophysiology.

      We have added a paragraph to the Discussion to speculate on how these various GABAergic subtype-specific effects downstream of the YH variant may contribute to overall network/brain pathology and seizure propensity in heterozygous mice.

      For example, hyperexcitability of PV cells would suggest more inhibition which would counter seizure propensity. However, spontaneous inhibitory postsynaptic currents show no change in pyramidal neurons. Moreover, how do the authors reconcile that the reductions in synaptic inputs onto interneurons in Figure 3B with the increases in Figure 8? This should be discussed.

      Generally, network and synaptic alterations downstream of the heterozygous variant were quite minimal compared with those of the homozygous variant. Although there were reductions in the frequency of synaptic inputs onto inhibitory neurons, the changes were relatively small. Thus, we concluded that the neuronal effects downstream of the heterozygous YH variant were below some threshold to result in broader network effects on synaptic activity and connectivity similar to those in the homozygous YH model. The discrepancies between our GABAergic vs. FS/NFS vs. VIP/SST/PV data will be discussed in more detail in response to Weakness #3.   

      (3) Similarly, the results in this work are not entirely internally consistent. For example, given the good correspondence between FS and NFS GABAergic cells with PV and SST expression, why are FS GABAergic cells hyperexcitable in Figure 1? If anything, there is a tendency to show reduced excitability like the NFS GABAergic cells.

      In our neuron cultures, 76-80% of Neu-N-expressing neurons are GFP+ (from the CamKII-eGFP virus used to mark glutamatergic neurons), and of the remaining ~20-24%, the majority are GABAergic (verified using the Dlx5/6-mRuby virus to mark GABAergic neurons and using electrophysiology to assess AP parameters and analyze evoked responses). In our original experiments, recordings sampled from this larger GABAergic population were used (Fig. 3), or this population was sorted almost equally into FS and NFS (Figs. 1 and 2).

      In later experiments, we isolated and cultured neurons from VIP-Cre, SST-Cre, and PV-Cre mouse lines and marked these neuron types in vitro with a Cre-inducible mCherry virus. In the VIP-Cre cultures, ~6% of the GFP- population, or 1.2% of the Neu-N-population, was mCherry+. In the SST-Cre cultures, ~20.5% of the GFP- population, or 4.7% of the Neu-N-population, was mCherry+. In the PV-Cre cultures, less than 1% of the Neu-N-population was mCherry+, which is not surprising considering the relatively late onset of PV expression compared with those of VIP and SST. Thus, we would estimate that we are marking and recording from less than 30% of the total GABAergic population in these in vitro experiments, rather than the 80-90% that these three populations would sum to in vivo.  

      Furthermore, using our original criteria for sorting GABAergic neurons into FS and NFS subtypes, all VIP recorded neurons were of the NFS type, PV of the FS type, whereas SST were of the FS (38%) and NFS (62%) types, which is not far off from the significant fraction of SST neurons that have been shown to be fast-spiking in slice experiments (Kvitsiani et al., 2013; Urban-Ciecko & Barth, 2016). Therefore, the FS group consists of both PV and SST neurons, and the NFS group consists of both VIP and SST neurons, and likely also contains immature PV neurons that have not yet developed a fast-spiking phenotype. Taken together, this suggests that the data from these two sets of experiments (FS/NFS vs. VIP/SST/PV) are not directly comparable.

      Also, why do the WT I-V curves look so different between Figures 2 and 5? This reviewer suggests at least a brief explanation in the discussion.

      As to the differences in appearance between the WT I-V curves in Figures 2 and 5, those plots are from different neuron types (Fig. 2: Glutamatergic, FS GABAergic, and NFS GABAergic vs. Fig. 5: VIP-, SST-, and PV-expressing), and the KNa currents are isolated using different methods (Fig. 2: TTX-subtraction vs. Fig. 5: VU170-subtraction). TTX blocks an inward Na+ current, which is apparent across subthreshold voltages in Fig. 2C1-3, whereas VU170 does not block this current, making it not apparent in Fig. 5C1-3. Also, the bottom three panels in Fig. 2C1-3 show the KNa current from -80 to 0 mV, whereas those in Fig. 5C1-3 show from -80 to -30 mV, to better illustrate the areas spanning KNa current increases, so their appearance is not directly comparable.

      (4) Given the authors' claim that the KCNT1 activation curve is a major contributor to the observed excitability differences in specific GABA cell subtypes, it would be helpful to directly measure the activation curve in the variants experimentally as was done for WT KCNT1 in Figure 6A and use the derived kinetics in the compartmental model.

      We apologize for the confusion. Although the activation curves among different GABAergic subtypes from WT KCNT1 are distinct, and we believe that these varying kinetics contribute to the neuron-type-specific phenotypes of KCNT1 GOF, we didn’t intend to suggest that the heterozygous Y777H variant itself causes neuron-type-specific alterations to the activation curves of the GABAergic subtypes. To clarify this point, below, we show the high similarity of the activation curves between WT KCNT1 and YH-HET KCNT1 in each of the GABAergic subtypes.

      Author response image 2.

      Reviewer #3 (Public Review):

      Summary:

      The present manuscript by Shore et al. entitled Reduced GABAergic Neuron Excitability, Altered Synaptic Connectivity, and Seizures in a KCNT1 Gain-of-Function Mouse Model of Childhood Epilepsy" describes in vitro and in silico results obtained in cortical neurons from mice carrying the KCNT1-Y777H gain-of-function (GOF) variant in the KCNT1 gene encoding for a subunit of the Na+-activated K+ (KNa) channel. This variant corresponds to the human Y796H variant found in a family with Autosomal Dominant Nocturnal Frontal lobe epilepsy. The occurrence of GOF variants in potassium channel encoding genes is well known, and among potential pathophysiological mechanisms, impaired inhibition has been documented as responsible for KCNT1-related DEEs. Therefore, building on a previous study by the same group performed in homozygous KI animals, and considering that the largest majority of pathogenic KCNT1 variants in humans occur in heterozygosis, the Authors have investigated the effects of heterozygous Kcnt1-Y777H expression on KNa currents and neuronal physiology among cortical glutamatergic and the 3 main classes of GABAergic neurons, namely those expressing vasoactive intestinal polypeptide (VIP), somatostatin (SST), and parvalbumin (PV), crossing KCNT1-Y777H mice with PV-, SST- and PV-cre mouse lines, and recording from GABAergic neurons identified by their expression of mCherry (but negative for GFP used to mark excitatory neurons).

      The results obtained revealed heterogeneous effects of the variant on KNa and action potential firing rates in distinct neuronal subpopulations, ranging from no change (glutamatergic and VIP GABAergic) to decreased excitability (SST GABAergic) to increased excitability (PV GABAergic). In particular, modelling and in vitro data revealed that an increase in persistent Na current occurring in PV neurons was sufficient to overcome the effects of KCNT1 GOF and cause an overall increase in AP generation.

      Strengths:

      The paper is very well written, the results clearly presented and interpreted, and the discussion focuses on the most relevant points.

      The recordings performed in distinct neuronal subpopulations are a clear strength of the paper. The finding that the same variant can cause opposite effects and trigger specific homeostatic mechanisms in distinct neuronal populations is very relevant for the field, as it narrows the existing gap between experimental models and clinical evidence.

      Weaknesses:

      My main concern is in the epileptic phenotype of the heterozygous mice investigated. In fact, in their previous paper the Authors state that "...Kcnt1-Y777H heterozygous mice did not exhibit any detectable epileptiform activity" (first sentence on page 4). However, in the present manuscript, they indicate twice in the discussion section that these mice exhibit "infrequent seizures". This relevant difference needs to be clarified to correctly attribute to the novel pathophysiological mechanism a role in seizure occurrence. Were such infrequent seizures clearly identified on the EEG, or were behavioral seizures? Could the authors quantify this "infrequent" value? This is crucial also to place in the proper perspective the Discussion statement regarding "... the increased INaP contribution to ... network hyperexcitability and seizures".

      We apologize for the confusion. Indeed, in the Results section from our previous paper, we failed to observe seizures in 14 heterozygous mice, whereas 23/25 homozygous mice showed seizures by video-EEG. However, in the fifth paragraph of the Discussion section from that paper, we further stated that “during the preparation and review of [that] article, we observed seizures in two Kcnt1-Y777H heterozygous mice, one during a widefield Ca2+ imaging experiment and the other during a video-EEG experiment”. Thus, we concluded that “heterozygous expression can result in seizures in a rodent model, but apparently at a much lower frequency than that observed with homozygous expression”. To emphasize these findings, we’ve added a sentence to the Introduction in this manuscript about the occurrence of infrequent seizures in Kcnt1-Y777H heterozygous mice, along with a reference to the Discussion of our previous paper.

      Of the two observed seizures, one seizure was captured in the Weston Lab at the University of Vermont from a Kcnt1-Y777H heterozygous mouse expressing a calcium indicator (after it was bred to the Snap25-GCaMP6s line) during a Ca2+ widefield imaging experiment, and it was accompanied by a time-locked video of the seizure event. The other seizure was recorded as a control during a drug study using video-EEG. This Kcnt1-Y777H heterozygous mouse had multiple tonic seizures, as evidenced by EEG traces and the accompanying video, which were recorded and analyzed in the Frankel Lab at Columbia University. The seizures from heterozygous mice have not been officially quantified, as they have only been rarely observed across multiple different experiments using heterozygous mice at multiple institutions, making quantification quite difficult.

      Lastly, regarding attributing the role of the identified pathological mechanisms to seizure occurrence mentioned by the reviewer, we have added a paragraph to the Discussion to speculate on how the various GABAergic subtype-specific effects downstream of the YH variant may contribute to the general lack of network/brain pathology and seizure generation in heterozygous mice.  

      Also, some statistical analysis seems to be missing. For example, I could not find any for the data shown in Fig. 6. Thus, the following statement: "the model PV neurons responded to KCNT1 GOF with decreased AP firing and an increased rheobase" requires proper statistical evaluation.

      We thank the reviewer for this suggestion. We were initially hesitant to apply a formal statistical analysis to the modeling data because it differs in important ways from the experimental data. However, we have now provided statistical analyses of these data, with some caveats. Because we applied each KCNT1 GOF level (40, 35, and 30 mM) to the same set of neurons for each data set, we performed repeated measures ANOVA analyses to assess differences due to GOF in each subtype. We note that some changes are statistically significant, but may not be physiologically relevant. For example, there are changes in input resistance and rheobase in VIP neurons only at the higher GOF level (30 mM), but the magnitude of each change is quite small relative to those in SST neurons (Rin: 1.7 MΩ in VIP vs. 23.2 MΩ in SST, rheo: 1.7 pA in VIP vs. 52.5 pA in SST), and likely as a consequence, there are no downstream effects on the AP firing rate at either GOF level in VIP neurons. It is important to examine the magnitude of the effects and interpret them in the context of the changes in other neuron types and in the experimental data, thus, we’ve provided two new figures to better illustrate the relative changes in each neuron type (Supplemental Figures 6-2 and 7-2). We have also added these statistical results to Figures 6E2, 6F2, 6G2, and 7E, and Supplemental Fig. 6-1, and we have described them in the Results section. A summary of the statistics has also been added in Supplemental Table 6.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      In addition to addressing the weaknesses highlighted in the public review, this reviewer recommends using a KCNT1 agonist such as loxapine to see if activating the potassium channel mimics the KCNT1 GOF in SST and PV cells.

      Although we appreciate this suggestion, we’re not sure whether treating GABAergic subtypes with loxapine would provide much clarity in the absence of many supporting experiments. First, the amount of channel activation and any changes in kinetics caused by loxapine would need to be measured and compared to the YH-HET GOF effects in order to interpret the results. In addition, the aforementioned caveat about off-target effects of small molecules would also have to be considered, as loxapine inhibits many other channels at nanomolar concentrations.

      More importantly, we hypothesize that several of the GABAergic subtype-specific effects of KCNT1 GOF result from homeostatic or adaptive mechanisms due to long-term increases in KNa currents. For instance, PV-expressing YH-HET neurons had a lower rheobase, increased AP amplitude, and increased AP firing frequency, effects that we believe are due, not to increased KNa currents themselves, but to a compensatory increase in a persistent Na+ current. For the SST neurons, we hypothesize that the increased capacitance and soma size, together with the increased electrical coupling, exacerbate the hypoexcitability phenotype downstream of the YH variant. Thus, we would not necessarily expect that opening KCNT1 channels by acute loxapine treatment would mimic many of these effects.

      Indeed, in a previous study using a different KCNT1 GOF mouse model, loxapine treatment mimics KCNT1 GOF effects in some neuron types (reduced AP firing frequency in loxapine-treated, WT PV neurons mimics that observed in heterozygous KCNT1 GOF PV neurons), but not in others (reduced AP firing frequency in loxapine-treated, WT pyramidal neurons does not mimic the unaltered AP firing frequency observed in heterozygous and homozygous KCNT1 GOF pyramidal neurons) (Gertler et al., 2022).  

      Related to this suggestion by the reviewer, we are currently performing studies using a KCNT1 blocker in WT and Kcnt1-KO neurons to better understand the role of KCNT1 among cortical neuronal subtypes that will be published in a future manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Though I realize that primary cultures allow for efficient identification of neuronal subclasses, it would have been useful to show that similar changes also occur in neurons with conserved in vivo connectivity, such as those recorded from brain slices.

      We thank the reviewer for this suggestion. We have added an additional figure (Fig. 9) showing that the hypoexcitability phenotype observed in SST neurons in culture recordings is conserved in SST neurons in slice recordings from GIN mice, which express GFP predominately in SST-expressing neurons.

      In addition, further experiments in PV neurons from Kcnt1-Y777H homozygous mice would provide evidence for a gene-dosage role in the changes found in heteros.

      For this manuscript, we chose to focus our efforts on understanding the effects of heterozygous Kcnt1 variant expression in various neuronal subtypes with the goal of better modeling GOF variant effects in human disease. However, we’re very interested in investigating the effects of homozygous expression of the YH variant on each of the GABAergic subtypes to compare with those found in this study, but this requires more rounds of breeding to get homozygous mice with GABAergic subtype-specific expression of cre recombinase. We look forward to reporting the results from these experiments in a future manuscript.

      Also, when addressing the issue regarding the different effects of the same GOF variant on the excitability of distinct neuronal populations in the Discussion or Introduction sections, the authors may want to cite the recent work on KCNQ2 and KCNQ3 by the Tzingounis group (https://pubmed.ncbi.nlm.nih.gov/37607817/).

      We thank the reviewer for bringing this manuscript to our attention. We have added this citation to a new paragraph in the Discussion section regarding neuron-type specific effects of ion channel variants (the last paragraph focusing on the effects in PV neurons).

      Budelli, G., Hage, T. A., Wei, A., Rojas, P., Jong, Y. J., O'Malley, K., & Salkoff, L. (2009). Na+-activated K+ channels express a large delayed outward current in neurons during normal physiology. Nat Neurosci, 12(6), 745-750. https://doi.org/10.1038/nn.2313

      Gertler, T. S., Cherian, S., DeKeyser, J. M., Kearney, J. A., & George, A. L., Jr. (2022). K(Na)1.1 gain-of-function preferentially dampens excitability of murine parvalbumin-positive interneurons. Neurobiol Dis, 168, 105713. https://doi.org/10.1016/j.nbd.2022.105713

      Grubb, M. S., & Burrone, J. (2010). Activity-dependent relocation of the axon initial segment fine-tunes neuronal excitability. Nature, 465(7301), 1070-1074. https://doi.org/10.1038/nature09160

      Hage, T. A., & Salkoff, L. (2012). Sodium-activated potassium channels are functionally coupled to persistent sodium currents. J Neurosci, 32(8), 2714-2721. https://doi.org/10.1523/JNEUROSCI.5088-11.2012

      Kuba, H., Oichi, Y., & Ohmori, H. (2010). Presynaptic activity regulates Na(+) channel distribution at the axon initial segment. Nature, 465(7301), 1075-1078. https://doi.org/10.1038/nature09087

      Kvitsiani, D., Ranade, S., Hangya, B., Taniguchi, H., Huang, J. Z., & Kepecs, A. (2013). Distinct behavioural and network correlates of two interneuron types in prefrontal cortex. Nature, 498(7454), 363-366. https://doi.org/10.1038/nature12176

      Shore, A. N., Colombo, S., Tobin, W. F., Petri, S., Cullen, E. R., Dominguez, S., Bostick, C. D., Beaumont, M. A., Williams, D., Khodagholy, D., Yang, M., Lutz, C. M., Peng, Y., Gelinas, J. N., Goldstein, D. B., Boland, M. J., Frankel, W. N., & Weston, M. C. (2020). Reduced GABAergic neuron excitability, altered synaptic connectivity, and seizures in a KCNT1 gain-of-function mouse model of childhood epilepsy. Cell Rep.

      Urban-Ciecko, J., & Barth, A. L. (2016). Somatostatin-expressing neurons in cortical networks. Nat Rev Neurosci, 17(7), 401-409. https://doi.org/10.1038/nrn.2016.53

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary

      The authors investigated the antigenic diversity of recent (2009- 2017) A/H3N2 influenza neuraminidases (NAs), the second major antigenic protein after haemagglutinin. They used 27 viruses and 43 ferret sera and performed NA inhibition. This work was supported by a subset of mouse sera. Clustering analysis determined 4 antigenic clusters, mostly in concordance with the genetic groupings. Association analysis was used to estimate important amino acid positions, which were shown to be more likely close to the catalytic site. Antigenic distances were calculated and a random forest model was used to determine potential important sites.

      This has the potential to be a very interesting piece of work. At present, there are inconsistencies in the methods, results and presentation that limit its impact. In particular, there are weaknesses in some of the computational work.

      Strengths

      (1) The data cover recent NA evolution and a substantial number (43) of ferret (and mouse) sera were generated and titrated against 27 viruses. This is laborious experimental work and is the largest publicly available neuraminidase inhibition dataset that I am aware of. As such, it will prove a useful resource for the influenza community.

      (2) A variety of computational methods were used to analyse the data, which give a rounded picture of the antigenic and genetic relationships and link between sequence, structure and phenotype.

      Weaknesses

      (1) Inconsistency in experimental methods

      Two ferret sera were boosted with H1N2, while recombinant NA protein for the others. This, and the underlying reason, are clearly explained in the manuscript. The authors note that boosting with live virus did not increase titres. Nevertheless, these results are included in the analysis when it would be better to exclude them (Figure 2 shows much lower titres to their own group than other sera).

      As an exercise, we have excluded the H1N2 boosted ferrets sera and no major impact was observed in the antigenic grouping (see Author response image 1a). Another way to control for differences in immunogenicity is to normalize the NAI values with the homologous ELISA titers for each antigen. Clustering based on these ELISA normalized NAI titers reveals the same 4 distinct antigenic groups but with one change: Kan17 is shifted from group 1 to group 2 (Author response image 1b). Note that a homologous ELISA titer is not available for A/West-Virginia/17/2012 and thus this serum sample is not included in Author response image 1b.

      Author response image 1.

      Antigenic and phylogenetic relatedness of N2 NAs. Phylogenetic tree based on the N2 NA head domain amino acid sequences and heat-map representing the average of normalized neuraminidase inhibition titer per H6N2 [log2 (max NAI/NAI)] determined in ferret sera after the boost (listed vertically). The red-to-blue scale indicates high-to-low NAI observed in ELLA against the H6N2 reassortants (listed at the bottom). UPGMA clustering of H6N2s inhibition profiles are shown on top of the heat map and colored according to the phylogenetic groups.(a) Based on the ferret sera with exclusion of the sera that were obtained following prime-boost by infection with H1N2 (A/Estonia/91625/2015 and A/Stockholm/15/2014). (b) Based on serum NAI titers that were normalized by the homologous ELISA titer.

      (2) Inconsistency in experimental results

      Clustering of the NA inhibition results identifies three viruses which do not cluster with their phylogenetic group. Again, this is clearly pointed out in the paper. Further investigation of this inconsistency is required to determine whether this has a genetic basis or is an experimental issue. It is difficult to trust the remaining data while this issue is unresolved.

      We understand the concern of the reviewer. It is important to keep in mind that discrete grouping of antigens allows to visualize major antigenic drifts. However, within closely related groups the cross reactivity of antisera is more likely distributed in a spectrum. When we constructed an antigenic map based on the antigenic cartography algorithm (as described by Smith D. et al, 2004), Kansas17, Wis15, and Ala15 are positioned more closely to antigenic group 1 than the majority of other antigens that were classified as group 2 (Author response image 2a). Similar results were obtained when individual ferret sera from the biological duplicates were used (Author response image 2b). This antigenic cartography map is now added as Figure 2. Figure supplement 3 to the revised manuscript.

      Author response image 2.

      The antigenic cartography was constructed using averaged data from pairs of ferrets (a). Similar analysis was performed on individual ferrets sera (b).

      (3) Inconsistency in group labelling

      A/Hatay/4990/2016 & A/New Caledonia/23/2016 are in phylogenetic group 1 in Figure 2 and phylogenetic group 1 in Figure 5 - figure supplement 1 panel a.

      Our apologies: there was indeed a mistake in labeling of Figure 5. A new antigenic cartography was constructed and included in the revised manuscript. As a result Figure 5 - figure supplement has now become redundant and was removed from the manuscript.

      A/Kansas/14/2017 is selected as a representative of antigenic group 2, when in Figure 2 it is labelled as AC1 (although Figure 2 - supplement 4 which the text is referring to shows data for A/Singapore/Infimh-16-0019/2016 as the representative of AC2). A/Kansas/14/2017 is coloured and labelled as AC2 in Figure 2 - supplement 5.

      Thank you for pointing out this inconsistency. Kan17 clustered antigenically in group 1 based on the NAI values that were normalized relative to the serum with the maximal NAI value against the H6N2 virus that was tested. When using NAI titers that are normalization with the homologous ELISA titer, Kan17 is positioned in group 2. Likewise, antigenic cartography mapping positions Kan17 in group 2. Therefore, we conclude that A/Kansas/14/2017 NA is a representative of group 2.

      The colouring is changed for Figure 3a at the bottom. A/Heilongjiang-Xiangyang/1134/2011 is coloured the same as AC4 viruses when it is AC1 in Figure 2. This lack of consistency makes the figures misleading.

      We apologize for this mistake. The coloring in Figure 3a has been corrected.

      (4) Data not presented, without explanation

      The paper states that 44 sera and 27 H6N2 viruses were used (line 158). However, the results for the Kansas/14/2017 sera do not appear to be presented in any of the figures (e.g. Figure 2 phylogenetic tree, Figure 5 - figure supplement 1). It is not obvious why these data were not presented. The exclusion of this serum could affect the results as often the homologous titre is the highest and several heatmaps show the fold down from the highest titre.

      Serum against A/Kansas/14/2017 was not prepared. For that reason, it is not included in the analysis. We agree that such homologous serum ideally should have been included and in the NAI assay would have resulted in a high if not the highest titer. However, we noticed that homologous sera did not always have the highest titers, especially in panels like ours were some antigens are closely related. The highest titer obtained against Kan17 H6N2 was from A/Bris/16 sera: 1/104, a titer that is in the range of other, homologous titers observed in the panel (Table S3). The Bris16 and Kan17 NAs have five amino acid differences. In summary, inclusion of Kan17 homologous sera would likely not impact the analysis and interpretation of the results because there are multiple highly cross-inhibiting heterologous serum samples against Kan17.

      (5) The cMDS plot does not have sufficient quality assurance A cMDS plot is shown in Figure 5 - figure supplement 1, generated using classical MDS. The following support for the appropriateness of this visualisation is not given. a. Goodness of fit of the cMDS projection, including per point and per titre. b. Testing of the appropriate number of dimensions (the two sera from phylogenetic group 3 are clustered with phylogenetic group 2; additional dimensions might separate these groups). c. A measure of uncertainty in positioning, e.g. bootstrapping. d. A sensitivity analysis of the assumption about titres below the level of detection (i.e. that <20 = 10). Without this information, it is difficult to judge if the projection is reliable.

      We agree with these comments. We have removed Figure 5 – figure supplement 1, and added new figure 2 – figure supplement 3 (antigenic cartography) instead.

      (6) Choice of antigenic distance measure

      The measure of antigenic distance used here is the average difference between titres for two sera. This is dependent on which viruses have been included in the analysis and will be biased by the unbalanced number of viruses in the different clusters (12, 8, 2, 5).

      To verify the impact of the number of antigens on our analysis, the matrix of differences was generated with only 4 H6N2s representing at least one phylogenetic group (Per09, Sin16, Hel823 and Ind11) (Author response image 3a). This matrix is very similar to the one calculated based on all 27 antigens (Author response image 3b). The obtained matrix (Author response image 3a) was used in random forest to model antigenic distances and the result of prediction was plotted against real differences calculated based on the full data. The correlation coefficient (R2) of predicted vs observed values dropped from 0.81 to 0.71, suggesting that the number of antigens tested does not drastically affect the antigenic differences calculated based on serum values (Author response image 3e). Importantly, amino acid substitutions potentially associated with increased antigenic distances are similarly identified (Author response image 3c, d and f).

      Author response image 3.

      Matrix of differences was calculated using only 4 H6N2 antigens (a) or the full panel (b). The matrixes from (c) 4 or (d) 27 antigens were used in random forest modeling to estimate the impact of amino acid changes, respectively. The rf modeling data generated from 4 H6N2 only was plotted and correlated with values calculated from the full panel of 27 H6N2s (e). The multi-way importance plot indicates in red that 7 out of the 10 most important substitutions were identified by the analysis using only 4 H6N2s (f).

      Interestingly, when matrix of differences is calculated using only 4 H6N2s data but not including at least one representative of antigenic group 1 and 2, the correlation coefficient between the predicted values and values obtained from the full panel is dramatically impacted (R2 values drops from 0.81 to 0.5 and 0.57. It is important to note that most of the sera also belong to phylogenetic antigens from groups 1 and 2. As a consequence, poorer prediction of those antigens would more drastically impact the correlation. No drastic drop was observed when representative H6N2s from group 3 or 4 were excluded from the data (from 0.81 to 0.75 and 0.73, Author response image 4 c and d).

      Author response image 4.

      Random forest analysis was repeated using only 4 antigens, but excluding representatives of one of the phylogenetic groups (a) no group 1, (b) no group 2, (c) no group 3, and (d) no group 4.

      We also used Euclidean distances as a measure of differences (Author response image 5). The predictive values obtained in rf have a slightly reduced R2 compared to the values obtained using average of differences.

      In conclusion the unbalanced number of antigens used per group and metric of distance does not seem to impact per se our analysis.

      Author response image 5.

      Antigenic distances were calculated using Euclidian distances of sera to sera. Those antigenic distances were used in rf for estimation of antigenic distance and importance of each amino acid substitution.

      (7) Association analysis does not account for correlations

      For each H6N2 virus and position, significance was calculated by comparing the titres between sera that did or did not have a change at that position. This does not take into account the correlations between positions. For haemagglutinin, it can be impossible to determine the true antigenic effects of such correlated substitutions with mutagenesis studies.

      Most of the potential correlated effects cannot be addressed with the panel of N2s, except for combinations of substitution that are included in the panel, such as 245/247 with or without 468. Only mutagenesis studies would shed light on the epistatic effects. However, it is important to keep in mind that those individual substitutions in such kind of study likely do not reflect natural evolution of N2 (cfr. the importance of the NA charge balance (Wang et al., 2021: 10.7554/eLife.72516).

      (8) Random forest method

      25 features are used to classify 43 sera, which seems high (p/3 is typical for classification). By only considering mismatches, rather than the specific amino acid changes, some signals may be lost (for example, at a given position, one amino acid change might be neutral while another has a large antigenic effect). Features may be highly, or perfectly correlated, which will give them a lower reported importance and skew the results.

      The number of features were optimized in the range from 5 to 80, with 25 being optimal (best R-value in predicted vs observed antigenic distances). Those features refer to the number of amino acid substitutions used in each tree. The number of trees was also optimized in the range of 100 to 2000.

      In random forest the matrix of differences is made considering only position based and not the type of substitution in pairs of NA. Indeed, substitutions with distinct effects may skew results by indicating lower reported importance.

      We have highlighted such potential bias in our discussion:

      “Also, our modelling does not consider that substitution by other amino acids can have a distinct impact on the antigenic distance. As a consequence, predictions based on the model could underestimate or overestimate the importance of a particular amino acid residue substitution in some cases.”

      Reviewer #2 (Public Review):

      Summary:

      The authors characterized the antigenicity of N2 protein of 44 selected A(H3N2) influenza A viruses isolated from 2009-2017 using ferret and mice immune sera. Four antigenic groups were identified, which correlated with their respective phylogenic/ genetic groups. Among 102 amino acids differed by the 44 selected N2 proteins, the authors identified residues that differentiate the antigenicity of the four groups and constructed a machine-learning model that provides antigenic distance estimation. Three recent A(H3N2) vaccine strains were tested in the model but there was no experimental data to confirm the model prediction results.

      Strengths:

      This study used N2 protein of 44 selected A(H3N2) influenza A viruses isolated from 2009-2017 and generated corresponding panels of ferret and mouse sera to react with the selected strains. The amount of experimental data for N2 antigenicity characterization is large enough for model building.

      Weaknesses:

      The main weakness is that the strategy of selecting 44 A(H3N2) viruses from 2009-2017 was not explained. It is not clear if they represent the overall genetic diversity of human A(H3N2) viruses circulating during this time. A comprehensive N2 phylogenetic tree of human A(H3N2) viruses from 2009-2017, with the selected 44 strains labeled in the tree, would be helpful to assess the representativeness of the strains included in the study.

      The selection of antigens was performed using the method described by Bien and Tibshirani 2011 (doi: 10.1198/jasa.2011.tm10183). This method calculates MinMax distances to identify a central representative among distinct clusters.

      To facilitate visualization of in a phylogenetic tree, only 180 representative N2 proteins from 2009-2017 were randomly selected (20 strains per year, unlabelled). Those 180 representatives and 44 readout panel strains (labelled) are shown in the phylogenetic tree below. Readout strains cover the major branches of the tree. The tree has been built using PhyML 3.0 using JTT substitution model and default parameters (Guindon S. et al, Systematic Biology 59(3):307-21, 2010) and visualized using ETE3 (Huerta-Cepas J. et al, Mol. Biol. Evol 33(6):1635-38, 2016).

      Author response image 6.

      The second weakness is the use of double-immune ferret sera (post-infection plus immunization with recombinant NA protein) or mouse sera (immunized twice with recombinant NA protein) to characterize the antigenicity of the selected A(H3N2) viruses. Conventionally, NA antigenicity is characterized using ferret sera after a single infection. Repeated influenza exposure in ferrets has been shown to enhance antibody binding affinity and may affect the cross-reactivity to heterologous strains (PMID: 29672713). The increased cross-reactivity is supported by the NAI titers shown in Table S3, as many of the double immune ferret sera showed the highest reactivity not against its own homologous virus but to heterologous strains. Although the authors used the post-infection ferret sera to characterize 5 viruses (Figure 2, Figure Supplement 4), the patterns did not correlate well. If the authors repeat the NA antigenic analysis using the post-infection ferret sera with lower cross-reactivity, will the authors be able to identify more antigenic groups instead of 4 groups?

      This is a very valuable remark. In their paper, Kosikova et al. (CID 2018) report that repeated infection of ferrets with antigenically slightly different H3N2 viruses results in a broader anti-HA response, compared to a prime infection of an influenza naïve ferret, which results in a narrower anti-HA response. In our ferret immunizations the boost was performed with recombinant, enzymatically active NA that was homologous to the NA of the H1N2 virus that was used for the priming by infection. We determined the NAI responses in sera from ferrets after H1N2 infection against 5 different H6N2 viruses (Figure 2 – figure supplement 5). Compared to NAI responses in sera from H1N2 infected and subsequently NA protein boosted ferrets, the NAI titers obtained after a single infection were considerably lower. Although the normalized NAI titers of day 14 and day 42 sera correlated well, we cannot exclude a degree of broadening of the NAI response in the NA protein boost sera (Author response image 7). On the other hand, repeated influenza antigen exposure is the reality for the majority of people.

      Author response image 7.

      Correlation obtained on NAI data from ferrets at day 14 after infection vs data from day 42 after boost.

      Another weakness is that the authors used the newly constructed model to predict the antigenic distance of three recent A(H3N2) viruses but there is no experimental data to validate their prediction (eg. if these viruses are indeed antigenically deviating from group 2 strains as concluded by the authors).

      Indeed, there is no experimental data from A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021. The generation of data to determine experimental values for A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021 would require the generation of new reassortant viruses (H1N2s), recombinant protein and immunization of new ferrets. The ferrets sera would have to be analyzed against all 27 H6N2s, including duplicated control sera for normalization. The major point of the modeling was to evaluate if it is possible to predict the antigenic behavior based on amino acid substitutions.

      As an exercise we have run the model again but this time excluding the Swe17 and HK17 antigens from the data set. Sequences of Sw17 or HK17 were then used to predict antigenic distances. The modeled versus experimental data are plotted in Author response image 8 and show a robust predictive outcome with R2 values of 0.94 and 0.91 for Sw17 and HK17, respectively.

      Author response image 8.

      Antigenic distances from Swe17 and HK17 calculated using the random forest algorithm that was constructed without experimental data from Swe17 and HK17. The predicted distances were plotted side by side to the experimental distances in (a) and correlations are shown in (b).

      Reviewer #3 (Public Review):

      Summary:

      This paper by Portela Catani et al examines the antigenic relationships (measured using monotypic ferret and mouse sera) across a panel of N2 genes from the past 14 years, along with the underlying sequence differences and phylogenetic relationships. This is a highly significant topic given the recent increased appreciation of the importance of NA as a vaccine target, and the relative lack of information about NA antigenic evolution compared with what is known about HA. Thus, these data will be of interest to those studying the antigenic evolution of influenza viruses. The methods used are generally quite sound, though there are a few addressable concerns that limit the confidence with which conclusions can be drawn from the data/analyses.

      Strengths:

      • The significance of the work, and the (general) soundness of the methods.

      • Explicit comparison of results obtained with mouse and ferret sera.

      Weaknesses:

      • Approach for assessing the influence of individual polymorphisms on antigenicity does not account for the potential effects of epistasis.

      Indeed, possible epistatic effects or individual polymorphisms were not assessed, which is limited by the nature of the panel of N2s selected in the study. We now emphasize this in the discussion as follows:

      “Also, our modelling does not consider that substitution by different amino acids can have distinct impact on antigenic distance. As a consequence, predictions based on the model could underestimate the importance of a particular amino acid residue substitution in some cases.”

      • Machine learning analyses were neither experimentally validated nor shown to be better than simple, phylogenetic-based inference.

      This is a valid remark and indeed we have found a clear correlation between NAI cross reactivity and phylogenetic relatedness. However, besides achieving good prediction of the experimental data (as shown in Figure 5 and in FigureR7), machine Learning analysis has the potential to rank or indicate major antigenic divergences based on available sequences before it has consolidated as new clade. ML can also support the selection and design of broader reactive antigens.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major corrections

      No major corrections, beyond the issues I touched on in the public review, for which I give a little more detail below:

      Point 2. If there's not a putative genetic basis for the unexpected clustering seen in the NAI, then reiterating a small subset of the data would show the reliability of the experimental methods and substantiate this unexpected finding.

      We thank the reviewer for this pertinent point and suggestion. We have modified our analysis by reiterating individual ferret data normalized with the homologous ELISA titers. This reiteration is shown in figure R1b. In this case both Kan17 and Wis15 are switched to antigenic group 2. The profile of sera inhibition against those 2 strains that shift from antigenic cluster 1 to 2, is clearly an intermediate between profiles observed in those 2 groups. Considering that antigenic evolution occurs gradually, it is not unexpected that those intermediate profiles would swing from one side to another when pushed to forced discrimination. Antigenic cartography mapping, as in Smith et al. (2004), also indicated that those H6N2s are located closer to G1 than overall antigens from G2. Raw data distribution (max and min EC50) also do not indicate potential bias in analysis.

      Point 5. If you want to use antigenic cartography (Smith et al 2004), there is the R CRAN package (https://CRAN.R-project.org/package=Racmacs) which can handle threshold titres (like <20) and has functions for the diagnostic tools I describe, in order to quality assure the resulting plot. It does use a different antigenic distance metric than the paper currently uses, so you might not want to take that route.

      Thank you for this suggestion. We have performed antigenic cartography using the methodology described by Smith et al made accessible by Sam Wilks. The outcome of this analysis has been added to the manuscript as Figure 2 – Figure supplement 3.

      Point 6. More robust measures of antigenic distance take into account the homologous titre, homologous and heterologous titres (Archetti & Horsfall, 1950) or use the highest observed titre for a serum (Smith et al 2004). A limitation of the first two is that the antigenic distance can only be calculated when you have the homologous titre, which will limit you as you only have this for 26/43 sera. They may give similar results to your average antigenic distance, in which case your analysis still stands. Calculating antigenic distance using the homologous or maximum titre only gives the antigenic distance between the antigen and the serum. If you want the distance between all the sera, then further analysis is required (making an antigenic map and outputting the serum-serum distances, see the point above).

      We thank the reviewer for these suggestions. A complete set of 43 H6N2 viruses that matches all 43 sera would have been ideal. This would require the generation of 17 additional H6N2 viruses and their testing in ELLA, a significant amount of work in terms of time and resources. Instead, we have generated an antigenic map of the 27 antigens and homologous sera (cfr. our response to point 5 above). Despite different methods the outcome showing 4 major antigenic groups is consistent.

      Minor corrections

      Table S1

      A/New_Castle/67/2016 should be A/Newcastle/67/2016

      A/Gambia/2012 is not the full virus name

      Corrected.

      Table S3 has multiple values of exactly 10.0. I think these should be <20 as they are below the threshold of detection for the assay.

      All the values lower than 20 in Table S3 were replaced by “< 20”.

      Line 376: A/Sidney/5/1997 should be A/Sydney/5/1997

      Corrected.

      Line 338: "25 randomly sampled data" is a bit vague, "25 randomly sampled features" would be better

      Corrected.

      Include RMSE of the random forest model.

      RMSE=19.6 RMSE/mean = 0.207 is now mentioned in the manuscript.

      Figure 5 - supplement 1: These plots are difficult to interpret as the aspect ratio is not 1:1, and panels a & b are difficult to compare as they have not been aligned (using a Procrustes analysis). It would be neater if they were labelled with short names.

      We have generated an antigenic cartography map instead. As a consequence, the MDS has become redundant and Figure 5 – supplement 1 was removed.

      Line 562: 98 variable residues, where it is 102 elsewhere in the text.

      There are 4 mutations near the end of the NA stalk domain, which are not resolved in the N2 structure. Therefore, amino acid distances to these residues cannot be calculated.

      No data availability statement. Some of the raw data is available in Table S3 and there is no link to the code.

      The data and code used for generation of rf modelling was uploaded to Github and made available. The following statement has been added to the manuscript: “The data and code used for the generation of the rf model is available at https://github.com/SaelensLAB/RF..”

      Reviewer #2 (Recommendations For The Authors):

      (1) More than 42,000 NA sequences are available for the mentioned period on GISAID, it is therefore important to understand the selection criteria for the 44 strains and if these strains represent the overall genetic diversity of N2 of human A(H3N2) viruses. To demonstrate the representativeness of the 44 selected strains, please construct a representative N2 phylogenetic tree for human A(H3N2) viruses circulated in 2009-2017 and label the 44 selected strains on the tree.

      The selection of antigens was performed using the method described by Bien and Tibshirani 2011 (doi: 10.1198/jasa.2011.tm10183). This method uses MinMax distances to identify a central representative among distinct clusters.

      To facilitate visualization tree only of 180 representative N2 proteins from 2009-2017 were randomly selected (20 strains per year, unlabelled). Those 180 representatives and 44 readout panel strains (labelled) are shown in the phylogenetic tree below. Readout strains cover the major branches of the tree. The tree has been built using PhyML 3.0 using JTT substitution model and default parameters (Guindon S. et al, Systematic Biology 59(3):307-21, 2010) and visualized using ETE3 (Huerta-Cepas J. et al, Mol. Biol. Evol 33(6):1635-38, 2016).

      Author response image 9.

      (2) Double immune ferret sera may increase antibody binding affinity and cross-reactivity against heterologous strains. Using single-infection ferret sera may yield different antigenic grouping results (eg. may identify more antigenic groups). Can the authors repeat the NA antigenic grouping using single-infection ferret sera? Although data from a subset of 5 strains was presented (Figure 2, Figure Supplement 4), the information was not sufficient to support if the use of single-infection or double immune ferret sera will yield similar antigenic grouping results.

      In our ferret immunizations the boost was performed with recombinant, enzymatically active NA that was homologous to the NA of the H1N2 virus that was used for the priming by infection. We determined the NAI responses in sera from ferrets after H1N2 infection against 5 different H6N2 viruses (Figure 2 – figure supplement 5). Compared to NAI responses in sera from H1N2 infected and subsequently NA protein boosted ferrets, the NAI titers obtained after a single infection were considerably lower. Although the normalized NAI titers of day 14 and day 42 sera correlated well, we cannot exclude a degree of broadening of the NAI response in the NA protein boost sera (Figure R6). On the other hand, repeated influenza antigen exposure is the reality for the majority of people.

      (3) NA antigenicity data is presented in heat maps and the authors would often describe the heat map patterns matches without further explanations. Line 234-235, the heat map of mouse sera (Figure 2. Figure supplement 5) was described to match the results of ferret sera (Figure 2), but this tends to be subjective. A correlation analysis of 7 selected antigens showed a positive correlation, what about the other 37 antigens?

      The interpretation of heatmaps is indeed very subjective, for this reason the correlation of the 7 selected antigens was also provided. The other 37 antigens were not tested. Considering the results using post boost sera, a simulation of using random forest modeling indicate that the data from one antigen of each antigenic group is sufficient to achieve a reliable predictive output (R2=0.71) (Figure R3 of this rebuttal).

      (4) Can the authors explain in more detail how data in Figure 4a was generated? According to the authors, residues close to the catalytic pocket are more likely to impact NAI. Can the authors explain how they define if a residue is close to the catalytic pocket?

      The correlation of distances of amino acid residues with significance values is explained as follows. Consider 7 distinct elements that are distributed horizontally as shown by the squares in the figure below (Author response image 10a). The elements highlighted in yellow have a numerical propriety (in case of N2 neuraminidase this was the significance values obtained in the association study). Taking P1 as reference we can calculate the distance (red arrows) between P1 and P2, P4 and P7, those distances can them be correlated to intrinsic values of P2, P4 and P7, which enables the calculation of the correlation coefficient Tau. This same process is repeated for each position (or each amino acid), as a consequence every position will have a correlation coefficient calculated (Author response image 8b). This correlation coefficient can be represented as a heat map at the surface of N2.

      Author response image 10.

      The 2D scheme represents the strategy used to calculate the correlation (i.e. the Tau values) between distances and p-values. Tau values can then be presented in a heat map.

      (5) Can the authors provide experimental data using the three recent A(H3N2) viruses as antigens and perform NAI assay to confirm if they are antigenic all deviating from group 2 viruses?

      The generation of data to determine experimental values for A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021 would require the generation of new reassortant viruses (H1N2s), recombinant protein and immunization of new ferrets. The ferrets sera would have to be analyzed against all 27 H6N2s, including duplicated control sera for normalization. The major point of the modeling was to evaluate if it is possible to predict the antigenic behavior based on amino acid substitutions.

      As an exercise we have run the model again but this time excluding the Swe17 and HK17 antigens from the data set. Sequences of Sw17 or HK17 were then used to predict antigenic distances. The modeled versus experimental data are plotted in Author response image 7 and show a robust predictive outcome with R2 values of 0.94 and 0.91 for Sw17 and HK17, respectively.

      (6) According to Ge et al. 2022 (PMID: 35387078), N2 NA's before 2014 (2007-2013) showed a 329-N-glycosylation and E344, and they were subsequently replaced by H3N2 viruses with E344K and 329 non-glycosylation changing the NI reactivity in ferret antisera towards later strains. Were these residues also predicted to be important to N2 antigenicity from your machine-learning method?

      Three of the N2 NAs used in our panel, A/Victoria/361/2011, A/Hong_Kong/3089/2017, and A/Tennessee/18/2017, lack this N-glycosylation motif. The E344K substitution is present in another 3 NAs, derived from A/Nagano/2153/2017, A/Minnesota/11/2010, and A/Indiana/08/2011. The importance of those mutations is among the lowest ones predicted in our modeling. However, the differences in NAI reported by Ge et al. are low (not even twofold). The experimental variability in our study potentially limits the identification of substitutions with a subtle impact NAI. We have added the following to the discussion in our revised manuscript:

      “It has been reported that an N-glycosylation site at position 329 combined with E344 in NA from human H3N2 viruses from 2007 to 2013 was gradually lost in later H3N2 viruses (Ge et al., 2022). This loss of an N-glycosylation site at position 329 combined with an E344K substitution was associated with a change in NAI reactivity in ferret sera. Three N2 NAs in our panel, derived from A/Victoria/361/2011, A/Hong_Kong/3089/2017, and A/Tennessee/18/2017, lack this N-glycosylation motif. The E344K substitution is present in three other NAs, derived from A/Nagano/2153/2017, A/Minnesota/11/2010, and A/Indiana/08/2011. The importance of those mutations is among the lowest ones predicted by our modeling. However, the differences in NAI reported by Ge et al. are very modest (lower than twofold). The experimental variability in our study potentially limits the identification of substitutions with a subtle impact NAI.”

      Reviewer #3 (Recommendations For The Authors):

      Specific suggestions:

      Line 132: Did the authors confirm the absence of compensatory mutations due to a heterologous H6 background that could potentially confound downstream NAI results?

      All NAs genes of the rescued H6N2 viruses were fully sequenced and were found to be identical to the expected NA sequences, with the only exception being the A/Tasmania/1018/2015 were a mixed population of wt and M467I was found. This substitution is located at the surface and at the top of the NA head domain, and thus could potentially impact NA antigenicity. However, A/Tasmania/1018/2015 H6N2s had a similar inhibition profile as other H6N2s in phylogenetic and antigenic group 1. This indicates that, at least in this mixed population, antigenicity was not drastically affected by the M467I substitution.

      Line 96: how do these data rule out variation in the fraction of properly folded protein across NAs? They certainly show that properly folded NA protein is present, but not whether amounts vary between the different NAs.

      SEC-MALS (size exclusion chromatography-Multiangle light scattering) data and enzymatic activity were considered as a proxy for correctly folded NA. Although the specific activity of the recombinant N2 NAs is expressed per mass unit (microgram), we cannot exclude that the fraction of properly folded protein across the different recombinant NAs may vary.

      Lines 262-269: this analysis approach (based on my reading) seems to consider each polymorphism in isolation and thus does not seem well suited for accounting for epistatic interactions within the NA. For example, the effect of a substitution on NAI may be contingent upon other alleles within NA that are not cleanly segregated between the two serum comparator groups. Can the authors address the potential of epistasis within NA to confound the results shown in Figure 3?

      Unfortunately, epistatic interactions cannot be solved using the panel of N2 selected for the study. This limitation is mentioned in our discussion:

      “It is important to highlight that co-occurring substitutions in our panel (the ones present in the main branches of the phylogenetic tree) cannot be individually assessed by association analysis or the random forest model. The individual weight of those mutation on NA drift thus remains to be experimentally demonstrated.”

      Line 331: is there a way to visualize and/or quantify how these two plots (F5 supplement 1a/b) reflect each other or not? Without this, it is hard to ascertain how they relate to each other.

      We have generated an antigenic cartography map instead. As a consequence, the MDS has become redundant and Figure 5 – supplement 1 was removed.

      Figure 4B structural images are not well labelled.

      The active site in 1 of the protomers is now indicated with an arrow in the top and side views of the NA tetramer.

      Lines 339-359: the ML predictions are just predictions and kind of meaningless without experimental validation of the predicted antigenic differences between recent NAs. This section would also be strengthened by an assessment of whether the ML approach obtains more accurate results than simply using phylogeny to predict antigenic relationships.

      Indeed, there is no experimental data from A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021. The generation of data to determine experimental values for A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021 would require the generation of new reassortant viruses (H1N2s), recombinant protein and immunization of new ferrets. The ferrets sera would have to be analyzed against all 27 H6N2s, including duplicated control sera for normalization. The major point of the modeling was to evaluate if it is possible to predict the antigenic behavior based on amino acid substitutions.

      As an exercise we have run the model again but this time excluding the Swe17 and HK17 antigens from the data set. Sequences of Sw17 or HK17 were then used to predict antigenic distances. The modeled versus experimental data are plotted in figure R7 and show a robust predictive outcome with R2 values of 0.94 and 0.91 for Sw17 and HK17, respectively. A major advantage of antigenic modeling is the potential to rank or indicate major antigenic divergences based on available sequences before it has consolidated as new clade. The support in selecting or designing broader reactive antigens is another advantage of machine learning analysis.

      Lines 416-421: appreciate the direct comparison of results obtained from ferrets versus mice.

      We thank the reviewer for expressing this appreciation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      R1-01 - Does ank-G-GFP label all isoforms (190, 270 and 480kDa) of ankG? From the images of the AIS and noR it appears that the large forms (270 and 480 kDa) are probably tagged with GFP. Did the authors check for puncta along dendrites and in dendritic spines, which are thought to be formed by the small (190 kDa) isoform? Perhaps a western blot to show that Ank-G-GFP labels all isoforms would be a useful addition to this study.

      We believe that AnkG-GFP indeed labels the major Ank3 transcripts in the brain, including the 190, 270, and 480 kDa isoforms, based both on known mRNA exon usage and on Western blot analysis (data not shown). Thus, theoretically, this model would be useful for examining the localization of 190 kD ankyrin-G to dendritic spines. While we attempted to examine this in sections from tissue, it was difficult to separate punctate ankyrinG-GFP labeling from the background. However, these experiments were done in genetic crosses that would label most pyramidal neurons in a given area (i.e. CaMKIIa-Cre). Given the Cre-dependence of this model, future experiments could utilize sparse transduction with a Cre virus that also fills neurons with soluble fluorophores (i.e. mCherry or tdTomato) to mark isolated neurons and identify dendritic spines, as exemplified in Fig. 2D. This would allow examination of subcellular localization of ankyrin-G within single pyramidal cells before and after induction of synaptic plasticity.

      R1-02 - In Figure 2, does all the native Ank-G get replaced by Ank-G-GFP? In Fig. 2E the GFP signal along the AIS of CamKII +ve neurons does not appear to be very homogeneous compared to the BIV-spectrin label. Have the authors carried out more experiments like those in 2F, using antibodies that label AnkG together with the GFP fluorescence of the labeled AnkG? It would also be informative to know if, as one might expect, the total levels of ankG-GFP correlate with the levels of ankG at the AIS.

      We agree that this is an important point and conducted additional experiments to address your concerns. Of course, we cannot exclude that some unmodified ankyrin-G remains in the AIS or other structures. We expect the turnover of the protein to be rather slow, and native ankyrin-G likely remains to some degree. However, our quantification demonstrates that the ankyrin-G-GFP labeling is sufficiently homogeneous to accurately represent AIS size, indicating proportional levels of GFP to native ankyrin-G. Animals were crossed with a CaMKIIa-Cre driver line and ex vivo slices were imaged live and after immunolabeling. We found a strong correlation between live ankyrin-G-GFP (patch clamp chamber), postfix ankyrin-G-GFP, postfix ankyrin-G, and βIV-spectrin immunosignals of the same AIS. Furthermore, our measurements of AIS length using the intrinsic GFP signal in combination with ankyrin-G, or βIV-spectrin antibodies showed significant overlap (see R103). We now included these graphs as supplemental Fig. S2 in the manuscript (pp. 8-9, ll. 173-177).

      R1-03 - Does the length and position of the AIS change when Ank-G is tagged with GFP? This seems like important information that is needed to make sure that there are no structural differences in AIS morphology when compared to native Ank-G.

      This is a very important point. We used the βIV-spectrin signal to compare the length of AIS with and without GFP modification in acute slices after patch-clamp recordings (N= 3 animals, 27 GFP+ and 48 GFP- AIS). As secondary control, we plotted the measurements of 160 AIS from a Thy1-GFP mouse line (N = 3 animals, 160 AIS). We found no significant difference in the length and position of the βIV-spectrin signal between GFP positive and negative AIS (p=0.3364 unpaired t-test, p=0.6138 non-parametric Mann-Whitney test, respectively). We have now included this analysis as Supplemental Fig. S2A in the manuscript (pp. 8-9, ll. 173-177). 

      R1-04 - How was node length measured in Figure 3? Was this done using the endogenous ank-G signal? In this figure, it would be informative to also quantify the number of noRs with a Nav1.6 stain. Perhaps even check if there are correlations between Ank-G-GFP and Nav1.6 levels. In this figure, it appears that comparisons are carried out between Ank-G-GFP +ve and -ve neurons in the same cryosections, from Ank-G-GFP mice crossed with CamKIIa-Cre. I worry that this may not be comparing the same types of axons. What cells do the CamKIIa -ve axons belong to? Also, the labels on the bar graph are confusing - perhaps GFP+ve and GFP-ve would be clearer?

      The reviewer raises an important point. We forgot to declare the signal which was used to measure node length in the manuscript. We have corrected this error and clearly state now in the Fig.3C legend that we used the ankyrin-G signal to quantify node length. Furthermore, using CaMKIIa-Cre mediated expression triggers ankyrin-G-GFP only in a genetically defined subset of neurons. Nodes that do not belong to this subgroup might very well have different node properties. Yet, we cannot assign potential differences in node length to the presence or absence of the GFP label, since we do not have an independent labeling technique for the very same subset of neurons. Since node lengths were similar and showed the same spread of lengths in our sample (Fig. 3C), we assume that the GFP length does probably not affect node length to a significant degree. We have now discussed this limitation in the result (p. 7, ll. 159-165) and method section (p. 30, ll. 644-645) and provide Supplementary Fig. S1 for more clarity. As suggested by the reviewer, we have measured mean fluorescence intensities between 91 GFP+ and 141 GFP- nodes using automated image processing in Imaris. The nodes were again defined by the ankyrin-G signal. We found no difference in length and ellipticity between the groups. We repeated this analysis and compared fluorescence intensities of Nav1.6 and ankyrin-G antibodies and again found no statistical differences between both groups. As suggested by the reviewer, we investigated whether ankyrin-G-GFP interferes with the fluorescence intensities of sodium channels (Nav1.6) and ankyrin-G in general. While the GFP signal showed a strong correlation with ankyrin-G, we found no interdependence with the Nav1.6 signal, indicating that the GFP label does not interfere with the general molecular composition of the nodes. We included these new analyses in Supplemental Fig. S1 (p. 7, ll. 159-165).

      R1-05 - In Figure 4 it would also be important to show the distribution of AIS molecules along the AIS, compared to the GFP signal, to establish whether this spatial arrangement of AIS-specific molecules remains intact. For example, Nav1.6 has been described as a more distally-located channel. As the authors point out, the example in A appears to show precisely this feature, but there is no quantification. The same applies to Kv1.2. This would also allow the authors to provide some quantification across multiple AISs, rather than just example images.

      We agree that quantifying and comparing AIS-associated proteins would be informative. We measured the intensity profiles of Nav1.6 and Kv2.1 in neighboring AIS and found no preferences for either end of the AIS, neither of GFP-positive nor GFP-negative AIS. We want to note that not all neurons exhibit a distal localization of Nav1.6 and hypothesize that our samples (neocortex layer II) also fall into this group. We included this new graph as Supplemental Fig. S2D and E in the manuscript (p. 9, ll. 180-184).

      R1-08 - In Figure 4, did the +Cre condition result in all cells showing a GFP-labelled AIS? If not, were the autocorrelations for +Cre-treated neurons done specifically on cells that expressed AnkG-GFP?

      We assume the reviewer refers to the autocorrelation in Figure 6. In this in vitro paradigm, we used virus-induced Cre expression which triggered ankyrin-G-GFP in almost all neurons. The orange boxplots describe the autocorrelation of all ankyrin-G, using a C-terminal antibody as in Fig.6C, but in neurons that also express ankyrin-G-GFP. The green samples use the GFP signal of ankyrin-GFP. We clarified this in the graph and legend of Fig. 6C (pages 14-15).

      R1-09 - As mentioned above in Figure 3, the comparisons in Figure 5 (GFP +ve and -ve neurons) may not be comparing like-for-like neurons. I imagine that many of the CamKII+ve cells in the cortex and hippocampus will be GABAergic interneurons, whereas presumably all of the CamKII+ve neurons will be pyramidal cells. Have the authors made sure that they are comparing across the same cell types? The fact that the number of axo-axonic synapses is similar across the two populations (Fig. 5B) does suggest that similar neuron types (presumably pyramidal cells) were compared in the hippocampus, but some other way of making sure would be a nice addition.

      We agree with the reviewer that the grey and green boxes are not sampled from the same subset of neurons, since only CaMKIIa-positive principal cells will express ankyrin-G-GFP. However, we are confident that the selected AIS belong to pyramidal neurons in both cases. Principal neurons can be well distinguished from interneurons not only by the size, shape, and position of their somas but also by the length and thickness of their AIS. We have performed previous studies on the AIS of interneurons using genetic GAD and parvalbumin markers. Thus, we are confident that the plots in 5A and 5B are sampled from pyramidal neurons, though certainly from genetically different subsets. We now highlight and discuss this limitation in the result section (p. 11, ll. 215-217) and modified the graph in Fig. 5A and 5B for clarity.

      R1-10 - In Figure 6, what was the promoter for the DCre and Cre+ lentivirus? Was this also driven by CamKIIa? In culture it is not always easy to be sure of neuronal identity - did the authors try to bias their analysis to specific neuronal types?

      Indeed, the nature of the promotor was not stated in the legend or method section, which we now corrected. We used lentiviral FUW-nGFP-Cre and FUW-nGFP-ΔCre constructs to trigger ankyrin-G-GFP expression. Both viruses use the CMV (Cytomegalovirus) promoter, which drives constitutively high levels of gene expression in a wide range of cell types, including neuronal cells. The majority of neurons in dissociated hippocampal cultures are excitatory, especially larger cells with larger AIS, which were preferably used in the analysis. Thus, we cannot claim that AIS nanostructure is intact in cultured interneurons, but this is also true for in vivo conditions in general. Since mice did not show any obvious behavioral phenotypes, we are positive that interneuron functionality is preserved. We also note that the parallel expression of nuclear GFP in the infected neurons was undesired, but did not impact STED imaging due to that technique’s high resolution. 

      R1-11 - The ability to visualize the plasticity of the AIS in real-time is an important advance in the field. The loss of proximal Ank-G-GFP signal upon local application of 15 mM KCl is particularly interesting. The fact that neighboring AISs are not affected is surprising - do the authors know how local their KCl application was? Also, although the neighboring AISs are a nice control, the one control lacking here is the local application of normal solution (preferably 15 mM NaCl to account for osmolarity changes) to make sure that this does not affect the properties of the AIS.

      We used KCl puffs in previous, unrelated experiments where we observed that only cells directly in front of the pipette are visibly depolarized by an acute KCl puff (measured by patch-clamp). Due to technical limitations, patched and live imaged neurons were generally in the first 2-5 cell layers of the brain slice, which is well perfused by the constant flow of oxygenated ACSF. KCl is thus quickly diluted and carried away. We have visualized the concentration gradients via puff application by puffing the fluorescent marker fluorescein in the same recording condition. The cone of fluorescence was only visible in front of the pipette and vanished in less than a second post-pressure application. To verify that it is indeed KCl and not the mechanical stress that lead to the loss of proximal Ank-G-GFP, one would indeed need an ACSF puff control, which we did for other studies. However, this is not the point we wanted to make. Instead of studying live single-cell AIS plasticity, we want to demonstrate that such investigations are generally possible using the ankyrin-G-GFP line.

      Author response image 1.

      R1-12 - The ability to be able to image AISs in vivo is another important finding. Were the authors able to image noRs as well?

      We believe that this is indeed the case. The panels in Figure 9C contain densely labeled puncta that also remain in position from week 1 to week 2. These are likely nodes of Ranvier, although we do not have the means to verify their presence at this time.

      Reviewer #2:

      R2-01 - Are there indeed different Ank-G-GFP isoforms expressed in this model and could they correspond to classical neuronal Ank-G isoforms?

      This is an important issue that was also raised by reviewer #1. Please consult the respective section R1-01 above for our response.

      R2-02 - What is the rationale of doing Ank-G co-labelling in the case of Ank-G-GFP expression, rather than Pan-Nav staining for example? The co-staining with Nav1.6 antibody, when present, is however convincing.

      We used the co-labeling to emphasize that the ankyrin-G-GFP construct allows reliable investigation of the whole AIS. This is why we wanted to demonstrate that the ankyrin-G-GFP signal overlaps with other AIS markers, as well as all ankyrin-G in general (including potentially remaining native and unlabeled ankyrin-G). This was also a point raised by Reviewer 1, which is why we provided some additional graphs (see response R1-02). However, we agree that staining with another independent marker, such as Nav1.6 or βIVspectrin was necessary. 

      R2-03 - Figure 2D and F: what is the rationale for not using betaIV-Spectrin staining as in the other panels of this figure? Furthermore, could betaIV-Spectrin localization be affected by Ank-GGFP expression, as betaIV-Spectrin is known to depend on Ank-G for its AIS targeting? Are there any other AIS markers, which localization is known to be independent of Ank-G, that could have been used?

      We have compiled this figure from a multitude of different experimental setups from different labs to showcase the reliability and robustness of the ankyrin-G-GFP label. This is why the type of staining is not consistent among panels. However, we provide some quantification on the possible impact of ankyrin-G-GFP expression on the βIV-spectrin signal and the composition of the AIS in general. The STED image verifies that the basic subcellular arrangement of the cytoskeleton, including βIV-spectrin, remains intact (Fig. 6). Most AIS markers are at least in some way dependent on ankyrin-G expression, but FGF14 and neurofascin may be the most independent candidates (Fig. 4).

      R2-04 - Did the authors measure the mean AIS length and distance from cell soma in Ank-G-GFPexpressing neurons versus non-expressing ones (considering the same neuronal subtypes) to assess whether these were unaffected by Ank-G-GFP expression?

      This is an important point that was also raised by Reviewer 1 (see also our comments to R1-03). We have included this analysis now in the manuscript as Supplemental Fig. S2A (pp. 8-9, ll. 173-177).

      R2-05 - Figure 5C: the microglial staining and 3D reconstruction could have been clearer.

      We have modified the image and 3D rendering to make Figure 5C clearer to the reader. We hope that our changes suffice.

      R2-06 - Figure 8: do hippocampal neurons retain their electrophysiological properties after 20 DIV? It could strengthen this part of the work to have access to the electrophysiological data mentioned in the text. 

      This is an important issue. We did not perform any electrophysiological recordings in OTCs in the course of this study. Panel E uses acute hippocampal slices like in Fig. 7. We have performed patch-clamp experiments up to DIV 10 for an unrelated study (see graph for action potential firing, Author response image 2). There are not many studies performing electrophysiology in slice cultures due to the formation of a glial scar on top of the slices. However, multielectrode array (MEA) recordings demonstrated that hippocampal organotypic slice cultures remain viable and show electric activity past DIV 20 (though with decreased viability and activity). We kindly refer to the following publications on that matter:

      Author response image 2.

      Sample traces of action potentials triggered by cuttrent injections

      Gong W, Senčar J, Bakkum DJ, Jäckel D, Obien ME, Radivojevic M, Hierlemann AR. Multiple SingleUnit Long-Term Tracking on Organotypic Hippocampal Slices Using High-Density Microelectrode Arrays. Front Neurosci. 2016 Nov 22;10:537. doi: 10.3389/fnins.2016.00537. PMID: 27920665; PMCID: PMC5118563.

      Mohajerani MH, Cherubini E. Spontaneous recurrent network activity in organotypic rat hippocampal slices. Eur J Neurosci. 2005 Jul;22(1):107-18. doi: 10.1111/j.1460-9568.2005.04198.x. PMID: 16029200.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Tesmer and colleagues uses fiber photometry recordings, sophisticated analysis of movement, and deep learning algorithms to provide compelling evidence that activity in hypothalamic hypocretin/orexin neurons (HONs) correlates with net body movement over multiple behaviors. By examining projection targets, the authors show that hypocretin/orexin release differs in projection targets to the locus coeruleus and substantia nigra, pars compacta. Ablation of HONs does not cause differences in the power spectra of movements. The movement-tracking ability of HONs is independent of HON activity that correlates with blood glucose levels. Finally, the authors show that body movement is not encoded to the same extent in other neural populations.

      Strengths:

      The major strengths of the study are the combination of fiber photometry recordings, analysis of movement in head-fixed mice, and sophisticated classification of movement using deep learning algorithms. The experiments seem to be well performed, and the data are well presented, visually. The data support the main conclusions of the manuscript.

      We thank the reviewer for their supportive feedback.

      Weaknesses:

      The weaknesses are minor, mostly consisting of writing and data visualization throughout the manuscript. To some degree, it is already known that hypocretin/orexin neurons correlate with movement and arousal, although this manuscript studies this correlation with unprecedented sophistication and scale. It is also unfortunate that most of the experiments throughout the study were only performed in male mice. Taken together, this study is likely to be impactful to the field and our understanding of HONs across behavioral states.

      We agree that disentangling movement from arousal is an important aspect, and in the revised manuscript, we now include new data and analyses towards this (pupillometry to directly assess arousal, and multivariate analysis to assess contributions of arousal vs movemement to HON activity). In addition, we now implement many of the reviewer’s recommendations regarding writing, data presentation, and visual clarity (see our replies in the “recommendations for authors” section).

      Reviewer #1 (Recommendations for the authors):

      Some recommendations for the authors:

      (1) The first sentence of the Introduction states: "Neural activity related to body movement recently received much attention." I would rephrase or clarify this statement, as neuroscientists have been studying neural activity related to body movement for decades.

      The reviewer is correct. Our intention was to highlight the resurgence of movementrelated neurosciences enabled by modern techniques such as deep learning applied to video data (e.g. DeepLabCut, etc). The passage has been updated for clarity.

      (2) The Introduction also states that HONs orchestrate "consciousness and arousal." I would delete the word "consciousness," as consciousness represents a lofty, global concept that is challenging to define and quantify in humans, let alone mice.

      We used the word consciousness to be consistent with current literature on the function of the mouse hypothalamus (e.g. Nat Neurosci 2016 Feb;19(2):290-8). But we agree it is not necessary here, and so we followed the advice to delete it.

      (3) The authors state that HON dynamics were recorded while mice were head-fixed while on a running wheel. For clarity, it would be helpful to visualize this head-fixation in Figures 1A and 5B. It would also be helpful to clarify how certain behaviors (e.g. grooming, chewing) were performed and recorded while the mouse was head-fixed.

      In the revised manuscript, updated graphics with a head-fixed mouse have now been added to relevant figures. Representative RGB frames (colors representing sequential frames) of each behaviour have been added to Figure 2A.

      (4) In the legend for Figure 1A, the reference to Gonzalez et al. 2016 seems out of place (at least the reader should be informed why the text is referring to this previous study). Additionally, because the references are ordered by number instead of alphabetically, it would be more helpful to refer to a numbered reference rather than a name.

      Gonzalez et al. 2016 references the source of the AAV construct used in this figure. This has been moved to the methods. Following eLife formatting guidelines, references will be alphabetized upon publication.

      (5) In Figure 3F, it would be helpful to show visual validation that the HON-DTR method indeed ablates all HONs. This is depicted conceptually, but representative figures would be much more convincing.

      A representative histological slice is now included for both wild type (WT) and HON-DTR mice in the new Figure 4B.

      Reviewer #2 (Public review):

      Summary:

      Despite several methodological strengths, the major and highly significant drawback is the confound of arousal with movement. This confound is not resolved, so the results could be explained by previously established relationships between orexin and arousal/wakefulness.

      This an excellent point, and we agree. To address this directly in the revised manuscript, we now include new data and analyses towards this (pupillometry to directly assess arousal, and multivariate analysis to assess contributions of arousal vs movemement to HON activity).

      Strengths:

      The authors show that orexin neuron activity is associated with body movement and that this information is conveyed irrespective of the fasted state. They also report differences in different orexin target brain regions for orexin release during movement. This paper contains an impressive array of cutting-edge techniques to examine a very important brain system, the orexin-hypocretin system. The authors offer an original perspective on the function of this system. The authors showed that orexin neuron activity scales to some degree with the magnitude of body movement change; this is unaffected by a fasted state and seems to be somewhat unique to orexin neurons.

      The investigation of other genetically defined subcortical neuron populations to determine the specificity of findings is also a strength, as is the ability to quantify movement and use deep learning to classify specific behaviors adds sophistication to analysis. The authors also show heterogeneity in orexin projections to specific target nuclei, which is interesting.

      The authors "speculate that narcolepsy-cataplexy, caused by HON loss-of-function, is perhaps explained by oscillations into unwanted sleep-states and motor programs due to impaired control loops for wakefulness and movement". This is quite an interesting aspect of their work and deserving of further study.

      We thank the reviewer for their supportive feedback.

      Weaknesses:

      Despite the strengths, there are several major and minor weaknesses that detract significantly from the study.

      My main concern with this work is the confound of arousal with movement so that correlations with one might reflect a relationship instead with the other. The orexin system is well known to play an important role in arousal, with elevated activity of orexin neurons reported for waking and high arousal. Orexin signaling has also been strongly associated with motivation, which also is associated with arousal and movement. The authors offer no compelling evidence that the relationships they describe between different movements and orexin signaling do not simply reflect the known relationship between arousal and motivation.

      The authors could address this concern by including classical arousal measurements, eg, cortical EEG recorded simultaneously with movements. Often, EEG arousal occurs independently of movement, so this could provide one approach to disentangling this confound. The idea that orexin signaling plays a role in arousal rather than movement is supported by their finding that orexin lesions using the orexin-DTR mouse model did not impact movements. In contrast, prior lesion and pharmacologic studies have found that decreased orexin signaling significantly decreases arousal and waking.

      Another way they could test their idea would be to paralyze and respirate animals so that orexin activity could be recorded without movement. Alternatively, animals could be trained to remain motionless to receive a reward. Thus, there are several ways to test the overall hypothesis of this work that have not been examined here.

      The authors propose that "a simple interpretation of their results is that, via HON movement tracking, the brain creates a "wake up" signal in proportion to movement". This seems to argue for the role of the orexin system in arousal and motivation rather than in movement per se.

      Thank you. We agree that disentangling between arousal and movement is indeed critical. A classic approach is a multivariate analysis, wherein multiple simultaneously recorded “predictors” of HON activity – such as arousal and movement - can be directly compared. While EEG arousal is an option, another well-accepted metric for arousal is pupil diameter. Using n = 7 mice, we now simultaneously record HON activity, movement, running speed, pupil size fluctuations, and ocular movements:

      We then fit a partial least squares multivariate regression (a regression type more robust to collinearity) using the movement metric, pupil size, and ocular movements as predictors of orexin neuron activity. Consistent with previous publications, we found that pupil size alone has a positive correlation with hORX.GCaMP6s (~0.45). However, using a drop-one feature analysis in multivariate regression, we found that movement had the highest % contribution to statistically explaining orexin neuron activity. Here are the new results (which we now added as Fig. 7A-B).

      Author response image 1.

      Furthermore, we also expanded this analysis to incorporate the different frequencies found in HON dynamics, using empirical mode decomposition. We found that pupil size had a maximum correlation at lower HON frequencies than the movement metric, while ocular movements were maximally correlated in higher frequencies (now added as Fig. 7D,E).

      Overall, this analysis suggests that – while HONs encode both movement and arousal – arousal and movement do not always co-fluctuate at the same timescales, and their impacts on HONs can be disentangled in a number of ways. We now mention this in revised text on page 5.

      There are several studies that have examined the effect of orexin antagonist treatment in rodents on locomotor and other motor activities. These studies have largely found no consistent effect of antagonizing orexin signaling, especially at the OxR1 receptor, on simple motor activity. These studies are not referenced here but should be taken into account in the authors' conclusions.

      We agree. Prior studies found that orexin antagonism – or optogenetic silencing of HONs – evokes either reduced locomotion, or no effect on locomotor movements. We now added text and references to paragraph 4 of Discussion, summarising this.

      Figure 3, panel F: I understand HON-DTR is a validated model but a picture of HONs ablation is necessary, including pictures of HONs outputs ablation within the SNc and LC.

      A representative histological slice is now included for both wild type (WT) and HON-DTR mice in the new Figure 4B. Because HONs are only found in the hypothalamus, somatic deletion of HONs in this region will result in axonal degradation in output regions.

      The discussion lacks a more extensive paragraph on the distinct signal and role of Ox>SNc and Ox-LC projections.

      We now added sentences discussing potential implications of this to Discussion (middle of paragraph 4).

      Reviewer #2 (Recommendations for the authors):

      Minor weaknesses

      A very important movement in rodents is head orientation, especially given the limitation in ocular movement. However, this paper used a fixed head model which obviated this movement and did not attempt to analyze ocular movements.

      Analysing ocular movements is something we had not considered but is very easy to check using pupillometry. In n = 7 mice, we recorded both orexin neurons, and ocular movements captured through an infrared camera under constant lighting. Ocular movements had a small positive correlation with orexin neuron photometry (r = ~0.26). See response to the public review above.

      Author response image 2.

      The "HON" abbreviation is not commonly used for orexin neurons, and I suggest replacing that with a more well-known abbreviation.

      To the best of our knowledge, there is no universally agreed or best-known abbreviation for hypocretin/orexin neurons (we agree it would be nice if there was one!). “HONs” is a simple first letter abbreviation of hypocretin/orexin neurons, which acknowledges the two names for this peptide given by the original discoverers (de Lecea et al, and Sakurai et al, in 1998). Although this may not be the perfect abbreviation, we have kept it for now, also to be consistent with the large number (>10) of other published studies that recently used this abbreviation.

      The graphs showing Pearson's r values do not demonstrate a very strong correlation between neural activity and movement change; they also lack validation of genetic expression/ablation in some cases. The results would more strongly support the conclusions if statistically significant correlations could be demonstrated between activity and movement.

      We agree that a correlation of ~0.68 is probably not worthy of a “very strong” classification. While there is no universal ruleset for categorizing the strength of a correlation, we have toned down our language throughout the manuscript.

      Comment regarding statistical testing of correlations: we are cautious to stand behind correlation significance testing for large sample sizes (~48’000 photometry & video samples in a 40-minute session). In our case, correlations were always extremely significant p<0.0001. The reason for this is that correlation p-values become “too big to fail” (see Lin et al. 2013) with inflated sample size. We therefore refrain from commenting on p-values and rather report between or within-subjects statistical tests, or tests against zero. See four example experiments below.

      Author response image 3.

      Citation: Lin, M., Lucas, H. C., Jr & Shmueli, G. Research Commentary—Too Big to Fail: Large Samples and the p-Value Problem. Information Systems Research 24, 906–917 (2013).

      The rationale for looking at running speed, general movement, and specific types of nonlocomotor movements could be clarified and explained more thoroughly in the introduction. Why is it important to distinguish between locomotion (represented here with running) and all other movements? Presumably, this is because orexin is known to regulate arousal/locomotion. What evidence is there for orexin's role in other types of movements, which are being grouped together in Figure 1? This could be laid out in more detail in the Introduction. Relatedly, it is not very clear in the text whether the correlation between movement and orexin neuron activity includes movement related to running.

      The main focus of our paper is on movement in general (i.e. video pixel difference, described in Results and Methods). This movement metric includes everything captured by the video, it is agnostic to the type of movement or behaviour.  To connect this to some of the specific innate movements/behaviours typically studied in mouse literature (running, grooming, sniffing, etc), we also performed plots in Figure 2. We attempted to explain this better in revised section 1 of Results.

      What exactly is being correlated in Figure 1C (and throughout the rest of the paper?) Is this the average signal correlated with the average movement change over the entire recording time? This could be more explicitly stated in methods/results. The correlations themselves/p-values could be shown in addition to/instead of Pearson's r values. Are the correlations themselves significant? This would strengthen the claim that orexin activity is strongly coupled to the magnitude of body movement change. As another example, in Figure 2D, there are no statistics reported on the correlation between movement metric and average neural signal. In Figure 6G, orexin neuron activity is more strongly correlated with movement than MVe glut neurons, but are either of these correlations significant? The correlation between MVe glut activity and movement overall seems similar to that of orexin neurons, and may be worth noting more explicitly.

      Throughout the paper, we have recorded both neural activity (photometry) and movement at 20 Hz. This would generate, for example, 48’000 samples of photometry and movement from a 40-minute session. All the samples were used to calculate a pearson’s r between variables. To clarify this, we now added the subtext “wholesession” to relevant figures, as well as a clarification in the methods.

      Individual experiment correlations for orexin neurons and MVe glut neurons were always significant p<0.0001, even after a Bonferroni multiple comparisons correction was applied to each population. See the “too big to fail” nature of correlation hypothesis testing above.

      It could be made clearer at the end of Figure 2 that orexin neuron activity is tracking the magnitude of movement change (shown in Figure 2D), not that it is encoding different types of movement.

      We intended for original Figure 2E to illustrate this concept, however this panel has caused a great deal of confusion to several readers and was perhaps ill conceived. We have replaced Figure 2E with a new panel more directly addressing the reviewer’s statement. We can construct three models where orexin neuron activity is predicted from the behavioral classification (sometimes called “one-hot” encoding) and/or the movement metric.

      Model 1 predicts orexin neuron activity using only a categorical predictor of behavioral state. Model 2 only uses the movement metric, and model 3 allows a different movement-metric correlation within each behavioral state. We can compare these models using AIC (Akaike Information Criterion) which is a point estimate. While the most complex model 3 was the best, model 2 was much closer to model 3 than model 1. Similarly, model 2 was much better than model 1. From this we conclude that the magnitude of movement change is a more powerful predictor than behavioral state (“type of movement”). This is now Figure 2E.

      It would be interesting to see the raw movement metric data as shown in Figures 1 and 2 in the DTR mice to show that ablating orexin neurons does not impair the movement profile seen in Figures 1 and 2.

      The requested visualization has been added to Figure 4B.

      Validation that orexin was selectively ablated in these mice would be ideal.

      Histology (see response to public review) was added to a new Figure 4B.

      Figure 4A - OxLight expression in SNc does not look very robust.

      Please note this is a membrane-targeted indicator, the staining this produces is thus much weaker than cyctosolic indicators such as calcium indicator GCaMP.

      Figure 4 - It would be beneficial to see the same correlations that were done in Figures 1 and 2 to show OxLight activity vs. movement metric. Are they correlated?

      Individual traces had significant correlations with OxLight and movement, and the population averages revealed similar trends:

      Author response image 4.

      Figure 6B - Targeting of MVe neurons does not look very specific. The sample size for orexintargeted mice should be re-stated in the figure legend for clarity.

      Legend has been updated to clarify n = 15 for orexin targeted mice.

      Some citations didn't seem to match what was being referenced in the text. Similarly, in the legend for Figure 1C, the statistics do not match what is reported in the text. In Figure 1, the sample size is not noted in the text. When referring to running in Figure 1, is this referring to running speed? Perhaps the language could be more consistent.

      These typos (due to a rounding error) in the legend and text have been corrected. Sample size has been added to the text, and we have changed Figure 1D to clarify we are referring to running speed. We moved some citations to improve clarity.

      Methods - where were Cre mice obtained from?

      Sources now better referenced in Methods (JAX or Parlato et al).

      Figure 1, panel C: The authors compared Pearson's r-coefficient results for each animal and for each variable. However, it would be interesting to show the correlation curves for each variable. However, it would be interesting to show the correlation curves for each variable as well here. Also, there is mention of a strong correlation but it is unclear whether these correlations are significant.

      See below for an example mouse.

      Author response image 5.

      Figure 3, panel F: I understand HON-DTR is a validated model but a picture orexin ablation is necessary, including pictures of orexin fibers ablation within the SNc and LC.

      See our reply to the public review above.

      Figure 5, Panel A: Same comment as Figure 1, panel C.

      We have similarly clarified the panel and legend.

      Page 4: The authors mention "Within the 1st and 4th quartile of blood glucose, movement-HON correlations were not significantly different. Please add the figures.

      The requested plot has been added to Figure 6, panel G.

      Reviewer #3 (Public review):

      Summary

      The study presents an investigation into how hypothalamic orexin neurons (HONs) track body movement with high precision. Using techniques including fiber photometry, video-based movement metrics, and empirical mode decomposition (EMD), the authors demonstrate that HONs encode net body movement consistently across a range of behaviors and metabolic states. They test the ability of HONs to track body movement to that of other subcortical neural populations, from which they distinguish HONs activity from other subcortical neural populations.

      Strengths:

      The study characterizes HONs activity as key indicators of movement and arousal, and this method may have potential implications for understanding sleep disorders, energy regulation, and brain-body coordination. Overall, I think this is a very interesting story, with novel findings and implications about sensorimotor systems in animals. The manuscript is clearly written and the evidence presented is rigorous. The conclusions are well supported by experimental data with clear statistical analyses.

      We thank the reviewer for their supportive feedback.

      Weaknesses/suggestions:

      There are a couple of issues I think the authors could address to make the paper better and more complete:

      (1) The study primarily focuses on steady-state behaviors. It would be interesting if the authors' current dataset allows analyses of HON dynamics during transitions between behavioral states (e.g., resting to running or grooming to sniffing). This could provide additional insights into how HONs adapt to rapid changes in body movement.

      This is a fantastic idea, and easy to check using our classification CNN. We identified the six most frequent behavioral transitions and plotted them in Figure 2H. HONs show rapid dynamics in activity aligned with behavioral changes.

      These changes are very similar to the movement magnitude along these transitions, which is now also plotted in Figure 2G.

      (2) Given the established role of HONs in arousal and wakefulness, the study could further investigate how movement-related HON dynamics interact with arousal states. For example, does HON encoding of movement differ during sleep versus wakefulness?

      To further investigate how movement encoding interacts with arousal, we now include quantification and analysis of pupil-linked arousal (see new Figure 7). We agree it would be interesting to look at what happens during sleep, especially REM sleep when some HONs are thought to be active where there is no/little body movement, but this is beyond the scope of the present study.

      (3) Although HON ablation experiments suggest that HONs do not shape movement frequency profiles. It would be more compelling if the authors could investigate whether HONs contribute to specific types of movements (e.g., fine motor vs. gross motor movements) or modulate movement initiation thresholds.

      We performed this analysis using the k-means classifier for small/large movements. Consistent with previous results, we found no significant effect (p = 0.2767) of genotype on the frequency of identified small (fine) or large (gross) movement clusters. This plot has been added to Figure 4E.

      (4) The heterogeneous movement-related orexin dynamics observed in the LC and SNc raise intriguing questions about the circuit-level mechanisms underlying these differences. Optogenetic or chemogenetic manipulation of these projections could validate the functional implications of these dynamics.

      We agree. We now discuss some implications of this in revised Discussion (paragraph 4). Please note that previous work already demonstrated that orexin action in the SNc can produce locomotion (referenced in the paragraph), though we agree that further work would be valuable.

      Reviewer #3 (Recommendations for the authors):

      Additional feedback:

      (1) Figure 1C: the individual data points are hard to track or see. Consider using a larger marker face to help data visualization. Similar issues can be found in Figures 2C, 2E, 5E, 6C, 6F, and 6G.

      Thickness of the lines and scatterplots have been increased.

      (2) First Section of Results: the authors claim to use a deep-learning network to automatically classify video recordings into five distinct behaviors. However, several issues need to be addressed here:

      a. In Results, the corresponding sentence lacks a reference to the Methods Section.

      Reference has been added to the text.

      b. In Methods, the description of the CNN model is quite limited, lacking many basic, necessary components including necessary references to published papers, the model training, characterization (only an overall accuracy is not enough), as well as dataset definition, preparation, augmentation (if any), etc.

      We have expanded the methods section regarding the CNN model.

      (3) First Section of Results: in the second paragraph, the authors claim that "Overall, these results reveal HON population activity precisely tracks a general degree of body movement across recorded behaviors." This is not accurate. To indicate that HONs activity tracks the general degree of body movement across behavior states, they need to further show that behavioral states with similar levels of movement metrics can be differentiated via HON activities. However, as they showed in Figure 2D, some behaviors with similar values of movement metric do not seem to be easily discerned by HON activity levels.

      We agree with you, and this is also what we originally intended to convey – now reworded for clarity.

      (4) Technical issue: Figures 3B, 3C, 3G, using local regression to plot the solid lines makes them touch negative values, which does not make sense for "power proportion" (this quantity is always non-negative).

      This is a good point. To fix this, we first log-transformed the power metric, then performed a local regression, and used the link function to transform the model predictions back to %-units for visualization. This has been noted in the methods.

      (5) Figure 3G: For a better comparison, consider combining the two plots into a single plot.

      The two plots have been merged as shown in Figure 4C.

      (6) Figure 5E: For a better data visualization, the current pair of plots can be consolidated into one single plot where the x-axis is Move and the y-axis is dGlu. In this way, it is easier to understand and the orthogonality as claimed in the manuscript can be more apparent.

      The requested plot has been added as Figure 6F.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study takes a detailed approach to understanding the effect of menopausal hormone therapy (MHT) in the brain aging of females. Neuroimaging data from the UK Biobank is used to explore brain aging and shows an unexpected effect of current MHT use and poorer brain health outcomes relative to never users. There is considerable debate about the benefits of MHT and estrogens in particular for brain health, and this analysis illustrates that the effects are certainly not straightforward and require greater consideration.

      Strengths:

      (1) The detailed approach to obtaining important information about MHT use from primary care records. Prior studies have suggested that factors such as estrogen/progestin type, route of administration, duration, and timing of use relative to menopause onset can contribute to whether MHT benefits brain health.

      (2) Consideration of type of menopause (spontaneous, or surgical) in the analysis, as well as sensitivity diagnoses to rule out the effect being driven by those with clinical conditions.

      (3) The incorporation of the brain age estimate along with hippocampal volume to address brain health.

      (4) The complex data are also well explained and interpretations are reasonable.

      (5) Limitations of the UK Biobank data are acknowledged

      We thank the reviewer for their time and the positive evaluation of our manuscript.

      Weaknesses:

      (1) Lifestyle factors are listed and the authors acknowledge group differences (at least between current users and never users of MHT). I was not able to find these analyses showing these differences.

      We highlighted and tested for group differences in lifestyle scores, and the results are shown in Table 1-3, column p-value. As highlighted in the method section (page 9): “The lifestyle score was calculated using a published formula (69), and included data on sleep, physical activity, nutrition, smoking, and alcohol consumption (see supplementary Note 3, Table S2)”. In line with reviewer 1 suggestion to the authors, we now included an additional table testing for group differences in the specific lifestyle factors constituting the lifestyle score in the supplementary materials (Table S2). Please find a more detailed response below (Recommendations for the authors, Response to Comment 1).

      (2) The distribution of women who were not menopausal was unequal across groups, and while the authors acknowledge this, one wonders to what extent this explains the observed findings.

      We agree with the reviewer that the unequal distribution of women across groups can influence the observed findings. We have made minor edits to highlight this important topic more explicitly in the discussion:

      Discussion (page 21): “Current MHT users were significantly younger than past- and never-users, and around 67 % were menopausal relative to over 80% in the past- and never-user groups. The unequal distribution of age and menopausal status across groups may have influenced the observed findings. For instance, a larger proportion of the current users might be in the perimenopausal phase, which is often associated with debilitating neurological and vasomotor symptoms (1). MHT is commonly prescribed to minimize such symptoms. Although MHT initiation during perimenopause has been associated with improved memory and hippocampal function, as well as lower AD risk later in life (15), the need for MHT might in itself be an indicator of neurological changes (71); here potentially reflected in higher BAG and lower hippocampal volumes. After the transition to menopause, symptoms might subside and some perimenopausal brain changes might revert or stabilize in the postmenopausal phase 5. Although the UK Biobank lacks detailed information on menopausal symptoms and perimenopausal staging, our results might be capturing subtle disturbances during perimenopause that later stabilize. This could explain why the largely postmenopausal groups of past MHT users and never-users present with lower GM and WM BAG than the current user group. Considering the critical window hypothesis emphasizing perimenopause as a key phase for MHT action (29,43), future longitudinal studies are crucial to clarify the interplay between neurological changes and MHT use across the menopause transition.”

      Discussion (page 25): “In addition, previous studies highlight that UK Biobank participants are considered healthier than the general population based on several lifestyle and health-related factors (89, 90). This healthy volunteer bias increases with age, likely resulting in a disproportionate number of healthier older adults. Together with the imbalance in age distributions across groups, this might explain the less apparent brain aging in the older MHT user groups. We have previously highlighted that age is negatively associated with the number of APOE ε4 carriers in the UK Biobank (21), which is indicative of survivor bias.”

      (3) While the interpretations are reasonable, and relevant theories (healthy cell & critical window) are mentioned, the discussion is missing a more zoomed-out perspective of the findings. While I appreciate wanting to limit speculation, the reader is left having to synthesize a lot of complex details on their own. A particularly difficult finding to reconcile is under what conditions these women benefit from MHT and when do they not (and why that may be).

      We thank the reviewer for this comment. As the presented data is cross-sectional and does not enable causal inference, we have refrained from a more zoomed-out interpretation of the results to avoid undue speculations. However, where applicable, we have discussed our findings in a broader context such as the effects of MHT use on the brain across the menopausal transition (discussion page 21) and the effects of MHT use on the brain in the presence and absence of bilateral oophorectomy and/or hysterectomy (discussion page 25).

      To best inform the reader about the scope of our paper, we would like to highlight the following sentences in our discussion (page 24):

      “The current work represents the most comprehensive study of detailed MHT data, APOE ε4 genotype, and several brain measures in a large population-based cohort to date. Overall, our findings do not unequivocally support general neuroprotective effects of MHT, nor do they indicate severe adverse effects of MHT use on the female brain. The results suggest subtle yet complex relationships between MHT’s and brain health, highlighting the necessity for a personalized approach to MHT use. Importantly, our analyses provide a broad view of population-based associations and are not designed to guide individual-level decisions regarding the benefits versus risks of MHT use.”

      And the conclusion (page 25): “In conclusion, our findings suggest that associations between MHT use and female brain health might vary depending on duration of use and past surgical history. Although the effect sizes were generally modest, future longitudinal studies and RCTs, particularly focused on the perimenopausal transition window, are warranted to fully understand how MHT use influences female brain health. Importantly, considering risks and benefits, decisions regarding MHT use should be made within the clinical context unique to each individual.”

      Reviewer #1 (Recommendations for the authors):

      Can the authors provide:

      (1) More information about which aspects of lifestyle factors were different between the groups, and how these factors may have contributed to the observed findings (if possible, without burying this information in the supplemental)?

      We thank the reviewer for this suggestion. We now added a table comparing lifestyle factors contained in the lifestyle score by MHT user status using t-tests (continuous variables) or χ2 tests (see Table S2). The results are referred to in the main manuscript result section under “Sample characteristics”, and the table (Table S2) is provided in the supplements not to overburden the main text, in line with input from reviewer 3.

      We updated the main text to refer to Table S2 and updated the supplementary Note 3 (page 2-3) to include the results of the comparison of the lifestyle factors contained in the lifestyle score by MHT user status.

      Methods, page 9:“The lifestyle score was calculated using a published formula (69), and included data on sleep, physical activity, nutrition, smoking, and alcohol consumption (see supplementary Note 3, Table S2).”

      Results, page 13: “Sample demographics including lifestyle score, stratified by MHT user group, surgical history among MHT users, and estrogen only MHT or combined MHT use, are summarized in Table 1, 2 and 3, respectively. MHT user group differences for each lifestyle factor contained in the lifestyle score are shown in Table S2.”

      “Note 3| Lifestyle Score

      The lifestyle score was calculated based on sleep duration, time spent watching television, current and past smoking status, alcohol consumption frequency, physical activity level (number of days per week of moderate/vigorous activity for at least 10 minutes), intake of fruits and vegetables, and intake of oily fish, beef, lamb/mutton, pork and processed meat (for details see (10)). Each unhealthy lifestyle factor was scored with 1 point (e.g., smoking), and participants points were summed to generate an unweighted score (from 0-9): the higher the lifestyle score, the unhealthier the participant’s lifestyle.

      A comparison of the lifestyle factors contained in the lifestyle score by MHT user status is presented in Table S2. In summary, we found that current MHT were more often smokers than never-users, had a higher alcohol intake than never- and past MHT users, reported the lowest fruit and vegetable intake relative to never-users and past MHT users, and stated lower moderate activity levels relative to past MHT users. Past MHT users reported higher alcohol intake than never-users, spend more time watching TV relative to never- and current-users, consumed more beef, pork, lamb/mutton, and processed meat than never-users, and reported lower vigorous activity levels relative to never-users. However, oily fish intake and fruit and vegetable intake was higher among past MHT users relative to never-and current-users. Self-reported sleep duration did not differ between MHT user groups.”

      (2) A greater description of the 2 main theories of MHT effects on the brain (healthy cell vs critical window). Can the authors also provide a more thorough explanation for how the findings fit with these theories.

      We thank the reviewer for this comment. We have described our findings in the context of the critical window hypothesis (discussion, page 21, paragraph 2), the healthy cell bias hypothesis (discussion, page 22, paragraph 3), and healthy user bias hypothesis (discussion, page 22, paragraph 4). We refrained from a more thorough explanation to avoid undue speculations.

      (3) Reflect more on what the findings may indicate as to who benefits from MHT, and why. There are some references that the authors may want to add, particularly related to recent findings from premenopausal bilateral oophortectomies that also speak to when (and for whom) MHT use might benefit.

      We thank the reviewer for this feedback. We have included additional references in the revised manuscript as follows:

      Discussion, page 23: “It is also possible that the timing between MHT use and surgery is more tightly controlled and therefore more beneficial for brain aging (43). For instance, studies suggest that MHT may mitigate the potential long-term adverse effects of bilateral oophorectomy before natural menopause on bone mineral density as well as cardiovascular, cognitive and mental health (79-81). In addition, a 2024 UK Biobank study found that ever used MHT was associated with decreased odds of Alzheimer’s disease in women with bilateral oophorectomy (82).”  

      (79) Blumel JE, Arteaga E, Vallejo MS, et al. Association of bilateral oophorectomy and menopause hormone therapy with mild cognitive impairment: the REDLINC X study. Climacteric 2022;25:195-202.

      (80) Kaunitz AM, Kapoor E, Faubion S. Treatment of Women After Bilateral Salpingo-oophorectomy Performed Prior to Natural Menopause. JAMA 2021;326:1429-1430.

      (81) Stuursma A, Lanjouw L, Idema DL, de Bock GH, Mourits MJE. Surgical Menopause and Bilateral Oophorectomy: Effect of Estrogen-Progesterone and Testosterone Replacement Therapy on Psychological Well-being and Sexual Functioning; A Systematic Literature Review. J Sex Med 2022;19:1778-1789.

      (82) Calvo N, McFall GP, Ramana S, et al. Associated risk and resilience factors of Alzheimer's disease in women with early bilateral oophorectomy: Data from the UK Biobank. J Alzheimers Dis 2024;102:119-128.

      Reviewer #2 (Public review):

      Summary:

      In this observational study, Barth et al. investigated the association between menopausal hormone therapy and brain health in middle- to older-aged women from the UK Biobank. The study evaluated detailed MHT data (never, current, or past user), duration of mHT use (age first/last used), history of hysterectomy with or without bilateral oophorectomy, APOEE4 genotype, and brain characteristics in a large, population-based sample. The researchers found that current mHT use (compared to never-users), but not past use, was associated with a modest increase in gray and white matter brain age gap (GM and WM BAG) and a decrease in hippocampal volumes. No significant association was found between the age of mHT initiation and brain measures among mHT users. Longer duration of use and older age at last MHT use post-menopause were associated with higher GM and WM BAG, larger WMH volumes, and smaller hippocampal volumes. In a sub-sample, after adjusting for multiple comparisons, no significant associations were found between detailed mHT variables (formulations, route of administration, dosage) and brain measures. The association between mHT variables and brain measures was not influenced by APOEE4 allele carrier status. Women with a history of hysterectomy with or without bilateral oophorectomy had lower GM BAG compared to those without such a history. Overall, these observational data suggest that the association between mHT use and brain health in women may vary depending on the duration of use and surgical history.

      Strengths:

      (1) The study has several strengths, including a large, population-based sample of women in the UK, and comprehensive details of demographic variables such as menopausal status, history of oophorectomy/hysterectomy, genetic risk factors for Alzheimer's disease (APOE ε4 status), age at mHT initiation, age at last use, duration of mHT, and brain imaging data (hippocampus and WMH volume).

      (2) In a sub-sample, the study accessed detailed mHT prescription data (formulations, route of administration, dosage, duration), allowing the researchers to study how these variables were associated with brain health outcomes. This level of detail is generally missing in observational studies investigating the association of mHT use with brain health.

      We thank the reviewer for their time and the positive evaluation of our manuscript.

      Weaknesses:

      (1) While the study has many strengths, it also has some weaknesses. As highlighted in an editorial by Kantarci & Manson (2023), women with symptoms such as subjective cognitive problems, sleep disturbances, and elevated vasomotor symptoms combined with sleep disturbances tend to seek mHT more frequently than those without these symptoms. The authors of this study have also indicated that the need of mHT use which might be associated with these symptoms may be indicators of preexisting neurological changes, potentially reflecting worse brain health scores, including higher BAG and lower hippocampal volume and/or higher WMH. However, among current users, how many of these women have these symptoms could not be reported in the study. Women with these vasomotor symptoms who are using mHT are more likely to stay longer in the healthcare system compared with those without these symptoms and no MHT use history. The authors noted that the UK Biobank lacks detailed information on menopausal symptoms and perimenopausal staging, limiting the study's ability to understand how these variables influence outcomes.

      We thank the reviewer for the succint synopsis of the limitations highlighted in discussion, page 21. We have now added the mentioned reference, 2023 editoral by Kantarci & Manson, to the discussion as well (see reference 71).

      Discussion (page 21): “Current MHT users were significantly younger than past- and never-users, and around 67 % were menopausal relative to over 80% in the past- and never-user groups. The unequal distribution of age and menopausal status across groups may have influenced the observed findings. For instance, a larger proportion of the current users might be in the perimenopausal phase, which is often associated with debilitating neurological and vasomotor symptoms (1). MHT is commonly prescribed to minimize such symptoms. Although MHT initiation during perimenopause has been associated with improved memory and hippocampal function, as well as lower AD risk later in life (15), the need for MHT might in itself be an indicator of neurological changes (71); here potentially reflected in higher BAG and lower hippocampal volumes. After the transition to menopause, symptoms might subside and some perimenopausal brain changes might revert or stabilize in the postmenopausal phase 5. Although the UK Biobank lacks detailed information on menopausal symptoms and perimenopausal staging, our results might be capturing subtle disturbances during perimenopause that later stabilize. This could explain why the largely postmenopausal groups of past MHT users and never-users present with lower GM and WM BAG than the current user group. Considering the critical window hypothesis emphasizing perimenopause as a key phase for MHT action (29,43), future longitudinal studies are crucial to clarify the interplay between neurological changes and MHT use across the menopause transition.”

      (2)  Earlier observational studies have reported conflicting results regarding the association between mHT use and the risk of dementia and brain health. Contrary to some observational studies, three randomized trials (WHI, KEEPS, ELITE) (Espeland et al 2013, Gleason et al 2015; Henderson et al 2016) demonstrated neither beneficial nor harmful effects of mHT (with varying doses and formulations) when initiated closer to menopause (<5 years). While strong efforts were made to run proper statistical analyses to investigate the association between mHT use and brain health, these results reflect mainly associations, but not causal relationships as also stated by the authors.

      We thank the reviewer for pointing that out.

      (3)  Furthermore, observational studies have intrinsic limitations, such as a lack of control over switching mHT doses and formulations, a lack of laboratory measures to confirm mHT use, and reliance on self-reported data, which may not always be reliable. The authors caution that these findings should not guide individual-level decisions regarding the benefits versus risks of mHT use. However, the study raises new questions that should be addressed by randomized clinical trials to investigate the varying effects of MHT on brain health and dementia risk.

      We thank the reviewer for making our efforts in providing proper disclaimers in the discussion visible.

      Reviewer #2 (Recommendations for the authors):

      (1) The study could benefit from extending these findings by adding plasma biomarkers of AD and PET imaging markers to further study the association of mHT variables with brain health.

      We agree with the reviewer that such markers would be beneficial for elucidating the association between MHT variables and brain health. Unfortunately, these markers are not readily available in the UK Biobank.

      (2) The study's reliance on a predominantly white cohort limits the generalizability of the findings to more diverse populations. This homogeneity may not capture the full spectrum of responses to MHT across different ethnic and genetic backgrounds.

      We fully agree with the reviewers statement and state this limitation in the discussion (page 25) as follows:

      “In addition to these inherent biases in aging cohorts, the ethnic background of the sample is homogeneous (> 96% white), further reducing the generalizability of the results.”

      (3) The study may benefit by editing the following information in the introduction: "In summary, WHIMS, HERS, and KEEPS mainly relied on orally administered CEE in older-aged or recently postmenopausal females." KEEPS used two routes and formulations (transdermal estradiol and oCEE, both with micronized progesterone).

      We thank the reviewer for catching this oversight. We removed the sentence to avoid ambiguities and revised the sentence specifically refering to the KEEPS study as follows:

      Introduction, page 3: “In contrast, administering oral CEE or transdermal estradiol plus micronized progesterone in recently postmenopausal females did not alter cognition in the Kronos Early Estrogen Prevention Study (KEEPS) (28).”

      (4) The study may benefit by editing the following statement in the introduction: "oral CEE use in combination with MPA seems to increase the risk for AD regardless of timing": I would suggest revising this statement, which is based on review article 29. The statement of the adverse effect of oCEE regardless of the time of start contradicts earlier randomized clinical findings. I think it is important to make a distinction between the outcomes of randomized control trials and observational studies. The WMIHS (Shumaker et al., 2003) (randomized control trial) reported that there was an increased risk of dementia for women (who were more than 10 years from the onset of menopause when the therapy was initiated) in oCEE + MPA compared to placebo. Two other long-duration randomized trials tested the effect of oral oestrogen and progesterone treatment on cognitive function in women who started treatment shortly after menopause (within 3 or 6 years) did not find evidence that treatment benefits or harms cognitive function compared with placebo (Gleason et al., 2015; Henderson et al., 2016). A short-term (4 months) randomized trial (Maki et al 2007 (Maki et al., 2007) (mentioned in ref 29) reported a potential negative effect of CEE/MPA on verbal memory in women who started HT shortly after menopause (within 3 years). The study did not investigate the risk of dementia, and the duration of use of HT was short-term.

      We thank the reviewer for this detailed input. After checking the provided references, we rephrased the sentence as follows:

      Introduction, page 4:“Although emerging evidence supports this hypothesis (30, 31), oral CEE use in combination with MPA has been found to increase the risk for memory decline regardless of timing (26, 29, 32).”

      We believe this formulation is more in line with the evidence provided by Shumaker et al. 2003, Maki et al. 2007 and the other references provided in the review paper by Maki and colleagues (mentioned in ref. 29). The reviewer further refers to Gleason et al. 2015 and Henderson et al. 2016, however both RCTs use micronized progesterone, not MPA, thereby not supporting the statement.

      (26) Shumaker SA, Legault C, Rapp SR, et al. Estrogen plus progestin and the incidence of dementia and mild cognitive impairment in postmenopausal women: the Women's Health Initiative Memory Study: a randomized controlled trial. JAMA 2003;289:2651-2662.

      (29) Maki PM. Critical window hypothesis of hormone therapy and cognition: a scientific update on clinical studies. Menopause 2013;20:695-709.

      (32) Maki PM, Gast MJ, Vieweg AJ, Burriss SW, Yaffe K. Hormone therapy in menopausal women with cognitive complaints: a randomized, double-blind trial. Neurology 2007;69:1322-1330.

      Reviewer #3 (Public review):

      In this study Barth et al. present results of detailed analyses of the relationships between menopausal hormone therapy (MHT), APOE ε4 genotype, and measures of anatomical brain age in women in the UK Biobank. While past studies have investigated the links between some of these variables (including works by the authors themselves), this new study adds more detailed MHT variables, surgical status, and additional brain aging measures. The UK biobank sample is large, but it is a population cohort and many of the MHT measures are self-reported (as the authors point out). However, the authors present a solid analysis of the available information which shows associations between MHT user status, length of MHT use, as well as surgical status with brain age. However, as the authors themselves state, the results do not unequivocally support the neuroprotective or adverse effect of MHT on the brain. I think this work strengthens the case for the need of better-designed longitudinal studies investigating the effect of MHT on the brain in the peri/post-menopausal stage.

      Strengths:

      (1) The authors addressed the statistical analyses rigorously. For example, multiple testing corrections, outlier removal, and sensitivity analysis were performed carefully. Ample background information is provided in the introduction allowing even individuals not familiar with the field to understand the motivation behind the work. The discussion section also does a great job of addressing open questions and limitations. Very detailed results of all statistical tests are provided either in the main text or in the supplementary information.

      We thank the reviewer for their time and the positive evaluation of our manuscript.

      Weaknesses:

      (1) For me, the biggest weakness was the presentation of the results. As many variables are involved and past studies have investigated several of these questions, it would have helped to better clarify the analysis and questions that are addressed by this study in particular and what sets this work apart from past studies. The information is present in the manuscript but better organization might have helped. For example, a figure depicting the key questions near the beginning of the manuscript would have been very helpful for me. The Tables also contain a lot of information but I wonder if there might be a way to capture the most relevant information more succinctly (either in Table format or in a figure) for the main text.

      We thank the reviewer for this comment. We do agree that with the large number of analyses it can be hard to keep an overview. We now added a Figure summarizing the main and sensitity analyses by sample.

      (2) Another concern I had was the linear models investigating the effects of these MHT variables on the brain age gap. The authors have included "age" as one of the parameters in this analysis. I wonder if adding a quadratic age factor age2 in the model might have improved the fit since many brain phenotypes tend to show quadratic brain age effects in the 40 to 80-year age range.

      We thank the reviewer for this suggestion. We have rerun the main analysis in the whole sample (model 1) with age squared as an additional covariate, and compared the gray matter brain age gap model fits using the corrected Akaike Information Criterion (AIC). All models with age squared had a better model fit than models without age squared (see Author response table 1). Hence, in the revised manuscript, we added a sensitivity analysis rerunning the model 1 with age squared to account for potential non-linear effect. The results were largely consistent. The manuscript was revised as follows to reflect the added analysis:

      Sensitivity analysis (Methods, Page 11): “To test whether the results were influenced by the inclusion of participants with ICD-10 diagnosis or by non-linear effects of age, the main analyses (models 1-2) were re-run excluding the sub-sample with diagnosed brain disorders (see supplementary Note 2) or adding age(2) as additional covariate, respectively.”

      Sensitivity analysis (Results, Page 20): “The results were consistent after removing participants with ICD-10 diagnoses known to impact the brain (see Table S9 for model 1 analyses and Table S10 for model 2 analyses), after additionally adjusting for age(2) (see Table S11), and after removing extreme values (see Table S12 for model 1 analyses).”

      Author response table 1.

      Gray matter brain age gap model selection based on corrected Akaike Information Criterion (AICc)

      Abbreviations and explanations of parameters: MHT = menopausal hormone therapy, K = number of estimated parameters for each model, AICc = the information criterion requested for each model, ΔAICc = the appropriate delta AIC component depending on the information criteria selectedModelLik = the relative likelihood of the model given the data, AICcWT = Akaike weights to indicate the level of support in favor of any given model being the most parsimonious among the candidate model sets, LL = log-likelihood of each model.

      Reviewer #3 (Recommendations for the authors):

      (1) Please note typo in Figures 2 and 3 legend "GM WM".

      We thank the reviewer for catching this typo and we changed it to BAG GM and BAG WM for all Figures for consistency.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors demonstrated that NINJ1 promotes TF-positive MV release during pyroptosis and thereby triggers coagulation. Coagulation is one of the risk factors that can cause secondary complications in various inflammatory diseases, making it a highly important therapeutic target in clinical treatment. This paper effectively explains the connection between pyroptosis and MV release with Ninj1, which is a significant strength. It provides valuable insight into the potential of targeting Ninj1 as a therapeutic strategy.

      Although the advances in this paper are valuable, several aspects need to be clarified. Some comments are discussed below. 

      (1) Since it is not Ninj1 directly regulating coagulation but rather the MV released by Ninj1 playing a role, the title should include that. The current title makes it seem like Ninj1 directly regulates inflammation and coagulation. It would be better to revise the title.

      Thanks for the thoughtful comments. We show that the release of procoagulant MVs by plasma membrane rupture (PMR) is a critical step in the activation of coagulation. In addition, the release of cytokines and danger molecules by PMR may also contribute to coagulation. In choosing the title, we are trying to emphasize NINJ1-dependent PMR as a common trigger for these biological processes.

      (2) Ninj1 is known to be an induced protein that is barely expressed in normal conditions. As you showed in "Fig1G" data, control samples showed no detection of Ninj1. However, in "Figure S1", all tissues (liver, lung, kidney and spleen) expressed Ninj1 protein. If the authors stimulated the mice with fla injection, it should be mentioned in the figure legend. 

      We respectfully disagree with the comment that “Ninj1 is known to be an induced protein that is barely expressed in normal conditions”. NINJ1 protein is abundantly expressed (without induction) in tissues including liver, lung, kidney, and spleen, as shown in Fig S1. Consistently, other groups have shown abundant NINJ1 expression at baseline in tissues and cells such as liver (Kayagaki et.al. Nature 2023) and BMDM (Kayagaki et.al. Nature 2021; Borges et.al. eLife 2023). Fig 1G shows fibrin deposition as an indicator of coagulation, not NINJ1 protein.

      (3) In "Fig3A", the Ninj1 protein expression was increased in the control of BMDM +/- cell lysate rather than fla stimulation. However, in MV, Ninj1 was not detected at all in +/- control but was only observed with Fla injection. The authors need to provide an explanation for this observation. Additionally, looking at the MV β-actin lane, the band thicknesses appear to be very different between groups. It seems necessary to equalize the protein amounts. If that is difficult, at least between the +/+ and +/- controls. 

      Thanks for the valuable comments. In Fla-stimulated Ninj1+/- BMDMs, most of the NINJ1 is released in MVs, therefore, not in the cell lysate, as shown in Fig 3A. The difference in beta-actin band intensity correlated with MV numbers shown in Fig 3B. We ensure consistency by using the same number of cells.

      (4) Since the authors focused Ninj1-dependent microvesicle (MV) release, they need to show MV characterizations (EM, NTA, Western for MV markers, etc...). 

      Thanks for the suggestion. We now add NTA analysis of MV for BMDMs in Fig S4C.

      (5) To clarify whether Ninj1-dependent MV induces coagulation, the authors need to determine whether platelet aggregation is reduced with isolated +/- MVs compared to +/+ MVs. 

      Thanks for the suggestion. We agree that platelet aggregation is closely linked to blood coagulation but would argue that one does not directly cause the other. While it would be interesting to examine whether MVs induce platelet aggregation, we hope the reviewer would agree that the outcome of this experiment would neither significantly support nor challenge our statement that NINJ1-dependent PMR promotes coagulation.

      (6) Even with the authors well established experiments with haploid mice, it is a critical limitation of this paper. To improve the quality of this paper, the authors should consider confirming the findings using mouse macrophage cell lines, such as generating Ninj1-/- Raw264.7 cell lines, to examine the homozygous effect. 

      Thanks for the valuable comments. We acknowledge the limitation of using haploid mice in this study. However, our data provides strong evidence supporting the role of NINJ1-dependent plasma membrane rupture in blood coagulation using primary macrophages.

      (7) There was a paper reported in 2023 (Zhou, X. et al., NINJ1 Regulates Platelet Activation and PANoptosis in Septic Disseminated Intravascular Coagulation. Int. J. Mol. Sci. 2023) that revealed the relationship between Ninj1 and coagulation. According to this paper, inhibition of Ninj1 in platelets prevents pyroptosis, leading to reduced platelet activation and, consequently, the suppression of thrombosis. How about the activation of platelets in Ninj1 +/- mice? The author should add this paper in the reference section and discuss the platelet functions in their mice.

      Thanks for the valuable comments. We examine PT time, plasma TAT, and tissue fibrin deposition as direct evidence of blood coagulation in this manuscript. We acknowledge that platelets play a key role in thrombosis; however, we hope the reviewer would agree that tissue factor-induced blood coagulation and platelet aggregation are linked yet distinct processes. Therefore, the role of NINJ1 in platelet aggregation falls beyond the scope of this manuscript.


      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Referring to previous research findings, the authors explain the connection between NINJ1 and MVs. Additional experiments and clarifications will strengthen the conclusions of this study.

      Below are some comments I feel could strengthen the manuscript: 

      (1) The authors mentioned their choice of using heterozygous NINJ1+/- mice on page 4, because of lethality and hydrocephalus. Nonetheless, there is a substantial number of references that use homozygous NINJ1-/- mice. Could there be any other specific reasons for using heterozygous mice in this study? 

      Thanks for the thoughtful comments. We are aware that a few homozygous NINJ1-/- mouse strains were used in several publications by different groups, including Drs. Kayagaki and Dixit (Genentech), from whom we obtained the heterozygous NINJ1+/- breeders. We do not have experience with the homozygous NINJ1-/- mice used by other groups. It’s reasonable to assume that homozygous NINJ1-/-, if healthy, would have even stronger protection against coagulopathy than heterozygous NINJ1+/-. The only reason for not using homozygous mice in this study is that a majority of our homozygous NINJ1-/- develops hydrocephalus around weaning and these mice are required to be euthanized by the rules of our DLAR facility. Although our homozygous NINJ1-/- mice develop hydrocephalus (the same reported by Drs. Kayagaki and Dixit, PMID: 37196676, PMCID: PMC10307625), heterozygous NINJ1+/- mice remain healthy.

      (2) Figure S2 clearly shows the method of pyroptosis induction by flagellin. It is also necessary as a prerequisite for this paper to show the changes in flagellin-induced pyroptosis in heterozygous NINJ1+/- mice.

      Thanks for the valuable suggestions. We agree that a plasma LDH measurement as an indicator of pyroptosis in vivo would add to the manuscript. Therefore, we have made several attempts to measure plasma LDH in flagellin-challenged WT and NINJ1+/- mice using CytoTox96 Non-Radioactive Cytotoxicity Assay (a Promega kit commonly used for LDH, Promega#G1780). Flagellin-challenged WT and NINJ1+/- mice develops hemolysis, which renders plasma red. Because plasma coloring interferes with the assay, we could not get a meaningful reading to make an accurate comparison. We also tried LHD-Glo Cytotoxicity Assay (Luciferase based, Promega#J2380) with no luck on both plasma and serum. We hope the reviewer would agree that reduced plasma MV count (Fig 3C) would serve as an alternative indictor for reduced pyroptosis.

      (3) IL-1ß levels controlled by GSDMD were not affected by NINJ1 expression according to previous studies (Ref 37, 29, Nature volume 618, pages 1065-1071 (2023)). GSDMD also plays an important role in TF release in pyroptosis. Are GSDMD levels not altered in heterozygous NINJ1 +/- mice?  

      Thanks for raising these great points. It’s been reported that IL-1β secretion in cell culture supernatant were not affected by NINJ1 deficiency or inhibition when BMDMs were stimulated by LPS (Ref 29, 37, now Ref 29, 35) or nigericin (Ref 29). As GSDMD pore has been shown to facilitate the release of mature IL-1β, these in vitro observations are reasonable given that NINJ1-mediated PMR is a later event than GSDMD pore-forming. However, we observed that plasma IL-1β (also TNFα and IL-6) in Ninj1+/- mice were significantly lower. There are a few differences in the experimental condition that might contribute to the discrepancy: 1, there was no priming in our in vivo experiment, while priming in BMDMs were performed in both in vitro observations before stimulating with LPS or nigericin; 2, the flagellin in our study engages different inflammasome than either LPS or nigericin. Priming might change the expression and dynamics of IL-1β. More importantly, there might be unrecognized mechanisms in IL-1β secretion in vivo. We now add discussion on this in the main text.

      We examined GSDMD protein levels in liver, lung, kidney, and spleen from WT and NINJ1+/- mice by Western blotting. The data is now presented in the updated Fig S1, we did not observe apparent difference in GSDMD expression between the two genotypes.

      (4) In Fig 1 F, the authors used a fibrin-specific monoclonal antibody for staining fibrin, but it's not clearly defined. There may be some problem with the quality of antibody or technical issues. Considering this, exploring alternative methods to visualize fibrin might be beneficial. Fibrin is an acidophil material, so attempting H&E staining or Movat's pentachrome staining might help for identify fibrin areas.

      Thanks for the valuable suggestions. The fibrin-specific monoclonal antibody in our study is mouse anti-fibrin monoclonal antibody (59D8). This antibody has been shown to bind to fibrin even in the presence of human fibrinogen at the concentration found in plasma [Hui et al. (1983). Science. 222 (4628); 1129-1132]. We apologize that we did not cite the reference in our initial submission. We obtained this antibody from Dr. Hartmut Weiler at Medical College of Wisconsin and Dr. Rodney M. Camire at the University of Pennsylvania, who were acknowledged in our initial submission.

      We performed H&E staining on serial sections of the same tissues for Figure 1F. The data is now presented as Fig S3.

      Reviewer #2 (Public Review): 

      Summary: 

      The author's main goal is to understand the mechanism by which pyroptosis (through the formation of Gasdermin D (GSDMD) pores in the plasma membrane) contributes to increased release of procoagulant Tissue Factor-containing microvesicles (MV). Their previous data demonstrate that GSDMD is critical for the release of MV that contains Tissue Factor (TF), thus making a link between pyroptosis and hypercoagulation. Given the recent identification of NINJ1 being responsible for plasma membrane rupture (Kayagaki et al. Nature 2011), the authors wanted to determine if NINJ1 is responsible for TF-containing MV release. Given the constitutive ninj1 KO mouse leads to partial embryonic lethality, the authors decided to use a heterozygous ninj1 KO mouse (ninj1+/-). While the data are well controlled, there is limited understanding of the mechanism of action. Also, given that the GSDMD pores have an ~18 nm inner diameter enough to release IL-1β, while larger molecules like LDH (140 kDa) and other DAMPs require plasma membrane rupture (likely mediated by NINJ1), it s not unexpected that large MVs require NINJ1-mediated plasma cell rupture. 

      Strengths: 

      The authors convincingly demonstrate that ninj1 haploinsufficiency leads to decreased prothrombin time, plasma TAT and plasma cytokines 90 minutes post-treatment in mice, which leads to partial protection from lethality. 

      Weaknesses: 

      - In the abstract, the authors say "...cytokines and protected against blood coagulation and lethality triggered by bacterial flagellin". This conclusion is not substantiated by the data, as you still see 70% mortality at 24 hours in the ninj1+/- mice. 

      Thanks for the thoughtful comments. We corrected the text to “partially protected against blood coagulation and lethality triggered by bacterial flagellin”.

      - The previous publication by the authors (Wu et al. Immunity 2019) clearly shows that GSDMDdependent pyroptosis is required for inflammasome-induced coagulation and mouse lethality. However, as it is not possible for the authors to use the homozygous ninj1 KO mouse due to partial embryonic lethality, it becomes challenging to compare these two studies and the contributions of GSDMD vs. NINJ1. Comparing the contributions of GSDMD and NINJ1 in human blood-derived monocytes/macrophages where you can delete both genes and assess their relevant contributions to TF-containing MV release within the same background would be crucial in comparing how much contribution NINJ1 has versus what has been published for GSDMD? This would help support the in vivo findings and further corroborate the proposed conclusions made in this manuscript.  

      Thanks for the valuable question. We have shown that plasma MV TF activity was reduced in both GSDMD deficient mice (Ref 23) and Ninj1+/- mice (present manuscript). Given that TF is a plasma membrane protein, MV TF most likely comes from ruptured plasma membrane. In flagellin-induced pyroptosis, both GSDMD and NINJ1 deficiency equally blocked LDH release (plasma membrane rupture) in BMDMs (Ref 29). Further, in pyroptosis glycine acts downstream of GSDMD pore formation for its effect against NINJ1 activation (Ref 35). Therefore, GSDMD pore-forming should be upstream of NINJ1 activation in pyroptosis (which may not be the case in other forms of cell death) and there are likely equal effects of GSDMD and NINJ1 on MV release in flagellin-induced pyroptosis. As the reviewer suggested, experiments using human blood-derived monocytes/macrophages will enable a direct comparison to determine the relative contribution. However, this approach presents a few technical difficulties: it’s not easy to manipulate gene expression on primary human monocytes/macrophages (in our experience); variable efficiency in gene manipulation of GSDMD and NINJ1 will complicate the comparison. I hope the reviewer would agree that a direct comparison between GSDMD and NINJ1 is not required to support our conclusion that NINJ1-dependent membrane rupture is involved in inflammasome-pyroptosis induced coagulation and inflammation.

      - What are the levels of plasma TAT, PT, and inflammatory cytokines if you collect plasma after 90 minutes? Given the majority (~70%) of the ninj+/- mice are dead by 24 hours, it is imperative to determine whether the 90-minute timeframe data (in Fig 1A-G) is also representative of later time points. The question is whether ninj1+/- just delays the increases in prothrombin time, plasma TAT, and plasma cytokines. 

      Thank for the valuable question. The time point (90 min) was chosen based on our in vitro observation that flagellin-induced pyroptosis in BMDMs largely occurs within 60-90 min. 

      Because our focus on the primary effect of flagellin in vivo, potential secondary effects at later points may complicate the results and are hard to interpret. As the reviewer suggested, we have measured plasma PT, TAT at 6 hours post-flagellin challenge. The significant difference in PT sustained between Ninj1+/+ and Ninj1+/- (Fig A), suggesting coagulation proteins remained more depleted in Ninj1+/+ mice than in Ninj1+/- mice. However, plasma TAT levels were diminished to baseline level (refer to Fig 1B in main text) in both groups and showed no significant difference between groups (Fig B), which could be explained by the short half-life (less than 30 min) in the blood. Since flagellin challenge is a one-time hit, there might not a second episode of coagulation after the 90-minute time point, at least not triggered by flagellin, supported by the plasma TAT levels at 6 hours. We now comment on this limitation at the end of the main text.

      Based on our previous studies, plasma IL-1β and TNFα peaked at early time point and diminished over time, but plasma IL-6 levels maintained. As shown below, plasma IL-6 appeared higher in Ninj1+/+ compared with Ninj1+/-, but not statistically significant (partly because one missing sample, n = 4 not 5, in Ninj1+/+ group decreased the statistical power of detecting a difference).

      Author response image 1.

      Mice were injected with Fla (500 ng lFn-Fla plug3 ugPA). Blood was collected 6 hours after Fla injection. Prothrombin time (A), plasma TAT (B), and plasma IL-6 (C) were measured. Mann-Whitney test were performed.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors): 

      - Fig 1F: are there lower magnification images that capture the fibrin deposition? The IHC data seems at odds with the WB data in Fig. 1G where there is still significant fibrin detected in the heterozygous lungs and liver. Quantitating the Fig. 1G Western blot would also be helpful.

      IHC surveys a thin layer of tissue section while WB surveys a piece of tissue, therefore fibrin deposition may be missing from IHC and but found in WB. That is why we used two methods. Below we provide lower mag images of fibrin deposition (about 2 x 1.6 mm area).

      Author response image 2.

      - Fig1H - lethality study uses 5x dose of Fla used in earlier studies. In the lethality data where there is a delay in ninj1+/- mortality, are the parameters (prothrombin time, plasma TAT, and plasma cytokines) measured at 90 minutes different between WT and ninj+/- mice? This would be critical to confirm that this is not merely due to a delayed release of TF-containing MVs.

      We used 5x lower dose of Fla in coagulation study than lethality study because it’s not as easy to draw blood from septic mouse with higher dose of flagellin. We need to terminate the mice to collect blood for plasma measurement and therefore the parameters were not measured for mice in lethality study.

      - What is the effect of ninj+/- on E. coli-induced lethality in mice? How do these data compare to E. coli infection of GSDMD-/- mice? 

      We did not examine the effect of Ninj1+/- on E. coli-induced lethality. After the initial submission of our manuscript, we have focused on Ninj1 flox/flox mice instead of Ninj1+/- for NINJ1 deficiency. We are using induced global Ninj1 deficient mice for polymicrobial infectioninduced lethality in our new studies.

      - Fig 2 - in the E. coli model, the prothrombin time, plasma TAT, and plasma cytokines are measured 6 hours post-infection. How were these time points chosen? Did the authors measure prothrombin time, plasma TAT, and plasma cytokines at different time points?  

      The in vivo time point for flagellin and E.coli were chosen based on our in vitro observation of the timelines on BMDM pyroptosis induced by flagellin and bacteria. This disparity probably arises from distinct dynamics between purified protein and bacterial infections. Purified proteins can swiftly translocate into cells and take effect immediately after injection. Conversely, during bacterial infection, macrophages engulf and digest the bacteria to expose their antigens. Subsequently, these antigens initiate further effects, a process that takes some time to unfold. 

      Our focus is on the primary effect of flagellin in vivo, potential secondary effects at later points may complicate the results and are hard to interpret. As the reviewer suggested, we have measured plasma PT, TAT at 6 hours post-flagellin challenge. The significant difference in PT sustained between Ninj1+/+ and Ninj1+/- (Fig A), suggesting coagulation proteins remained more depleted in Ninj1+/+ mice than in Ninj1+/- mice. However, plasma TAT levels were diminished to baseline level (refer to Fig 1B in main text) in both groups and showed no significant difference between groups (Fig B), which could be explained by the short half-life (less than 30 min) in the blood. Since flagellin challenge is a one-time hit, there might not a second episode of coagulation after the 90-minute time point, at least not triggered by flagellin, supported by the plasma TAT levels at 6 hours. We now comment on this limitation at the end of the main text.

      Based on our previous studies, plasma IL-1β and TNFα peaked at early time point and diminished over time, but plasma IL-6 levels maintained. As shown below, plasma IL-6 appeared higher in Ninj1+/+ compared with Ninj1+/-, but not statistically significant (partly because one missing sample, n = 4 not 5, in Ninj1+/+ group decreased the statistical power of detecting a difference).

      - Fig 3 - the sequence of figure panels listed in the legend needs to be corrected. Fig 3A requires quantitation of NINJ1 levels compared to beta-actin. Fig 3C - needs a control for equal MV loading. 

      Thanks for the recommendations. The figure sequence has been corrected. There remain no common markers or loading controls for MV, so we use equal plasma volume for loading control.

      Additional comments: 

      (1) In Fig 3A, the size of NINJ1 appears to be increased in the NINJ+/- group.  

      This discrepancy is likely attributed to a technical issue when running the protein gel and protein transfer, which makes the image tilt to one side.

      (2) Describe the method of BMDM isolation.

      Thanks for the recommendations. We now include the method of BMDM isolation. In brief, mouse femur and tibia from one leg are harvested and rinsed in ice-cold PBS, followed by a brief rinse in 70% ethanol for 10-15 seconds. Both ends of the bones are then cut open, and the bone marrow is flushed out using a 10 ml syringe with a 26-gauge needle. The marrow is passed through a 19-gauge needle once to disperse the cells. After filtering through a 70-μm cell strainer, the cells are collected by centrifugation at 250 g for 5 minutes at 4 °C, then suspended in two 150 mm petri dish, each containing 25 ml of L-cell conditioned medium (RPMI-1640 supplemented with 10% FBS, 2mM L-Glutamine, 10mM HEPES, 15% LCM, and penicillin/streptomycin). After 3 days, 15 mL of LCM medium is added to each dish cells. The cells typically reach full confluency by days 5-7.

      (3) According to this method, BMDMs are seeded without any M-CSF or L929-cell conditioned medium. How many macrophages survive under this condition? 

      BMDMs are cultured and differentiated in medium supplemented with 15% L929-cell conditioned medium. For the experiment, the cells were seeded in Opti-MEM medium (Thermo Fisher Scientific, Cat# 51985034) without M-CSF or L929-cell conditioned medium. BMDMs can survive under this condition, as evidenced by low LDH and high ATP measurement (Fig S5).

      Reviewer #2 (Recommendations For The Authors): 

      - There is significant information missing in the methods and this makes it unclear how to interpret how some of the experiments were performed. For example, there is no detailed description or references in the methods on how the in vivo experiments were performed. The methods section needs significantly more details so that any reader is able to follow the protocols in this manuscript. References to previous work should also be included as needed.

      Thanks for the recommendations. We had some of the details in the figure legend. We now add details in the methods for better interpretation of our data. 

      - Line numbers in the manuscript would be helpful when resubmitting the manuscript so that the reviewer can easily point to the main text when making comments. 

      Thanks for the recommendations. We now add line numbers in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The chromophore molecule of animal and microbial rhodopsins is retinal which forms a Schiff base linkage with a lysine in the 7-th transmembrane helix. In most cases, the chromophore is positively charged by protonation of the Schiff base, which is stabilized by a negatively charged counterion. In animal opsins, three sites have been experimentally identified, Glu94 in helix 2, Glu113 in helix 3, and Glu181 in extracellular loop 2, where a glutamate acts as the counterion by deprotonation. In this paper, Sakai et al. investigated molecular properties of anthozoan-specific opsin II (ASO-II opsins), as they lack these glutamates. They found an alternative candidate, Glu292 in helix 7, from the sequences. Interestingly, the experimental data suggested that Glu292 is not the direct counterion in ASO-II opsins. Instead, they found that ASO-II opsins employ a chloride ion as the counterion. In the case of microbial rhodopsin, a chloride ion serves as the counterion of light-driven chloride pumps. This paper reports the first observation of a chloride ion as the counterion in animal rhodopsin. Theoretical calculation using a QM/MM method supports their experimental data. The authors also revealed the role of Glu292, which serves as the counterion in the photoproduct, and is involved in G protein activation.

      The conclusions of this paper are well supported by data, while the following aspects should be considered for the improvement of the manuscript.

      We thank the reviewer for carefully reading the manuscript and providing important suggestions. Below, we address the specific comments.

      (1) Information on sequence alignment only appears in Figure S2, not in the main figures. Figure S2 is too complicated by so many opsins and residue positions. It will be difficult for general readers to follow the manuscript because of such an organization. I recommend the authors show key residues in Figure 1 by picking up from Figure S2.

      We thank the reviewer for pointing this out. As suggested, we have selected key residues (potential counterion sites) from Fig. S2 and show them now as Fig. 1B in the revised manuscript. Fig. S2 has also been simplified by showing only the most important residues.

      (2) Halide size dependence. The authors observed spectral red-shift for larger halides. Their observation is fully coincident with the chromophore molecule in solution (Blatz et al. Biochemistry 1972), though the isomeric states are different (11-cis vs all-trans). This suggests that a halide ion is the hydrogen-bonding acceptor of the Schiff base N-H group in solution and ASO-II opsins. A halide ion is not the hydrogen-bonding acceptor in the structure of halorhodopsin, whose halide size dependence is not clearly correlated with absorption maxima (Scharf and Engelhard, Biochemistry 1994). These results support their model structure (Figure 4), and help QM/MM calculations.

      We appreciate the comment, which provides a deeper insight into our results and reinforces our conclusions. We have revised the discussion of the effect of halide size on the λ<sub>max</sub> shift to cite the prior work mentioned by the reviewer.

      (3) QM/MM calculations. According to Materials and Methods, the authors added water molecules to the structure and performed their calculations. However, Figure 4 does not include such water molecules, and no information was given in the manuscript. In addition, no information was given for the chloride binding site (contact residues) in Figure 4. More detailed information should be shown with additional figures in Figure SX.

      We thank the reviewer for making us realize that Fig. 4 was oversimplified.

      We have added following text in the “Structural modelling and QM/MM calculations of the dark state of Antho2a” section:

      Lines 220 – 223

      “The chloride ion is also coordinated by two water molecules and the backbone of Cys187 which is part of a conserved disulfide bridge (Fig. S2). The retinylidene Schiff base region also includes polar (Ser186, Tyr91) and non-polar (Ala94, Leu113) residues (Fig. 4).”

      We have updated Fig. 4 and its legend to show a more detailed environment of the protonated Schiff base and the chloride ion, including water molecules and other nearby residues.

      (4) Figure 5 clearly shows much lower activity of E292A than that of WT, whose expression levels are unclear. How did the authors normalize (or not normalize) expression levels in this experiment?

      We thank the reviewer for this valuable comment. In the previous version of the manuscript, we did not normalize the activity based on expression levels. We have considered this in the amended version.

      First, we evaluated the expression levels of wild type and E292A Antho2a by comparing absorbances at λ<sub>max</sub> (± 5 nm) of these pigments that were expressed and purified under the same conditions. Assuming that their molar absorption coefficients at the absorption maximum wavelengths are approximately the same, this can allow us to roughly compare their expression levels. The relative expression of the E292A mutant compared to the wild type (set as 1) was 0.81 at pH 6.5 and 140 mM NaCl, in which 94.0% (for E292A) and 99.8% (for wild type) of the Schiff base is protonated (Fig. 3A and B). As we conducted the live cell Ca<sup>2+</sup> assay in media at pH 7.0, we estimated the proportion of the protonated states of wild type and E292A mutant at same pH. The relative amounts of the protonated states to the wild type at pH 6.5 (set as 1) were estimated to be 0.99 for wild type and 0.84 for E292A. Together, the protonated pigment of the E292A mutant was calculated to be about 73% of that of the wild type at pH 7.0. From Fig. 5, the amplitude of Ca<sup>2+</sup> response of the E292A mutant was 12.1% of the wild type, showing that even after normalizing the expression levels, the Ca<sup>2+</sup> response amplitude was lower in the E292A mutant than in the wild type. This leads to our conclusion that the E292A mutation can also influence the G protein activation efficiency.

      We have added Fig. S11 showing the comparison of expression levels between the wild type and E292A of Antho2a (Fig. S11A) and maximum Ca<sup>2+</sup> responses after normalizing the expression levels (Fig. S11B).

      We have also revised the discussion section as follows:

      Lines 324 – 335

      “The relative expression level of the E292A mutant of Antho2a was approximately 0.81 of the wild type (set as 1), as determined by comparing absorbances at λ<sub>max</sub> for both pigments expressed and purified under identical conditions (Fig. S11A). Additionally, the fraction of protonated pigment relative to the wild type (set as 1 at pH 6.5) was estimated to be 0.94 for the E292A mutant at pH 6.5, and 0.99 and 0.84 for the wild type and the E292A mutant at pH 7.0, respectively (Fig. 3A and B). Since pH 7.0 corresponds to the conditions used in the live cell Ca<sup>2+</sup> assays, the effective amount of protonated pigment for the E292A mutant was approximately 73% of the wild type. Nevertheless, even after normalization for these differences, the Ca<sup>2+</sup> response amplitude of the E292A mutant remained significantly lower (~ 17% of wild type, compared to the observed 12% prior to normalization; Fig. 5 and Fig. S11B). These observations suggest that Glu292 serves not only as a counterion in the photoproduct but also plays an allosteric role in influencing G protein activation.”

      (5) The authors propose the counterion switching from a chloride ion to E292 upon light activation. A schematic drawing on the chromophore, a chloride ion, and E292 (and possible surroundings) in Antho2a and the photoproduct will aid readers' understanding.

      We thank the reviewer for this excellent suggestion. We have prepared a new figure with a schematic drawing of the environment of the protonated Schiff base depicting the counterion switch in Fig. S10.

      Reviewer #2 (Public review):

      Summary:

      This work reports the discovery of a new rhodopsin from reef-building corals that is characterized experimentally, spectroscopically, and by simulation. This rhodopsin lacks a carboxylate-based counterion, which is typical for this family of proteins. Instead, the authors find that a chloride ion stabilizes the protonated Schiff base and thus serves as a counterion.

      Strengths:

      This work focuses on the rhodopsin Antho2a, which absorbs in the visible spectrum with a maximum at 503 nm. Spectroscopic studies under different pH conditions, including the mutant E292A and different chloride concentrations, indicate that chloride acts as a counterion in the dark. In the photoproduct, however, the counterion is identified as E292.

      These results lead to a computational model of Antho2a in which the chloride is modeled in addition to the Schiff base. This model is improved using the hybrid QM/MM simulations. As a validation, the absorption maximum is calculated using the QM/MM approach for the protonated and deprotonated E292 residue as well as the E292A mutant. The results are in good agreement with the experiment. However, there is a larger deviation for ADC(2) than for sTD-DFT. Nevertheless, the trend is robust since the wt and E292A mutant models have similar excitation energies. The calculations are performed at a high level of theory that includes a large QM region.

      Weaknesses:

      I have a couple of questions about this study:

      We thank the reviewer for providing critical comments, particularly on the QM/MM calculations. We have carefully considered all comments and have addressed them as detailed below. Corresponding revisions have been made to the manuscript.

      (1) I find it suspicious that the absorption maximum is so close to that of rhodopsin when the counterion is very different. Is it possible that the chloride creates an environment for the deprotonated E292, which is the actual counterion?

      We think it is unlikely that the chloride ion merely facilitates deprotonation of Glu292 in such a way that it acts as the counterion of the dark state Antho2a. This conclusion is based on two results from our study. (1) λ<sub>max</sub> of wild type Antho2a in the dark is positively correlated with the ionic radius of the halide in the solution; the λ<sub>max</sub> is red shifted in the order Cl- < Br- < I- (Fig. 2E and F in the revised manuscript). This tendency is observed when the halide anion acts as a counterion of the protonated Schiff base (Blatz et al. Biochemistry 11: 848–855, 1972). (2) The QM/MM models of the dark state of Antho2a show that the calculated λ<sub>max</sub> of Antho2a with a protonated (neutral) Glu292 is much closer to the experimentally observed λ<sub>max</sub> than with a deprotonated (negatively charged) Glu292 (Fig. 4), suggesting that the Glu292 is likely to be protonated even in the presence of chloride ion. Therefore, we conclude that a solute anion, and not Glu292, acts as the counterion of the protonated Schiff base in the dark state of Antho2a. We have discussed this in the revised manuscript as follows:

      Lines 274 – 291

      “We found that the type of halide anions in the solution has a small but noticeable effect on the λ<sub>max</sub> values of the dark state of Antho2a. This is consistent with the effect observed in a counterion-less mutant of bovine rhodopsin, in which halide ions serve as surrogate counterions (Nathans, 1990; Sakmar et al., 1991). Similarly, our results align with earlier observations that the λ<sub>max</sub> of a retinylidene Schiff base in solution increases with the ionic radius of halides acting as hydrogen bond acceptors (i.e., I− > Br− > Cl−) (Blatz et al., 1972). In contrast, the λ<sub>max</sub> of halorhodopsin from Natronobacterium pharaonic does not clearly correlate with halide ionic radius (Scharf and Engelhard, 1994), as the halide ion in this case is not a hydrogen-bonding acceptor of the protonated Schiff base (Kouyama et al., 2010; Mizuno et al., 2018). Altogether, these findings support our hypothesis that in Antho2a, a solute halide ion forms a hydrogen bond with the Schiff base, thereby serving as the counterion in the dark state. Moreover, QM/MM calculations for the dark state of Antho2a suggest that Glu292 is protonated and neutral, further supporting the hypothesis that Glu292 does not serve as the counterion in the dark state. However, unlike dark state, Cl− has little to no effect on the visible light absorption of the photoproduct (Fig. S5). Therefore, we conclude that Cl− and Glu292, respectively, act as counterions for the protonated Schiff base of the dark state and photoproduct of Antho2a. This represents a unique example of counterion switching from exogeneous anion to a specific amino acid residue upon light irradiation (Fig. S10).”

      (2) The computational protocol states that water molecules have been added to the predicted protein structure. Are there water molecules next to the Schiff base, E292, and Cl-? If so, where are they located in the QM region?

      We have updated Fig. 4 to show amino acids and water molecules near the Schiff base, E292, and the chloride ion. These include Ser186, Tyr91, Ala94, Leu113, Cys187, and two water molecules coordinating the chloride ion. We have added following text in the “Structural modelling and QM/MM calculations of the dark state of Antho2a” section of the revised manuscript.

      Lines 220 – 223

      “The chloride ion is also coordinated by two water molecules and the backbone of Cys187 which is part of a conserved disulfide bridge (Fig. S2). The retinylidene Schiff base region also includes polar (Ser186, Tyr91) and non-polar (Ala94, Leu113) residues (Fig. 4).”

      Water molecules, which have been modelled by homology to other GPCR structures, were not included in the QM region. In the revised version of the manuscript, we clarify this point in the “Computational modelling and QM/MM calculations” section as follows.

      Lines 515 – 517

      “The retinal-binding pocket also contains predicted water molecules (modelled based on homologous GPCR structures) close to the Schiff base and the chloride ion which were not included in the QM region.”

      (3) If the E292 residue is the counterion in the photoproduct state, I would expect the retinal Schiff base to rotate toward this side chain upon isomerization. Can this be modeled based on the recent XFEL results on rhodopsin?

      The recent XFEL studies of rhodopsin reveal that at very early stages (1 ps after photoactivation), structural changes in retinal are limited primarily to the isomerization around the C11=C12 bond of the polyene chain, without significant rotation of the Schiff base.

      Although modelling of a later active state with planar retinal and a rotated Schiff base is feasible—e.g., guided by high-resolution structures of bovine rhodopsin’s Meta II state such as PDB ID: 3PQR, see Author response image 1 below—active states of GPCRs typically exhibit substantial conformational flexibility and heterogeneity, making the generation of precise structural models suitable for accurate QM/MM calculations challenging. Despite these uncertainties, this preliminary modelling does indicate that upon isomerization to the all-trans configuration, the retinal Schiff base would rotate towards E292, supporting our hypothesis that E292 serves as the counterion in the Antho2a photoproduct. This is now shown better in the revised Fig. S10.

      Author response image 1.

      Reviewer #3 (Public review):

      Summary:

      The paper by Saito et al. studies the properties of anthozoan-specific opsins (ASO-II) from organisms found in reef-building coral. Their goal was to test if ASO-II opsins can absorb visible light, and if so, what the key factors involved are.

      The most exciting aspect of this work is their discovery that ASO-II opsins do not have a counterion residue (Asp or Glu) located at any of the previously known sites found in other animal opsins.

      This is very surprising. Opsins are only able to absorb visible (long wavelength light) if the retinal Schiff base is protonated, and the latter requires (as the name implies) a "counter ion". However, the authors clearly show that some ASO-II opsins do absorb visible light.

      To address this conundrum, they tested if the counterion could be provided by exogenous chloride ions (Cl-). Their results find compelling evidence supporting this idea, and their studies of ASO-II mutant E292A suggest E292 also plays a role in G protein activation and is a counterion for a protonated Schiff base in the light-activated form.

      Strengths:

      Overall, the methods are well-described and carefully executed, and the results are very compelling.

      Their analysis of seven ASO-II opsin sequences undoubtedly shows they all lack a Glu or Asp residue at "normal" (previously established) counter-ion sites in mammalian opsins (typically found at positions 94, 113, or 181). The experimental studies clearly demonstrate the necessity of Cl- for visible light absorbance, as do their studies of the effect of altering the pH.

      Importantly, the authors also carried out careful QM/MM computational analysis (and corresponding calculation of the expected absorbance effects), thus providing compelling support for the Cl- acting directly as a counterion to the protonated retinal Schiff base, and thus limiting the possibility that the Cl- is simply altering the absorbance of ASO-II opsins through some indirect effect on the protein.

      Altogether, the authors achieved their aims, and the results support their conclusions. The manuscript is carefully written, and refreshingly, the results and conclusions are not overstated.

      This study is impactful for several reasons. There is increasing interest in optogenetic tools, especially those that leverage G protein-coupled receptor systems. Thus, the authors' demonstration that ASO-II opsins could be useful for such studies is of interest.

      Moreover, the finding that visible light absorbance by an opsin does not absolutely require a negatively charged amino acid to be placed at one of the expected sites (94, 113, or 181) typically found in animal opsins is very intriguing and will help future protein engineering efforts. The argument that the Cl- counterion system they discover here might have been a preliminary step in the evolution of amino acid based counterions used in animal opsins is also interesting.

      Finally, given the ongoing degradation of coral reefs worldwide, the focus on these curious opsins is very timely, as is the authors' proposal that the lower Schiff base pKa they discovered here for ASO-II opsins may cause them to change their spectral sensitivity and G protein activation due to changes in their environmental pH.

      We thank the reviewer for the comprehensive summary of the manuscript and for finding it well-described and impactful.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      (1) p. 5, l. 102: The authors obtained three absorption spectra out of seven. Did the authors examine the reasons for no absorption spectra for the remaining four proteins?

      We have not identified the reasons for the absence of detectable absorption spectra for the remaining four opsins. We speculate that this could result from poor retinal binding under detergent-solubilized conditions, but we have not directly tested this possibility.

      (2) p. 7, l. 141: The pH value is 7.5 in the text and 7.4 in Figure S4B.

      We thank the reviewer for finding this mistake. The correct value is 7.4 and we have revised the text accordingly.

      Reviewer #2 (Recommendations for the authors):

      The structures and the simulations should be made available to the reader by providing them in a repository.

      We have deposited the Antho2a models in Zenodo (https://zenodo.org/; an open-access repository for research data). We have added the following description in the “Data and materials availability” section of the revised manuscript.

      Lines 559 – 560

      “The structural models of wild type Antho2a with a neutral or charged Glu292 and the Antho2a E292A mutant are available in Zenodo (10.5281/zenodo.15064942).”

      Reviewer #3 (Recommendations for the authors):

      (1) In the homology models for the ASO-II opsins, are there any other possible residues that could act as counter-ion residues outside of the "normal" positions at 94, 113, or 181?

      We have updated Fig. 4 to show all residues near the retinylidene Schiff base region, which include Cl−, Glu292, Ser186, Tyr91, Ala94, Leu113, Cys187, and two water molecules.

      Apart from Cl− and Glu292, the homology models of the ASO-II opsins do not reveal any other candidate as the counterion of Schiff base. This is also suggested by the sequence alignment between opsins of the ASO-II group and other animal opsins in Fig. S2, where we show amino acid residues near the Schiff base (in addition to key motifs important for G protein activation).

      (2) It is mentioned that the ASO-II opsins do not appear to be bistable opsins in detergents - do these opsins show any ability to photo-switch back and forth when in cellular membranes?

      We have not directly tested whether Antho2a exhibits photo-switching in cellular membranes due to technical limitations associated with high light scattering in spectroscopic measurements. Instead, we recorded absorption spectra from crude extracts of detergent-solubilized cell membranes expressing Antho2a wild type (without purification) in the dark and after sequential light irradiation (Fig. S3C). This approach, which retains cellular lipids, can better preserve the photochemical properties of opsins, such as thermal stability and photoreactivity of their photoproducts, similar to intact cellular membranes. The first irradiation with green light (500 nm) led to a decrease in absorbance around the 550 nm region and an increase around the 450 nm region, indicating the formation of a photoproduct, consistent with observations using purified Antho2a.

      However, subsequent irradiation with violet light (420 nm) did not reverse these spectral changes but resulted in only a slight decrease in absorbance around 400 nm. Re-exposure to green light produced no further spectral changes aside from baseline distortions. These findings suggest that the Antho2a photoproduct has limited ability to revert to its original dark state under these conditions. Nevertheless, because detergent solubilization may influence these observations, further studies in intact cellular membranes using live-cell assay will be required to conclusively assess bistability or photo-switching properties.

      (3) The idea that E292 acts as a counterion for the protonated active state is intriguing - do the authors think the retinal decay process after light activation occurs with hydrolysis of the non-protonated form with subsequent retinal release?

      We thank the reviewer for raising this important question. We first examined whether the increased UV absorbance observed after incubating the photoproduct for 20 hours in the dark (Fig. S3D, E, violet curves) originated from free retinal released from the opsin pigment. Acid denaturation (performed at pH 1.9) of this photoproduct resulted in a main product absorbing around 400 nm (Fig. S3G). Typically, when retinal binds opsin via the Schiff base (whether protonated or deprotonated), acid denaturation traps the retinal chromophore as a protonated Schiff base, yielding an absorption spectrum with a λ<sub>max</sub> at approximately 440 nm, as observed in the dark state of Antho2a (Fig. S3F). Our results thus indicate that the UV absorbance in the photoproduct did not result from a deprotonated Schiff base but rather from retinal released during incubation. We have not directly tested whether the protonated or deprotonated form is more prone to retinal release. However, the decay of visible absorbance (associated with the protonated photoproduct) occurred more rapidly under alkaline conditions (pH 8.0), which generally favors deprotonation of the Schiff base (Fig. S3H). Thus, it is possible that the deprotonated photoproduct releases retinal more rapidly than the protonated form, but further studies are necessary to confirm this hypothesis.

      To answer the comments (2) and (3) by the reviewer, we have added new panels (C and F–H) to Fig. S3.

      We have revised the Results section as follows:

      Lines 136 – 141

      “The photoproduct remained stable for at least 5 minutes (Fig. S3A, curves 2 and 3) but did not revert to the original dark state upon subsequent irradiation (Fig. S3A and C). Instead, it underwent gradual decay accompanied by retinal release over time (Fig. S3D–G). These findings indicate that purified Antho2a is neither strictly bleach resistant nor bistable (see also Fig. S3 legend). We also observed that the protonated photoproduct decayed more rapidly at pH 8.0 (Fig. S3H) than at pH 6.5 (Fig. 3A, D, E).”

      Text:

      (4) Page 3, line 38. Consider defining eumetazoan (for lay readers).

      As suggested, we have defined eumetazoans and revised the sentence as follows:

      Lines 38 – 40

      “Opsins are present in the genomes of all eumetazoans (i.e., all animal lineages except sponges), and based on their phylogenetic relationships, they can be classified into eight groups…”

      (5) Page 3, line 42. "But, furthermore, ..." should be changed to either word alone.

      Revised as suggested.

      (6) Page 18, line 447. The HPLC method is well-described and helpful. If possible, please add a Reference, or indicate if this is a new variation of the method.

      This is a well-established method for analyzing the composition of retinal isomers bound to different states of rhodopsin pigments. We have now cited a reference describing the methodology (Terakita et al. Vision Res. 6: 639–652, 1989).

      (7) Page 11, line 267. "..type of halide anions in the solution affected the λ<sub>max</sub> values of the dark state of".

      Since the changes are not large (but clearly occur), consider changing this sentence to "..type of halide anions in the solution has a small but visible effect on the λ<sub>max</sub> values of the dark state ..."

      We have revised this sentence as suggested.

      Figures:

      (9) Consider combining Figure FS6 with Figure 2 (effect of anions on visible absorbance).

      As suggested, the previous Fig. S6 has been included in the main text as Fig. 2E and F in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable finding for the treatment of PCCs by sequencing 16 tumor specimens from five patients with pheochromocytomas by single-cell transcriptomics and proposing a new molecular classification criterion based on the sequencing results and characterization of tumor microenvironmental features. The evidence supporting the claims of the authors is solid, although the inclusion of more patient samples would strengthen the study's conclusions. The work will be of interest to clinicians or medical biologists working on rare pheochromocytomas (PCCs).

      Firstly, we sincerely appreciate the positive feedback from the editor and extend our gratitude to the three reviewers for their meticulous review and valuable comments. Our detailed responses to each recommendation are outlined below.

      Response to reviewers’ recommendations

      Reviewer #1 (Recommendations for The Authors):

      1) Transcriptomal clonal dynamics of different PCCs is well written. However for conclusion sample size needs to be more.

      Acknowledging the rarity of PCCs with an incidence of approximately 0.2 to 0.6 cases per 100,000 person-years (Farrugia & Charalampopoulos, 2019; Neumann et al, 2019), our study recognizes the limitation in sample size, as discussed in the limitations section (Page 22). In response to this concern, we are committed to undertaking further research with an expanded sample size to bolster the robustness of our conclusions, seeking a more comprehensive understanding of tumor microenvironment characterization and molecular classification in PCCs. We appreciate the valuable guidance provided by the reviewer.

      2) Clinical, biochemistry data of 5 cases can be analysed. Any findings in different categories as per postulated classification can be noted for further studies. Example: epinephrine levels

      We have now included the clinical information of 5 PCC patients, encompassing signs and symptoms, the tumor size, and laboratory test results in the revised manuscript as Supplemental Table S3 (Page 11-12). Notably, our analysis revealed that the kinase-type PCC patient (P4) exhibited higher blood pressures and plasma levels of catecholamine metabolites (3-methoxytyramine and normetanephrine) compared to metabolism-type PCC patients (P1-P3, and P5). This observation aligns with the elevated expression of phenylethanolamine N-methyltransferase (PNMT), an enzyme involved in the biosynthesis of catecholamine and linked to hypertension, in P4, as identified in the scRNA-seq data (Figure 4B and 4D) (Kennedy et al, 1993; Konosu-Fukaya et al, 2018; Nguyen et al, 2015). As suggested, we plan to conduct further research to explore the correlation of our molecular classification with plasma levels of catecholamine metabolites, and the relevant points have been discussed in the revision (Page 20).

      We would like to take this chance to again thank the reviewer for the careful review and very helpful guidance about how to improve our study.

      References for Reviewer #1:

      Farrugia FA, Charalampopoulos A (2019) Pheochromocytoma. Endocrine regulations 53: 191-212 Neumann HPH, Young WF, Jr., Eng C (2019) Pheochromocytoma and Paraganglioma. The New England journal of medicine 381: 552-565

      Kennedy B, Elayan H, Ziegler MG (1993) Glucocorticoid hypertension and nonadrenal phenylethanolamine N-methyltransferase. Hypertension (Dallas, Tex : 1979) 21: 415419

      Konosu-Fukaya S, Omata K, Tezuka Y, Ono Y, Aoyama Y, Satoh F, Fujishima F, Sasano H, Nakamura Y (2018) Catecholamine-Synthesizing Enzymes in Pheochromocytoma and Extraadrenal Paraganglioma. Endocrine pathology 29: 302309

      Nguyen P, Khurana S, Peltsch H, Grandbois J, Eibl J, Crispo J, Ansell D, Tai TC (2015) Prenatal glucocorticoid exposure programs adrenal PNMT expression and adult hypertension. The Journal of endocrinology 227: 117-127

      Reviewer #2 (Recommendations for The Authors):

      1) Please revise all references to "malignant potential", "malignant behavior", etc. throughout the article, including the abstract and introduction, and replace them with the word "metastasis" as appropriate. Since all PCCs are malignant non-epithelial neuroendocrine neoplasms originating from the paraganglia, which are themselves malignant tumors, it is unacceptable to describe them as "malignant potential" or "malignant potential". Please review the 2022 WHO/IARC classification and description of pheochromocytoma/paraganglioma (reference: Mete O, Asa SL, Gill AJ, Kimura N, de Krijger RR, Tischler A. Overview of the 2022 WHO Classification of Paragangliomas and Pheochromocytomas. Endocr Pathol. 2022;33(1):90-114. doi:10.1007/s12022-022-09704-6).

      As suggested, we have replaced all occurrences of “malignant potential” or “malignant behavior” with “metastasis” throughout the revised manuscript. We have also included a citation to the 2022 WHO/IARC classification for further clarity.

      • Similarly, it is not advisable to use the PASS score to predict "malignant" PCC; this type of scoring system evaluates the "metastasis risk" or the "metastasis potential" of PCC.

      We appreciate the reviewer for this insight and have revised our statements accordingly.

      • Also, "MALIGNANT CHAFFIN CELLS" needs to be modified; in fact, it is the "tumor cell of PCC" that the authors are trying to express.

      As suggested, we have amended the term “malignant chromaffin cells” to “PCC cells” in the revised manuscript (Page 9-10).

      2) How does the PASS score specifically relate to intra-tumor heterogeneity as reflected by scRNA-seq? In fact, the PASS score evaluates the histological or pathological invasiveness of PCC, and different sections of the same tumor tissue may have different histological manifestations, which may affect the score; however, scRNA-seq analyzes the cellular composition of the tumor, which is not the same as the information reflected by the PASS score. Both represent different levels and dimensions of intra-tumor heterogeneity and should be analyzed together. Please specifically list, one by one, the proportion of each item score of the PASS system and cell type of scRNA-seq for each sample and the results of the comparisons with each other to better present the conclusions.

      As suggested, we have included the proportion of each item score from the PASS system in the revised manuscript as Supplemental Table S2 (Page 8). Integrating this data with the cell type composition of each sample from Figure 2B, our analysis suggests that intra-tumor heterogeneity, as assessed by the PASS system, is more extensive compared to scRNA-seq. We concur with the reviewer’s judgement that scRNA-seq analysis and PASS score represent different levels and dimensions of intratumor heterogeneity, and we have adjusted our claim throughout the revised manuscript accordingly (Page 8, 9, and 19).

      3) Where is the specific mutation site of the VHL gene in patient 5? Please advise.

      The VHL gene mutation site, c.499C>T (missense mutation), in patient 5 was identified through whole exome sequencing (WES) analysis. We have now added the information to Supplemental Table S1 in the revised manuscript (Page 6).

      4) Please revise Supplementary Figure 1, the scale should not appear in the picture of the staining result of P5.

      As suggested, we have adjusted the position of the scale bar.

      Author response image 1.

      Hematoxylin-eosin staining and immunohistochemistry staining of CGA marker in formalin-fixed paraffin-embedded PCC tissue sections matched to scRNA-seq specimens. Scale bar, 100 μm.

      5) What were the clinical presentation and biochemical findings in the five patients?

      The information regarding tumor sizes, signs and symptoms, and plasma levels of catecholamine metabolites [3-methoxytyramine (3-MT), metanephrine (MN), and normetanephrine (NMN)] has been added to the revised manuscript as Supplemental Table S3 (Page 11-12).

      • Were there any preoperative symptoms of hypertension?

      With the exception of P2, preoperative symptoms of hypertension were observed in all PCC patients. The information has been added to the revised manuscript as Supplemental Table S3 (Page 11-12).

      • What was the size and catecholamine secretion phenotype of each tumor? What was the relationship between these data and the scRNA-seq results?

      The secretion phenotype showed that the kinase-type PCC patient (P4) exhibited higher plasma levels of catecholamine metabolites (3-methoxytyramine and normetanephrine) compared to metabolism-type PCC patients (P1-P3, and P5). This observation aligns with the elevated expression of phenylethanolamine Nmethyltransferase (PNMT), an enzyme involved in the biosynthesis of catecholamine and linked to hypertension, in P4, as identified in the scRNA-seq data (Figure 4B and 4D) (Kennedy et al, 1993; Konosu-Fukaya et al, 2018; Nguyen et al, 2015). Meanwhile, we have not observed the correlation between tumor sizes and molecular classification. We have now included tumor sizes and laboratory test results of 5 PCC patients in the revised manuscript as Supplemental Table S3 (Page 11-12), and the relevant points have been discussed in the revision (Page 20).

      6) Please revise Figure 1A, the meaning shown in the figure appears to dissociate the tissues of the patient's normal adrenal glands, which can be misleading.

      We appreciate the reviewer for raising this concern. The schematic in Figure 1A has been revised accordingly.

      Author response image 2.

      (1A) Schematic of the experimental pipeline. 11 tumor specimens and 5 adjacent normal adrenal medullary specimens were isolated from 5 PCC patients, dissociated into single-cell suspensions, and analyzed using 10x Genomics Chromium droplet scRNA-seq.

      • Please revise the figure note for Figure 1B, where the symbol (B) appears twice.

      As suggested, we have revised the figure legends for Figure 1B and 1C (Page 42).

      7) Please indicate in the figure legends and text what exactly is meant by "adjacent specimens"? medulla? cortex? normal tissue? I believe the authors mean adjacent normal adrenal medullary tissue, please check the article.

      As suggested, we have revised the term “adjacent specimens” to “adjacent normal adrenal medullary tissues” throughout the revised manuscript.

      8) Please review the pathologic diagnostic criteria of this study in light of the 2022 WHO/IARC guidelines for pathologic diagnosis: "For the pathological diagnosis, the inclusion criteria were neuroendocrine neoplasm originating from the adrenal medulla and retroperitoneal origin, i.e. pheochromocytoma and paraganglioma, with consistent morphologic and immunohistochemical confirmation in relevant cases and positivity for chromogranin A and synaptophysin. The exclusion criteria were adrenocortical neoplasm and metastatic tumors." It is not rigorous enough to diagnose a tumor as PCC based on positive CgA immunohistochemical staining results alone.

      We have revised the statements about pathologic diagnostic criteria in accordance with the suggestion and have cited the reference (Page 6).

      We would like to express our gratitude to the reviewer for the thorough review and invaluable guidance provided to enhance the quality of our study.

      References for Reviewer #2:

      Kennedy B, Elayan H, Ziegler MG (1993) Glucocorticoid hypertension and nonadrenal phenylethanolamine N-methyltransferase. Hypertension (Dallas, Tex: 1979) 21: 415419

      Konosu-Fukaya S, Omata K, Tezuka Y, Ono Y, Aoyama Y, Satoh F, Fujishima F, Sasano H, Nakamura Y (2018) Catecholamine-Synthesizing Enzymes in Pheochromocytoma and Extraadrenal Paraganglioma. Endocrine pathology 29: 302309

      Nguyen P, Khurana S, Peltsch H, Grandbois J, Eibl J, Crispo J, Ansell D, Tai TC (2015) Prenatal glucocorticoid exposure programs adrenal PNMT expression and adult hypertension. The Journal of endocrinology 227: 117-127

      Reviewer #3 (Recommendations For The Authors):

      I have several concerns and suggestions, which if addressed would improve the manuscript.

      1) The statements of “plasmas” in the manuscript and figures are confusing, which should be revised as “plasma cells”.

      As suggested, we have revised the terminology from “plasmas” to “plasma cells” throughout the revised manuscript and figures.

      2) The marker genes used for defining plasma cells (IGHG1 and IGLC2) showed low expressing percentage in Figure 1D. Please consider providing other genes as the marker of plasma cells.

      As suggested, we performed additional analysis to pinpoint marker genes for accurate definition of plasma cells. Applying stricter statistical criteria (cut-off pvalue < 0.05, log2 fold change ≥ 1.5, and expressing percentage ≥ 0.6), we identified XBP1 (a transcription factor playing key roles in the final stages of plasma cell development) and IGKC (a type of light-chain immunoglobulins) (Todd et al, 2009; Poulsen et al, 2002) as top significant differentially expressed genes (DEGs) suitable for defining plasma cells. These data are now presented as Figure 1D in the revised manuscript (Page 7).

      Author response image 3.

      (1D) Dot plot of representative marker genes for each cell type. The color scale represents the average marker gene expression level; dot size represents the percentage of cells expressing a given marker gene.

      3) The statement “Our clustering and cell type annotation analysis identified diverse adrenal cells, stromal cells, and immune cells within the PCC microenvironment” seems not be exhibited in Figure 1, so the clustering result of adrenal cells, stromal cells, and immune cells need to be added.

      As suggested, we performed clustering analysis for adrenal cells, stromal cells, and immune cells (including lymphocytes and myeloid cells), and visualized by the Uniform Manifold Approximation and Projection (UMAP) plot. These data have been added to the revised manuscript as Supplemental Figure S3 (Page 8).

      Author response image 4.

      Integration Analysis across 5 PCC Patients Revealing the Cell Type Composition of the PCC Microenvironment. UMAP plot depicting the distribution of adrenal cells, stromal cells, and immune cells (including lymphocytes and myeloid cells) within the PCC microenvironment.

      4) Given the classification of “metabolism-type PCCs” and “kinase-type PCCs” have not been presented in Figure 2D, the statement “Combined with our findings of a higher proportion of neutrophils and monocyts/macrophages in metabolism-type as compared with kinase-type” in Result 6 should be supported by using additional data.

      As suggested, we performed additional analysis to evaluate the proportion of neutrophils and monocytes/macrophages in metabolism-type and kinasetype PCC patients. These data have been added to the revised manuscript as Supplemental Figure S4 (Page 14).

      Author response image 5.

      The frequency distribution of cell types within the microenvironment of metabolism-type and kinase-type PCC patients.

      5) What makes the difference of scRNA-seq analysis and multispectral immunofluorescent staining in judging the immune escape of PCCs? Please provide an explanation.

      We appreciate the reviewer's concern. scRNA-seq lacks spatial details, and multispectral immunofluorescent staining is constrained in the number of detected proteins. To address this, we employed both methods for analysis. scRNA-seq revealed limited communication between tumor and T cells, with lower HLA-I expression in kinase-type PCCs compared to metabolism-type PCCs. This was supported by multispectral staining using antibodies against CD4+ T cells, CD8+ T cells, M1 macrophages, or M2 macrophages markers, indicating sparse immune cell infiltration around tumor cells, mainly in the stroma (Figure 7A and 7B). This dual approach strengthens our understanding of immune escape in both PCC types. The explanation has been added to the revised manuscript (Page 21).

      6) Figure 7G missed the scale bar for the staining results of marker proteins. Please add the scale bar into the figure.

      As suggested, we have added to the scale bar accordingly.

      7) In the method part of the manuscript, the authors should describe the minimum and maximum number used for quality control of the number of genes and the percentage of mitochondrial genes.

      For quality control, we established a minimum threshold of no less than 200 genes and a maximum threshold of no more than 5000 genes. Additionally, the quality control process included a maximum threshold of 30% for mitochondrial genes. These specific criteria have been added to the methods section of the revised manuscript (Page 25-26).

      We express our gratitude to the reviewer for their supportive recommendations and invaluable guidance on enhancing the rigor of our data.

      References for Reviewer #3:

      Todd DJ, McHeyzer-Williams LJ, Kowal C, Lee AH, Volpe BT, Diamond B, McHeyzer-Williams MG, Glimcher LH (2009) XBP1 governs late events in plasma cell differentiation and is not required for antigen-specific memory B cell development. The Journal of experimental medicine 206: 2151-2159

      Poulsen TS, Silahtaroglu AN, Gisselø CG, Tommerup N, Johnsen HE (2002) Detection of illegitimate rearrangements within the immunoglobulin light chain loci in B cell malignancies using end sequenced probes. Leukemia 16: 2148-2155

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Li et al. investigates the metabolism-independent role of nuclear IDH1 in chromatin state reprogramming during erythropoiesis. The authors describe accumulation and redistribution of histone H3K79me3, and downregulation of SIRT1, as a cause for dyserythropoiesis observed due to IDH1 deficiency. The authors studied the consequences of IDH1 knockdown, and targeted knockout of nuclear IDH1, in normal human erythroid cells derived from hematopoietic stem and progenitor cells and HUDEP2 cells respectively. They further correlate some of the observations such as nuclear localization of IDH1 and aberrant localization of histone modifications in MDS and AML patient samples harboring IDH1 mutations. These observations are intriguing from a mechanistic perspective and they hold therapeutic significance, however there are major concerns that make the inferences presented in the manuscript less convincing.

      (1) The authors show the presence of nuclear IDH1 both by cell fractionation and IF, and employ an efficient strategy to knock out nuclear IDH1 (knockout IDH1/ Sg-IDH1 and rescue with the NES tagged IDH1/ Sg-NES-IDH1 that does not enter the nucleus) in HUDEP2 cells. However, some important controls are missing.

      A) In Figure 3C, for IDH1 staining, Sg-IDH1 knockout control is missing.

      Thanks for the reviewer’s suggestion. We have complemented the staining of Sg-IDH1 knockout cells, and made corresponding revision in Figure 3C in the revised manuscript.

      B) Wild-type IDH1 rescue control (ie., IDH1 without NES tag) is missing to gauge the maximum rescue that is possible with this system.

      Thanks for the reviewer’s suggestion. We have overexpressed wild-type IDH1 in the IDH1-deficient HUDEP2 cell line and detected the phenotype. The results are presented in Supplementary Figure 9 in the revised manuscript. As shown in Supplementary Figure 9A, IDH1 deficiency resulted in reduced cell number in HUDEP2 cells, a phenotype that was rescued by overexpression of wild-type IDH1 but not by NES-IDH1. Given IDH1's well-established role in redox homeostasis through catalyzing isocitrate to α-KG conversion, we hypothesized that both wild-type IDH1 and NES-IDH1 overexpression would significantly restore α-KG levels compared to the IDH1-deficient group. Supplementary Figure 9B demonstrates that IDH1 depletion resulted in a dramatic decrease in α-KG levels, whereas overexpression of either wild-type IDH1 or NES-IDH1 almost completely restored α-KG levels, as anticipated. These results suggest that wild-type IDH1 overexpression can restore metabolic regulatory functions as effectively as NES-IDH1 overexpression. To investigate whether apoptosis contributes to the impaired cell expansion caused by IDH1 deficiency, we performed Annexin V/PI staining to quantify apoptotic cells. As shown in Supplementary Figure 9C and D, flow cytometry analysis revealed no significant changes in apoptosis rates following either IDH1 depletion or ectopic expression of wild-type IDH1 or NES-IDH1 in IDH1 deficient HUDEP2 cells.

      Flow cytometric analysis demonstrated that IDH1 deficiency triggered S-phase accumulation at day 8, indicative of cell cycle arrest. Whereas ectopic expression of wild-type IDH1 significantly rescued this cell cycle defect, overexpression of NES-IDH1 failed to ameliorate the S-phase accumulation phenotype induced by IDH1 depletion, as presented in Supplementary Figure 9E and F. Although NES-IDH1 overexpression rescued metabolic regulatory function defect, it failed to alleviate the cell cycle arrest induced by IDH1 deficiency. In contrast, wild-type IDH1 overexpression fully restored normal cell cycle progression. This functional dichotomy demonstrates that nuclear-localized IDH1 executes critical roles distinct from its cytoplasmic counterpart, and overexpression of wild-type IDH1 could efficient restore the functional impairment induced by depletion of nuclear localized IDH1.

      (2) Considering the nuclear knockout of IDH1 (Sg-NES-IDH1 referenced in the previous point) is a key experimental system that the authors have employed to delineate non-metabolic functions of IDH1 in human erythropoiesis, some critical experiments are lacking to make convincing inferences.

      A) The authors rely on IF to show the nuclear deletion of Sg-NES-IDH1 HUDEP2 cells. As mentioned earlier since a knockout control is missing in IF experiments, a cellular fractionation experiment (similar to what is shown in Figure 2F) is required to convincingly show the nuclear deletion in these cells.

      We sincerely thank the reviewer for raising this critical point. As suggested, we have performed additional IF experiments and cellular fractionation experiments to comprehensively address the subcellular localization of IDH1.

      The results of IF staining were shown in Figure 3C of the revised manuscript. In Control HUDEP2 cells, endogenous IDH1 was detected in both the cytoplasm and nucleus. This dual localization may reflect its dynamic roles in cytoplasmic metabolic processes and potential nuclear functions under specific conditions. In Sg-IDH1 cells (IDH1 knockout), IDH1 signal was undetectable, confirming efficient depletion of the protein. In Sg-NES-IDH1 cells (overexpressing NES-IDH1 in IDH1 deficient cells), IDH1 predominantly accumulated in the cytoplasm, consistent with the disruption of its nuclear export signal. The dual localization of IDH1 that was determined by IF staining experiment were then further confirmed by cellular fractionation assays, as shown in Figure 3D.

      B) Since the authors attribute nuclear localization to a lack of metabolic/enzymatic functions, it is important to show the status of ROS and alpha-KG in the Sg-NES-IDH1 in comparison to control, wild type rescue, and knockout HUDEP2 cells. The authors observe an increase of ROS and a decrease of alpha-KG upon IDH1 knockdown. If nuclear IDH1 is not involved in metabolic functions, is there only a minimal or no impact of the nuclear knockout of IDH1 on ROS and alpha-KG, in comparison to complete knockout? These studies are lacking.

      We appreciate the insightful suggestions of the reviewers and agree that the detection of ROS and alpha-KG is useful for the demonstration of the non-canonical function of IDH1. We examined alpha-KG concentrations in control, IDH1 knockout and nuclear IDH1 knockout HUDEP2 cell lines. The results showed a significant decrease in alpha-KG content after complete knockout of IDH1, whereas there was no significant change in nuclear knockout IDH1 (Supplementary Figure 9B). As to the detection of ROS level, the commercial ROS assay kits that we can get are detected using PE (Excitation: 565nm; Emission: 575nm) and FITC (Excitation: 488nm; Emission: 518nm) channels in flow cytometry. We constructed HUDEP2 cell lines of Sg-IDH1 and Sg-NES-IDH1 to express green fluorescent protein (Excitation: 488nm; Emission: 507nm) and Kusabira Orange fluorescent protein (Excitation: 500nm; Emission: 561nm) by themselves. Unfortunately, due to the spectral overlap of the fluorescence channels, we were unable to detect the changes in ROS levels in these HUDEP2 cell lines using the available commercial kit.

      (3) The authors report abnormal nuclear phenotype in IDH1 deficient erythroid cells. It is not clear what parameters are used here to define and quantify abnormal nuclei. Based on the cytospins (eg., Figure 1A, 3D) many multinucleated cells are seen in both shIDH1 and Sg-NES-IDH1 erythroid cells, compared to control cells. Importantly, this phenotype and enucleation defects are not rescued by the administration of alpha-KG (Figures 1E, F). The authors study these nuclei with electron microscopy and report increased euchromatin in Figure 4B. However, there is no discussion or quantification of polyploidy/multinucleation in the IDH1 deficient cells, despite their increased presence in the cytospins.

      A) PI staining followed by cell cycle FACS will be helpful in gauging the extent of polyploidy in IDH1 deficient cells and could add to the discussions of the defects related to abnormal nuclei.

      We appreciate the reviewer’s insightful suggestion. Since PI dye is detected using the PE channel (Excitation: 565nm; Emission: 575nm) of the flow cytometer and the HUDEP2 cell line expresses Kusabira orange fluorescent protein (Excitation: 500nm; Emission: 561nm), we were unable to use PI staining to detect the cell cycle. Edu staining is another commonly used method to determine cell cycle progression, and we performed Edu staining followed by flow cytometry analysis on Control, Sg-IDH1 and Sg-NES-IDH1 HUDEP2 cells, respectively. The results showed that complete knockdown of IDH1 resulted in S-phase block and increased polyploidy in HUDEP2 cells on day 8 of erythroid differentiation, and overexpression of IDH1-NES did not reverse this phenotype (Supplemental Figure 9E-F). Moreover, we have added a discussion of abnormal nuclei being associated with impaired erythropoiesis.

      B) For electron microscopy quantification in Figures 4B and C, how the quantification was done and the labelling of the y-axis (% of euchromatin and heterochromatin) in Figure 4 C is not clear and is confusingly presented. The details on how the quantification was done and a clear label (y-axis in Figure 4C) for the quantification are needed.

      Thanks for the reviewer's suggestion. In this study, we calculated the area of nuclear, heterochromatin and euchromatin by using Image J software. We addressed the quantification strategy in the section of Supplementary methods of the revised Supplementary file. In addition, the y-axis label in Figure 4C was changed to “the area percentage of euchromatin and heterochromatin’’.

      C) As mentioned earlier, what parameters were used to define and quantify abnormal nuclei (e.g. Figure 1A) needs to be discussed clearly. The red arrows in Figure 1A all point to bi/multinucleated cells. If this is the case, this needs to be made clear.

      We thank the reviewer for their suggestion. In our present study, nuclear malformations were defined as cells exhibiting binucleation or multinucleation based on cytospin analysis. A minimum of 300 cells per group were evaluated, and the proportion of aberrant nuclei was calculated as (number of abnormal cells / total counted cells) × 100%.

      (4) The authors mention that their previous study (reference #22) showed that ROS scavengers did not rescue dyseythropoiesis in shIDH1 cells. However, in this referenced study they did report that vitamin C, a ROS scavenger, partially rescued enucleation in IDH1 deficient cells and completely suppressed abnormal nuclei in both control and IDH1 deficient cells, in addition to restoring redox homeostasis by scavenging reactive oxygen species in shIDH1 erythroid cells. In the current study, the authors used ROS scavengers GSH and NAC in shIDH1 erythroid cells and showed that they do not rescue abnormal nuclei phenotype and enucleation defects. The differences between the results in their previous study with vitamin C vs GSH and NAC in the context of IDH1 deficiency need to be discussed.

      We appreciate the reviewer’s insightful observation. The apparent discrepancy between the effects of vitamin C (VC) in our previous study and glutathione (GSH)/N-acetylcysteine (NAC) in the current work can be attributed to divergent molecular mechanisms beyond ROS scavenging. A growing body of evidence has identified vitamin C as a multifunctional regulator. In addition to acting as an antioxidant maintaining redox homeostasis, VC also acts as a critical epigenetic modulator. VC have been identified as a cofactor for α-ketoglutarate (α-KG)-dependent dioxygenases, including TET2, which catalyzes 5-methylcytosine (5mC) oxidation to 5-hydroxymethylcytosine (5hmC) [1,2]. Structural studies confirm its direct interaction with TET2’s catalytic domain to enhance enzymatic activity in vitro [3]. The biological significance of the epigenetic modulation induced by vitamin C is illustrated by its ability to improve the generation of induced pluripotent stem cells and to induce a blastocyst-like state in mouse embryonic stem cells by promoting demethylation of H3K9 and 5mC, respectively [4,5]. In contrast, GSH and NAC are canonical ROS scavengers lacking intrinsic epigenetic-modifying activity. While they effectively neutralize oxidative stress (as validated by reduced ROS levels in our current data, Supplemental Figure 7), their inability to rescue nuclear abnormalities or enucleation defects in IDH1 deficient cells suggests that IDH1 deficiency-driven dyserythropoiesis is not solely ROS-dependent.

      References:

      (1) Blaschke K, Ebata KT, Karimi MM, Zepeda-Martínez JA, Goyal P, et al. Vitamin C induces Tet-dependent DNA demethylation and a blastocyst-like state in ES cells. Nature. 20138;500(7461): 222-226.

      (2) Minor EA, Court BL, Young JI, Wang G. Ascorbate induces ten-eleven translocation (Tet) methylcytosine dioxygenase-mediated generation of 5-hydroxymethylcytosine. J Biol Chem. 2013;288(19): 13669-13674.

      (3) Yin R, Mao S, Zhao B, Chong Z, Yang Y, et al. Ascorbic acid enhances Tet-mediated 5-methylcytosine oxidation and promotes DNA demethylation in mammals. J Am Chem Soc. 2013;135(28):10396-10403.

      (4) Esteban MA, Wang T, Qin B, Yang J, Qin D, et al. Vitamin C enhances the generation of mouse and human induced pluripotent stem cells. Cell Stem Cell. 2010;6(1):71-79.

      (5) Chung T, Brena RM, Kolle G, Grimmond SM, Berman BP, et al. Vitamin C promotes widespread yet specific DNA demethylation of the epigenome in human embryonic stem cells. Stem Cells. 2010;28(10):1848-1855.

      (5) The authors describe an increase in euchromatin as the consequential abnormal nuclei phenotype in shIDH1 erythroid cells. However, in their RNA-seq, they observe an almost equal number of genes that are up and down-regulated in shIDH1 cells compared to control cells. If possible, an RNA-Seq in nuclear knockout Sg-NES-IDH1 erythroid cells in comparison with knockout and wild-type cells will be helpful to tease out whether a specific absence of IDH1 in the nucleus (ie., lack of metabolic functions of IDH) impacts gene expression differently.

      Thanks for the reviewer's suggestion. ATAC-seq showed an increase in chromatin accessibility after IDH1 deletion, but the number of up-regulated genes was slightly larger than that of down-regulated genes, which may be caused by the metabolic changes affected by IDH1 deletion. In order to explore the effect of chromatin accessibility changes on gene expression after IDH1 deletion, we analyzed the changes in differential gene expression at the differential ATAC peak region (as shown in Author response image 1), and the results showed that the gene expression at the ATAC peak region with increased chromatin accessibility was significantly up-regulated. This may explain the regulation of chromatin accessibility on gene expression.

      Author response image 1.

      Box plots of gene expression differences of differential ATAC peaks located in promoter for the signal increasing and decreasing groups.

      (6) In Figure 8, the authors show data related to SIRT1's role in mediating non-metabolic, chromatin-associated functions of IDH1.

      A) The authors show that SIRT1 inhibition leads to a rescue of enucleation and abnormal nuclei. However, whether this rescues the progression through the late stages of terminal differentiation and the euchromatin/heterochromatin ratio is not clear.

      Thanks for the reviewer's suggestion. As shown in Supplementary Figure 14 and 15 in the revised Supplementary Data, our data showed that both the treatment of SRT1720 on normal erythroid cells and treatment of IDH1-deficient erythroid cells with SIRT1 inhibitor both have no effect on the terminal differentiation.

      (7) In Figure 4 and Supplemental Figure 8, the authors show the accumulation and altered cellular localization of H3K79me3, H3K9me3, and H3K27me2, and the lack of accumulation of other three histone modifications they tested (H3K4me3, H3K35me4, and H3K36me2) in shIDH1 cells. They also show the accumulation and altered localization of the specific histone marks in Sg-NES-IDH1 HUDEP2 cells.

      A) To aid better comparison of these histone modifications, it will be helpful to show the cell fractionation data of the three histone modifications that did not accumulate (H3K4me3, H3K35me4, and H3K36me2), similar to what was shown in Figure 4E for H3K79me3, H3K9me3, and H3K27me2).

      We appreciate the reviewer’s insightful suggestion. We collected erythroblasts on day 15 of differentiation from cord blood-derived CD34<sup>+</sup> hematopoietic stem cells to erythroid lineage and performed ChIP assay. As shown in Author response image 2, the results showed that the concentration of bound DNA of H3K9me3, H3K27me2 and H3K79me3 was too low to meet the sequencing quality requirement, which was consistent with that of WB. In addition, we tried to perform ChIP-seq for H3K79me3, and the results showed that there was no marked peak signal.

      Author response image 2.

      ChIP-seq analysis show that there was no marked peak signal of H3K79me3 on D15. (A) Quality control of ChIP assay for H3K9me3, H3K27me2, and H3K79me3. (B) Representative peaks chart image showed normalized ChIP signal of H3K79me3 at gene body regions. (C) Heatmaps displayed normalized ChIP signal of H3K79me3 at gene body regions. The window represents ±1.5 kb regions from the gene body. TES, transcriptional end site; TSS, transcriptional start site.

      C) Among the three histone marks that are dysregulated in IDH1 deficient cells (H3K79me3, H3K9me3, and H3K27me2), the authors show via ChIP-seq (Fig5) that H3K79me3 is the critical factor. However, the ChIP-seq data shown here lacks many details and this makes it hard to interpret the data. For example, in Figure 5A, they do not mention which samples the data shown correspond to (are these differential peaks in shIDH1 compared to shLuc cells?). There is also no mention of how many replicates were used for the ChIP seq studies.

      We thank the reviewer for pointing this out. We apologize for not clearly describing the ChIP-seq data for H3K9me3, H3K27me2 and H3K79me3 and we have revised them in the corresponding paragraphs. Since H3 proteins gradually translocate from the nucleus to the cytoplasm starting at day 11 (late Baso-E/Ploy-E) of erythroid lineage differentiation, we performed ChIP-seq for H3K9me3, H3K27me2 and H3K79me3 only for the shIDH1 group, and set up three independent biological replicates for each of them.

      Reviewer #2 (Public Review):

      Li and colleagues investigate the enzymatic activity-independent function of IDH1 in regulating erythropoiesis. This manuscript reveals that IDH1 deficiency in the nucleus leads to the redistribution of histone marks (especially H3K79me3) and chromatin state reprogramming. Their findings suggest a non-typical localization and function of the metabolic enzyme, providing new insights for further studies into the non-metabolic roles of metabolic enzymes. However, there are still some issues that need addressing:

      (1) Could the authors show the RNA and protein expression levels (without fractionation) of IDH1 on different days throughout the human CD34+ erythroid differentiation?

      We sincerely appreciate the reviewer’s constructive feedback. To address this point, we have now systematically quantified IDH1 expression dynamics across erythropoiesis stages using qRT-PCR and Western blot analyses. As quantified in Supplementary fige 1, IDH1 expression exhibited a progressive upregulation during early erythropoiesis and subsequently stabilized throughout terminal differentiation.

      (2) Even though the human CD34+ erythroid differentiation protocol was published and cited in the manuscript, it would be helpful to specify which erythroid stages correspond to cells on days 7, 9, 11, 13, and 15.

      We sincerely thank the reviewer for raising this important methodological consideration. Our research group has previously established a robust platform for staged human erythropoiesis characterization using cord blood-derived CD34<sup>+</sup> hematopoietic stem cells (HSCs) [6-9]. This standardized protocol enables high-purity isolation and functional analysis of erythroblasts at defined differentiation stages.

      Thanks for the reviewer’s suggestion. Our previous work (Jingping Hu et.al, Blood 2013. Xu Han et.al, Blood 2017.Yaomei Wang et.al, Blood 2021.) have isolation and functional characterization of human erythroblasts at distinct stages by using Cord blood. These works illustrated that using cord blood-derived hematopoietic stem cells and purification each stage of human erythrocytes can facilitate a comprehensive cellular and molecular characterization.

      Following isolation from cord blood, CD34<sup>+</sup> cells were cultured in a serum-free medium and induced to undergo erythroid differentiation using our standardized protocol. The process of erythropoiesis was comprised of 2 phases. During the early phase (day 0 to day 6), hematopoietic stem progenitor cells expanded and differentiated into erythroid progenitors, including BFU-E and CFU-E cells.

      During terminal erythroid maturation (day 7 to day 15), CFU-E cells progressively transitioned through defined erythroblast stages, as validated by daily cytospin morphology and expression of band 3/α4 integrin. The stage-specific composition was quantitatively characterized as follows:

      Author response table 1.

      The composition of erythroblast during terminal stage erythropoiesis.

      This differentiation progression from proerythroblasts (Pro-E) through basophilic (Baso-E), polychromatic (Poly-E), to orthochromatic erythroblasts (Ortho-E) recapitulates physiological human erythropoiesis, confirming the validity of our differentiation system for mechanistic studies.

      Reference:

      (6) Li J, Hale J, Bhagia P, Xue F, Chen L, et al. Isolation and transcriptome analyses of human erythroid progenitors: BFU-E and CFU-E. Blood. 2014;124(24):3636-3645.

      (7) Hu J, Liu J, Xue F, Halverson G, Reid M, et al. Isolation and functional characterization of human erythroblasts at distinct stages: implications for understanding of normal and disordered erythropoiesis in vivo. Blood. 2013;121(16):3246-3253.

      (8) Wang Y, Li W, Schulz VP, Zhao H, Qu X, et al. Impairment of human terminal erythroid differentiation by histone deacetylase 5 deficiency. Blood. 2021;138(17):1615-1627.

      (9) Li M, Liu D, Xue F, Zhang H, Yang Q, et al. Stage-specific dual function: EZH2 regulates human erythropoiesis by eliciting histone and non-histone methylation. Haematologica. 2023;108(9):2487-2502.

      (3) It is important to mention on which day the lentiviral knockdown of IDH1 was performed. Will the phenotype differ if the knockdown is performed in early vs. late erythropoiesis? In Figures 1C and 1D, on which day do the authors begin the knockdown of IDH1 and administer NAC and GSH treatments?

      We sincerely appreciate the reviewer’s inquiry regarding the experimental timeline. The day of getting CD34<sup>+</sup> cells was recorded as day 0. Lentivirus of IDH1-shRNA and Luciferase -shRNA was transduced in human CD34<sup>+</sup> at day 2. Puromycin selection was initiated 24h post-transduction to eliminate non-transduced cells. IDH1-KD cells were then selected for 3 days. The knock down deficiency of IDH1 was determined on day 7. NAC or GSH was added to culture medium and replenished every 2 days.

      (4) While the cell phenotype of IDH1 deficiency is quite dramatic, yielding cells with larger nuclei and multi-nuclei, the authors only attribute this phenotype to defects in chromatin condensation. Is it possible that IDH1-knockdown cells also exhibit primary defects in mitosis/cytokinesis (not just secondary to the nuclear condensation defect)?), given the function of H3K79Me in cell cycle regulation?

      We appreciate the reviewer’s insightful suggestion. We performed Edu based cell cycle analysis on Control, Sg-IDH1 and Sg-NES-IDH1 HUDEP2 cells, respectively. The results showed that IDH1 deficiency resulted in S-phase block and increased polyploidy in HUDEP2 cells on day 8 of erythroid differentiation. NES-IDH1 overexpression failed to rescue these defects, indicating nuclear IDH1 depletion as the primary driving factor (Figure 3E,F). Recent studies have established a clear link between cell cycle arrest and nuclear malformation. Chromosome mis-segregation, nuclear lamina disruption, mechanical stress on the nuclear envelope, and nucleolar dysfunction all contribute to nuclear abnormalities that trigger cell cycle checkpoints [10,11]. Based on all these findings, it reasonable for us to speculate that the cell cycle defect in IDH1 deficient cells might caused by the nuclear malfunction.

      Reference:

      (10) Hong T, Hogger AC, Wang D, Pan Q, Gansel J, et al. Cell Death Discov. CDK4/6 inhibition initiates cell cycle arrest by nuclear translocation of RB and induces a multistep molecular response. 2024;10(1):453.

      (11) Hervé S, Scelfo A, Marchisio GB, Grison M, Vaidžiulytė K, et al. Chromosome mis-segregation triggers cell cycle arrest through a mechanosensitive nuclear envelope checkpoint. Nat Cell Biol. 2025;27(1):73-86. 

      (5) Why are there two bands of Histone H3 in Figure 4A?

      We sincerely appreciate the reviewer's insightful observation regarding the dual bands of Histone H3 in our original Figure 4A. Upon rigorous investigation, we identified that the observed doublet pattern likely originated from the inter-batch variability of the original commercial antibody. To conclusively resolve this technical artifact, we have procured a new lot of Histone H3 antibody and repeated the western blot experimental under optimized conditions, and the results demonstrates a single band of H3.

      (6) Displaying a heatmap and profile plots in Figure 5A between control and IDH1-deficient cells will help illustrate changes in H3K79me3 density in the nucleus after IDH1 knockdown.

      Thank you for your suggestion. As presented in Author response image 2, we performed ChIP assays on erythroblasts collected at day 15. However, the concentration of H3K79me3-bound DNA was insufficient to meet the quality threshold required for reliable sequencing. Consequently, we are unable to generate the requested heatmap and profile plots due to limitations in data integrity from this experimental condition.

      Reviewer #3 (Public Review):

      Li, Zhang, Wu, and colleagues describe a new role for nuclear IDH1 in erythroid differentiation independent from its enzymatic function. IDH1 depletion results in a terminal erythroid differentiation defect with polychromatic and orthochromatic erythroblasts showing abnormal nuclei, nuclear condensation defects, and an increased proportion of euchromatin, as well as enucleation defects. Using ChIP-seq for the histone modifications H3K79me3, H3K27me2, and H3K9me3, as well as ATAC-seq and RNA-seq in primary CD34-derived erythroblasts, the authors elucidate SIRT1 as a key dysregulated gene that is upregulated upon IDH1 knockdown. They furthermore show that chemical inhibition of SIRT1 partially rescues the abnormal nuclear morphology and enucleation defect during IDH1-deficient erythroid differentiation. The phenotype of delayed erythroid maturation and enucleation upon IDH1 shRNA-mediated knockdown was described in the group's previous co-authored study (PMID: 33535038). The authors' new hypothesis of an enzyme- and metabolism-independent role of IDH1 in this process is currently not supported by conclusive experimental evidence as discussed in more detail further below. On the other hand, while the dependency of IDH1 mutant cells on NAD+, as well as cell survival benefit upon SIRT1 inhibition, has already been shown (see, e.g, PMID: 26678339, PMID: 32710757), previous studies focused on cancer cell lines and did not look at a developmental differentiation process, which makes this study interesting.

      (1) The central hypothesis that IDH1 has a role independent of its enzymatic function is interesting but not supported by the experiments. One of the author's supporting arguments for their claim is that alpha-ketoglutarate (aKG) does not rescue the IDH1 phenotype of reduced enucleation. However, in the group's previous co-authored study (PMID: 33535038), they show that when IDH1 is knocked down, the addition of aKG even exacerbates the reduced enucleation phenotype, which could indicate that aKG catalysis by cytoplasmic IDH1 enzyme is important during terminal erythroid differentiation. A definitive experiment to test the requirement of IDH1's enzymatic function in erythropoiesis would be to knock down/out IDH1 and re-express an IDH1 catalytic site mutant. The authors perform an interesting genetic manipulation in HUDEP-2 cells to address a nucleus-specific role of IDH1 through CRISPR/Cas-mediated IDH1 knockout followed by overexpression of an IDH1 construct containing a nuclear export signal. However, this system is only used to show nuclear abnormalities and (not quantified) accumulation of H3K79me3 upon nuclear exclusion of IDH1. Otherwise, a global IDH1 shRNA knockdown approach is employed, which will affect both forms of IDH1, cytoplasmic and nuclear. In this system and even the NES-IDH1 system, an enzymatic role of IDH1 cannot be excluded because (1) shRNA selection takes several days, prohibiting the assessment of direct effects of IDH1 loss of function (only a degron approach could address this if IDH1's half-life is short), and (2) metabolic activity of this part of the TCA cycle in the nucleus has recently been demonstrated (PMID: 36044572), and thus even a nuclear role of IDH1 could be linked to its enzymatic function, which makes it a challenging task to separate two functions if they exist.

      We appreciate the reviewer’s emphasis on rigorously distinguishing between enzymatic and enzymatic independent roles of IDH1. In our revised manuscript, we have removed all assertions of a "metabolism-independent" mechanism. Instead, we focus on demonstrating that nuclear-localized IDH1 contributes to chromatin state regulation during terminal erythropoiesis (e.g., H3K79me3 accumulation). While we acknowledge that nuclear IDH1’s enzymatic activity may still play a role [12], our data emphasize its spatial association with chromatin remodeling. We now explicitly state that nuclear IDH1’s function may involve both enzymatic and structural roles, and further studies are required to dissect these mechanisms.

      Reference:

      (12) Kafkia E, Andres-Pons A, Ganter K, Seiler M, Smith TS, et al.Operation of a TCA cycle subnetwork in the mammalian nucleus. Sci Adv. 2022;8(35):eabq5206.

      (2) It is not clear how the enrichment of H3K9me3, a prominent marker of heterochromatin, upon IDH1 knockdown in the primary erythroid culture (Figure 4), goes along with a 2-3-fold increase in euchromatin. Furthermore, in the immunofluorescence (IF) experiments presented in Figure 4Db, it seems that H3K9me3 levels decrease in intensity (the signal seems more diffuse), which seems to contrast the ChIP-seq data. It would be interesting to test for localization of other heterochromatin marks such as HP1gamma. As a related point, it is not clear at what stage of erythroid differentiation the ATAC-seq was performed upon luciferase- and IDH1-shRNA-mediated knockdown shown in Figure 6. If it was done at a similar stage (Day 15) as the electron microscopy in Figure 4B, then the authors should explain the discrepancy between the vast increase in euchromatin and the rather small increase in ATAC-seq signal upon IDH1 knockdown.

      Thank you for raising this important point. We agree that while H3K9me3 and H3K27me2 modifications are detectable in the nucleus, their functional association with chromatin in this context remains unclear. Our ChIP-seq data did not reveal distinct enrichment peaks for H3K9me3 or H3K27me2 (unlike the well-defined H3K79me3 peaks), suggesting that these marks may not be stably bound to specific chromatin regions under the experimental conditions tested. However, we acknowledge that the absence of clear peaks in our dataset does not definitively rule out chromatin interactions, as technical limitations or transient binding dynamics could influence these results. To avoid over-interpretation, we have removed speculative statements about the chromatin-unbound status of H3K9me3 and H3K27me2 from the revised manuscript. This revision aligns with our broader effort to present conclusions strictly supported by the current data while highlighting open questions for future investigation.

      (3)The subcellular localization of IDH1, in particular its presence on chromatin, is not convincing in light of histone H3 being enriched in the cytoplasm on the same Western blot. H3 would be expected to be mostly localized to the chromatin fraction (see, e.g., PMID: 31408165 that the authors cite). The same issue is seen in Figure 4A.

      We sincerely appreciate the reviewer's insightful comment regarding the subcellular distribution of histone H3 in our study. We agree that histone H3 is classically associated with chromatin-bound fractions, and its cytoplasmic enrichment in our Western blot analyses appears counterintuitive at first glance. However, this observation is fully consistent with the unique biology of terminal erythroid differentiation, which involves drastic nuclear remodeling and histone release - a hallmark of terminal stage erythropoiesis. Terminal erythroid differentiation is characterized by progressive nuclear condensation, chromatin compaction, and eventual enucleation. During this phase, global chromatin reorganization leads to the active eviction of histones from the condensed nucleus into the cytoplasm. This process has been extensively documented in erythroid cells, with studies demonstrating cytoplasmic accumulation of histones H3 and H4 as a direct consequence of nuclear envelope breakdown and chromatin decondensation preceding enucleation [13-16]. Our experiments specifically analyzed terminal-stage polychromatic and orthochromatic erythroblasts. At this stage, histone releasing into the cytoplasm is a dominant biological event, explaining the pronounced cytoplasmic H3 signal in our subcellular fractionation assays.

      In summary, the cytoplasmic enrichment of histone H3 in our data aligns with established principles of erythroid biology and reinforces the physiological relevance of our findings. We thank the reviewer for raising this critical point, which allowed us to better articulate the unique aspects of our experimental system.

      Reference:

      (13) Hattangadi SM, Martinez-Morilla S, Patterson HC, Shi J, Burke K, et al. Histones to the cytosol: exportin 7 is essential for normal terminal erythroid nuclear maturation. Blood. 2014;124(12):1931-1940.

      (14) Zhao B, Mei Y, Schipma MJ, Roth EW, Bleher R, et al. Nuclear Condensation during Mouse Erythropoiesis Requires Caspase-3-Mediated Nuclear Opening. Dev Cell. 2016;36(5): 498-510.

      (15) Zhao B, Liu H, Mei Y, Liu Y, Han X, et al. Disruption of erythroid nuclear opening and histone release in myelodysplastic syndromes. Cancer Med. 2019;8(3):1169-1174. 

      (16) Zhen R, Moo C, Zhao Z, Chen M, Feng H, et al.  Wdr26 regulates nuclear condensation in developing erythroblasts. Blood. 2020;135(3):208-219.

      (4) This manuscript will highly benefit from more precise and complete explanations of the experiments performed, the material and methods used, and the results presented. At times, the wording is confusing. As an example, one of the "Key points" is described as "Dyserythropoiesis is caused by downregulation of SIRT1 induced by H3K79me3 accumulation." It should probably read "upregulation of SIRT1".

      We sincerely thank the reviewer for highlighting the need for improved clarity in our experimental descriptions and textual precision. We fully agree that rigorous wording is essential to accurately convey scientific findings. Specific modifications have been made and are highlighted in Track Changes mode in the resubmitted manuscript.

      The reviewer correctly identified an inconsistency in the original phrasing of one key finding. The sentence in question ("Dyserythropoiesis is caused by downregulation of SIRT1 induced by H3K79me3 accumulation") has been revised to:"Dyserythropoiesis is caused by the upregulation of SIRT1 mediated through H3K79me3 accumulation." This correction aligns with our experimental data showing that H3K79me3 elevation promotes SIRT1 transcriptional activation. We apologize for this oversight and have verified the consistency of all regulatory claims in the text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It will be helpful to mention/introduce the cells used for the study at the beginning of the results section. For example, for Figure 1A neither the figure legend nor the results text includes information on the cells used.

      Thanks for the reviewer’s suggestion. The detail information of the cells that were used in our study have been provided in the revised manuscript.

      (2) Important details for many figures are lacking. For example, in Figure 5, there is no mention of the replicates for ChIP-Seq studies. Also, the criteria used for quantifications of abnormal nuclei, % euchromatin vs heterochromatin, the numbers of biological replicates, and how many fields/cells were used for these quantifications are missing.

      We thank the reviewer for emphasizing the importance of methodological transparency. It has been revised accordingly. The ChIP-Seq data in Figure 5 was generated from three independent biological replicates to ensure reproducibility. In this study, Image J software was used to calculate the area of nuclear, heterochromatin/euchromatin and to quantify the percentage of euchromatin and heterochromatin. A minimum of 300 cells per group were evaluated, and the proportion of aberrant nuclei was calculated as (number of abnormal cells / total counted cells) × 100%.

      (3) It will be helpful if supplemental data are ordered according to how they are discussed in the text. Currently, the order of the supplemental data is hard to keep track of eg., the results section starts describing supplemental Figure 1, then the text jumps to supplemental Figure 5 followed by Supplemental Figure 3 (and so on).

      Thanks for the reviewer’s suggestion. It has been revised accordingly.

      (4) Overall, there are many incomplete sentences and typos throughout the manuscript including some of the figures e.g. on page 10 the sentence "Since the generation of erythroid with abnormal nucleus and reduction of mature red blood cells caused by IDH1 absence are notable characteristics of MDS and AML." is incomplete. On page 11, it reads "Histone post-modifications". This needs to be either histone modifications or histone post-translational modifications. In Figure 4C, the y-axis title is hard to understand "% of euchromatin and heterochromatin". Overall, the document needs to be proofread and revised carefully.

      Thanks for the reviewer’s suggestion. We have made revision accordingly in the revised manuscript. The sentence "Since the generation of erythroid with abnormal nucleus and reduction of mature red blood cells caused by IDH1 absence are notable characteristics of MDS and AML." has been revised to “The production of erythrocytes with abnormal nuclei and the reduction of mature erythrocytes due to IDH1 deletion are prominent features of MDS and AML.”  “% of euchromatin and heterochromatin” has been modified to “Area ratio of euchromatin to heterochromatin”.

      Reviewer #3 (Recommendations For The Authors):

      The following critique points aim to help the authors to improve their manuscript:

      (1) The authors reason (p. 10) that because mutant IDH1 has been shown to result in altered chromatin organization, this could be the case in their system, too. However, mutant IDH1 has an ascribed metabolic consequence, the generation of 2-HG, which further weakens the author's argument for an enzymatically independent role of IDH1 in their system. The same is true for the author's observation in Supplementary Figure 9B that in IDH1-mutant AML/MDS samples, H3K79me3 colocalized with the IDH1 mutants in the nucleus. Again, this speaks in favor of IDH1's role being linked to metabolism. The authors could re-write this manuscript, not so much emphasizing the separation of function between different subcellular forms of IDH1 but rather focusing on the chromatin changes and how they could be linked to the actual phenotype, the nuclear condensation and enucleation defect - if so, addressing the surprising finding of enrichment of both active and repressive chromatin marks will be important.

      Thanks for the reviewer’s suggestion. We agree with the reviewers and editors all the data we present in the current are not robust enough to rigorously distinguish between enzymatic and enzymatic-independent roles of IDH1. In our revised manuscript, we have removed all assertions of a "metabolism-independent" mechanism. Instead, we focus on demonstrating that nuclear-localized IDH1 contributes to chromatin state regulation during terminal erythropoiesis (e.g., H3K79me3 accumulation).

      (2) How come so many genes were downregulated by RNA-seq (about an equal number as upregulated genes) but not more open by ATAC-seq? The authors should discuss this result.

      Thanks for the reviewer's suggestion. ATAC-seq showed an increase in chromatin accessibility after IDH1 deletion, but the number of up-regulated genes was slightly larger than that of down-regulated genes, which may be caused by the metabolic changes affected by IDH1 deletion. In order to explore the effect of chromatin accessibility changes on gene expression after IDH1 deletion, we analyzed the changes in differential gene expression at the differential ATAC peak region (as shown in the figure below), and the results showed that the gene expression at the ATAC peak region with increased chromatin accessibility was significantly up-regulated. This may explain the regulation of chromatin accessibility on gene expression.

      (3) For the ChIP-seq analyses of H3K79me3, H3K27me2, and H3K9me3, the authors should not just show genome-wide data but also several example gene tracks to demonstrate the differential abundance of peaks in control versus IDH1 knockdown. Furthermore, the heatmap shown in Figure 5A should include broader regions spanning the gene bodies, to visualize the intergenic H3K27me2 and H3K9me3 peaks. Expression could very well be regulated from these intergenic regions as they could bear enhancer regions. ChIP-seq for H3K27Ac in the same setting would be very useful to identify those enhancers.

      Thanks for the reviewer’s suggestion. It has been revised accordingly. We reanalyzed the ChIP-seq peak signal of H3K79me3, H3K27me2 and H3K9me3 in a wider region (±5Kb) at gene body, and the results showed that the H3K27me2 and H3K9me3 peak signals did not change significantly. Since H3K79me3 showed a higher peak signal and was mainly enriched in the promoter region, our subsequent analysis focusing on the impact of H3K79me3 accumulation on chromatin accessibility and gene expression might be more valuable.

      Author response image 3.

      ChIP-seq analysis show that the peak signal of H3K79me3,H3K27me2 and H3K9me3. (A) Heatmaps displayed normalized ChIP signal of H3K9me3, H3K27me2, and H3K79me3 at gene body regions. The window represents ±5 kb regions from the gene body. TES, transcriptional end site; TSS, transcriptional start site. (B) Representative peaks chart image showed normalized ChIP signal of H3K9me3, H3K27me2, and H3K79me3 at gene body regions.

      (4) The absent or very mild delay (also no significance visible in the quantification plots) in the generation of orthochromatic erythroblasts on Day 13 upon IDH1 shRNA knockdown as per a4-integrin/Band3 flow cytometry does not correspond to the already quite prominent number of multinucleated cells at that stage seen by cytospin/Giemsa staining. Why do the authors think this is the case? Cytospin/Giemsa staining might be the better method to quantify this phenotype and the authors should quantify the cells at different stages in at least 100 cells from non-overlapping cytospin images.

      Thanks for the reviewer’s suggestion. We have supplemented the cytpspin assay and the results were presented in Supplemental Figure 4.

      (5) The pull-down assay in Figure 7E does not show a specific binding of H3K79me3 to the SIRT1 promoter. Rather, there is just more H3K79me3 in the nucleus, thus leading to generally increased binding. The authors should show that H3K79me3 does not bind more just everywhere but to specific loci. The ChIP-seq data mention only categories but don't show any gene lists that could hint at the specificity of H3K79me3 binding at genes that would promote nuclear abnormalities and enucleation defects.

      We thank the reviewer for pointing this out. The GSEA results of H3K79me3 peak showed enrichment of chromatin related biological processes, and the list of associated genes is shown Figure 7B. In addition, we also displayed the changes in H3K79me3 peak signals, ATAC peak signals, and gene expression at gene loci of three chromatin-associated genes (SIRT1, KMT5A and NUCKS1).

      (6) P. 12: "Representatively, gene expression levels and ATAC peak signals at SIRT1 locus were elevated in IDH1-shRNA group and were accompanied by enrichment of H3K9me3 (Figure 7F)." Figure 7F does not show an enrichment of H3K9me3, but if the authors found such, they should explain how this modification correlates with the activation of gene expression.

      Thank you for bringing this issue to our attention. We sincerely apologize for the mistake in the description of Figure 7F on page 12. We have already corrected this error in the revised manuscript.

      (7) Related to the mild phenotype by flow cytometry on Day 13, are the "3 independent biological replicates" from culturing and differentiating CD34 cells from 3 different donors? If all are from the same donor, experiments from at least a second donor should be performed to generalize the results.

      In our current study, CD34<sup>+</sup> cells were derived from different donors. 

      (8) If the images in Supplementary Figure 4 are only the indicated cell type, then it is not clear how the data were quantified since only some cells in each image are pointed at and others do not seem to have as large nuclei. There is also no explanation in the legend what the colors mean (nuclei were presumably stained with DAPI, not clear what the cytoplasm stain is - GPA?).

      We thank the reviewer for pointing this out. We have revised the manuscript accordingly. Specifically, the nuclei was stained with DAPI and the color was blue. The cell membrane was stained with GPA and the color was red. This staining method allows for clear visualization of the cell structure and helps to better understand the localization of the proteins of interest.

      (9) It is not clear to this reviewer whether Figure 4F is a quantification of the Western Blot or of the IF data.

      Figure 4F is a quantification of the Western Blot experiment.

      (10) The authors sometimes do not describe experiments well, e.g., "treatment of IDH1-deficient erythroid cells with IDH1-EX527" (p. 13). EX-527 is a SIRT1 inhibitor, which the authors only explicitly mention later in that paragraph. It is unclear to this reviewer, why the authors call it IDH1-EX527.

      Thank you for pointing out the unclear description in our manuscript. We apologize for the confusion caused by the unclear statement. We have revised the manuscript accordingly. The compound EX-527 is a SIRT1 inhibitor, and we have corrected the description to simply "EX-527" in the revised manuscript.

      (11) The end of the introduction needs revising to be more concise; the last paragraph on p. 4 ("Recently, the decreased expression of IDH1...") partially should be integrated with the previous paragraph, and partially is repeated in the last paragraph (top paragraph on p. 5). The last sentence on p. 4, "These findings strongly suggest that aberrant expression of IDH1 is also an important factor in the pathogenesis of AML and MDS.", should rather read "increased expression of IDH1", to distinguish it from mutant IDH1 (mutant IDH1 is also aberrantly expressed IDH1).

      We appreciated the reviewer for the helpful suggestion. Considering that the inclusion of this paragraph did not provide a valuable contribution to the formulation of the scientific question, we have removed it after careful consideration, and the revised manuscript is generally more logically smooth.

      (12) Abstract and last sentence of the introduction: "innovative perspective" should be re-worded, as the authors present data, not a perspective. Maybe could use "evidence".

      Thanks for the reviewer’s suggestion. It has been revised accordingly.

      (13) "IDH1-mut AML/MDS" on p. 11. The authors should provide more information about these AML/MDS samples. The legend contains no information about them/their mutational status. How many samples did the authors look at? Do these cells contain mutations other than IDH1?

      Thanks for the reviewer’s suggestion. The detail information of these AML/MDS samples are provide in supplemental table 1. In our current study, we collected ten AML/MDS samples and the majority of the samples only contain IDH1 mutations at different sites.

      (14) The statement, "Taken together, these results indicated that IDH1 deficiency reshaped chromatin states and subsequently altered gene expression pattern, especially for genes regulated by H3K79me3, which was the mechanism underlying roles of IDH1 in modulation of terminal erythropoiesis." (p. 10), is not correct at that point in the manuscript as the authors have not yet introduced the RNA-seq data.

      Thanks for the reviewer’s suggestion. The statement has been revised to “Taken together, these results indicated that IDH1 deficiency reshaped chromatin states by altering the abundance and distribution of H3K79me3, which was the mechanism underlying roles of IDH1 in modulation of terminal erythropoiesis”.

      (15) For easier readability, the authors should present the data in order. For example, the supplemental data for IDH shRNA and siRNA should be presented together and not in Supplementary Figures 1 and 5. Supplementary Figure 3 is mentioned after Supplementary Figure 1, but before Supplementary Figure 2 - again, all data need to be presented in subsequent figures to be viewed together.

      Thank you for your suggestion regarding the order of data presentation. We have reorganized the figures in the manuscript to improve readability. We apologize for any confusion caused by the previous arrangement and hope that the revised version meets your expectations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Fallah and colleagues characterize the connectivity between two basal ganglia output nuclei, the SNr and GPe, and the pedunculopontine nucleus, a brainstem nucleus that is part of the mesencephalic locomotor region. Through a series of systematic electrophysiological studies, they find that these regions target and inhibit different populations of neurons, with anatomical organization. Overall, SNr projects to PPN and inhibits all major cell types, while the GPe inhibits glutamatergic and GABAergic PPN neurons, and preferentially in the caudal part of the nucleus. Optogenetic manipulation of these inputs had opposing effects on behavior - SNr terminals in the PPN drove place aversion, while GPe terminals drove place preference.

      Strengths:

      This work is a thorough and systematic characterization of a set of relatively understudied circuits. They build on the classic notions of basal ganglia connectivity and suggest a number of interesting future directions to dissect motor control and valence processing in brainstem systems. We thank the reviewer for these positive comments.

      Weaknesses:

      Characterization of the behavioral effects of manipulations of these PPN input circuits could be further parsed, for a better understanding of the functional consequences of the connections demonstrated in the ephys analyses.

      We have further analyzed our behavioral data to reveal more nuanced functional effects and included these analyses in Figure S2.

      All the cell type recording studies showing subtle differences in the degree of inhibition and anatomical organization of that inhibition suggest a complex effect of general optogenetic manipulation of SNr or GPe terminals in the PPN. It will be important to determine if SNr or GPe inputs onto a particular cell type in PPN are more or less critical for how the locomotion and valence effects are demonstrated here.

      This is a really interesting future direction and we have expanded on these points in the discussion in lines 771-772 and 782-785.

      Reviewer #1 (Recommendations for the authors):

      (1) Overall these are really valuable studies and help set up a number of future directions.

      We thank the reviewer for their positive comments.

      (2) I don't have many specific suggestions, but more examples of viral targeting and cell type targeting, including potentially some validation of the genetic identity of the cells targeted, could be useful for considering the details of the ephys experiments.

      We agree that understanding which exact SNr and GPe neurons go to which exact PPN populations is an important next step and are planning to conduct future experiments investigating these important questions. Others have found that there is minimal overlap between the three cell types within the PPN discussed in this manuscript (Wang and Morales 2009; Yoo et al. 2017; Steinkellner, Yoo, and Hnasko 2019). One important line of future investigations is to look at the specific inputs onto recently identified subsets of the glutamatergic PPN neurons such as Chx10- and Rbp4-expressing neurons (Goñi-Erro et al. 2023; Ferreira-Pinto et al. 2021). We hope to explore the electrophysiological properties and connectivity of these subtypes in future projects.

      (3) More discussion of which PPN cell types might be mediating the optogenetic behavioral effects of bulk SNr or GPe terminal stimulation would be useful for connecting the ephys results with the behavior.

      We are also interested in the question of which PPN cell type is most critical for mediating the effects observed in bulk terminal stimulation. While the best experiment would be to stimulate the axons projecting to each specific cell type of the PPN, this is not currently possible due to methodological limitations and lack of studies dissecting which SNr and GPe subpopulations project to each cell type of the PPN. However, in future studies, we plan to leverage the ability of AAV1 to jump a synapse along with Cre/Flp viruses and mouse lines to selectively inhibit cholinergic, GABAergic, or glutamatergic PPN neurons that receive GPe or SNr input to elucidate the contribution of each cell type in mediating behavioral changes in movement and valence processing.

      To address these important future directions, we have added additional text in the discussion in lines 771-772 and 782-785.

      Reviewer #2 (Public review):

      Summary:

      Fallah et al carefully dissect projections from SNr and GPe - two key basal ganglia nuclei - to the PPN, an important brainstem nucleus for motor control. They consider inputs from these two areas onto 3 types of downstream PPN neurons: GABAergic, glutamatergic, and cholinergic neurons. They also carefully map connectivity along the rostrocaudal axis of the PPN.

      Strengths:

      The slice electrophysiology work is technically well done and provides useful information for further studies of PPN. The optogenetics and behavioral studies are thought-provoking, showing that SNr and GPe projections to PPN play distinct roles in behavior.

      We appreciate the reviewer’s positive evaluation.

      Weaknesses:

      Although the optogenetics and behavioral studies are intriguing, they are somewhat difficult to fit together into a specific model of circuit function. Perhaps the authors can work to solidify the connection between these two arms of the work.

      We have expanded on these topics in the discussion.

      Otherwise, there are a few questions whose answers could add context to the interpretation of these results:

      (1) Male and female mice are used, but the authors do not discuss any analysis of sex differences. If there are no sex differences, it is still useful to report data disaggregated by sex in addition to pooled data.

      We have added a supplementary figure (Figure S2) showing distance traveled during optical stimulation for male and female mice.

      (2) There is some lack of clarity in the current manuscript on the ages used - 2-5 months vs "at least 7 weeks." Is 7 weeks the time of virus injection surgery, then recordings 3 weeks later (at least 10 weeks)? Please clarify if these ages apply equally to electrophysiological and behavioral studies. If the age range used for the test is large, it may be useful to analyze and report if there are age-related effects.

      Thanks for pointing this out, we have clarified this in the methods. 7 weeks is the youngest age at which mice used for electrophysiology were injected, and all were used for electrophysiology between 2-5 months. For behavior, the youngest mice used were 11 weeks old at time of behavior (8 weeks old at injection). Mice in the GPe-stimulated condition were 110 ± 7.4 SEM days old and mice in the SNr-stimulated condition 132 ± 23.4 SEM days old. We have added these details to the revised manuscript in lines 913 and 963-964.

      In addition, we have correlated distance traveled at baseline and during stimulation with age for both SN and GPe stimulated conditions. Baseline distance traveled did not correlate with age, but there was a trend toward more movement during stimulation with older mice in the SN axon stimulation group. We have included these plots in supplemental Figure S2.

      (3) Were any exclusion criteria applied, e.g. to account for missed injections?

      All injection sites and implant sites were within our range of acceptability, so we did not exclude any mice for missed injections or incorrect implant location.

      (4) 28-34 degC is a fairly wide range of temperatures for electrophysiological recording, which could affect kinetics.

      This is an important consideration, and we agree the wide temperature is not optimal. We have plotted our main measurement of current amplitude in the condition where we found significant differences between rostral and caudal PPN (SNr to Vglut2 PPN neurons) against temperature and found no correlation (Pearson’s r value = -0.0076). Similarly, we found no correlation between baseline (pre-opto) firing frequency and temperature (r = -0.068). See Author response image 1.

      Author response image 1.

      (5) It would be good to report the number of mice used for each condition in addition to n=cells. Statistically, it would be preferable not to assume that each cell from the same mouse is an independent measurement and to use a nested ANOVA.

      For electrophysiology, the number of mice used in each experiment was 6 (3 male, 3 female). In the manuscript ‘N’ represents number of mice and ‘n’ represents number of cells. Because of the unpredictability of how many healthy cells can be recorded from one mouse, our data were planned to be collected with n=cells, and are underpowered for a nested ANOVA.

      However, in many cases, rostral and caudal data were collected from the same mice. While we do not have sufficient paired data for each electrophysiological parameter, analyzing one of our main and most important findings with a paired comparison (with biological replicates being mice) shows a statistically significant difference in the inhibitory effect of SNr axon stimulation on firing rate between rostral and caudal glutamatergic neurons (p=0.031, Wilcoxon signed rank test). See Author response image 2.

      Author response image 2.

      Reviewer #3 (Public review):

      Summary:

      The study by Fallah et al provides a thorough characterization of the effects of two basal ganglia output pathways on cholinergic, glutamatergic, and GABAergic neurons of the PPN. The authors first found that SNr projections spread over the entire PPN, whereas GPe projections are mostly concentrated in the caudal portion of the nucleus. Then the authors characterized the postsynaptic effects of optogenetically activating these basal ganglia inputs and identified the PPN's cell subtypes using genetically encoded fluorescent reporters. Activation of inputs from the SNr inhibited virtually all PPN neurons. Activation of inputs from the GPe predominantly inhibited glutamatergic neurons in the caudal PPN, and to a lesser extent GABAergic neurons. Finally, the authors tested the effects of activating these inputs on locomotor activity and place preference. SNr activation was found to increase locomotor activity and elicit avoidance of the optogenetic stimulation zone in a real-time place preference task. In contrast, GPe activation reduced locomotion and increased the time in the RTPP stimulation zone.

      Strengths:

      The evidence of functional connectivity of SNr and GPe neurons with cholinergic, glutamatergic, and GABAergic PPN neurons is solid and reveals a prominent influence of the SNr over the entire PPN output. In addition, the evidence of a GPe projection that preferentially innervates the caudal glutamatergic PPN is unexpected and highly relevant for basal ganglia function.

      Opposing effects of two basal ganglia outputs on locomotion and valence through their connectivity with the PPN.

      Overall, these results provide an unprecedented cell-type-specific characterization of the effects of basal ganglia inputs in the PPN and support the well-established notion of a close relationship between the PPN and the basal ganglia.

      We thank the reviewer for their positive comments.

      Weaknesses:

      The behavioral experiments require further analysis as some motor effects could have been averaged out by analyzing long segments.

      We have further analyzed our motor effects and included these analyses in supplemental figure S2 in the revised manuscript.

      Additional controls are needed to rule out a motor effect in the real-time place preference task.

      To address this comment, we analyzed the second day of RTPP, where no stimulation was applied in either chamber. Specifically, we evaluated the time spent in the stimulated chamber during the first minute of the unstimulated RTPP task. We found that the mice that had SNr axon stimulation still avoided the previously stimulated chamber and the mice that had GPe axon stimulation still preferred the previously stimulated chamber. These data have been added to Figure 7 and in the results section lines 564-575.

      Importantly, the location of the stimulation is not reported even though this is critical to interpret the behavioral effects.

      The implant locations were generally over the middle-to-rostral PPN and we will clarify this in the revised manuscript. These locations are shown in figure 7B.

      There are some concerns about the possible recruitment of dopamine neurons in the SNr experiments.

      We have added experiments stimulating the SNc dopaminergic neuron axons in the PPN and found very interesting behavioral effects. These are described in more detail below and in the results lines 595-624. These data are also included in Figure S3.

      Reviewer #3 (Recommendations for the authors):

      (1) Locomotor activity should be analyzed as trial averages instead of session averages. The effect of SNr on locomotion might be showing a rebound of activity in cholinergic neurons, which innervate dopamine neurons and induce locomotion. Furthermore, the variability between animals should be reported, Figure 7C doesn't show a standard deviation.

      This is an important point and could reveal different early and late effects of basal ganglia axon stimulation. We have added a time course graph of the trial averages for the distance traveled in the open field with higher temporal resolution (10s vs 1min). This is included in supplemental Figure S2A&B.

      The variability between animals was shown as shaded area, but was too light and transparent so it was difficult to see in Figure 7C. We have changed this shading to error bars for better visibility.

      (2) SNc projects to the PPN. It has recently been shown that PPN neurons respond robustly to dopaminergic activation, including effects on motor activity (Juarez Tello et al., 2024). The transductions shown in Figure S1 clearly cover to entirety of the SNc. Dopamine blockers should be used in the ex vivo experiments to rule out dopaminergic effects.

      This is an important point and one we were particularly interested in as far as the behavioral experiments. We thank the reviewer for bringing this up because it led us to a really interesting result. We have now run an additional experiment using DAT-cre mice and a cre-dependent ChR2 using the same injection site at our constitutive ChR2 experiments. We found that selectively stimulating the SNc dopaminergic axons replicates the increased locomotion at high laser power and replicates the no change in locomotion at low laser power as seen with our constitutive ChR2 experiment. However, the selective dopaminergic axon activation in the PPN is rewarding at both high and low power, while the constitutive ChR2 activation is aversive. We have added these data to supplemental figure S3, and have added text in the results (lines 595624) and discussion (lines 695-734) about this new exciting finding.

      While we can’t exclude the possibility of dopamine influence on the electrophysiology experiments (via changes in input resistance or channel properties), the fast synaptic currents measured are uncharacteristic of inhibitory D2 receptor currents (which would be slow), and are inhibited by the GABAa receptor blocker, GABAzine.

      (3) Activation of glutamatergic neurons in the caudal PPN elicits locomotion while the same stimulation in the rostral PPN terminates locomotion. In line with this, the authors report important differences in glutamatergic neurons in the rostral vs caudal PPN (Fig. 5). For the behavioral experiments, the location of the optic fiber is not reported. This is essential for the interpretation of the behavioral experiments. Based on the recent literature, inhibiting glutamatergic neurons in the rostral and in the caudal PPN will produce opposing effects.

      We absolutely agree the rostral and caudal PPN differences are functionally important. In Figure 7B, we have mapped the location of the optical fiber tip for each experiment. Our implant location was generally in the rostral-middle part of the PPN and we have added this to the methods section of the revised manuscript in lines 887 and 1048. While we did not have many implant locations that were specifically rostral or specifically caudal, we did evaluate the behavioral response for our most rostrally-located implant and our most-caudally located implant in the SN axon stimulation experiment. We found that low-power laser activation of nigral axons in the most rostral implant resulted in increased locomotion but in the most caudal implant resulted in decreased locomotion. This increased locomotion exactly what we would expect when rostral PPN neurons (that normally inhibit movement) are preferentially inhibited, and decreased locomotion is what we would expect when caudal PPN neurons (that normally promote movement) are inhibited. Future experiments using more precise rostral and caudal implant locations will be needed to fully parse out the functional role of rostral vs caudal PPN. See Figure S4 (two green implant sites are circled for one mouse because the implants were bilateral).

      (4) Even though the authors made an effort to dissect out the motor component during the RTPP task, this was not entirely achieved. Low laser power was still able to decrease activity following GPe stimulation, causing the animal to spend more time in the stimulated compartment. It is not clear the reason for using RTPP as opposed to CPP, which will not have the confound of the effects on motor activity. The interpretation of these data is problematic.

      This is an important consideration, and the reviewer is correct that we can’t completely eliminate a motor contribution to our RTPP experiment. We attempted to minimize potential motor confounds by utilizing unilateral stimulation and our supplemental videos show that the mice can escape the stimulated chamber.

      However, to address this comment, we analyzed the second day of RTPP, where no stimulation was applied in either chamber. Specifically, we evaluated the time spent in the stimulated chamber during the first minute of the unstimulated RTPP task. We found that the mice that had SN stimulation still avoided the previously stimulated chamber and the mice that had GPe axon stimulation still preferred the previously stimulated chamber. These data have been added as Figure 7G and in the results section lines 564-575.

      (5) The resting membrane potential for cholinergic, glutamatergic, and GABAergic neurons is not reported.

      Since a majority of PPN neurons are spontaneously active, we have reported the average membrane voltage during the pre-optical stimulation period in supplementary table 1.

      (6) During the RTPP, the animals were stimulated unilaterally with the purpose of reducing the optogenetic effects on locomotion, but no data support this claim. Please report the locomotor measurements during unilateral stimulation.

      To address this comment, we have analyzed the speed of the mouse in each compartment (stimulated vs non-stimulated) during the RTPP task. We found that the mean speed does differ, in the direction expected (i.e., mice are on average slower in the GPe stimulated zone where they spend more time, and mice are on average faster in the SNr stimulated zone where they spend less time). This is expected because when the mouse spends more time in a zone, it is more likely to spend time grooming or staying still, but it could still be evidence of motor response to the stimulation. To evaluate how fast the mouse is able to move with and without unilateral stimulation, we measured maximum speed in the stimulated and unstimulated zone. We found that maximum speed does not differ between stimulated and unstimulated zones in either the SNr or GPe group. See Author response image 3.

      Author response image 3.

      (7) Given the similarity of the parameters evaluated for all three PPN cell types, the results could be presented in a table, it will be easier to summarize.

      This is a good point and we have added supplemental tables 1-4 for key electrophysiological findings.

      (8) The text is repetitive in some parts.

      We have gone through the results to edit out repetitive text. For example, lines 244-260 and 274-287 have been rewritten for clarity and efficiency.

      (9) Lines 609-620: the behavioral effects after SNr stimulation are not mediated by the PPN, please correct.

      We have corrected this.

      (10) The number of patched GABAergic neurons in the caudal PPN is almost double the number of patched neurons in the rostral PPN. This contrasts with the high density of GABAergic neurons in the rostral PPN reported in the literature, and therefore, the probability of recording GABAergic neurons will be much higher in the rostral PPN. Please comment.

      It is true that there are more GABA neurons in the rostral region, but on a sagittal slice, the rostral region occupies a smaller area compared to caudal and there is a notable cluster of GABAergic neurons in the caudal region (Mena-Segovia et al. 2009). The number of visible and healthy cells with obvious fluorescence against background fluorescence in the heavily myelinated tissue of the PPN is unpredictable and it is possible that the dense number of GABA neurons in the rostral region conglomerates the fluorescence of individual cell somas, making it difficult to detect as many rostral neurons. While we did our best to equally patch rostral and caudal neurons based on our best judgment during the experiment, neurons were ultimately designated as ‘rostral’ or ‘caudal’ after post-hoc staining for the cholinergic neurons, as described below.

      (11) Describe how the rostral and caudal PPN regions were defined and how the authors ensured consistency across recordings.

      We have added more details about the definition of rostral vs caudal PPN in to the methods in lines 1042-1053.

      (12) Please report the proportion of GABAergic neurons showing STD vs STP for rostral and caudal PPN. The data in Figure 3 might be averaging out some important differences. Figure 3L suggests some differences in the proportions.

      The variability within the GABAergic population was really interesting and we plan to pursue this in the future. We have defined STD as PPR<0.95 and STP as PPR>1.05 and added the proportions of caudal and rostral GABAergic PPN neurons with each type of short-term synaptic plasticity to lines 253-257.

      (13) Please report whether the mice’s compartment preferences during the habituation were taken into account for the selection of the laser-on compartment.

      Mice were not habituated to the chamber in the unstimulated condition prior to the RTPP experiment. Laser-on side was randomly chosen and counter-balanced between mice. Mice were also randomly assigned to have low laser power RTPP first or high laser power RTPP first. In each case, mice were given an unstimulated 10-minute trial on the day between the first and second RTPP experiment to ‘unlearn’ which side was stimulated and the second RTPP experiment stimulated the opposite chamber compared to the first RTPP experiment. For example, one mouse would have high power stimulation on the striped side on day 1, no stimulation on day 2, and low power stimulation on the spotted side on day 3. This is now explained more thoroughly in lines 564-575 and lines 992-998.

      (14) Some references to figure panels are missing in the text.

      We have carefully reviewed the manuscript to ensure figure panels are referenced in the text.

      (15) The interpretation in lines 724-725 is not supported by the data given that GPe inputs to cholinergic neurons are negligible.

      We have reworded much of the discussion.

      (16) Some parts of the discussion should go into the “ideas and speculation” subsection of the discussion.

      We have rewritten sections of the discussion.

      References:

      Ferreira-Pinto, Manuel J., Harsh Kanodia, Antonio Falasconi, Markus Sigrist, Maria S. Esposito, and Silvia Arber. 2021. “Functional Diversity for Body Actions in the Mesencephalic Locomotor Region.” Cell 184 (17): 4564-4578.e18. https://doi.org/10.1016/j.cell.2021.07.002.

      Goñi-Erro, Haizea, Raghavendra Selvan, Vittorio Caggiano, Roberto Leiras, and Ole Kiehn. 2023. “Pedunculopontine Chx10+ Neurons Control Global Motor Arrest in Mice.” Nature Neuroscience 26 (9): 1516–28. https://doi.org/10.1038/s41593-023-01396-3.

      Mena-Segovia, J., B. R. Micklem, R. G. Nair-Roberts, M. A. Ungless, and J. P. Bolam. 2009. “GABAergic Neuron Distribution in the Pedunculopontine Nucleus Defines Functional Subterritories.” The Journal of Comparative Neurology 515 (4): 397–408. https://doi.org/10.1002/cne.22065.

      Steinkellner, Thomas, Ji Hoon Yoo, and Thomas S. Hnasko. 2019. “Differential Expression of VGLUT2 in Mouse Mesopontine Cholinergic Neurons.” eNeuro, July. https://doi.org/10.1523/ENEURO.0161-19.2019.

      Wang, Hui-Ling, and Marisela Morales. 2009. “Pedunculopontine and Laterodorsal Tegmental Nuclei Contain Distinct Populations of Cholinergic, Glutamatergic and GABAergic Neurons in the Rat.” The European Journal of Neuroscience 29 (2): 340–58. https://doi.org/10.1111/j.1460-9568.2008.06576.x.

      Yoo, Ji Hoon, Vivien Zell, Johnathan Wu, Cindy Punta, Nivedita Ramajayam, Xinyi Shen, Lauren Faget, Varoth Lilascharoen, Byung Kook Lim, and Thomas S. Hnasko. 2017. “Activation of Pedunculopontine Glutamate Neurons Is Reinforcing.” The Journal of Neuroscience: The Official Journal of the Society for Neuroscience 37 (1): 38–46. https://doi.org/10.1523/JNEUROSCI.3082-16.2016.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer 1 (Public review):

      Summary:

      This paper attempts to measure the complex changes of consciousness in the human brain as a whole. Inspired by the perturbational complexity index (PCI) from classic research, authors introduce simulation PCI (_s_PCI) of a time series of brain activity as a measure of consciousness. They first use large-scale brain network modeling to explore its relationship with the network coupling and input noise. Then the authors verify the measure with empirical data collected in previous research.

      Strengths:

      The conceptual idea of the work is novel. The authors measure the complexity of brain activity from the perspective of dynamical systems. They provide a comparison of the proposed measure with four other indexes. The text of this paper is very concise, supported by experimental data and theoretical model analysis.

      We would like to thank the reviewer for evaluation of our work and the positive feedback. In what follows we would like to clarify the ambiguities in our initial submission, and the respective changes to the manuscript.

      (1) Consciousness is a network phenomenon. The measure defined by the authors is to consider the maximal sPCI across the nodes stimulated. This measure is based on the time series of one node. The measure may be less effective in quantifying the ill relationship between nodes. This may contribute to the less predictive power of anesthesia (Figure 4b).

      Thank you for this comment, consciousness is indeed a network phenomenon. sPCI is in fact measured across the whole network: to compute sPCI we apply PCI to simulated activity of the whole network. The perturbation is applied to individual nodes of network (different node for each trial) and each time, the response to the stimulus is measured through sPCI in the whole network. To make this explicit, the relevant section now reads:

      “In line with the PCI experimental protocol, we sampled from multiple initial conditions and stimulated regions, presenting the maximum sPCI for each regime (i.e., each {G,σ}). For each simulation, we measured the complexity of the activity of the whole network over a 10-second period post-stimulus.”

      (2) One of the focuses of the work is the use of a dynamic model of brain networks. The explanation of the model needs to be in more detail.

      Thank you for your feedback. We expanded the method section.

      (3) The equations should be checked. For example, there should be no max on the left side of the first equation on page 13.

      We thank the reviewer for spotting this typo, and we removed the max on the left side of this equation, and also checked all the other equations for correctness. The equation now reads:

      (4) The quality of the figures should be improved.

      Thank you for your comment. We have made adjustments to several figures and we hope that they are clearer now...

      (5) Figure 4 should be discussed and analyzed more in the text.

      Thank you for pointing this out. We added the following paragraph discussing the figure (now number 5) in the results section:

      “Classification results using a linear SVM are reported in Fig. 5. We report the crossplots of PCI and each of the resting-state metrics for all subjects and conditions in Fig. 5a. Each point corresponds to the calculation of the given metric over the whole recording normalized by its duration. We find that for fluidity (Fig. 5a, third panel), there is a perfect linear separation between Propofol and Xenon anesthesia on the left side and Wakefulness and Ketamine anesthesia on the right side. This corresponds to the classification accuracy result of 100% for the consciousness class in Fig. 5b, which is the same for PCI. As expected, PCI and fluidity behave poorly at classifying the presence of an anesthetic agent due to the confusion induced by Ketamine. However, the size of the functional repertoire performs an almost perfect classification for this grouping. Only one subject under Ketamine has a high functional repertoire (Fig. 5a, left panel), but all other subjects in the anesthesia condition have a size of functional repertoire roughly under 100. Classification accuracies for complexity and GAP at the group level are less performant but are shown for completeness.”

      (6) The usage of the terms PCI and sPCI should be distinguished.

      We would like to thank the reviewer for pointing out this ambiguity. The PCI metric had to be adapted for the synthetic data. We have now further emphasized this in the methods sections – “Perturbational Complexity”.

      Reviewer 2 (Public review):

      Summary:

      Breyton and colleagues analysed the emergent dynamics from a neural mass model, characterised the resultant complexity of the dynamics, and then related these signatures of complexity to datasets in which individuals had been anaesthetised with different pharmacological agents. The results provide a coherent explanation for observations associated with different time series metrics, and further help to reinforce the importance of modelling when integrating across scientific studies.

      Strengths:

      (1) The modelling approach was clear, well-reasoned, and explicit, allowing for direct comparison to other work and potential elaboration in future studies through the augmentation with richer neurobiological detail.

      (2) The results serve to provide a potential mechanistic basis for the observation that the Perturbational Complexity Index changes as a function of the consciousness state.

      We would like to thank the reviewer for assessing our work, and the valuable feedback.

      Weaknesses:

      (3) Coactivation cascades were visually identified, rather than observed through an algorithmic lens. Given that there are numerous tools for quantifying the presence/absence of cascades from neuroimaging data, the authors may benefit from formalising this notion.

      Thank you for bringing this to our attention. We added a quantification of the cascades in Fig 2 and 3. We computed the absolute value of the mean signal across sources (following z-scoring) to obtain a cascade profile and calculated the area under the curve as quantification of the overall presence of cascades. As it can be seen in the two figures, the presence of cascades is the highest around the working point. We have also added the precise definition to the methods section, which now reads:

      “Coactivation Cascades

      The profile of cascades over time was computed, first by z-scoring each source activity, and second by averaging the absolute value of the activity across all sources. The quantification of cascades was then obtained by calculating numerically the Area Under the Curve (AUC) of the profile of cascades.”

      (4) It was difficult to tell, graphically, where the model’s operating regime lay. Visual clarity here will greatly benefit the reader.

      Thank you for pointing out this ambiguity, we have marked the working point explicitly in the Figure 3.

      Recommendations For The Authors

      Reviewer 1 (Recommendations for the authors):

      (1) In the method section, the technical details of the other four indexes should be elaborated.

      Thank you for your recommendation, we agree that the description in the submitted manuscript was too brief. We expanded the method section about the functional repertoire and the bursting potential.

      Reviewer 2 (Recommendations for the authors):

      (1) The authors could more clearly label the ”working point” of their parameter space. Perhaps a label/arrow on Figure 2c that directs the readers’ eyes towards the location in state-space that you define as the working point?

      Thank you for pointing out this ambiguity, we updated the figure 3 to mark the working point precisely.

      (2) While ’fluidity’ is quite an evocative term and does a great job of suggesting to the uninitiated reader the character of the time series in question, I wonder whether a more descriptive term might be better suited for this variable, even if as an adjunct to the term, fluidity. In the past, we (and others) have used the term dynamic functional connectivity variability (Mu¨ller et al., 2020 NeuroImage) to refer to this feature, as it links the measure directly to the technique from which it was estimated.

      Thank you for your feedback. You are correct, dynamic functional connectivity variability could have been a wording of choice for some of our results. However the term “fluidity” was chosen to convey a broader theoretical concept linked to dynamical systems but not exclusive to the brain. Here, dynamic functional connectivity variability is merely a measure of the fluidity of the system. We added the following in the method section describing the metrics:

      “[...] Fluidity is related to previously defined metrics such as functional connectivity variability [10] that relied on a non-overlapping windowing procedure. We chose the term fluidity to convey a cocept linked to dynamical systems in general and states exploration. [...]”

      (3) The term ”bursting potential” is also potentially problematic, as ”bursting” refers to a different concept at the cellular level (i.e., multiple action potentials in a short window of time) than it does in the context that the authors are presumably using it here (i.e., the capacity for the dynamics of the population to ”burst” into the fat-tail of their activity distribution). To avoid ambiguity, it could be worth considering altering this terminology, perhaps again by using a term that is descriptive of the technique used to estimate it, rather than the concept that it evokes.

      Thank you for pointing out this ambiguity in the naming of the bursting potential. We have renamed it to “Global Activation Potential (GAP)” as we believe this term is a better description of the metric. We have switched to this term across the whole manuscript.

      (4) There is a range of other modelling studies that have compared brain dynamics in the awake vs. anaesthetised patient. In my opinion, the reader would benefit from the ability to place this work into the broader context created by the literature, particularly as there are subtle (yet potentially important) differences in the models used in each case. Note - as this is a subjective opinion, I don’t view this as a crucial addition to the paper’s potential strength of evidence, though I do believe that it would have a positive effect on its potential impact.

      We thank you for the suggestion. We have modified the before-to-last paragraph of the discussion to bring more context from the literature models of anethesia and wakefulness:

      “Several studies have employed computational modeling approaches to investigate the differences in brain dynamics across states of consciousness. These studies present varying degrees of physiological detail and focus on complementary aspects of unconsciousness. They start from simple abstract models (Ising model) addressing for example the increased correlation between stuctural and functional connectivity in aneshesia [15], or oscillator-based models (Hopf model) capturing a brain state dependent response to simulated perturbation [4]. More neurobiologically realistic models (Dynamic Mean Field) have also been used to combine multimodal imaging data together with receptor density maps to address the macroscopic effects of general aneshesia and their relationship to spatially heterogeneous properties of the neuronal populations [8]. Similarly, using anatomically constrained parameters for brain regions has already been shown to increase the predictive value of brain network models [6, 18]. Furthermore, employing biophysically grounded mean-field and spiking neuron models (AdEx) allows addressing phenomena propagating in effect across multiple scales of description such as the molecular effects of anesthetics targeting specific receptor types [12]. Related work has shown that adaptation successfully reproduces dynamical regimes coherent with NREM and wakefulness [3] with corresponding realistic PCI values Goldman2021comprehensive. Here, we don’t address these biological questions but rather give a proof of concept that large-scale brain models can help understand the dynamics related to brain function. We used a model derived from QIF neurons Montbrio2015Macroscopic that lacks biological parameters such as ion concentration or synaptic adaptation. Nevertheless, we demonstrate that even the symmetry breaking caused by the connectome is sufficient for setting the global working point of the brain, which then links the brain’s capacity for generating complex behavior in the different paradigms, that is, rest and stimulation.”

      (5) I saw the label ”digital brain twin” in the abstract but then did not find a location in the main text/methods wherein this aspect of the modelling was explained.

      Thank you for pointing out this discrepancy, we have removed the term “digital brain twin” and replaced it by “whole-brain model” everywhere.

      References

      John M. Beggs and Dietmar Plenz. Neuronal Avalanches in Neocortical Circuits. The Journal of Neuroscience, 23(35):11167–11177, dec 3 2003.

      A. G. Casali, O. Gosseries, M. Rosanova, M. Boly, S. Sarasso, K. R. Casali, S. Casarotto, M.-A. Bruno, S. Laureys, G. Tononi, and M. Massimini. A Theoretically Based Index of Consciousness Independent of Sensory Processing and Behavior. Science Translational Medicine, 5(198):198ra105–198ra105, aug 14 2013.

      Anna Cattani, Andrea Galluzzi, Matteo Fecchio, Andrea Pigorini, Maurizio Mattia, and Marcello Massimini. Adaptation shapes local cortical reactivity: From bifurcation diagram and simulations to human physiological and pathological responses. eneuro, 10(7):ENEURO.0435– 22.2023, July 2023.

      Gustavo Deco, Joana Cabral, Victor M. Saenger, Melanie Boly, Enzo Tagliazucchi, Helmut Laufs, Eus Van Someren, Beatrice Jobst, Angus Stevner, and Morten L. Kringelbach. Perturbation of whole-brain dynamics in silico reveals mechanistic differences between brain states. NeuroImage, 169:46–56, April 2018.

      Rahul S. Desikan, Florent S´egonne, Bruce Fischl, Brian T. Quinn, Bradford C. Dickerson, Deborah Blacker, Randy L. Buckner, Anders M. Dale, R. Paul Maguire, Bradley T. Hyman, Marilyn S. Albert, and Ronald J. Killiany. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage, 31(3):968–980, 7 2006.

      Xiaolu Kong, Ru Kong, Csaba Orban, Peng Wang, Shaoshi Zhang, Kevin Anderson, Avram Holmes, John D. Murray, Gustavo Deco, Martijn van den Heuvel, and B. T. Thomas Yeo. Sensory-motor cortices shape functional connectivity dynamics in the human brain. Nature Communications, 12(1), November 2021.

      A. Lempel and J. Ziv. On the Complexity of Finite Sequences. IEEE Transactions on Information Theory, 22(1):75–81, 1 1976. event-title: IEEE Transactions on Information Theory.

      Andrea I. Luppi, Pedro A. M. Mediano, Fernando E. Rosas, Judith Allanson, John D. Pickard, Guy B. Williams, Michael M. Craig, Paola Finoia, Alexander R. D. Peattie, Peter Coppola, Adrian M. Owen, Lorina Naci, David K. Menon, Daniel Bor, and Emmanuel A. Stamatakis. Whole-brain modelling identifies distinct but convergent paths to unconsciousness in anaesthesia and disorders of consciousness. Communications Biology, 5(1), April 2022.

      Ernest Montbri´o, Diego Paz´o, and Alex Roxin. Macroscopic Description for Networks of Spiking Neurons. Physical Review X, 5(2):021028, jun 19 2015.

      Eli J. Mu¨ller, Brandon Munn, Luke J. Hearne, Jared B. Smith, Ben Fulcher, Aurina Arnatkeviˇciu¯te˙, Daniel J. Lurie, Luca Cocchi, and James M. Shine. Core and matrix thalamic sub-populations relate to spatio-temporal cortical connectivity gradients. NeuroImage, 222:117224, November 2020.

      J. Matias Palva, Alexander Zhigalov, Jonni Hirvonen, Onerva Korhonen, Klaus LinkenkaerHansen, and Satu Palva. Neuronal long-range temporal correlations and avalanche dynamics are correlated with behavioral scaling laws. Proceedings of the National Academy of Sciences, 110(9):3585–3590, feb 26 2013. publisher: Proceedings of the National Academy of Sciences.

      Maria Sacha, Federico Tesler, Rodrigo Cofre, and Alain Destexhe. A computational approach to evaluate how molecular mechanisms impact large-scale brain activity. Nature Computational Science, 5(5):405–417, May 2025.

      Simone Sarasso, Melanie Boly, Martino Napolitani, Olivia Gosseries, Vanessa Charland-Verville, Silvia Casarotto, Mario Rosanova, Adenauer Girardi Casali, Jean-Francois Brichant, Pierre Boveroux, Steffen Rex, Giulio Tononi, Steven Laureys, and Marcello Massimini. Consciousness and Complexity during Unresponsiveness Induced by Propofol, Xenon, and Ketamine. Current Biology, 25(23):3099–3105, 12 2015.

      Pierpaolo Sorrentino, Rosaria Rucco, Fabio Baselice, Rosa De Micco, Alessandro Tessitore, Arjan Hillebrand, Laura Mandolesi, Michael Breakspear, Leonardo L. Gollo, and Giuseppe Sorrentino. Flexible brain dynamics underpins complex behaviours as observed in Parkinson’s disease. Scientific Reports, 11(1):4051, feb 18 2021. number: 1 publisher: Nature Publishing Group.

      S. Stramaglia, M. Pellicoro, L. Angelini, E. Amico, H. Aerts, J. M. Cort´es, S. Laureys, and D. Marinazzo. Ising model with conserved magnetization on the human connectome: Implications on the relation structure-function in wakefulness and anesthesia. Chaos: An Interdisciplinary Journal of Nonlinear Science, 27(4), April 2017.

      Enzo Tagliazucchi, Pablo Balenzuela, Daniel Fraiman, and Dante Chialvo. Criticality in LargeScale Brain fMRI Dynamics Unveiled by a Novel Point Process Analysis. Frontiers in Physiology, 3, 2012. [Online; accessed 2022-12-23].

      David C. Van Essen, Stephen M. Smith, Deanna M. Barch, Timothy E.J. Behrens, Essa Yacoub, and Kamil Ugurbil. The WU-Minn Human Connectome Project: An Overview. NeuroImage, 80:62–79, oct 15 2013. PMID: 23684880 PMCID: PMC3724347.

      Peng Wang, Ru Kong, Xiaolu Kong, Rapha¨el Li´egeois, Csaba Orban, Gustavo Deco, Martijn P. van den Heuvel, and B.T. Thomas Yeo. Inversion of a large-scale circuit model reveals a cortical hierarchy in the dynamic resting human brain. Science Advances, 5(1), January 2019.

      Farnaz Zamani Esfahlani, Youngheun Jo, Joshua Faskowitz, Lisa Byrge, Daniel P. Kennedy, Olaf Sporns, and Richard F. Betzel. High-amplitude cofluctuations in cortical activity drive functional connectivity. Proceedings of the National Academy of Sciences of the United States of America, 117(45):28393–28401, November 2020.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the role of HSPA2 during mouse preimplantation development. Knocking down HSPA2 in zygotes, the authors describe lower chances of developing into blastocysts, which show a reduced number of inner cell mass cells. They find that HSPA2 mRNA and protein levels show some heterogeneity among blastomeres at the 4-cell stage and propose that HSPA2 could contribute to skewing their relative contribution to embryonic lineages. To test this, the authors try to reduce HSPA2 expression in one of the 2-cell stage blastomere and propose that it biases their contribution to towards extra-embryonic lineages. To explain this, the authors propose that HSPA2 would interact with CARM1, which controls chromatin accessibility around genes regulating differentiation into embryonic lineage.

      Strengths:

      (1) The study offers simple and straightforward experiments with large sample sizes.

      Thanks for your kind recognition.

      (2) Unlike most studies in the field, this research often relies on both mRNA and protein levels to analyses gene expression and differentiation.

      Thanks for your kind recognition.

      Weaknesses:

      (1) Image and statistical analyses are not well described.

      Thanks for your advisable comment. We redescribe the image and statistical analyses in our revised version (line 255-257).

      (2) The functionality of the overexpression construct is not validated.

      Thanks for your kind suggestion. We validate the functionality of the overexpression construct in our revised version (Figure S3).

      (3) Tracking of KD cells in embryos injected at the 2-cell stage with GFP is unclear.

      Thanks for your kind suggestion. We randomly co-injected green fluorescent protein (Gfp) mRNA as a linage tracer with either Hspa2-siRNA or NC-FAM into one of the 2 -cell, and then monitored embryo development to the blastocyst stage (line 342-344).

      (4) A key rationale of the study relies on measuring small differences in the levels of mRNA and proteins using semi-quantitative methods to compare blastomeres. As such, it is not possible to know whether those subtle differences are biologically meaningful. For example, the lowest HSPA2 level of the embryo with the highest level is much higher than the top cell from the embryo with the lowest level. What does this level mean then? Does this mean that some blastomeres grafted from strong embryos would systematically outcompete all other blastomeres from weaker embryos? That would be very surprising. I think the authors should be more careful and consider the lack of quantitative power of their approach before reaching firm conclusions. Although to be fair, the authors only follow a long trend of studies with the same intrinsic flaw of this approach.

      Thanks for your advisable comment. Indeed, despite the approach drew on previous research (Zhou Cell 2018), we were clearly aware that this approach can only reflect relative comparisons. This means that the relative difference among the blastomeres from the same embryo were detected and compared. We did not compare the absolute levels of mRNA between different embryos. We also offered simple and straightforward experiments with large sample sizes to confirm this conclusion.

      (5) Some of the analyses on immunostaining do not take into account that this technique only allows for semi-quantitative measurements and comparisons.

      a) Some of the microscopy images are shown with an incorrect look-up table.

      b) Some of the schematics are incorrect and misleading.

      Thanks for your advisable comment. We revised microscopy images and schematics in our revised version.

      Reviewer #2 (Public review):

      Summary:

      In this study, Gao et al. use RNA-seq to identify Hspa2 as one of the earliest transcripts heterogeneously distributed between blastomeres. Functional studies are performed using siRNA knockdown showing Hspa2 may bias cells toward the ICM lineage via interaction with the known methyltransferase CARM1.

      Strengths:

      This study tackles an important question regarding the origins of the first cell fate decision in the preimplantation embryo. It provides novelty in its identification of Hspa2 as a heterogeneous transcript in the early embryo and proposes a plausible mechanism showing interactions with Carm1. Multiple approaches are used to validate their functional studies (FISH, WB, development rates, proteomics). Given only 4 other transcripts/RNA have been identified at or before the 4-cell stage (LincGET, CARM1, PRDM14, HMGA1), this would be an important addition to our understanding of how TE vs ICM fate is established.

      Thanks for your kind recognition.

      The RNA-seq results leading the authors to focus on Hspa2 are not included in the manuscript. This dataset would serve as an important resource but is neither included nor discussed. Nor is it mentioned whether Hspa2 was identified in prior RNA-seq embryos studies (for example Deng Science 2014).

      Thanks for your advisable comment. To identify genes that show a significantly high variability across blastomeres in the same embryo, we regressed out the embryo effect by established a new method, which will be published and uploaded to the database in the future. Thus, the RNA-seq results leading the we focus on Hspa2 are not included in the manuscript.   

      In addition, the functional studies are centered on Hspa2 knockdown at the zygote (1-cell) stage, which would largely target maternal transcript. Given the proposed mechanism relies on Hspa2 heterogeneity post-ZGA (late 2-cell stage), the knockdown studies don't necessarily test this and thus don't provide direct support to the authors' conclusions. The relevance of the study would be improved if the authors could show that zygotic knockdown leads to symmetric Hspa2 levels at the late 2-cell and/or 4-cell stage. It may be possible that zygotic knockdown leads to lower global Hspa2 levels, but that asymmetry is still generated at the 4-cell stage.

      Thanks for your advisable comment. We showed that the Hspa2 levels at the late 2-cell and 4cell stage after zygotic knockdown in our revised version (Figure S1 G-H, line 450-452).

      Furthermore, the authors show that Hspa2 knockdown at the 1-cell stage lowers total Carm1 levels at the 4-cell stage. However, it is unclear how total abundance within the embryo alters lineage specification within blastomeres. The authors go on to propose a plausible mechanism involving Hspa2 and Carm1 interaction, but do not discuss how expression levels may be involved.

      Thanks for your advisable comment. Previous research suggests that heterogeneous activity of the methyltransferase CARM1 results in differential methylation of histone H3R26 to modulate establishment of lineage specification (Zernicka-Goetz Cell 2018). Thus, we didn't discuss the total abundance within the embryo alters lineage specification.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) Major issue with analyses:

      Image analysis needs to be much better explained than simply saying that ImageJ was used. Where are cells measured (at their equatorial plane? What is the size of the ROI?)? Ideally, the ROI and/or raw measurements should be provided.

      Thanks for your advisable comment. We redescribe the Image analysis in our revised version (line 187-194). 

      What are the objective criteria determining whether a cell is counted as GFP positive, CDX2 positive, or OCT4 positive? This is very unclear and key to the interpretation of many experiments.

      Thanks for your advisable comment. We think that the cell containing fluorescence signals above background noise were counted positive.

      Statistical analyses mention ANOVA in the methods but the student's t-test in the figure legend. Which is which? Most data are heavily normalized, which would unlikely fit the description for Student's t-test analyses.

      Thanks for your advisable comment. We redescribe the statistical analyses in our materials and methods (line 253-260).

      Figure 5H describes a relative fluorescence intensity with control at 1. The legend describes a normalization to "DNA" (I guess the authors meant DAPI), which is unlikely to give 1. This suggests that additional normalization was done and is not described. Is that the case? Also, since the authors propose that HSPA2 would control Histone modification and chromatin packing, I do not think that using DAPI is an appropriate way of normalizing the fluorescence signal.

      Thanks for your advisable comment. We replaced DNA with DAPI in our revised version. Based on previous studies, we adopted DAPI as a normalized fluorescence signal (Zhou Cell 2018, Zernicka-Goetz Cell 2018).

      Figure 1E shows data normalized to the lowest level while Figure 1H is normalized to the highest level. A consistent representation would be welcome.

      Thanks for your advisable comment. We revised the Figure 1H in our revised version.

      Is Figure 1C showing a t-test between correlations?

      Yes, Figure 1C shows the t-test between correlation.

      (2) Major issue with the interpretation of semi-quantitative methods and measurements:

      qPCR, WB, immunostaining are all semi-quantitative methods that require some kind of normalization due to non-linear bias in the way the molecules are picked up. Such normalization makes it difficult to know whether a detectable difference is meaningful biologically speaking i.e. if a difference of 1 CT between blastomeres can be detected after qPCR, is it meaningful? If that were the case, then embryos with lower CT than others (Figure 1D) would not be able to develop into blastocyst, like siRNA injected embryos, or grafting a blastomere with a high CT onto an embryo with low CT would lead to the systematic differentiation of these strong blastomeres into ICM.

      Thanks for your advisable comment. The CT values represent the relative mRNA levels of Hspa2 between blastomeres, and the higher CT value represents the lower expression of Hspa2 at mRNA level. Figure 1D shows the Hspa2 mRNA levels between blastomeres. The blastomere with lowlevel expression of the Hspa2 mRNA is not bias an ICM fates.  

      The same goes for fluorescence analyses (Figure 1F). Can the authors also provide the measurements for DAPI as they did for HSPA2? I am sure that with enough measurements, DAPI is variable enough to give a statistical difference among blastomeres with questionable biological meaning.

      I think the reasoning used here (unfortunately following the reasoning that has been used in a series of studies by other groups) of ranking blastomeres after semi-quantitative measurement is fundamentally flawed.

      Thanks for your advisable comment. The DAPI was determined by the maximal area using a custom Python script. Based on previous studies, we adopted DAPI as a normalized fluorescence signal (Zhou Cell 2018). This approach is to normalize embryo-to-embryo variance from the technical reason.

      (3) Major issue with overexpression experiment:

      While the siRNA experiment is partially validated by qPCR and WB measurements of HSPA2 after KD, the overexpression experiment is not. Do the authors have any evidence that the construct they use is produced into protein and functional? Can the authors check by WB? Can the authors rescue the siRNA with their overexpression?

      Thanks for your advisable comment. We verified the overexpression experiment by WB in in our revised version (Figure S3, line 360-361). Considering that siRNA degrades mRNA and prevents the mRNA translation process, we did not co-inject the siRNA with their overexpression.

      The lack of effect of HSPA2 overexpression on blastocyst formation is difficult to reconcile with the interpretation from the authors that levels of HSPA2 bias lineages.

      Have the authors tried lower concentrations? Have the authors tried FISH on their half-injected 2cell embryos? Of course, if the antibody against HSPA2 would work with immunostaining, that would be ideal.

      Thanks for your advisable comment. We chose the concentrations for our study based on previous research (Zernicka-Goetz Cell 2016). To verified Hspa2 was successfully inject into one blastomere at the 2-cell stage, we observed green fluorescence after co-injected GFP mRNA with either siRNA or NC-FAM into one blastomere of the two-cell embryos. Thus, we didn't try FISH on half-injected 2-cell embryos. We tried to perform immunostaining experiments with various HSPA2 antibodies (Proteintech: 12797-1-AP, Abcam: ab108416) and no good results were achieved.

      Author response image 1.

      (4) Major issue with tracking of injected cells:

      It is unclear what counts as a GFP-positive cell. In Figure 3D, most cells appear to have the same level of GFP.

      Thanks for your advisable comment. The cell containing green fluorescence signals above background noise were counted GFP-positive in Figure 3D. Most cells seem to have the same level of GFP because they are daughter cells of the blastomeres injected with GFP.

      In the images of GFP-expressing cells used to track the control of KD cells shown in Figure 3A, it seems that the control embryos have mostly GFP cells in the ICM. Is that the case, or just a bad example?

      Thanks for your advisable comment. The green fluorescent signals in Figure 3A represented OCT4 protein, an ICM marker.

      Can the authors do FISH against HSPA2 and visualize their GFP cells to validate the heterogeneous expression in situ?

      Thanks for your advisable comment. We have verified the heterogeneous expression of HSPA2 in Figure1.

      (5) Issue with fluorescent images:

      Many images are shown with inappropriate look-up tables with saturated DAPI, OCT4, CDX2, and FISH. This raises the doubt that analyses were made on saturated images, which would be incorrect.

      The LUT of Figure 5H should be adjusted similarly between the control and siRNA.

      Thanks for your advisable comment. We revised some images which showed inappropriate lookup tables in our revised version. The LUT of Figure 5H had been adjusted between the control and siRNA. 

      (6) Issue with schematics:

      Schematics of blastomere isolation grown into blastocyst-like structures are misleading since the final blastocyst-like structure should not have a zona pellucida and should have fewer cells than regular blastocysts.

      Thanks for your advisable comment. We revised schematics of blastomere grown into morula in our revised version (Figure 1A and Figure S1A).

      The summary schematics in the final figure should not state HSPA2 -/- since experiments in the study did not use KO but KD.

      Thanks for your advisable comment. We revised the summary schematics in our revised version.

      The blastocysts are the same sizes as the cleavage stage or morula embryos which implies that cells lose volume to the lumen, which is not the case.

      Thanks for your advisable comment. We revised the schematics in our revised version.

      (7) Issue with data presentation:

      In the tables within the figures, the number of decimals given should be the same for the mean and SE (one decimal should be more than enough).

      Thanks for your advisable comment. We revised the figure 2H in our revised version.

      The comparison of cell number and distribution within embryos (e.g. Figure 2B) would be best represented by a correlation analysis of TE vs ICM cells.

      Thanks for your advisable comment. We add the figure of a correlation analysis of TE vs ICM cells in our revised version (Figure 3B).

      The docking simulations are described in the main text as "experiments".

      Thanks for your advisable comment. We redescribed the docking simulations in our revised version.

      (8) Issue with data interpretation:

      The reduced number of ICM cells is interpreted as a slowed-down cell cycle. This could also be explained by failed cytokinesis and the generation of binucleated or polyploid cells. Have the authors checked for that? For example, by looking at their DAPI staining. 

      Thanks for your advisable comment. Our RNA-seq results revealed that the differentially expressed genes (DEGs) at blastocyst stage with HSPA2 knocking down are closely related to negative regulation of cell cycle, G1/S transition of mitotic cell cycle, mitotic cell cycle phase transition and regulation of mitotic cell cycle phase transition. Additionally, the previous study demonstrated that knockdown of HSPA2 reduced cell proliferation and led to G1/S phase cell cycle arrest (Hu Ann Transl Med 2019). Additionally, the lower cell number in ICM may also associated with failed cytokinesis and the generation of binucleated or polyploid cells. Thus, we guessed that HSPA2 has a role in ICM lineage establishment, although half of the ICM cells were able to survive with HSPA2 deficiency (line 463-472).

      It is unclear to me why reduced ICM should lead to fewer blastocysts. Blastocysts should be able to form as long as their TE is fine. In Figure 2G, embryos seem to be cultured in close proximity, which is fine if they are healthy but not if some of the embryos start dying and releasing toxic compounds (e.g. ROS). Have the authors tried removing the dying KD embryos to see if the development of the remaining embryos would improve?

      Thanks for your advisable comment. We think HSPA2 may affect blastocyst development by affecting other signaling pathways. And, the GO enriched terms was closely related to blastocyst development (Figure 2E). There was no significant difference in morula formation rate between Hspa2-KD group and NC group, thus the assumption that the toxic compounds released by some of the embryos that lead to downregulation of blastocyst rate may not be correct. Indeed, the rate of blastocyst formation in Hspa2-KD embryos was reduced significantly lower when few embryos was cultured separately. In addition, we discussed the possibility that the lower cell number in ICM may also associated with failed cytokinesis and the generation of binucleated or polyploid cells.

      Author response image 2.

      Reviewer #2 (Recommendations for the authors):

      One of the significant findings in the paper is the discovery portion where Hspa2 is identified as a heterogeneous transcript. To improve the logic and impact of the manuscript, it may benefit from reorganizing some of the figures and text. For example:

      (1) The paragraph in the introduction (Lines 56-68) should be moved to the discussion as the Hspa2 reveal should be in section 3.1, not prior to the RNA-seq results presented in Figure 1.

      Thanks for your advisable comment. We think it is more logical that HSPA2 needs to be introduced in the introduction.

      (2) Add text at the beginning of Section 3.1 to describe the rationale and results for the RNAseq. It would help the readers if the authors clearly stated why they chose the 4-cell stage.

      Thanks for your advisable comment. We explain why we chose the 4-cell stage in our revised version (line 272-273).

      (3) As this is the first time Hspa2 is identified, consider moving Figure S1C to the main figure to show expression throughout development.

      Thanks for your advisable comment. We moved Figure S1C to the main figure in our revised version (line 286-291).

      (4) Figure 1C: the correlation between Hspa2 and ICM markers would be strengthened if additional transcripts were used (Oct4, Sox2, Sox21). The graph in 1C would also be more informative if represented as a scatter plot with correlation coefficients (Nanog log2TPM vs Hspa2 log2TPM), rather than bar graphs.

      Thanks for your advisable comment. We chose Nanog as the correlation between Hspa2 and Nanog, a ICM markers, was showing the strongest correlation in result. And, the figure 1C shows the stronger positive correlation between Nanog and Hspa2 in gene expression than random gene pairs (n=100, n means the number of random gene pairs). Thus, the figure 1C with bar graphs is easier to understand.

      (5) Figure 1D: how were individual blastomeres grouped into B1-4? Individually run and then pooled based on relative expression?

      Thanks for your advisable comment. Blastomeres are named B1 to B4 according to increasing Hspa2 concentration in figure 1E.

      (6) Figures 1F, 1I, 5H: the DAPI channel appears to be saturated, but is used to normalize fluorescence intensity and may incorrectly account for light scattering within the embryo. Please clarify by adding more details regarding image analysis. Were partial stacks through the nucleus used for analysis, or max projections? Graph axes should be "relative fluorescence intensity."

      Thanks for your advisable comment. We added the details of fluorescence images analysis. The graph axes had revised in our revised version.

      (7) Line 278: the results in Figure S1C would benefit from more text regarding expression patterns throughout development. The maternal transcript appears to have a sharp downregulation by the early 2-cell stage, and is then upregulated coinciding with ZGA.

      Thanks for your advisable comment. We added more describe of the Figure in main text (LINE 285-290).

      (8) For the analyses in Figure 2 I-J and 2K-L, were arrested embryos excluded from analysis? This is an important detail as including arrested embryos would significantly bias the RNA-seq results. 

      Thanks for your advisable comment. The arrested embryos were excluded in Figure 2 I-J and 2K-L.

      (9) Figures 2G-H would be aided by converting the table in 2H to a bar graph and adding development rates for all stages (2-, 4-, 8-, morula, and blast). This would also show when an arrest occurs.

      Thanks for your advisable comment. We converted the table in 2H to a bar graph.

      (10) Blast rates are represented with too many significant digits (Figures 2H, 4B). They should only be reported to the closest ones given the unit of measure (number of blasts divided by number of zygotes). For instance, a blast rate of 81.63 {plus minus} 2.000 reflects excessive precision that is not measured in the data, it should rather read 82 {plus minus} 2%. This is also true for % cells (Figures 3E, 4H).

      Thanks for your advisable comment. Values were rounded down to the one decimal place (rounded down).

      (11) The clarity and impact of Figure 3A and 3D would benefit from 2D slices through the ICM. 

      Thanks for your advisable comment. In order to get more comprehensive understanding of the 3D structure of blastocyst of Figure 3A and 3D, we did not choose 2D slices.

      (12) To improve clarity and logic, separate the 1-cell and 2-cell knockdown experiments in the text and figures:

      a) 1-cell knockdown with RNA-seq results (Fig 2A-F).

      b) 1-cell knockdown showing less ICM/pluripotency markers in (combine Figures 2G-M and Figures 3A-B; "new Fig 3").

      c) 2-cell knockdown tracing lineage (Figures 2D-E; "new Fig 4").

      The new Figures 3 and 4 should mirror one another (i.e. for each knockdown experiment, development rates and cell counts should be included). For the 2-cell knockdown (Figures 2 D-E), what were the developmental rates (8-cell, morula, blast)?

      Thanks for your advisable comment. However, in order to the overall logical of the article, we do not separate the 1-cell and 2-cell knockdown experiments in the text and figures. And, we added the developmental rates (8-cell, morula, blast) of 2-cell knockdown group in our revised version (Figure S2).

      For the overexpression experiment (Figure 4), why were injections performed at the zygote stage versus the 2-cell stage? Given the significant downregulation of maternal transcript demonstrated in Figure S1C, it seems plausible that the injected RNA was also downregulated.

      Thanks for your advisable comment. For the overexpression experiment, we first chose to inject Hspa2 mRNA at the zygote stage and found that the overexpression of Hspa2 does not induce blastomere cells to bias an ICM fate. The qRT-PCR results indicated that the expression level of Hspa2 in overexpression group was significantly increased compared with normal group at 4cell and blastocyst stage (Figure 4C, 4D).  In addition, there is no guarantee that an equal amount of Hspa2 mRNA be injected into each blastomere in 2-cell stage. Thus, we did not microinject Hspa2 mRNA into the 2-cell stage.

      The 3.5 subheading overstates the results as the Hspa2-Carm1 interaction is not linked to lineage segregation. For example, a more specific subtitle might be, "Hspa2 interacts with Carm1 and alters H3R26me2 levels."

      Thanks for your advisable comment. We revised the subtitle in our revised version (line 376).

      Figures 5B-C and 5D-E. The qRT-PCR and WB analysis of knockdown blasts shows a correlation between Hspa2 downregulation and Carm1 downregulation. However, if the proposed mechanism is Hspa2 binding to Carm1 to mediate downstream methylation, why would it be expected to alter transcript levels at the 4-cell or blast stage? Please add further details and discussion in the results and discussion sections.

      Thanks for your advisable comment. The reason we chose to work at the 4-cell stage is because previous studies on CARM1 have focused on the 4-cell stage (Zernicka-Goetz Cell 2018,2016). 

      In the discussion, the statement in Lines 430-431 is an overinterpretation: "the heterogeneity of HSPA2... acts as an upstream factor to drive [the] first cell-fate decision." The knockdown experiments don't alter heterogeneity per se, but total abundance. Furthermore, the results do not show that heterogeneity drives heterogeneity in H3R26me2 patterns, for example.

      Thanks for your advisable comment. We redescribe the relevant statement in the discussion.

      More needs to be said regarding the ICM cells that persisted in the 1-cell KD experiment (Fig 3B). Lines 449-450 point out this result, but do not propose any plausible explanations. For instance, ICM cells may still form due to the incomplete knockdown achieved or the possibility that redundant pathways exist.

      Thanks for your advisable comment. We redescribe the relevant statement in our revised version (line 468-473).

      The 5th paragraph of the discussion seems incomplete. The authors point out a possible link between Hspa2 and Hippo and Wnt signaling pathways, but need to expand their discussion on how this may act as an additional mechanism incorporating Hspa2 with lineage segregation.

      Thanks for your advisable comment. We redescribe the 5th paragraph of the discussion (line 483-494).

      Statistics: all comparisons with greater than 2 groups should be performed with a one-way ANOVA and multiple comparisons, rather than Student's t-test (Figures 1B, 1D, 1E, 1F).

      All figure legends lack statistical test details.

      Thanks for your advisable comment. All figure legends added statistical test details in statistical analysis.

      Minor comments:

      In all graphs, individual blastomere expression levels should be represented as boxwhisker/bar/scatter/violin plots since the comparison is groups rather than time points (i.e. symbols should not be connected with a line in Figures 1B, 1D, 1F-G, 1I, S1D, S1F).

      Thanks for your advisable comment. Each colored line represents a single cell, and the dots of the same color represent the blastomere of the same cell. Thus, we use a line representation individual blastomere.

      For all fluorescent images, having two representative images may be confusing for the reader. Figures may be improved by just including one representative image for each stage/treatment (Figures 1F, 1I, S1F, 3A, 3D, 4E, 4G).

      Thanks for your advisable comment. The figures just including one representative image for each stage in our revised version. In addition, two representative images from each group were shown for each treatment (Figures 3A, 3D, 4E, 4G).

      The manuscript would be improved with thorough grammar and typo editing.

      For example:

      (1) Lines 18, 73, the wording is confusing, consider: "knockdown of Hspa2 in one of the two-cell blastomeres biased its progeny towards the trophectoderm lineage.".

      (2) Line 23, overstatement. Consider: "we demonstrated that HSPA2 levels correlate with ICMassociated genes and that it interacts with the CARM1.".

      (3) Line 25 confusing wording, "via the execution of commitment and differentiation phases.".

      (4) Line 37, replace "that" with "of;" replace "cell-fate decisions" with "cell-fate decision".

      (5) Line 40: needs space before (CARM1).

      (6) Line 43: the wording is confusing, consider "can result in higher expression levels of".

      (7) Line 45: wording, consider "Recent [studies have] further suggested".

      (8) Line 70: plurality, consider "analyzed gene expression pattern".

      (9) Line 73 typo: "prevents its".

      (10) Line 76-77 wording, consider "Hspa2 expression patterns can bias cell fate in the mouse embryo".

      (11) Line 276: remove "in whole embryos," since MII eggs are not embryos.

      (12) Line 617 "There" should be "Three".

      (13) Axis label in Fig 3b "Totle" should be "Total".

      (14) Lines 417, 419 missing spaces.

      (15) Line 448 missing word, "interfering [with] the cell cycle".

      (16) Line 462 incorrect word, "[a]polar cells being specified as ICM".

      (17) Line 469 incorrect plural, "cell differentiation".

      Thanks for your advisable comment. We revised the whole manuscript carefully according to the reviewers' suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The manuscript by Zhang et al describes the use of a protein language model (pLM) to analyse disordered regions in proteins, with a focus on those that may be important in biological phase separation. While the paper is relatively easy to read overall, my main comment is that the authors could perhaps make it clearer which observations are new, and which support previous work using related approaches. Further, while the link to phase separation is interesting, it is not completely clear which data supports the statements made, and this could also be made clearer.

      We thank the reviewer for their thoughtful evaluation of our manuscript and for the supportive comments. As outlined in the responses below, we have made substantial revisions to clarify the novel observations presented in our study and to strengthen the connection between sequence conservation and phase separation.

      Comment 1: With respect to putting the work in a better context of what has previously been done before, this is not to say that there is not new information in it, but what the authors do is somewhat closely related to work by others. I think it would be useful to make those links more directly.

      We have addressed the specific comments as outlined below.

      Comment 1a: Alderson et al (reference 71) analysed in detail the conservation of IDRs (via pLDDT, which is itself related to conservation) to show, for example, that conserved residues fold upon binding. This analysis is very similar to the analysis used in the current study (using ESM2 as a different measure of conservation). Thus, the result that "Given that low ESM2 scores generally reflect mutational constraint in folded proteins, the presence of region a among disordered residues suggests that certain disordered amino acids are evolutionarily conserved and likely functionally significant" is in some ways very similar to the results of that (Alderson et al) paper .

      We thank the reviewer for the comment. However, we would like to clarify that our findings show subtle but important differences from those reported by Alderson et al. Specifically, Alderson et al. used AlphaFold2 predictions to identify IDRs that undergo disorder-to-order transitions, which the authors termed as conditionally folded IDRs. These regions could potentially be functionally important, assuming that function of IDRs necessitate folding.

      We argue, however, that, the validity of this structure-function relationship for IDRs remains to be tested. In our opinion, The most direct way to evaluate the functional significance is via evaluating the evolutionary conservation.

      As shown in Author response image 1, the correlation between pLDDT scores and the conservation score, while noticable, is significantly weaker than that between the ESM2 score and the conservation score.

      Author response image 1.

      Comparison of the correlation between AlphaFold2 pLDDT scores and conservation scores with the correlation between ESM2 scores and conservation scores. Calculations were performed using proteins in the MLO-hProt dataset. (A) Correlation between the mean AlphaFold2 pLDDT scores and conservation scores for various amino acids. Pearson correlation coefficients (r) are indicated in the figure legends. The four panels on the right present analogous correlation plots for amino acids grouped by structural order, as defined by their pLDDT scores. (B) Similar as in part A but for ESM2 scores.

      Therefore, we believe that ESM2 score is a better indicator than AlphaFold2 pLDDT score for functional relevance.

      Furthermore, for the human IDRs, we explicitly selected amino acids with pLDDT scores ≤ 70.

      These would be classified as structureless, disordered amino acids, according to the study by Alderson et al. Nevertheless, as shown in Figures 2 and 3 of the main text, our analyses still identifies conserved regions. Therefore, these regions may function via distinct mechanisms than the disorder to order transition.

      We now discuss the novelty of our work in the context of existing studies in the newly added Conclusions and Discussion: Related Work, as quoted below.

      “Numerous studies have sought to identify functionally relevant amino acid groups within IDRs [cite]. For instance, using multiple sequence alignment, several groups have identified evolutionarily conserved residues that contribute to phase separation [cite]. Alderson et al. employed AlphaFold2 to detect disordered regions with a propensity to adopt structured conformations, suggesting potential functional relevance [alderson et al].

      In contrast, our approach based on ESM2 is more direct: it identifies conserved residues without relying on alignment or presupposing that functional significance requires folding into stable 3D structures. Notably, many of the conserved residues identified in our analysis exhibit low pLDDT scores (Figure 2), implying potential functional roles independent of stable conformations.”

      Comment 1b: Dasmeh et al, Lu et al and Ho & Huang analysed conservation in IDRs, including aromatic residues and their role in phase separation.

      We thank the reviewer for bringing these works to our attention! We now explicitly discuss these studies in both the Discussion section as mentioned above and in the Introduction as quoted below.

      “Evolutionary analysis of IDRs is challenging due to difficulties in sequence alignment [cite], though several studies have attempted alignment of disordered proteins with promising results [Dasmeh et al, Lu et al and Ho & Huang].”

      Comment 1c: A number of groups have performed proteomewide saturation scans using pLMs, including variants of the ESM family, including Meier (reference 89, but cited about something else) and Cagiada et al (https://doi.org/10.1101/2024.05.21.595203) that analysed variant effects in IDRs using a pLM. Thus, I think statements such as "their applicability to studying the fitness and evolutionary pressures on IDRs has yet to be established" should possibly be qualified.

      We added a new paragraph in the Introduction to discuss the application of protein language models to IDRs and cited the suggested references.

      “While protein language models have been widely applied to structured proteins [cite], it is important to emphasize that these models themselves are not inherently biased toward folded domains. For example, the Evolutionary Scale Model (ESM2) [cite] is trained as a probabilistic language model on raw protein sequences, without incorporating any structural or functional annotations. Its unsupervised learning paradigm enables ESM2 to capture statistical patterns of residue usage and evolutionary constraints without relying on explicit structural information. Thus, the success of ESM2 in modeling the mutational landscapes of folded proteins [cite] reflects the model’s ability to learn sequence-level constraints imposed by natural selection — a property that is equally applicable to IDRs if those regions are also under functional selection. Indeed, protein language models are increasingly been used to analyze variant effects in IDRs [cite].”

      Comment 2: On page 4, the authors write, "The conserved residues are primarily located in regions associated with phase separation." These results are presented as a central part of the work, but it is not completely clear what the evidence is.

      We thank the reviewer this insightful comment. We realized that our wording is not as precise as we should have been. We meant to state that the regions associated with phase separation are significantly enriched in these conserved residues. This is a significant finding and indicates that phase separation could be a source of evolutionary pressure in dictating IDP sequence conservation. However, we do not intend to suggest that phase separation is the only evolutionary pressure.

      The sentence has been revised to

      “Notably, regions associated with phase separation are significantly enriched in these conserved residues.”

      We further replaced the section title "Conserved, Disordered Residues Localize in Regions Driving Phase Separation" with "Regions Driving Phase Separation Are Enriched with Conserved, Disordered Residues" to further clarify our findings and avoid overinterpretation.

      Finally, we revised the following sentence in the discussion

      “Notably, these conserved, disordered residues are predominantly located in regions actively involved in phase separation, contributing to the formation of membraneless organelles.”

      to

      “Notably, regions actively involved in phase separation are enriched with these conserved, disordered residues, supporting their potential role in the formation of membraneless organelles.”

      The submitted manuscript provides clear evidence supporting the enrichment of conserved residues in MLO-driving IDRs. Specifically, Figures 4A and 4C demonstrate that these IDRs exhibit a substantially higher fraction of conserved residues compared to other IDRs involved in phase separation.

      In this analysis, the nMLO-hIDR group serves as a baseline, representing the distribution of conservation in disordered regions lacking MLO-related functions. In contrast, IDRs from MLOassociated groups show a pronounced lower shift in their median and interquartile ranges, indicating stronger evolutionary constraints. Within the dMLO cohort, the degree of conservation follows a distinct gradient: driving residues exhibit the highest levels of conservation, followed by participant residues, with non-participant residues showing values closer to the nMLO baseline. This pattern reflects the relative functional importance of each group in phase separation, with conservation levels corresponding to their roles in MLO scaffolding.

      To further support this, we computed, for each IDR, the fraction of conserved amino acids. As shown in Figure S11B, for IDRs that actively contribute to phase separation, the fraction is indeed higher than those not involved in phase separation. This analysis is now included in SI.

      During the revision, we explicitly evaluated whether conserved residues are preferentially located in regions associated with phase separation. To this end, for each protein in the MLO-hProt dataset, we calculated the probability p of finding conserved residues within regions contributing to phase separation. These regions include both "driving" and "participating" segments as defined in Figure 4 of the main text.

      Figure S11A presents the distribution of p across all proteins. For comparison, we also include the distribution of 1− p, representing the probability of finding conserved residues in regions not associated with phase separation. On average, p exceeds 0.5, suggesting a tendency for conserved residues to be more frequently located in phase-separating regions. However, the difference between the two distributions is not statistically significant. This result may be due to the generally low density of conserved residues in IDRs, which makes the estimation of p challenging for individual proteins. Additionally, some conserved sites may be involved in functions unrelated to phase separation.

      We added the following text to the Discussion section of the main text.

      “We emphasize that the results presented in Figure 4 do not directly demonstrate that conserved residues are preferentially located in regions associated with phase separation. Although these regions are more enriched in conserved amino acids, their total sequence length can be smaller than that of non-phase-separating regions. As a result, the absolute number of conserved residues may still be higher outside phase-separating regions. To quantitatively assess this, we calculated, for each protein in the MLO-hProt dataset, the probability p of finding conserved residues within regions contributing to phase separation. These regions include both "driving" and "participating" segments, as defined in Figure 4 of the main text. Figure S11 shows the distribution of p across all proteins. For comparison, we also present the distribution of 1− p, which reflects the probability of finding conserved residues in non-phase-separating regions. While the average value of p exceeds 0.5, indicating a trend toward conserved residues being more frequently located in phase-separating regions, the difference between the two distributions is not statistically significant. Future studies with expanded datasets may be necessary to clarify this trend.”

      Comment 3: It would be useful with an assessment of what controls the authors used to assess whether there are folded domains within their set of IDRs.

      We acknowledge that our previous labeling may have caused some confusion. Protein sequences used in Figures 2 and 3 include both folded and disordered domains. Results presented in these figures were constructed using full-length protein sequences to highlight the similarities and differences in ESM2 scores between folded and disordered domains.

      In contrast, the analyses presented in Figures 4 and 5 focus exclusively on IDRs to examine their role in phase separation.

      To prevent further confusion, we have renamed the dataset used in Figures 2 and 3 as MLO-hProt, emphasizing that the analysis pertains to entire protein sequences. The term MLO-hIDR is now reserved for a new dataset that includes only disordered residues, as used in Figures 4 and 5, and corresponding SI Figures.

      For the dMLO-IDR dataset, all except one amino acid (P40967, residue G592) are annotated as disordered in the MobiDB database (https://mobidb.org/). This database characterizes disordered regions based on a combination of predictive algorithms and experimental data. As illustrated in Figure S5A, 25.5% of the proteins in the dataset have direct experimental evidence supporting their disorderedness. These experimental annotations are derived from a diverse range of techniques (Figure S5B). For the remaining proteins, disorder was predicted by one or more computational tools. Although not all tools were applied to every protein, each protein in the dataset was identified as disordered by at least one method.

      For human proteins, IDRs were identified based on AlphaFold2 pLDDT scores, using a threshold of 70. As established in prior studies [1, 2], the pLDDT score provides a quantitative measure of local structural confidence, with lower values indicating greater structural disorder. IDRs associated with conditional folding or disorder-to-order transitions generally exhibit high pLDDT values (e.g., >70).

      Author response image 2 shows a violin plot of AlphaFold2 pLDDT scores for the various MLO-hIDR groups. The consistently low scores support the conclusion that these regions are structurally disordered.

      We also cross-checked the MLO-hIDR regions against the MobiDB database. As shown in Figure S6, approximately 76% of the proteins in the dataset are predicted to contain disordered regions. Among the non-labeled segments with pLDDT scores ≤ 70, the majority are relatively short, with segments of 1–5 residues accounting for approximately 80%.

      Author response image 2.

      AlphaFold pLDDT scores of hIDRs in different MLO-related groups.

      In addition to renaming the dataset, we also revised the manuscript to highlight the validation of disorderedness in section of Results: Regions Driving Phase Separation Are Enriched with Conserved, Disordered Residues.

      “The presence of evolutionarily conserved disordered residues raises the question of their functional significance. To explore this, we identified disordered regions of MLO-hProt using a pLDDT score less than 70 and partitioned these regions into two categories: drivers (dMLO-hIDR), which actively drive phase separation, and clients (cMLO-hIDR), which are present in MLOs under certain conditions but do not promote phase separation themselves [cite]. Additionally, IDRs from human proteins not associated with MLOs, termed nMLO-hIDR, were included as a control. To enhance statistical robustness, we extended our dataset by incorporating driver proteins from additional species [cite], resulting in the expanded dMLO-IDR dataset. Beyond the pLDDT-based classification, the majority of residues in these datasets are also predicted to be disordered by various computational tools and supported by experimental evidence (Figures S5 and S6).”

      Recommendation 1: The authors use the terms "evolutionary fitness of IDRs" (abstract and p. 5, for example), "fitness of amino acids" (p. 4), and "quantify the fitness of particular residues at specific sites" (p. 5). It is not clear what is meant by fitness in this context.

      We thank the reviewer for pointing out the ambiguity in the term fitness. To enhance clarity, we have replaced “fitness" with “mutational tolerance" to more directly emphasize the evolutionary conservation of specific residues.

      Recommendation 2: The authors write (p. 6) "Previous studies have demonstrated a strong correlation between ESM2 scores and changes in free energy related to protein structure stability". While that may be true, it might be worth noting that ESM2 scores report on the effects of mutations and function more broadly than stability because these models have previously been shown to capture conservation effects beyond stability.

      We fully agree with the reviewer’s comment and have revised the main text accordingly. Specifically, the referenced sentence has been revised and relocated, as shown below.

      “Our analysis demonstrated that HP1_α_’s structured domains consistently yield low ESM2 scores, reflecting strong mutational constraints characteristic of folded regions. These constraints are further evident in the local LLR predictions, as shown in Figure 2B, where we illustrate the folded region G120-T130. Given the functional importance of preserving the 3D of structured domains, mutations with greater detrimental effects are likely to disrupt protein folding substantially. This interpretation is consistent with previous studies reporting a significant correlation between ESM2 LLRs and changes in free energy associated with protein structural stability [cite].”

      Recommendation 3: p. 10: The authors write "To exclude sequences that no longer qualify as homologs, we filtered for sequences with at least 20% identity to the reference". How did they decide on 20% and why? And over which residues are these 20% calculated.

      We apologize for the earlier lack of clarity. Sequence alignment was performed using the full-length protein sequences, encompassing both folded and disordered regions. For each sequence, we calculated the percent identity by counting the number of positions, denoted as n, at which the amino acid matches the reference. The percent identity was then computed as n/N, where N represents the total length of the aligned reference sequence. This total includes residues in folded and disordered regions, as well as gap positions introduced during alignment.

      We updated the Methods section of the main text to clarify.

      “We performed multi-sequence alignment (MSA) analysis using HHblits from the HH-suite3 software suite [citations], a widely used open-source toolkit known for its sensitivity in detecting sequence similarities and identifying protein folds. HHblits builds MSAs through iterative database searches, sequentially incorporating matched sequences into the query MSA with each iteration. Sequence alignment was performed using the full-length protein sequences, encompassing both folded and disordered regions.

      ...

      To refine alignment quality by focusing on closely related homologs, we filtered out sequences with ≤ 20% identity to the query, excluding weakly related sequences where only short segments show similarity to the reference. For each sequence, we calculated the percent identity by counting the number of positions, denoted as n, at which the amino acid matches the reference. The percent identity was then computed as n/N, where N represents the total length of the aligned reference sequence. This total includes residues in folded and disordered regions, as well as gap positions introduced during alignment.”

      We selected a 20% sequence identity threshold to balance inclusion of true homologs with exclusion of distant matches that may not share functional relevance. To determine this cutoff, we compared identity thresholds of 0%, 10%, 20%, and 40% and examined the resulting distributions of conservation and ESM2 scores across aligned residues for MLO-hProt dataset (Author response image 3). Thresholds of 10%, 20%, and 40% produced qualitatively similar results, with a consistent correspondence between low ESM2 scores and high conservation. Lower thresholds introduced highly divergent sequences that added noise to the alignment, resulting in reduced overall conservation scores. In contrast, higher thresholds excluded homologs with potentially meaningful conservation, particularly in disordered regions where conservation scores tend to be relatively low.

      Author response image 3.

      Histograms of the ESM2 score and the conservation score, presented in a format consistent with Figure 3B of the main text. The conservation scores were computed using aligned sequences with identity thresholds of ≥0, ≥10%, ≥20%, and ≥40% (left to right). Contour lines represent different levels of −log_P_(CS,ESM2), where P is the joint probability density of conservation score (CS) and ESM2 score. Contours are spaced at 0.5-unit intervals, highlighting regions of distinct density.

      Recommendation 4: In their description of "motif" searching algorithm (p. 20) I think that the search algorithm would give a different result whether the search is performed N->C or C->N (because the first residue (i) needs to have a score <0.5 but the last (j) could have a score >0.5 as long as the average is below 0.5. Is that correct? And if so, why did they choose an asymmetric algorithm? .

      We thank the reviewer for highlighting the asymmetry in our motif-search algorithm.

      To investigate this issue, we repeated the algorithm starting from the C-terminus and compared the resulting motifs with those obtained from the N-terminal scan. We found that the two sets of motifs overlap entirely: each motif identified from the C-terminal direction has a corresponding counterpart from the N-terminal scan. However, the motifs are not identical. The directionality of the search introduces additional amino acids—referred to here as peripheral residues—at the motif boundaries, which differ between the two sets.

      As shown in Author response image 4, the number of peripheral residues is small relative to the total motif length.

      To eliminate asymmetry and ambiguity, we have revised our method to perform bidirectional scans—from both the N- and C-termini—and define each motif as the overlapping region identified by both directions. This approach emphasizes the conserved core and avoids the inclusion of spurious terminal residues. The updated procedure is described in Methods: Motif Identification.

      “To identify motifs within a given IDR, we implemented the following iterative procedure. Starting from either the N– or C–terminus of the sequence, we first locate the initial residue i whose ESM2 score falls within 0.5. From i, residues are sequentially appended…”

      Author response image 4.

      Number of peripheral residues and their relative length to the full-motif length identified from both sides. (A). The unique motifs identified from N-to-C terminal direction. (B) The unique motifs identified from C-to-N terminal direction.

      “…in the direction toward the opposite terminus until the segment’s average ESM2 score exceeds 0.5; the first residue to breach this threshold is denoted j. The segment (i,i+1,..., j−1) is then recorded as a candidate motif. This process repeats starting from j until the end of the IDR is reached.

      We perform this full procedure independently from both termini and designate the final motif as the intersection of the two candidate-motif sets. This bidirectional overlap strategy excludes terminal residues that might transiently satisfy the average-score criterion only due to adjacent low-scoring regions, thereby isolating the conserved core of each motif. All other residues—those not included in either directional pass—are classified as non-motif regions, minimizing peripheral artifacts.”

      Accordingly, we have updated the Supplementary material: ESM2_motif_with_exp_ref.csv for the new identified motifs commonly exited from both N-terminal and C-terminal searches. Minor changes were observed in the set of motifs as being discussed, but these do not affect the main conclusions. Figures 5C, 5D, and S6 have been revised accordingly.

      Reviewer #2:

      Summary:

      Unfortunately, I do not believe that the results can be trusted. ESM2 has not been validated for IDRs through experiments. The authors themselves point out its little use in that context. In this study, they do not provide any further rationale for why this situation might have changed. Furthermore, they mention that experimental perturbations of the predicted motifs in in vivo studies may further elucidate their functional importance, but none of that is done here. That some of the motifs have been previously validated does not give any credibility to the use of ESM2 here, given that such systems were probably seen during the training of the model.

      We thank the reviewer for their detailed and thoughtful critique of our manuscript. We recognize the importance of careful model validation, especially in the context of IDRs, and appreciate the opportunity to clarify the scope and rationale of our study. Below, we respond point-by-point to the main concerns.

      (1) The use of ESM2 is not validated for IDRs, and the authors provide no rationale for its applicability in this context.

      We thank the reviewer for raising this important point.

      First, we emphasize that ESM2 is a probabilistic language model trained entirely on amino acid sequences, without any structural supervision. The model does not receive any input about protein structure — folded or disordered — during training. Instead, it learns to estimate the likelihood of each amino acid at a given position, conditioned on the surrounding sequence context. This makes ESM2 agnostic to whether a sequence is folded or disordered; the model’s capacity to identify patterns of residue usage arises solely from the statistics of natural sequences.

      As such, ESM2 is not inherently biased toward folded proteins, even though previous studies have demonstrated its usefulness in identifying conserved and functionally constrained residues in structured domains [3–9]. These findings support the broader utility of language models for uncovering evolutionary constraints — and by extension, suggest that similar signatures could exist in IDRs, particularly if they are under functional selection.

      Indeed, if certain residues or motifs in IDRs are conserved due to their importance in biological processes (e.g., phase separation), we would expect such selection to be reflected in sequence-based features, which ESM2 is designed to detect. The model’s applicability to IDRs, then, is a natural extension of its core probabilistic architecture.

      To further evaluate this, we carried out an independent in silico validation using multiple sequence alignments (MSAs). This analysis allowed us to compute the evolutionary conservation of individual amino acids without any reliance on ESM2. We then compared these conservation scores to ESM2 scores and found a strong correlation between the two. This provides direct, quantitative support for the idea that ESM2 is capturing biologically meaningful sequence constraints — even in disordered regions.

      While we agree that experimental testing would ultimately provide the most compelling validation, we believe that our MSA-based comparison constitutes a strong and arguably ideal computational validation of the model’s predictions. It offers an orthogonal measure of evolutionary pressure that confirms the biological plausibility of ESM2 scores.

      We added the following text in the introduction to highlight the applicability of ESM2 to IDRs.

      “While protein language models have been widely applied to structured proteins, it is important to emphasize that these models themselves are not inherently biased toward folded domains. For example, the Evolutionary Scale Model (ESM2) [cite] is trained as a probabilistic language model on raw protein sequences, without incorporating any structural or functional annotations. It operates by estimating the likelihood of observing a given amino acid at a particular position, conditioned on the entire surrounding sequence context. This unsupervised learning paradigm enables ESM2 to capture statistical patterns of residue usage and evolutionary constraints without relying on explicit structural information. Thus, the success of ESM2 in modeling fitness landscapes of folded proteins reflects the model’s ability to learn sequence-level constraints imposed by natural selection — a property that is equally applicable to IDRs if those regions are also under functional selection. Indeed, protein language models are increasingly been used to analyze variant effects in IDRs [cite].”

      (2) There is no experimental validation of the ESM2-based predictions in this study.

      We agree that experimental validation would provide definitive support for the utility of ESM2 in IDRs, and we explicitly state this as a limitation in the revised manuscript as quoted below.

      “Limitations: Despite the promising findings, our study has several limitations. Most notably, our analysis is purely computational, relying on ESM2-derived predictions and sequence-based conservation without accompanying experimental validation. While the strong correlation between ESM2 scores and evolutionary conservation provides compelling evidence that the identified motifs are functionally constrained, the precise biological roles of these motifs remain uncharacterized. ESM2 is well-suited for highlighting regions under selective pressure, but it does not provide mechanistic insights into how conserved motifs contribute to specific molecular functions such as phase separation, molecular recognition, or dynamic regulation. Determining these roles will require targeted experimental investigations, including mutagenesis and biophysical characterization.”

      In addition, we revised the manuscript title from “Protein Language Model Identifies Disordered, Conserved Motifs Driving Phase Separation" to “Protein Language Model Identifies Disordered, Conserved Motifs Implicated in Phase Separation". This revision softens the original claim to better reflect the absence of direct experimental evidence for the motifs’ role in phase separation.

      However, we also emphasize that the goal of our study is not to claim definitive predictive power, but rather to explore whether ESM2-derived mutational profiles align with known biological features of IDRs — and in doing so, to generate new, testable hypotheses.

      In addition, while no in vivo experiments were performed, our study does include an in silico validation step, as detailed in the response to the previous comment. The strong correlation between ESM2 scores and conservation scores provides direct support for the utility of ESM2 in identifying residues under evolutionary constraint in disordered regions.

      (3) The overlap between predicted motifs and known ones may be due totraining data leakage.

      We respectfully clarify that training data leakage is not possible in this case, as ESM2 is trained using unsupervised learning on raw protein sequences alone. The model has no access to experimental annotations, functional labels, or knowledge of which motifs are involved in phase separation. It only models statistical sequence patterns derived from evolutionarily observed proteins.

      Therefore, any agreement between ESM2-derived predictions and previously validated motifs arises not from memorization of experimental data, but from the model’s ability to learn meaningful sequence constraints from the natural distribution of proteins.

      (4) The authors should revamp the study with a testable predictive framework.

      We respectfully suggest that a full revamp is not necessary or appropriate in this context.

      As outlined in our previous responses, we believe that certain misunderstandings about the nature and capabilities of ESM2 may have influenced the reviewer’s assessment.

      Importantly, both Reviewer 1 and Reviewer 3 express strong support for the significance and novelty of this work, and recommend publication following minor revisions.

      In this context, we believe the manuscript provides a useful contribution as a first step toward understanding disordered regions using language models, and that it has value even in the absence of direct experimental testing. We have now better positioned the manuscript in this light, clarified limitations, and suggested concrete next steps for follow-up research.

      We hope these clarifications and revisions address the reviewer’s concerns, and we thank them again for helping us strengthen the framing, rigor, and clarity of our study.

      Reviewer #3:

      Summary:

      This is a very nice and interesting paper to read about motif conservation in protein sequences and mainly in IDRs regions using the ESM2 language model. The topic of the paper is timely, with strong biological significance. The paper can be of great interest to the scientific community in the field of protein phase transitions and future applications using the ESM models. The ability of ESM2 to identify conserved motifs is crucial for disease prediction, as these regions may serve as potential drug targets. Therefore, I find these findings highly significant, and the authors strongly support them throughout the paper. The work motivates the scientific community towards further motif exploration related to diseases.

      Strengths:

      (1) Revealing conserved regions in IDRs by the ESM-2 language model.

      (2) Identification of functionally significant residues within protein sequences, especially in IDRs.

      (3) Findings supported by useful analyses.

      We appreciate the reviewer’s thoughtful words and support for our work.

      Weaknesses:

      (1) Lack of examples demonstrating the potential biological functions of these conserved regions.

      As detailed in the Response to Recommendation 6, we conducted additional analyses to connect the identified conserved regions with their biological functions.

      (2) Very limited discussion of potential future work and of limitations.

      We have substantially revised the Conclusions and Discussion section to provide a detailed analysis of the study’s limitations and to propose several directions for future research, as elaborated in our Response to Recommendation 5 below.

      Recommendation 1: The authors describe the ESM2 score such that lower scores are associated with conserved residues, stating that "lower scores indicate higher mutational constraint and reduced flexibility, implying that these residues are more likely essential for protein function, as they exhibit fewer permissible mutational states." However, when examining intrinsically disordered regions (IDRs), which are known to drive phase separation, I observe that the ESM2 score is relatively high (Figure 3C, pLDDT < 50, and Supplementary Figure S2). Could the authors clarify how this relatively high score aligns with the conservation of motifs that drive phase separation?

      We thank the reviewer for this insightful comment. We would like to clarify that most amino acids in the IDRs are not conserved, even for IDRs that contribute to phase separation. Only a small set of amino acids in these IDRs, which we term as motifs, are evolutionarily conserved with low ESM2 scores. Therefore, the ESM2 scores exhibit bimodal distribution at high and low values, as shown in Figures 4A and 4C of the manuscript. When averaged over all the amino acids, the mean ESM2 scores, plotted in Figure 3C, are relatively high due to dominant population of non-conserved amino acids.

      Recommendation 2: The authors mention: "We first analyzed the relationship between ESM2 and pLDDT scores for human Heterochromatin Protein 1 (HP1, residues 1-191)". I appreciate this example as a demonstration of amino acid conservation in IDRs. However, it is questionable whether the authors could provide some more examples to support amino acid conservation particularly within the IDRs along with lower ESM2 score (e.g, Could the authors provide some additional examples of "conserved disordered" regions in various proteins which are associated with relatively low ESM2 score as appear in Figure 2A).

      We thank the reviewer for this valuable suggestion. We want to kindly noted that the conserved residues on IDRs are prevalent as indicated in Figures 2D and 3B. To further illustrate the prevalence of “conserved disordered” regions, we generated ESM2 versus pLDDT score plots for the full dMLO–hProt dataset (82 proteins) in Figure S2. In these plots, residues with pLDDT ≤ 70 are highlighted in blue to denote structural disorder (dMLO-hIDR), and these disordered residues with ESM2 score ≤ 1.5 are shown in purple to indicate conserved disordered segments.

      Recommendation 3: Could the authors plot a Violin conservation score plot for Figure 4A to emphasise the relationship between ESM2 scores and conservation scores of disordered residues?

      We thank the reviewer for this suggestion. We included a violin plot illustrating the distribution of conservation scores for disordered residues across all four IDR groups, shown in Author response image 5. Consistent with the findings in Figure 4A, the phase separation drivers (dMLO-hIDR and dMLOIDR) exhibit a higher proportion of conserved amino acids compared to the client group (cMLOhIDR).

      We also note that the nMLO-hIDR group may contain conserved residues due to functions unrelated to MLO formation, which could contribute to the higher observed levels of conservation in this group.

      Author response image 5.

      Violin plots illustrating the distribution of conservation scores for disordered residues across the nMLO–hIDR, cMLO–hIDR, dMLO–hIDR, and dMLO–IDR datasets. Pairwise statistical comparisons were conducted using two-sided Mann–Whitney U tests on the conservation score distributions (null hypothesis: the two groups have equal medians). P-values indicate the probability of observing the observed rank differences under the null hypothesis. Statistical significance is denoted as follows: ***: p < 0.001; **: p < 0.01; *:p < 0.05;

      Recommendation 4: It will be appreciated if the authors could add to Figure 4 Violin plots, a statistical comparison between the different groups.

      We thank the reviewer for this valuable suggestion. We included the p-values for Figures 4A and 4C to quantify the statistical significance of differences in the distributions.

      Most comparisons are highly significant (p < 0.001), while the largest p-value (p = 0.089) between the conservation score of driving and non-participating groups (Figure 4C) still suggests a marginally significant trend.

      Recommendation 5: Could the authors expand more on potential future research directions using ESM2, given its usefulness in identifying conserved motifs? Specifically, how do the authors envision conserved motifs will contribute to future discoveries/applications/models using ESM (e.g, discuss the importance of conserved motifs, especially in IDRs motifs, in protein phase transition prediction in relation to diseases).

      We thank the reviewer for this insightful comment. To further assess the functional relevance of the conserved motifs, we incorporated pathogenic variant data from ClinVar [10, 11] to evaluate mutational impacts. As shown in Figure S12A and B, a substantial number of pathogenic variants in MLO-hProt proteins are associated with low ESM2 LLR values. This pattern holds for both folded and disordered residues.

      Moreover, we observed that variants located within motifs are more frequently pathogenic compared to those outside motifs (Figure S12C). In the main text, motifs were defined only for driver proteins; however, the available variant data for this subset are limited (6 data points). To improve statistical power, we extended motif identification to include both client and driver human proteins, following the same methodology described in the main text. Consistent with previous findings, variants within motifs in this expanded set are also more likely to be pathogenic. These results further support the functional importance of both low ESM2-scoring residues and the conserved motifs in which they reside.

      The following text was added in the Discussion section of the manuscript to discuss these results and outline future research directions.

      “Several promising directions could extend this work, both to refine our mechanistic understanding and to explore clinical relevance. One avenue is testing the hypothesis that conserved motifs in scaffold proteins act as functional stickers, mediating strong intermolecular interactions. This could be evaluated computationally via free energy calculations or experimentally via interaction assays. Deletion of such motifs in client proteins may also reduce their partitioning into condensates, illuminating their roles in molecular recruitment.

      To explore potential clinical implications, we analyzed pathogenicity data from Clin-Var [10, 11]. As shown in Figure S12A, single-point mutations with low LLR values—indicative of constrained residues—are enriched among clinically reported pathogenic variants, while benign variants typically exhibit higher LLR values. Moreover, mutations within conserved motifs are significantly more likely to be pathogenic than those in non-motif regions (Figure S12B). These findings highlight the potential of ESM2 as a first-pass screening tool for identifying clinically relevant residues and suggest that the conserved motifs described here may serve as priorities for future studies, both mechanistic and therapeutic.”

      Moreover, the functional significance of conserved motifs, particularly their implications in disease and pathology, warrants further investigation. As an initial analysis, we incorporated ClinVar pathogenic variant data [citation] to assess mutational effects within our datasets. As illustrated in Figure R12A, single-point mutations with low LLR values are enriched among clinically reported pathogenic variants, whereas benign variants are more commonly associated with higher LLR values. Notably, mutations within conserved motifs are substantially more likely to be pathogenic compared to those in non-motif regions. These findings highlight the potential of ESM2 as a firstpass tool for identifying residues of clinical relevance. The conserved motifs identified here may be prioritized in future studies aimed at elucidating their biological roles and evaluating their viability as therapeutic targets.

      Recommendation 6: The authors mention: "Our findings provide strong evidence for evolutionary pressures acting on specific IDRs to preserve their roles in scaffolding phase separation mechanisms, emphasizing the functional importance of entire motifs rather than individual residues in MLO formation." They also present a word cloud of functional motifs in Figure 5D. Although it makes sense that evolutionarily conserved motifs, especially within the IDRs regions, act as functional units, I think there is no direct evidence for such functionality (e.g., examples of biological pathways associated with IDRs and phase separation). Hence, there is no justification to write in the figure caption: "ESM2 Identifies Functional Motifs in driving IDRs" unless the authors provide some examples of such functionality. This will even make the paper stronger by establishing a clear connection to biological pathways, and hence these motifs can serve as potential drug targets.

      We thank the reviewer for this insightful suggestion. We have replaced “functional motifs" with “conserved motifs" in the figure caption.

      Identifying the precise biological pathways associated with the conserved motifs is a complex task and a comprehensive investigation lies beyond the scope of this study. Nonetheless, as an initial effort, we explored the potential functions of these motifs using annotations available in DisProt (https://disprot.org/).

      DisProt is the leading manually curated database dedicated to IDPs, providing both structural and functional annotations. Expert curators compile experimentally validated data, including definitions of disordered regions, associated functional terms, and supporting literature references. Author response image 6 presents a representative DisProt entry for DNA topoisomerase 1 (UniProt ID: P11387), illustrating its structural and biological annotation.

      For each motif, we located the corresponding DisProt entry and assigned a functional annotation based on the annotated IDR from which the motif originates. We emphasize that this functional assignment should be regarded as an approximation. Because experimental annotations often pertain to the entire IDR, regions outside the motif may also contribute to the reported function.

      Nevertheless, the annotations provide valuable insights.

      Author response image 6.

      Screenshot of information provided by the DisProt database. Detailed annotations of biological functions and structural features, along with experimental references, are accessible via mouse click.

      Approximately 50% of ESM2-predicted IDR motifs lack functional annotations. Among those that are annotated, motifs from the dMLO-IDR dataset are predominantly associated with “molecular condensate scaffold activity,” followed by various biomolecular binding functions (Author response image 7A). These findings support the role of these motifs in MLO formation.

      For comparison, we applied the same identification procedure (described in Methods: Motif Identification) to motifs from the nMLO-hIDR dataset. In contrast to the dMLO-IDR motifs, these exhibit a broader range of annotated functions related to diverse cellular processes. Collectively, these results suggest that motifs identified by ESM2 are aligned with biologically relevant functions captured in current databases.

      Finally, as illustrated in Figure S12 and discussed in the Response to Recommendation 5, variants occurring within identified motifs are more likely to be pathogenic than those in non-motif regions, further underscoring their functional importance.

      Author response image 7.

      Biological functions of ESM2-predicted motifs. (A) Distribution of biological functions associated with all identified motifs from dMLO-IDR driving groups. (B) Distribution of biological functions associated with all identified motifs from nMLO-hIDR groups.

      Recommendation 7: In Figure 2C the authors present FE (I assume this is free energy), some discussion about the difference in the free energy referring to the "a" region is missing (i.e. both "Folded" and "Disordered" regions are associated with low ESM score but with low and high free energy (FE), respectively.

      We thank the reviewer for the comments. FE indeed abbreviates free energy. To improve clarify and avoid confusion, we have updated all figure captions by replacing “FE” with “−logP” to explicitly denote the logarithm of probability in the contour density plots.

      We used “a" in Figures 2C and 2D to refer to regions with low ESM2 scores, which appears a local minimum in both plots. Since most residues in folded regions are conserved, region a has lower free energy than region b in Figure 2C. On the other hand, as most residues in disordered regions are not conserved, as we elaborated in Response to Recommendation 1, region a has lower population and higher free energy than region b.

      To avoid confusion, we have replaced “a" and “b" in Figure 2D with “I" and “II".

      Recommendation 8: Figure S2: It would be useful to plot the same figure for structured and disordered regions as well.

      We are not certain we fully understood this comment, as we believe the requested analysis has already been addressed. In Figure S2, we used the AlphaFold2 pLDDT score to represent the structural continuum of different protein regions, where residues with pLDDT > 70 (red and lightred bars) are classified as structured, while those with pLDDT ≤ 70 (blue and light-blue bars) are classified as disordered.

      Minor suggestion 1: Could the authors clarify the meaning of the abbreviation "FE" in the colorbar of the contour line? I assume this is free energy.

      We have updated all contour density plot figure captions by replacing “FE” with “−logP” to explicitly denote the logarithm of probability.

      Minor suggestion 2: In Figure 2A - do the authors mean "Conserved folded" instead of just "Folded"? If so, could the authors indicate this?

      We thank the reviewer for this comment. The ESM2 scores indeed suggest that, within folded regions, there may be multiple distinct groups exhibiting varying degrees of evolutionary conservation. However, as our primary focus is on IDRs, we chose not to investigate these distinctions further.

      Figure 2A illustrates a randomly selected folded region based on AlphaFold2 pLDDT scores.

      References

      (1) Ruff, K. M.; Pappu, R. V. AlphaFold and Implications for Intrinsically Disordered Proteins. Journal of Molecular Biology 2021, 433, 167208.

      (2) Alderson, T. R.; Pritišanac, I.; Kolaric, Ð.; Moses, A. M.; Forman-Kay, J. D. Systematic´ Identification of Conditionally Folded Intrinsically Disordered Regions by AlphaFold2. Proceedings of the National Academy of Sciences of the United States of America, 120, e2304302120.

      (3) Brandes, N.; Goldman, G.; Wang, C. H.; Ye, C. J.; Ntranos, V. Genome-Wide Prediction of Disease Variant Effects with a Deep Protein Language Model. Nature Genetics 2023, 55, 1512–1522.

      (4) Lin, Z. et al. Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model. 2023.

      (5) Zeng, W.; Dou, Y.; Pan, L.; Xu, L.; Peng, S. Improving Prediction Performance of General Protein Language Model by Domain-Adaptive Pretraining on DNA-binding Protein. Nature Communications 2024, 15, 7838.

      (6) Gong, J. et al. THPLM: A Sequence-Based Deep Learning Framework for Protein Stability Changes Prediction upon Point Variations Using Pretrained Protein Language Model. Bioinformatics 2023, 39, btad646.

      (7) Lin, W.; Wells, J.; Wang, Z.; Orengo, C.; Martin, A. C. R. Enhancing Missense Variant Pathogenicity Prediction with Protein Language Models Using VariPred. Scientific Reports 2024, 14, 8136.

      (8) Saadat, A.; Fellay, J. Fine-Tuning the ESM2 Protein Language Model to Understand the Functional Impact of Missense Variants. Computational and Structural Biotechnology Journal 2025, 27, 2199–2207.

      (9) Chu, S. K. S.; Narang, K.; Siegel, J. B. Protein Stability Prediction by Fine-Tuning a Protein Language Model on a Mega-Scale Dataset. PLOS Computational Biology 2024, 20, e1012248.

      (10) Landrum, M. J.; Lee, J. M.; Riley, G. R.; Jang, W.; Rubinstein, W. S.; Church, D. M.; Maglott, D. R. ClinVar: Public Archive of Relationships among Sequence Variation and Human Phenotype. Nucleic Acids Research 2014, 42, D980–D985.

      (11) Landrum, M. J. et al. ClinVar: Improving Access to Variant Interpretations and Supporting Evidence. Nucleic Acids Research 2018, 46, D1062–D1067.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) A major issue throughout the paper is that Hox expression analysis is done exclusively through quantitative PCR, with values ranging from 2-fold to several thousand-fold upregulation, with no antibody validation for any Hox protein (presumably they are all upregulated).

      Thank you for your comment.

      We tried to verify the stimulated Hox expression pattern by in situ hybridization. Although in early embryos (E9.5) we could detect clearly hox (i.e. Hox8 and Hox9 in Author response image 1) expression patterns in the neural tube by whole mount in situ hybridization, we failed to detect a clear pattern in the brain stem at E18.5 either in whole mount tissue or on sections. That’s one reason that we turned to single nuclear RNA-seq instead.

      This is likely due to their low expression levels at late developmental stages and need to be detected by more sensitive method. However, we estimated that the stimulated expression levels of the representative Hox genes are at least comparable to the physiological levels at posterior spinal cord to evoke a functional effect.

      Author response image 1.

      Some Hox8 and Hox9 expression pattern in E9.5 embryos.

      (2) In Figure 1, massive upregulation of most Hox genes in the brainstem is shown after e16.5 but the paper quickly focuses on analysis of PN nuclei. What are the other consequences of this broad upregulation of Hox genes in the brainstem? There is no discussion of the overall phenotype of the mice, the structure of the brainstem, the migration of neurons, etc. The very narrow focus on motor cortex projections to PN nuclei seems bizarre without broad characterization of the mice, and the brainstem in particular. There is only a mention of "severe motor deficits" from previous studies, but given the broad expression of Rnf220, the fact that is a global knockout, and the effects on spinal cord populations shown previously the justification for focusing on PN nuclei does not seem strong.

      Thank you for your comment.

      Although RNF220 is important for the dorsal-ventral patterning of the spinal cord as well as the hindbrain during embryonic development, the earlier neural patterning and differentiation are normal in the Rnf220+/- mice (Wang et al., 2022). However, these mice showed reduced survival and motility to various degree postnatally (Ma et al., 2019; Ma et al., 2021), likely suggesting a dosage dependent role of RNF220 in maintaining late neural development. As our microarray assay showed the deregulation of the Hox genes in the brain, we followed this direction in this study and narrowed down the affected region to the pons. Our single nuclear RNA-Seq (snRNA-seq) data further shows that the Hox de-regulation mainly occurred in 3 clusters of neurons. However, the pons is complex and contains tens of nuclei. And the current resolution of our data does not support to assign a clear identity to each of them. Although it is clear that more nuclei are likely affected, the PN (cluster7) is the only cluster we can identify to follow in the current study. 

      As to general effect of RNF220 haploinsufficiency on the brainstem, we carried out Nissl staining assays and found no clear difference in neuronal cell organization between WT and Rnf220+/- pons (revised Figure 2-figure supplement 2).

      (3) It is stated that cluster 7 in scRNA-seq corresponds to the PN nuclei. The modest effect shown on Hox3-5 expression in that data in Figure 1 is inconsistent with the larger effect shown in Figure 2.

      Thank you for your comment.

      Due to the low efficiency of snRNA-seq and the depth of the sequencing, the quantification of the Hox expression based on the snRNA-seq data is likely less accurate as the qRT-PCR. In addition, only mRNAs in the nuclear could be captured by snRNA-seq, while mRNAs in both the nuclear and cytoplasm were reversed-transcribed and examined for qRT-PCR assays in Figure 2A.

      (4) Presumably, Hox genes are not the only targets of Rnf220 as shown in the microarray/RNA-sequencing data. There is no definitive evidence that any phenotypes observed (which are also not clear) are specifically due to Hox upregulation. The only assay the authors use to look at a Hox-dependent phenotype in the brainstem is the targeting of PN nuclei by motor cortex axons. This is only done in 2 animals and there are no details as to how the data was analyzed and quantified. The only 2 images shown are not convincing of a strong phenotype, they could be taken at slightly different levels or angles. At the very least, serial sections should be shown and the experiment repeated in more animals. There is also no discussion of how these phenotypes, if real, would relate to previous work by the Rijli group which showed very precise mechanisms of synaptic specificity in this system.

      Thank you for your comments and suggestions.

      The deregulation of Hox is the most obvious phenomena observed from the RNA-seq data, and we tried to assign its specific phenotypic effect in this study. As the roles of Hox in PN patterning and circuit formation is well established, we focused on the PN in the following study. Based on literature, we carried out the circuit analysis to examine the targeting of PN neurons by the motor cortex axons. A cohort of additional animals with different genotypes (n=10 for WT and n=9 for Rnf220+/-) were used to repeat the experiment and we got the same conclusion. More detailed information on data analysis and serial images were included in the revised manuscript and figure legends.

      (5) The temporal aspect of this regulation in vivo is not clear. The authors show some expression changes begin at e16.5 but are also present at 2 months. Is the presumed effect on neural circuits a result of developmental upregulation at late embryonic stages or does the continuous overexpression in adult mice have additional influence? Are any of the Hox genes upregulated normally expressed in the brainstem, or PN specifically, at 2 months? Why perform single-cell sequencing experiments at 2 months if this is thought to be mostly a developmental effect? Similarly, the significance of the upregulated WRD5 in the pons and pontine nuclei at 2 months in Figure 3 is not clear.

      Thank you for your comment.

      The spatial and temporal expression pattern of Hox genes is established at early embryonic stages and then maintained throughout developmental stage in mammals. As we have shown, the de-repression of Hox genes is a long-lasting defect in Rnf220+/- mice beginning at late embryonic stages. Since the neuronal circuit is established after birth in mice, we speculated that the neuronal circuit defects from motor cortex to PN neurons were due to the long-lasting up-regulation of Hox genes in PN neurons. We could not distinguish the effect on neural circuit a result of Hox genes developmental upregulation or continuous overexpression in adult mice. An inducible knockout mouse model may help to answer this question in the future. The discussion on this point was included in the revised manuscript.

      We carried out snRNA-seq analysis using pons tissues from adult mice aiming to identify the specific cell population with Hox up-regulation, which we failed to specify by in situ hybridization.

      We repeated the related experiments in the original Figure 3 and some of the blot images were replaced and quantified.

      (6) In Figure 3C, the levels of RNF220 in wt and het don't seem to be that different.

      We repeated the experiments and changed the related image in the revised Figure 3C.

      (7) Based on the single-cell experiments, and the PN nuclei focus, the rescue experiments are confusing. If the Rnf220 deletion has a sustained effect for up to 2 months, why do the injections in utero? If the focus is the PN nuclei why look at Hox9 expression and not Hox3-5 which are the only Hox genes upregulated in PN based on sc-sequencing? No rescue of behavior or any phenotype other than Hox expression by qPCR is shown and it is unclear whether upregulation of Hox9 paralogs leads to any defects in the first place. The switch to the Nes-cre driver is not explained. Also, it seems that wdr5 mRNA levels are not so relevant and protein levels should be shown instead (same for rescue experiments in P19 cells).

      Thank you for your comments.

      Since our data suggest that the upregulation of Hox genes expression is a long-lasting effect beginning at the late embryonic stage of E16.5, we conducted the rescue experiments by in utero injection of WDR5 inhibitor at E15.5 and examined the expression of Hox genes at E18.5. Although it is also necessary to examine whether the rescue effect by WDR5 inhibitor injection is also a long-lasting effect at adult stages, it is difficult to distinguish the embryos or pups when they were given birth. As a supplement, rescue assays with genetic ablation of Wdr5 gene were conducted and the results showed that genetic ablation of a single copy of Wdr5 allele could revere the upregulation of Hox genes by RNF220 haploinsufficiency in the hindbrains at P15.

      Most of the upregulated Hox genes including both Hox9 and Hox3-5 were examined in our rescue experiments. Since this study focuses on the PN nuclei, the results of Hox3-5 genes were shown in the revised main Figure 6.

      We conducted rescue experiments by deleting Wdr5 in neural tissue using Nestin-Cr_e mice because _Wdr5+/- mice is embryonic lethal. And the up-regulation of Hox genes could be also observed in the hindbrains of Rnf220fl/wt; Nestin-Cre mice. Although Rnf220fl/wt; Wdr5fl/wt; Nestin-Cre mice are viable and could survive to adult stages, developmental defects in the forebrains, including cerebral cortex and hippocampus, were observed in Rnf220fl/wt;Wdr5fl/wt;Nestin-Cre mice. Therefore, no rescue of behavior tests was conducted in this study. We believe that it is out of the scope of this study to discuss the role of WDR5 in the development of forebrains.

      The potential defects due to the up-regulation of Hox9 paralogs awaits further investigations.

      Wdr5 mRNA levels were firstly examined to confirm the genetic deletion or siRNA mediated knockdown of Wdr5 genes. We have carried out western blot to examine the WDR5 protein levels and the results were included in the revised Figure 3.

      (8) What is the relationship between Retinoic acid and WRD5? In Figure 3E there is no change in WRD5 levels without RA treatment in Rnf KO but an increase in expression with RA treatment and Rnf KO. However, the levels of WRD5 do not seem to change with RA treatment alone. Does Rnf220 only mediate WDR5 degradation in the presence of RA? This does not seem to be the case in experiments in 293 cells in Figure 4.

      Thank you for your comment.

      We believe that the regulation of WDR5 and Hox expression by RNF220 is context dependent and precisely controlled in vivo, depending on the molecular and epigenetic status of the cell, which is fulfilled by RA treatment in P19 cells. In Figure 4, the experiment is based on exogenous overexpression assays, which might not fully reflect the situation in vivo.

      (9) Why are the levels of Hox upregulation after RA treatment so different in Figure 5 and Figure Supplement 5?

      In Figure.5C, the Hox expression levels were normalized against the control group in the presence of RA; while in Figure Supplement 5 they were normalized to the control group without RA treatment.

      (10) In Figures 4B+C which lanes are input and which are IP? There is no quantitation of Figure 4D, from the blot it does look that there is a reduction in the last 2 columns as well. The band in the WT flag lane seems to have a bubble. Need to quantitate band intensities. Same for E, the effect does not seem to be completely reversed with MG132.

      Thanks for pointing this out. The labels were included in the revised Figure 4B and 4C.

      We repeated the experiments for Figure 4D and 4E. Some of bot images were replaced and quantified in the revised Figure 4D and 4E.

      Reviewer 2:

      (1) Figure 1E shows that Rnf220 knockdown alone could not induce an increase in Hox expression without RA, which indicates that Rnf220 might endogenously upregulate Retinoic acid signaling. The authors should test if RA signaling is downstream of Rnf220 by looking at differences in the expression of Retinaldehyde dehydrogenase genes (as a proxy for RA synthesis) upon Rnf220 knockdown.

      Thank you for your comment and suggestion.

      Two sequential reactions are required for RA synthesis from retinol, which catalyzed by alcohol dehydrogenases (ADHs)/ retinol dehydrogenase (RDH) and retinaldehyde dehydrogenase (RALDHs also known as ALDHs) respectively. When RA is no longer needed, it is catabolized by cytochrome enzymes (CYP26 enzymes) (Niederreither, et al.,2008; Kedishvili et al., 2016). Here, we test ADHs、ALDHs and CYP26 enzymes in E16.5 WT and Rnf220-/- embryos.

      The results are as follows. ADH7 and ADH10 are slightly upregulated. ALDH1 and ALDH3 are upregulated and downregulated in Rnf220-/- embryos, respectively, but there is no significant change in the expression of ALDH2, which plays a key role in RA synthesis during embryonic development (Niederreither, et al.,2008). Furthermore, Cyp26a1 which responsible for RA catabolism was upregulated in Rnf220-/- embryos. Collectively, these data do not support a clear effect on RA signaling by RNF220.  

      Author response image 2.

      The effect of Rnf220 on RA synthesis and degradation pathways

      (2) In Figure 2C-D further explanation is required to describe what criteria were used to segment the tissue into Rostral, middle, and caudal regions. Additionally, it is unclear whether the observed change in axonal projection pattern is caused due to physical deformation and rearrangement of the entire Pons tissue or due to disruption of Hox3-5 expression levels. Labeling of the tissue with DAPI or brightfield image to show the structural differences and similarities between the brain regions of WT and Rnf220 +/- will be helpful.

      Thank you for your comment and suggestion.

      More information on the quantification of the results shown in Figure 2C-D was included in our revised manuscript. We carried out Nissl staining assays using coronal sections of the brainstem and found that there is no significant difference in neuronal cell organization between WT and Rnf220+/- (revised Figure 2-figure supplement 2).

      (3) Line 192-195. These roles of PcG and trxG complexes are inconsistent with their initial descriptions in the text - lines 73-74.

      We are sorry for the mistake. We carefully revised the related descriptions to avoid such mistake. Thank you.

      (4) In Figure 4D, the band in the gel seems unclear and erased. Please provide a different one. These data show that neither Rnf220 nor wdr5 directly regulates Hox gene expressions. The effect of double knockdown in the presence of RA suggests that they work together to suppress Hox gene expression via a different downstream target. This point should be addressed in the text and discussion section of the paper. example for the same data which shows a full band with lower intensity.

      Thank you for your suggestion.

      We repeated the experiment of Figure 4D and some of the blot images were replaced in the revised Figure 4D.

      Indeed, in the presence of RA, knockdown of Rnf220 alone can upregulate the expression Hox genes (Figure 5C). Knockdown of Wdr5 could reverse the upregulation of Hox genes in RNF220 knockdown cells, suggesting that Rnf220 regulated Hox gene expression in a Wdr5 dependent manner. However, in the absence of RA, none of Rnf220 knockdown, Wdr5 knockdown or Rnf220 and Wdr5 double knockdown had a significant effect on the expression of Hox genes in P19 cells. It seems that RA signaling plays a crucial role for the regulation of RNF220 to WDR5 in P19 cells and discussion on this point was included in the revised manuscript.

      (5) In Figure 4G the authors could provide some form of quantitation for changes in ubiquitination levels to make it easier for the reader. They should also describe the experimental procedures and conditions used for each of the pull-down and ubiquitination assays in greater detail in the methods section.

      Thank you for your suggestion.

      The quantitation and statistics for the original Figure 4G were included in the revised Figure 4. More information on the biochemical assays was included in the “Methods and Materials” section of our revised manuscript.

      (6) Figure 5 shows that neither Rnf220 nor wdr5 directly regulate Hox gene expressions. The effect of double knockdown in the presence of RA suggests that they work together to suppress Hox gene expression via a different downstream target.

      Thank you for your comment.

      In fact, knockdown of Rnf220 alone can upregulate the expression Hox genes in the presence of RA (Figure 5C). Furthermore, knockdown of Wdr5 could reverse the upregulation of Hox genes in Rnf220 knockdown cells, which suggest that Rnf220 regulated Hox gene expression in a Wdr5 dependent manner. However, in the absence of RA, none of Rnf220 knockdown, Wdr5 knockdown or Rnf220 and Wdr5 double knockdown had a significant effect on the expression of Hox genes in P19 cells. It seems that RA signaling plays a crucial role for the regulation of RNF220 to WDR5 in P19 cells and discussion on this point was included in the revised manuscript.

      (7) In Figure 6, while the reversal of changes in Hox gene expression upon concurrent Rnf220; Wdr5 inhibition highlights the importance of Wdr5 in this regulatory process, the mechanistic role of wdr5 and its functional consequences are unclear. To answer these questions, the authors need to: (i) Assay for activated and repressive epigenetic modifications upon double knockdown of Rnf220 and Wdr5 similar to that shown in Figure 3- supplement 1. This will reveal if wdr5 functions according to its intended role as part of the TrxG complex. (ii) The authors need to assay for changes in axon projection patterns in the double knockdown condition to see if Wdr5 inhibition rescues the neural circuit defects in Rnf220 +/- mice.<br />

      Thank you for your suggestion.

      Although it is also necessary to examine whether the rescue effect by WDR5 inhibitor injection in uetro is also a long-lasting effect for neuronal cirtuit at adult stages, it is difficult to distinguish the embryos or pups when they were given birth. Although Rnf220fl/wt;Wdr5fl/wt;Nestin-Cre mice are viable and could survive to adult stages, developmental defects in the forebrains, including cerebral cortex and hippocampus, were observed in Rnf220fl/wt;Wdr5fl/wt;Nestin-Cre mice. Therefore, no rescue effect on defects of behavior and neuronal circuit were examined in this study. Maybe, a PN nuclei specific inducible Cre mouse line could help toward this direction in the future.

      We carried out ChIP-qPCR and tested activated and repressive epigenetic modifications upon double knockdown of Rnf220 and Wdr5 in P19 cell line and found Rnf220 and Wdr5 double knockdown recured Hox epigenetic modification to a certain degree (Figure 6-figure supplement 1).

      References

      Kedishvili, N.Y. 2016. Retinoic acid synthesis and degradation. Subcell Biochem, 81:127-161. DOI: 10.1007/978-94-024-0945-1_5, PMID: 2783050

      Ma, P., Li, Y., Wang, H., Mao, B., Luo, Z.-G. 2021. Haploinsufficiency of the TDP43 ubiquitin E3 ligase RNF220 leads to ALS-like motor neuron defects in the mouse. Journal of Molecular Cell Biology, 13: 374-382. DOI: 10.1093/jmcb/mjaa072, PMID: 33386850

      Ma, P., Song, N.-N., Li, Y., Zhang, Q., Zhang, L., Zhang, L., Kong, Q., Ma, L., Yang, X., Ren, B., Li, C., Zhao, X., Li, Y., Xu, Y., Gao, X., Ding, Y.-Q., Mao, B. 2019. Fine-Tuning of Shh/Gli Signaling Gradient by Non-proteolytic Ubiquitination during Neural Patterning. Cell Rep, 28: 541-553.e544. DOI: 10.1016/j.celrep.2019.06.017, PMID: 31291587

      Niederreither, K., Dollé, P. 2008. Retinoic acid in development: towards an integrated view. Nat Rev Genet, 9: 541-53. DOI: 10.1038/nrg2340, PMID: 18542081

      Wang, Y.-B., Song, N.-N., Zhang, L., Ma, P., Chen, J.-Y., Huang, Y., Hu, L., Mao, B., Ding, Y.-Q. 2022. Rnf220 is Implicated in the Dorsoventral Patterning of the Hindbrain Neural Tube in Mice. Front Cell Dev Biol, 10. DOI: 10.3389/fcell.2022.831365, PMID: 35399523

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for the thoughtful consideration of our work, including both reviewers’ constructive comments. Our apologies for taking some extra time for this revision, but we wanted to adress comments thoroughly with new analyses, not to mention a PhD defense, parental leave and my teaching ultimately being the bottleneck for the team’s work!

      Reviewer #1 (Public Review):

      The authors use a combination of structural and MD simulation approaches to characterize phospholipid interactions with the pentameric ligand-gated ion channel, GLIC. By analyzing the MD simulation data using clusters of closed and open states derived previously, the authors also seek to compare lipid interactions between putative functional states. The ultimate goal of this work is to understand how lipids shape the structure and function of this channel.

      The strengths of this article include the following:

      1) The MD simulation data provide extensive sampling of lipid interactions in GLIC, and these interactions were characterized in putative closed and open states of the channel. The extensive sampling permits confident delineation of 5-6 phospholipid interaction sites per subunit. The agreement in phospholipid binding poses between structures and the all-atom MD simulations supports the utility of MD simulations to examine lipid interactions.

      2) The study presents phospholipid binding sites/poses that agree with functionally-important lipid binding sites in other pLGICs, supporting the notion that these sites are conserved. For example, the authors identify interactions of POPC at an outer leaflet intersubunit site that is specific for the open state. This result is quite interesting as phospholipids or drugs that positively modulate other pLGICs are known to occupy this site. Also, the effect of mutating W217 in the inner leaflet intersubunit site suggests that this residue, which is highly conserved in pLGICs, is an important determinant of the strength of phospholipid interactions at this site. This residue has been shown to interact with phospholipids in other pLGICs and forms the binding site of potentiating neurosteroids in the GABA(A) receptor.

      Weaknesses of this article include the following:

      1) The authors describe in detail state-dependent lipid interactions from the MD simulations; however, the functional significance of these findings is unclear. GLIC function appears to be insensitive to lipids, although this understanding is based on experiments where GLIC proteoliposomes were fused to oocyte membranes, which may not be optimal to control the lipid environment. Without functional studies of GLIC in model membranes, the lipid dependence of GLIC function is not definitively known. Therefore, it is difficult to interpret the meaning of these state-dependent lipid interactions in GLIC.

      2) It is unlikely that the bound phospholipids in the GLIC structures, which are co-purified from e. coli membranes, are POPC. Rather, these are most like PE or PG lipids. While it is difficult to accommodate mixed phospholipid membranes in all-atom MD simulations, the choice of POPC for this model, while practically convenient, seems suboptimal, especially since it is not known if PE or PG lipids modulate GLIC function. Nevertheless, it is striking that the overall binding poses of POPC from the simulations agree with those identified in the structures. It is possible that the identity of the phospholipid headgroup will have more of an impact on the strength of interactions with GLIC rather than the interaction poses (see next point).

      3) The all-atom MD simulations provide limited insight into the strength of the POPC interactions at each site, which is important to interpret the significance of these interactions. It is unlikely that the system has equilibrated within the 1.7 microseconds of simulation for each replicate preventing a meaningful assessment of the lipid interaction times. Although the authors report exchange of up to 4 POPC interacting at certain residues in M4, this may not represent binding/unbinding events (depending on how binding/interaction is defined), since the 4 Å cutoff distance for lipid interactions is relatively small. This may instead be a result of small movements of POPC in and out of this cutoff. The ability to assess interaction times may have been strengthened if the authors performed a single extended replicate up to, for example, 10-20 microseconds instead of extending multiple replicates to 1.7 microseconds.

      Reviewer #2 (Public Review):

      The authors convincingly show multiple inner and outer leaflet non-protein (lipid) densities in a cryo-EM closed state structure of GLIC, a prokaryotic homologue of canonical pentameric ligand-gated ion channels, and observe lipids in similar sites during extensive simulations at both resting and activating pH. The simulations not only corroborate structural observations, but also suggest the existence of a state-dependent lipid intersubunit site only occupied in the open state. These important findings will be of considerable interest to the ion channel community and provide new hypotheses about lipid interactions in conjunction with channel gating.

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      In particular, a discussion of whether the timescale of the simulations permit measurements of residence or interaction times of the lipids should be addressed.

      Reviewer #1 (Recommendations for the authors):

      Comment 1.1: The authors may consider expanding the discussion about the significance of state-dependent lipid interactions. On the one hand, they emphasize state-dependent interactions of POPC with closed and open states in the outer leaflet in the results. On the other hand, they state that GLIC is insensitive to its lipid environment. What is the significance of the state-dependent interactions of POPC in GLIC, if any? It is possible that GLIC agonist responses are sensitive to phospholipids (such as PE or PG found in e. coli)? The state-dependent differences in lipid interaction identified in this study support this possibility and suggest the need to better understand the effects of phospholipids on GLIC function.

      Response 1.1: We agree with the reviewer that this is an interesting question and we have therefore extended the discussion with additional references on the functional effects on GLIC of various lipid membranes:

      p. 11 (Discussion)

      “Sampling was further simplified by performing simulations in a uniform POPC membrane. Prior experiments have been conducted to assess the sensitivity of GLIC in varying lipid environments (Labriola et al., 2013; Carswell et al., 2015; Menny et al., 2017), indicating that GLIC remains fully functional in pure POPC bilayers. In our cryo-EM experiments, the protein was recombinantly expressed from E. coli, which means that the experimental density would likely represent phosphatidylglycerol or phosphatidylethanolamine lipids. However, as the molecular identities of bound lipids could not be precisely determined, POPC lipids were built for straightforward comparison with simulation poses. While it appears that GLIC is capable of gating in a pure POPC bilayer, it remains plausible that its function could be influenced by different lipid species, especially due to the presence of multiple charged residues around the TMD/ECD interface which might interact differently with different lipid head groups. Further experiments would be needed to confirm whether the state dependence observed in simulations is also lipid-dependent. It is possible that certain types of lipids bind in one but not the other state, or that certain states are stabilized by a particular lipid type.”

      Comment 1.2: It would be helpful to state in the discussion that the co-purified lipids from GLIC structures are likely PE or PG from e. coli membranes. Nevertheless, it is interesting that the phospholipid poses from the structures generally agree with those identified from the MD simulations using PC.

      Response 1.2: Good point. We have clarified in the discussion that the native lipids in the cryo-EM structure are likely PG or PE lipids, as quoted in the preceding Response.

      Comment 1.3: The authors describe a more deeply penetrating interaction of POPC in the outer intrasubunit cleft in the open state, but this is difficult to appreciate from the images in Fig. 4B, 4E or S3B. The same is true of the deep POPC interaction at the outer intersubunit site. It may be helpful to show these densities from a different perspective to appreciate the depth of these binding poses.

      Response 1.3: We have added Figure 4 – figure supplement 1 to better show the depth of lipid binding poses, especially the ones in the outer leaflet intrasubunit cleft and at the inner intersubunit site, and cited the figure on p. 7 (Results).

      Comment 1.4: The representation of the lipid densities in Fig. 4B is not easy to interpret. First, the meaning of resting versus activating conditions and closed versus open states can be easily missed for readers who are not familiar with the author's previous study. It may be helpful to describe this (i.e. how open and closed state clusters were generated from structures determined in resting and activating conditions) in greater detail in either the figure legend, results or methods. Second, the authors state that there are differences in lipid poses between the closed and open states but not resting and activating conditions. With the exception of the intersubunit density, this is difficult to appreciate from Fig. 4B. As stated in point #3, the difference, for example, in the complementary intrasubunit site may be better appreciated with an image from a different perspective.

      Response 1.4: Acknowledged - the distinction between resting and activating conditions v.s. open and closed states can be confusing. We have tried to clarify these differences at the beginning of the results section, the methods section, and in the caption of Figure 4. Regarding differences in lipid poses between open and closed states, we agree it is difficult to appreciate from Figure 4, but here we refer the reader to Figure 4 – figure supplement 2 for an overlay between open and closed densities. Additionally, we now added Figure 1 – figure supplement 1 which provides lipid densities for all five subunits and overlays with the build cryo-EM lipids, possibly making differences easier to appreciate. Regarding images from different perspectives, we trust the new figure supplement described in Response 1.3 provides a better perspective.

      p. 3 (Results)

      “For computational quantification of lipid interactions and binding sites, we used molecular simulations of GLIC conducted under either resting or activating conditions (Bergh et al., 2021a). As described in Methods, resting conditions corresponded to neutral pH with most acidic residues deprotonated; activating conditions corresponded to acidic pH with several acidic residues protonated. Both open and closed conformations were present in both conditions, albeit with different probabilities.”

      p. 8 (Figure 4)

      “Overlaid densities for each state represent simulations conducted under resting (dark shades) or activating (light shades) conditions, which were largely superimposable within each state.”

      p. 24 (Methods)

      “We analyzed previously published MSMs of GLIC gating under both resting and activating conditions (Bergh et al., 2021a). Resting conditions corresponded to pH 7, at which GLIC is nonconductive in functional experiments, with all acidic residues modeled as deprotonated. Activating conditions corresponded to pH 4.6, at which GLIC is conductive and has been crystallized in an open state (Bocquet et al., 2009). These conditions were modeled by protonating a group of acidic residues (E26, E35, E67, E75, E82, D86, D88, E177, E243; H277 doubly protonated) as previously described (Nury et al., 2011).”

      Comment 1.5: The new closed GLIC structure was obtained by merging multiple datasets. What were the conditions of the datasets used? Was it taken from samples in resting or also activating conditions?

      Response 1.5: We have updated the Results, Discussion, and Methods to clarify this important point, in particular by merging datasets and rerunning the classification:

      p. 3 (Results)

      “In our cryo-EM work, a new GLIC reconstruction was generated by merging previously reported datasets collected at pH 7, 5, and 3 (Rovšnik et al., 2021). The predominant class from the merged data corresponded to an apparently closed channel at an overall resolution of 2.9 Å, the highest resolution yet reported for GLIC in this state (Figure 1 – figure supplement 2, Table 1).”

      p. 11 (Discussion)

      “Interestingly, the occupational densities varied remarkably little between resting and activating conditions (Figure 1 – figure supplement 1), indicating state- rather than pH- dependence in lipid interactions, also further justifying the approach of merging closed- state GLIC cryo-EM datasets collected at different pH conditions to resolve lipids.”

      p. 14 (Methods)

      “After overnight thrombin digestion, GLIC was isolated from its fusion partner by size exclusion in buffer B at pH 7, or in buffer B with citrate at pH 5 or 3 substituted for Tris. The purified protein was concentrated to 3–5 mg/mL by centrifugation. [...] Data from three different grids, at pH 7, 5, and 3, were merged and processed together.”

      Comment 1.6: In Fig. 3D, do the spheres represent the double bond? If so, please state in the legend

      Response 1.6: We have clarified in the legend of Figure 3D that the yellow spheres on the lipid tails represent a double bond.

      Comment 1.7: In Fig. 3E, what is the scale of the color representation?

      Response 1.7: We have clarified in the legend of Figure 3E that colors span 0 (white) to 137015 contacts (dark red).

      Reviewer #2 (Recommendations For The Authors):

      Comment 2.1: I'm not sure I fully understand how the final lipids were modeled (built). Fig. 1 caption suggests they may have been manually built? I understand that the idea was to place them in the overlap of simulation densities and structure densities, but can the authors please clarify if there were any quantifiable conditions that were employed during this process or if this was entirely manual placement in a pose that looked good? Regardless, it would be helpful to see an overlay of the built lipids with both the cryo and simulation densities (e.g., overly of Fig. 1F/H and G/H) to better visualize how the final built lipids compare.

      Response 2.1: We thank the reviewer for pointing out unclarities regarding our methods. We have extended the methods section to clarify how the lipids were manually built in the cryo-EM structure. We have also added Figure 1 – figure supplement 1 showing overlays of the computational densities and built cryo-EM lipids.

      p. 15 (Methods)

      “Lipids were manually built in COOT by importing a canonical SMILES format of POPC (Kim et al., 2021) and adjusting it individually into the cryo-EM density in each of the sites associated with a single subunit, based in part on visual inspection of lipid densities from simulations, as described above. After building, 5-fold symmetry was applied to generate lipids at the same sites in the remaining four subunits.”

      Comment 2.2: Regarding the state-dependent lipid entry to the outer leaflet intersubunit site associated with channel opening, if the authors could include a movie depicting this process that would be great. The current short explanation does not do this justice. Also, what were the dynamics of this process? Beyond the correlation between site occupancy and the pore being open, how did the timing of lipid entry/exit and pore opening/closing correlate?

      Response 2.2: The point regarding the timing of state-dependent lipid binding at the subunit interface and pore opening is indeed an interesting one. We have added Figure 4 – figure supplement 3D showing that the state-dependent P250 lipid interaction precedes pore opening, as quantified by pore hydration levels, indicating a potential role in gating. The interaction between lipid binding and conformational change of the protein is also depicted in the newly added Figure 4 - video supplement 1, which we hope will be able to better communicate the conclusions regarding state-dependent interactions. We have also expanded the results and discussion to better explain these results:

      p. 9 (Results)

      “The lipid head made particularly close contacts with residue P250 on the M2-M3 loop, which undergoes substantial conformational change away from the pore upon channel opening, along with outer-leaflet regions of M1–M3 (Figure 4E, Figure 4—figure Supplement 3A,B,C, Figure 4—video 1). These conformational changes were accompanied by a flip of M1 residue F195, which blocked the site in the closed state but rotated inward to allow closer lipid interactions in the open state (Figure 4—figure Supplement 3C, Figure 4—video 1). Indeed, P250 was predominantly located within 3 Å of the nearest lipid atom in open- but not closed-state frames (Figure 4F). Despite being restricted to the open state, interactions with P250 were among the longest duration in all simulations (Figure 2C) and as these binding events preceded pore opening, it is plausible to infer a role for this state-dependent lipid interaction in the gating process (Figure 4 – figure supplement 3D).”

      p. 12 (Discussion)

      “The state-dependent binding event at this site preceded pore opening in MSMs, where lipid binding coincided with crossing a smaller energy barrier between closed and intermediate states, followed by pore opening at the main energy barrier between intermediate and open states (Figure 4 – figure supplement 3D). Further, since the P250- lipid interaction was characterized by relatively long residence times (Figure 2), it is possible this lipid interaction has a role to play in GLIC gating.”

      Comment 2.3: Although the interaction times are helpful, I didn't get a great sense of how mobile the lipids are during the simulations. Can the authors discuss this a bit more. For example, are interaction times dominated by lipids that jiggle a bit away from a residue and then back again, vs how often are lipids exchanging with other lipids initially further away from the protein?

      Response 2.3: We have now added various measures of lipid diffusion, both for initially interacting lipids and for bulk lipids, which are summarized in the new Figure 2 – figure supplement 1. We have further addressed the question of simulation timescales in Results, Discussion, and Methods. These numbers highlight that it is possible for lipids several nanometers away from the protein surface to exchange with lipids of the first lipid shell.

      p. 3,6 (Results)

      “Lateral lipid diffusion coefficients were estimated to 1.47 nm2/µs for bulk lipids and 0.68 nm2/µs for lipids of the first lipid shell (Figure 2 – figure supplement 1A), which is relatively slow compared to the timescales of each trajectory (1.7 µs). However, multiple residues throughout the M1, M3, and M4 helices exchanged contacts with 2-4 different lipid molecules in individual simulations (Figure 2C). Furthermore, 1.7-µs root mean square displacement of lipids originally in the first lipid shell was 2.15 nm, and 3.16 nm in the bulk bilayer, indicating such exchanges are not limited to nearby lipids (Figure 2 – figure supplement 1B). Thus, exchange events and diffusion estimates indicate that the duration of lipid contacts observed in this work can be at least partly attributed to interaction stabilities and not solely to sampling limitations.”

      p. 11 (Discussion)

      “Indeed, the unrestrained atomistic MD simulations studied here were not expected to capture the maximal duration of stable contacts, as indicated by some interaction times approaching the full 1.7-µs trajectory (Figure 2}). Nevertheless, simulations were of sufficient length to sample exchange of up to four lipids, particularly around the M4 helix. Calculation of lipid lateral diffusion coefficients resulted in average displacements at the end of simulations of 2.15 nm for lipids initially interacting with the protein surface, roughly corresponding to lipids diffusing out to the 4th lipid shell. Diffusion of bulk lipids was faster, allowing lipids originally 3.16 nm away from the protein surface to ingress the first lipid shell. This observation underscores the potential for lipid exchange events even among lipids initially distant from the protein surface. Of course, duration of exceptionally stable interactions, such as those involving T274 (Figure 2C), inevitably remain bounded by the length of our simulations. Still, diffusion metrics, supported by robust statistical analysis encompassing diverse starting conditions (500 trajectories), enable confident estimation of relative interaction times.“

      p. 13 (Methods)

      “Time-based measures of protein-lipid interactions, such as mean duration times and exchange of interactions, were calculated for the 100 x 1.7 µs-long simulations using prolintpy (Sejdiu and Tieleman, 2021) with a 4 Å interaction cutoff. Analysis of lateral lipid diffusion in individual simulations was carried out for two disjoint sets of lipids: the first lipid shell defined as lipids with any part within 4 Å of the protein surface (~90 lipids), and bulk lipids consisting of all other lipids (~280 lipids). Mean square displacements of each lipid set were calculated using GROMACS 2021.5 (Abraham et al., 2015b) with contributions from the protein center of mass removed. Diffusion coefficients for each set, DA, were calculated using the Einstein relation (Equation 1) by estimating the slope of the linear curve fit to the data.

      where ri(t) is the coordinate of the center of mass of lipid i of set A at time t and DA is the self-diffusion coefficient.”

      Comment 2.4: How symmetric or asymmetric are the cryo and simulation densities across subunits and was there subunit asymmetry in the final build lipids? I could not tell from any of the figures beyond the casual observation that they maybe look somewhat similar in Fig. 1?

      Response 2.4: We thank the reviewer for this useful remark. We have clarified in the methods that the cryo-EM lipids were built in C5-symmetry, and thus the positions are symmetric. The computational densities were calculated independently for each subunit and are thus not necessarily symmetric. We have added Figure 1 – figure supplement 1 showing densities for all five subunits, also serving as an indication of convergence of the results.

      p. 3 (Results) “Although the stochastic nature of simulations resulted in nonidentical lipid densities associated with the five GLIC subunits, patterns of lipid association were notably symmetric (Figure 1 – figure supplement 1).”

      p. 14-15 (Methods)

      “A smaller subset of particles was used to generate an initial model. All subsequent processing steps were done using 5-fold symmetry. […] A monomer of that model was fit to the reconstructed density and 5-fold symmetry was applied with PHENIX 1.19.2-4158 through NCS restraints detected from the reconstructed cryo-EM map, to generate a complete channel. […] After building, 5-fold symmetry was applied to generate lipids at the same sites in the remaining four subunits.”

      Minor comments:

      Comment 2.5: Fig. 1 is probably not easy to follow for the general reader and the caption is very brief. I suggest adding an additional explanation to the caption and/or additional annotations to the figure to help a general reader step through this.

      Response 2.5: We have expanded the caption of Figure 1 and clarified the meanings of colors, labels, and annotations.

      Comment 2.6: Fig. 1B - Caption is confusing. I would not call the state separation lines outlines as they are not closed loops. Also, I see red/orange and two shades of blue whereas the caption mentions orange and blue only. The caption should also explicitly say what the black lines are (other cluster separations).

      Response 2.6: We have edited the caption to better describe colors, annotations, and the meaning of the data:

      p. 4 (Figure 1)

      “(B) Markov state models were used to cluster simulations conducted under resting (R) or activating (A) conditions into five states, including closed (left of the light or dark orange lines) and open (right of the light or dark blue lines). Black lines mark edges of other state clusters derived from MSM eigenvectors. Experimental structures are highlighted as white circles.”

      Comment 2.7: Fig. 3F caption appears to conflict with data where interaction with W217A appears longer than W217. I think the authors want to suggest here that W217A reduces contact time with T274 as stated in the main text.

      Response 2.7: We have clarified in this legend that “Mutation of residue W217, lining this pocket, reveals shortened interactions at the T274 binding site” (p. 6, Figure 3).

      Comment 2.8: Ref 25 and 26 are the same.

      Response 2.8: Apologies; this mistake has been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      This study investigated the role of CD47 and TSP1 in extramedullary erythropoiesis by utilization of both global CD47-/- mice and TSP1-/- mice. 

      Strengths:  

      Flow cytometry combined with spleen bulk and single-cell transcriptomics were employed. The authors found that stress-induced erythropoiesis markers were increased in CD47-/- spleen cells, particularly genes that are required for terminal erythroid differentiation. Moreover, CD47 dependent erythroid precursors population was identified by spleen scRNA sequencing. In contrast, the same cells were not detected in TSP1-/- spleen. These findings provide strong evidence to support the conclusion that the differential role of CD47 and TSP1 in extramedullary erythropoiesis in mouse spleen. 

      Weaknesses: 

      Methods and data analysis are appropriate. However, some clarifications are required. The discussion section needs to be expanded.  

      (1) The sex of mice that were used in the study is unknown.  

      (2) In the method of Single-cell RNA sequencing (page 10), it mentioned that single cell suspensions from mouse spleens were depleted of all mature hematopoietic cell lineages by passing through CD8a microbeads and CD8a+ T cell isolation Kit. As described, it is confusing what cell types are obtained for performing scRNAseq. More information is required for clarity.  

      (3) The constitutive CD47 knockout mouse model is utilized in this study. The observed accumulation of erythroid precursors in the spleens of CD47-/- mice suggests a chronic effect of CD47 on spleen function. Can the current findings be extrapolated to acute scenarios involving CD47 knockdown or loss, as this may have more direct relevance to the potential side effects associated with an-CD47-mediated cancer therapy? Please expand on this topic in the discussion section.  

      (1) The missing mouse gender information is incorporated into the revised manuscript. For flow cytometry, two male and two female mice of each genotype were used. For single cell RNA sequencing, two female and one male mouse of each genotype were used. For the bulk RNA sequencing four male cd47−/− mice and four male wildtype mice were used.

      (2) We apologize for the confusing presentation, which has been corrected. The bulk RNA sequencing analysis identified elevated expression of erythropoietic genes in CD8+ spleen cells from cd47−/− versus wildtype mice that were obtained using magnetic bead depletion of all other lineages. Therefore, we used the same Miltenyi negative selection kit as the first step to prepare the cells for single cell RNA sequencing. These untouched cells were then depleted of most mature CD8 T cells using a Miltenyi CD8a(Ly2) antibody positive selection kit. An important consideration underlying this approach was recognizing that the commercial magnetic bead depletion kits used for preparing specific immune cell types are optimized to give relatively pure populations of the intended immune cells using wildtype mice. Our previous experience studying NK cell development in the cd47−/− mice taught us that NK precursors, which are rare in wildtype mouse spleens, accumulate in cd47−/− spleens and were not removed by the antibody cocktail optimized for wildtype spleen cells (Nath et al Front Immunol 2018). The present data indicate that erythroid precursors behave similarly.

      (3) The Discussion was edited as recommended. Anemia is a prevalent side effect of several CD47 therapeutic antibodies being developed for cancer therapy. This anemia would be expected to induce erythropoiesis in bone marrow and possibly at extramedullary sites. Human spleen cells are not accessible to directly evaluate extramedullary erythropoiesis in cancer patients, but analysis of circulating erythroid precursors or liquid biopsy methods could be useful to detect induction of extramedullary erythropoiesis by these therapeutics. We are currently investigating the ability of CD47 antibodies to directly induce erythropoiesis using a human in vitro model.

      Reviewer #2 (Public Review):

      Summary: 

      The authors used existing mouse models to compare the effects of ablating the CD47 receptor and its signaling ligand Thrombospondin. The CD47-KO model used in this study was generated by Kim et al, 2018, where hemolytic anemia and splenomegaly was reported. This study analyzes the cell composition of the spleens from CD47-KO and Thsp-KO, focusing on early hematopoietic and erythroid populations. The data broadly shows that splenomegaly in the CD47-KO is largely due to an increase in committed erythroid progenitors as seen by Flow Cytometry and single-cell sequencing, whereas the Thsp-KO shows a slight depletion of committed erythroid progenitors but is otherwise similar to WT in splenic cell composition.  

      Strengths:

      The techniques used are appropriate for the study and the data support the main conclusions of the study. This study provides novel insights into a putative role of Thsp-CD47 signaling in triggering definitive erythropoiesis in the mouse spleen in response to anemic stress and constitutes a good resource for researchers seeking to understand extramedullary erythropoiesis.  

      Weaknesses:

      The Flow cytometry data alone supports the authors' main conclusion and single-cell sequencing confirms them but does not add further information, other than those already observed in the Flow data. The single-cell sequencing analysis and presentation could be improved by using alternate clustering methods as well as separating the data by genotype and displaying them in order for readers to fully grasp the nuanced differences in marker expression between the genotypes. Further, it is not clear from the authors' description of their results whether the increased splenic erythropoiesis is a direct consequence of CD47-KO or a response to the anemic stress in this mouse model. The enrichment of cKit+ Ter119+ Sca1- cells in CD47-KO indicates that these are likely stress erythroid progenitors. Another CD47-KO mouse model (Lindberg et al 1996) has no reported erythroid defects and was also not examined in this study.  

      (1) The reviewer asked, “whether the increased splenic erythropoiesis is a direct consequence of CD47-KO or a response to the anemic stress in this mouse model.” Our data supports both a direct role for CD47 and an indirect role resulting from the response to anemic stress. We cited our previous publications describing increased Sox2+ stem cells in spleens of Cd47 and Thbs1 knockout mice, but we neglected to emphasize another study where we found that bone marrow from cd47−/− mice subjected to the stress of ionizing radiation exhibited more colony forming units for erythroid (CFU-E) and burst-forming unit-erythroid (BFU-E) progenitors compared to bone marrow from irradiated wildtype mice (Maxhimer Sci Transl Med 2009). Taken together, our published data demonstrates that loss of CD47 results in an intrinsic protection of hematopoietic stem cells from genotoxic stress. This function of CD47 is thrombospondin-1-dependent and is consistent with the up-regulation of early erythroid precursors in the spleens of both knockout mice but cannot explain why the Thbs1−/−  mice have fewer committed erythroid precursors than wildtype. We cited studies that documented increased red cell turnover in cd47−/− mice but less red cell turnover in Thbs1−/−  mice compared to wildtype mice. Increased red cell clearance in cd47−/− mice is mediated by loss of the “don’t eat me” function of CD47 on red cells. In wildtype mice, clearance is augmented by thrombospondin-1 binding to the clustered CD47 on aging red cells (Wang, Aging Cell 2020). Thus, anemic stress in the mouse strains studied here decreases in the order cd47−/− > WT > Thbs−/−. This is consistent with the increased committed erythroid progenitors reported here in cd47−/− spleens and decreased committed progenitors in the Thbs1−/− spleens. 

      (2) Based on the reviewer’s question regarding alternative mechanisms and the publication of Yang et al 2022 identifying a role for CD47 in stress erythropoiesis though transfer of mitochondria to erythroblasts, we asked whether cd47-/- erythroid precursors  would show decreased mRNA expression for mitochondrial chromosome genes (new Figure 4−figure supplement 3C). Some of these mRNAs were more abundant in cd47-/- and thbs1-/- erythroid cells, which is the opposite of what we expected based on Yang 2022 but consistent with our previous publications identifying thrombospondin-1 and CD47 as negative regulators of mitochondrial homeostasis in muscle cells and T cells.

      (3) The cd47−/− mice used for the current study are the same strain as those reported by Lindberg et al in 1996, with additional backcrossing onto a C57BL/6 background.

      Recommendations For The Authors:

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.  

      Significant efforts went into analyzing the type of erythroid progenitors by marker expression, but typical Flow cytometry strategies using Ter119 and CD44 combined with forward scatter can be used to stage the committed erythroid progenitors precisely.  

      We appreciate this suggestion to extend the flow data. However, the upcoming retirement of the PI required closing our breeding colony, and the mice are no longer available.  

      How can the difference between the erythroid phenotypes of the Lindberg et al 1996 CD47-KO (exon2 Neo knock-in) and Kim et al 2018 CD47-ko (exon1 26bp indel) be explained?  

      We are not convinced that the erythroid phenotypes of the Lindberg and Kim CD47-KO mice differ at the age used in our studies. Kim et al. focused on progressive hemolytic anemia and changes in T cells in spleen that emerge at 26 weeks age, whereas the mice used here were younger. The Lindberg and Kim mice have similar spleen enlargement at the age we used.

      Another manuscript under review from our lab suggests that cis-regulation of an adjacent colinear gene could contribute to some phenotypes observed when perturbing the Cd47 gene. The Lindberg mouse exhibits minimal perturbation of that adjacent gene, but we have no data regarding the Kim et al mouse. The reviewer’s question brought to our attention that we neglected to state in the Methods that the mice used here are the Lindberg mice, not the Kim mice. This omission is now corrected.

      The authors used Lindberg mouse for 2018 study on NK cells and observed splenomegaly. Did they check for extramedullary erythropoiesis there?  

      Retrospective examination of the RNAseq data for the spleen cells enriched in NK precursors used in our 2018 publication (Nath, 2018) reveals significantly elevated expression for a majority of the extramedullary erythroid markers listed in Table 1, but they were generally less abundant than observed for the lineage-depleted spleen cells used in the present manuscript.   

      Author response table 1.

      To clarify the stress erythropoiesis issue, it might be helpful to examine the sc-seq data for the expression of specific stress erythropoiesis markers in CD47-KO. Targets of BMP4 and Hedgehog signaling can also be examined. Further colony assays can help determine if stress BFU-Es are prevalent in the CD47-KO spleens and depleted in Thsp-KO  

      As noted in Table 1, twelve of the genes we studied are established markers of stress-induced extramedullary erythropoiesis, and most of these were included in the scRNA seq data presented. Our previous publication demonstrated that bone marrow from cd47−/− mice subjected to the stress of ionizing radiation exhibited more colony forming units for erythroid (CFU-E) and burst-forming unit-erythroid (BFU-E) progenitors compared to bone marrow from irradiated wildtype mice (Maxhimer Sci Transl Med 2009). We have not performed colony formation assays using spleen.

      To address the reviewer’s question regarding BMP4 and hedgehog signaling we performed gene set enrichment analysis for known BMP4 and hedgehog signaling signatures. Using GSE26351_UNSTIM_VS_BMP_PATHWAY_STIM_HEMATOPOIETIC_PROGENITORS, cd47-/- cells in cluster 12 or their CD34+ orCD34- subsets did not show significant enrichment for BMP4 targets compared to WT. Thbs1-/- cells in clusters 12 and 14 showed marginally significant depletion of the BMP4 signature (p=0.04 and p=0.023, respectively). Using the KEGG_HEDGEHOG_SIGNALING_PATHWAY, we did not find any significant enrichment. However, only a few genes in this pathway were detectable in the scRNAseq data. These data suggest that the BMP4 signaling may be regulated by thrombospondin-1, but properly testing this hypothesis would require achieving greater sequencing depth combined with a cell isolation method that better enriches the early hematopoietic progenitors that are known to utilize the BMP4 pathway.

      In the reclustering of erythroid progenitors in Figure 5, inclusion of Gata1 as a selection marker may help capture more of the early erythroid progenitors from the dataset and provide a more complete picture of the erythroid populations. 

      We thank the reviewer for suggesting inclusion of Gata1. We repeated the reclustering including Gata1 and found the selected cell count increased from 876 cells to 1007 cells. However, most of the increase was not in the erythroid cluster, which increased from 413 cells to 419 cells. Most of the increase represented Gata1+ T cells (548 cells including Gata1 versus 463 cells without). The revised manuscript presents genotype-dependent differential gene expression based on including Gata1 selection, but none of the specific conclusions were changed from the initial submission. The new Table 4 and Figure 7−figure supplement 1 enabled us to compare differential expression of erythropoietic genes obtained using supervised and unsupervised clustering and show that both methods yield comparable results.

      Just out of curiosity, was there an attempt to make a CD47 Thsp double KO? . Is it viable?  

      Cd47 KO mice are somewhat difficult breeders, and several previous attempts to cross with other transgenics have produced viable homozygous offspring that could not be propagated.

      Recommendations for improving the wring and presentation.  

      Perhaps readers would find it more intriguing if the paper led with the single-cell sequencing showing enrichment of erythroid populations in CD47-KO, and later confirmed with Flow Cytometry (even if this was not necessarily the order in which the experiments were done). 

      We considered this suggestion but believe that some of the flow cytometry data is needed to understand why we focused on CD34+ and CD34- subsets and proliferation markers when analyzing the scRNAseq data

      The single-cell sequencing data in Figure 3 might benefit from UMAP clustering as well. In addition, it would greatly help readers if the data points were separated by genotype and displayed after clustering. A similar analysis has been done in this paper: doi:10.1038/s41556-022-00898-9 by clustering different conditions together but displaying them separately by condition. 

      We initially explored tSNE and UMAP clustering and obtained similar results. We have added violin plots separated by genotype in Figure 4-figure supplement 2. We also included improved clusters separated by genotype in the revised Figure 3 panels C and D and for the reclustering in Figure 6D. UMAP plots provided better presentation for the reclustering (revised Figure 7). All data have been updated to the latest pipeline as noted in the Methods.

      Minor corrections to the text and figures.  

      Figure 4: Labels and plot legends are illegible in general, please relabel manually and if possible, redo plots with bigger font size and legends (relatively easy using ggplot2) 

      All figure panels were relabeled using larger fonts

      Figure 5D: Individual plots are stacked randomly atop each other and in many cases, gene names are not visible. Please restack the layers and ensure that the gene names are visible 

      Panel D was made a separate figure with enlarged labels (now Figure 7).

      Supp Fig 2: Layout can be organized a little better. Consider splitting into two figures for better organization  

      The figure was split as recommended. Now Figure 1-figure supplement 2 and Figure 2-figure supplement

      1.

      Abstract Line 10: "...mRNA expression of Kit, Ermap, and Tfrc, Induction of committed erythroid precursors is...". Replace comma after "Tfrc" with period   

      Done.

      Discussion Page 9 Line 8: "...WT spleens, s. mRNAs for some markers of committed erythroid cells including Nr3c1 mRNA...". Remove ", s" after spleens.   

      Done.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The paper from Hsu and co-workers describes a new automated method for analyzing the cell wall peptidoglycan composition of bacteria using liquid chromatography and mass spectrometry (LC/MS) combined with newly developed analysis software. The work has great potential for determining the composition of bacterial cell walls from diverse bacteria in high-throughput, allowing new connections between cell wall structure and other important biological functions like cell morphology or host-microbe interactions to be discovered. In general, I find the paper to be well written and the methodology described to be useful for the field. However, there are areas where the details of the workflow could be clarified. I also think the claims connecting cell wall structure and stiffness of the cell surface are relatively weak. The text for this topic would benefit from a more thorough discussion of the weak points of the argument and a toning down of the conclusions drawn to make them more realistic.

      Thank you for your thorough and insightful review of our manuscript. We greatly appreciate your positive and constructive feedbacks on our methodology. We have carefully reviewed your comments and have responded to each point as follows:

      Specific points:

      1) It was unclear to me from reading the paper whether or not prior knowledge of the peptidoglycan structure of an organism is required to build the "DBuilder" database for muropeptides. Based on the text as written, I was left wondering whether bacterial samples of unknown cell wall composition could be analyzed with the methods described, or whether some preliminary characterization of the composition is needed before the high-throughput analysis can be performed. The paper would be significantly improved if this point were explicitly addressed in the main text. We apologize for not making it clearer. The prior knowledge of the peptidoglycan structure of an organism is indeed required to build the “DBuilder” database to accurately identify muropeptides; otherwise, the false discovery rate might increase. While peptidoglycan structures of certain organisms might not have been extensively studied, users still remain the flexibility to adapt the muropeptide compositions based on their study, referencing closely related species for database construction. We have addressed this aspect in the main text to ensure a clearer understanding.

      “(Section HAMA platform: a High-throughput Automated Muropeptide Analysis for Identification of PGN Fragments) …(i) DBuilder... Based on their known (or putative) PGN structures, all possible combinations of GlcNAc, MurNAc and peptide were input into DBuilder to generate a comprehensive database that contains monomeric, dimeric, and trimeric muropeptides (Figure 1b)."

      2) The potential connection between the structure of different cell walls from bifidobacteria and cell stiffness is pretty weak. The cells analyzed are from different strains such that there are many possible reasons for the change in physical measurements made by AFM. I think this point needs to be explicitly addressed in the main text. Given the many possible explanations for the observed measurement differences (lines 445-448, for example), the authors could remove this portion of the paper entirely. Conclusions relating cell wall composition to stiffness would be best drawn from a single strain of bacteria genetically modified to have an altered content of 3-3 crosslinks.

      We understand your concern regarding the weak connection between cell wall structure and cell stiffness. We will make a clear and explicit statement in the main text to acknowledge that the cells analyzed are derived from different strains, introducing the possibility of various factors influencing the observed changes in physical measurements as determined by AFM. Furthermore, we greatly appreciate your suggestion to consider genetically modified strains to investigate the role of cross-bridge length in determining cell envelope stiffness. In this regard, we are in the process of developing a CRISPR/Cas genome editing toolbox for Bifidobacterium longum, and we plan on this avenue of investigation for future work.

      Reviewer #2 (Public Review):

      The authors introduce "HAMA", a new automated pipeline for architectural analysis of the bacterial cell wall. Using MS/MS fragmentation and a computational pipeline, they validate the approach using well-characterized model organisms and then apply the platform to elucidate the PG architecture of several members of the human gut microbiota. They discover differences in the length of peptide crossbridges between two species of the genus Bifidobacterium and then show that these species also differ in cell envelope stiffness, resulting in the conclusion that crossbridge length determines stiffness.

      We appreciate your thoughtful review of our manuscript and your recognition of the potential significance of our work in elucidating the poorly characterized peptidoglycan (PGN) architecture of the human gut microbiota.

      The pipeline is solid and revealing the poorly characterized PG architecture of the human gut microbiota is worthwhile and significant. However, it is unclear if or how their pipeline is superior to other existing techniques - PG architecture analysis is routinely done by many other labs; the only difference here seems to be that the authors chose gut microbes to interrogate.

      We apologize if this could have been clearer. The HAMA platform stands apart from other pipelines by utilizing automatic analysis of LC-MS/MS data to identify muropeptides. In contrast, most of the routine PGN architecture analyses often use LC-UV/Vis or LC-MS platform, where only the automatic analyzing PGFinder software is supported. To our best knowledge, a comparable pipeline on automatically analyzing LC-MS/MS data was reported by Bern et al., which they used commercial Byonic software with an in-house FASTA database and specific glycan modifications. They achieved accurate and sensitive identification on monomer muropeptides, but struggled with cross-linked muropeptides due to the limitations of the Byonic software. We believe that our pipeline introducing the automatic and comprehensive analysis on muropeptide identification (particularly for Gram-positive bacterial peptidoglycans) would be a valuable addition to the field. To enhance clarity, we have adjusted the context as follows:

      (Introduction) … Although they both demonstrated great success in identifying muropeptide monomers, the accurate identification of muropeptide multimers and other various bacterial PGN structures still remains unresolved. This is because deciphering the compositions requires MS/MS fragmentation, but it is still challenging to automatically annotate MS/MS spectra from these complex muropeptide structures."

      I do not agree with their conclusions about the correlation between crossbridge length and cell envelope stiffness. These experiments are done on two different species of bacteria and their experimental setup therefore does not allow them to isolate crossbridge length as the only differential property that can influence stiffness. These two species likely also differ in other ways that could modulate stiffness, e.g. turgor pressure, overall PG architecture (not just crossbridge length), membrane properties, teichoic acid composition etc.

      Regarding the conclusions drawn about the correlation between cross-bridge length and cell envelope stiffness, we understand your point and appreciate your feedback. We revisit this section of our manuscript and tone down the conclusions drawn from this aspect of the study. We also recognize the importance of considering other potential factors that could influence stiffness, as you mentioned above. In light of this, we mentioned the need for further investigations, potentially involving genetically modified strains, in the main text to isolate and accurately determine the impact of bridge length on cell envelope stiffness.

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      1) One thing to consider would be testing the robustness of the analysis pipeline with one the well-characterized bacteria studied, but genetically modifying them to change the cell wall composition in predictable ways. Does the analysis pipeline detect the expected changes?

      We appreciate the reviewer's suggestion and would like to provide a clear response. Regarding to testing the pipeline with genetically modified strains, our lab previously worked on genetically modified S. maltophilia (KJΔmrdA).1 Inactivation of mrdA turned out the increasing level of N-acetylglucosaminyl-1,6-anhydro-N-acetylmuramyl-L-alanyl-D-glutamyl-meso-diamnopimelic acid-D-alanine (GlcNAc-anhMurNAc tetrapeptide) in muropeptide profiles, which is the critical activator ligands for mutant strain ΔmrdA-mediated β-lactamase expression. In this case, our platform could provide rapid PGN analysis for verifying the expected change of muropeptide profiles (see Author response image 1). Besides, if the predictable changes involve genetically modifications on interpeptide bridges within the PGN structure, for example, the femA/B genes of S. aureus, which are encoded for the synthesis of interpeptide bridges,2 our current HAMA pipeline is capable of detecting these anticipated changes. However, if the genetically modifications involve the introduce of novel components to PGN structures, then it would need to create a dedicated database specific to the genetically modified strain.

      Author response image 1.

      2) Line 368: products catalyzed > products formed

      The sentence has been revised.

      “(Section Inferring PGN Cross-linking Types Based on Identified PGN Fragments) …Based on the muropeptide compositional analysis mentioned above, we found high abundances of M3/M3b monomer and D34 dimer in the PGNs of E. faecalis, E. faecium, L. acidophilus, B. breve, B. longum, and A. muciniphila, which may be the PGN products formed by Ldts.”

      3) Lines 400-402: Is it possible the effect is related to porosity, not "hardness".

      Thank you for the suggestion. The possibility of the slower hydrolysis rate of purified PGN in B. breve being related to porosity is indeed noteworthy. While this could be a potential factor, we would like to acknowledge the limited existing literature that directly addresses the relation between PGN architecture and porosity. It is plausible that current methods available for assessing cell wall porosity may have certain limitations, contributing to the scarcity of relevant studies. In light of this, we would like to propose a speculative explanation for the observed effect. It is plausible that the tighter PGN architecture resulting from shorter interpeptide bridges in B. breve could contribute to its harder texture. This speculation is grounded in the concept that a more compact PGN structure might lead to increased stiffness, aligning with our observations of higher cell stiffness in B. breve.

      4) Lines 403-408: See point #2 above.

      Thank you for the suggestion. We have explicitly addressed this point in the main text:

      “(Section Exploring the Bridge Length-dependent Cell Envelope Stiffness in B. longum and B. breve) … Taken all together, we speculate that a tight peptidoglycan network woven by shorter interpeptide bridges or 3-3 cross-linkages could give bacteria stiffer cell walls. However, it is important to note that cell stiffness is a mechanical property that also depends on PGN thickness, overall architecture, and turgor pressure. These parameters may vary among different bacterial strains. Hence, carefully controlled, genetically engineered strains with similar characteristics will be needed to dissect the role of cross-bridge length in cell envelope stiffness.”

      5) Lines 428-429: It is not clear to me how mapping the cell wall architecture provides structural information about the synthetic system. It is also not clear how antibiotic resistance can be inferred. More detail is needed here to flesh out these points.

      Thank you for the suggestion. To provide further clarity on these important aspects, the context in the manuscript has been revised.

      “(Discussion) …Importantly, our HAMA platform provides a powerful tool for mapping peptidoglycan architecture, giving structural information on the PGN biosynthesis system. This involves the ability to infer possible PGN cross-linkages based on the type of PGN fragments obtained from hydrolysis. For instance, the identification of 3-3 cross-linkage formed by L,D-transpeptidases (Ldts) is of particular significance. Unlike 4-3 cross-linkages, the 3-3 cross-linkage is resistant to inhibition by β-Lactam antibiotics, a class of antibiotics that commonly targets bacterial cell wall synthesis through interference with 4-3 cross-linkages. Therefore, by elucidating the specific cross-linkage types within the peptidoglycan architecture, our approach offers insights into antibiotic resistance mechanisms.”

      6) Line 478: "maneuvers are proposed for" > "work is needed to generate". Also, delete "innovative". Also "in silico" > "in silico-based".

      The sentence has been revised.

      “(Discussion) …To achieve a more comprehensive identification of muropeptides, future work is needed to generate an expanded database, in silico-based fragmentation patterns, and improved MS/MS spectra acquisition.”

      7) Line 485: "Its" > "It has potential"

      The sentence has been revised.

      “(Discussion) …It has potential applications in identifying activation ligands for antimicrobial resistance studies, characterizing key motifs recognized by pattern recognition receptors for host-microbiota immuno-interaction research, and mapping peptidoglycan in cell wall architecture studies.”

      8) Figure 1 legend: Define Gb and Pb.

      Gb and Pb are the abbreviations of glycosidic bonds and peptide bonds. We have revised the Figure legend 1 as follow:

      “(Figure legend 1) …(b) DBuilder constructs a muropeptide database containing monomers, dimers, and trimers with two types of linkage: glycosidic bonds (Gb) and peptide bonds (Pb).”

      9) Figure 2: It is hard to see what is going on in panel a and c with all the labels. Consider removing them and showing a zoomed inset with labels in addition to ab unlabeled full chromatogram.

      We apologize for not making this clearer. The panel a and c in Figure 2 were directly generated by the Analyzer as a software screenshot of the peak annotations on chromatogram. Our intention was to present a comprehensive PGN mapping (approximately 70% of the peak area was assigned to muropeptide signals) using this platform. We understand the label density might affect clarity, so we have added the output tables of the whole muropeptide identifications as source data (Table 1–Source Data 1&2). Additionally, we have uploaded the Analyzer output files (see Additional Files), which can be better visualized in the Viewer program, and it also allows users zoom in for detailed labeling information.

      10) Figure 3: It is worth pointing out what features of the MS/MS fingerprints are helping to discriminate between species.

      Thank you for the suggestion. We have revised Figure 3 and the legend as follow:

      “(Figure legend 3) …The sequence of each isomer was determined using in silico MS/MS fragmentation matching, with the identified sequence having the highest matching score. The key MS/MS fragments that discriminate between two isomers are labeled in bold brown.”

      Author response image 2.

      11) Figure 4 and 5 legend: Can you condense the long descriptions of the abbreviations - or at least only refer to them once?

      Certainly, to enhance clarity and conciseness in the figure legends, we have revised Figure legend 5 as follow:

      “(Figure legend 5) …(b) Heatmap displaying …. Symbols: M, monomer; D, dimer; T, trimer (numbers indicate amino acids in stem peptides). Description of symbol abbreviations as in Figure legend 4, with the addition of "Glycan-T" representing trimers linked by glycosidic bonds.”

      Reviewer #2 (Recommendations For The Authors):

      1. Please read the manuscript carefully for spelling errors.

      We appreciate your careful review of our manuscript. We have thoroughly rechecked the entire manuscript for spelling errors and have made the necessary corrections to ensure the accuracy and quality of the text.

      1. Line 46 - "multilayered" is likely only true for Gram-positive bacteria.

      We thank reviewer #2 for bringing up this concern. Indeed, Gram-negative bacteria mostly possess single layer of peptidoglycan, but could be up to three layers in some part of the cell surface.3, 4 In order to reduce the confusion, we have rewritten the context as follow: “(Introduction) …PGN is a net-like polymeric structure composed of various muropeptide molecules, with their glycans linearly conjugated and short peptide chains cross-linked through transpeptidation.”

      1. Methods section: It seems like pellets from a 10 mL bacterial culture were ultimately suspended in 1.5 L (750 mL water + 750 mL tris) - why such a large volume? And how were PG fragments subsequently washed (centrifugation? There is no information on this in the Methods).

      We apologize for the mislabeling on the units. The accurate volume should be “1.5 mL (750 µL water + 750 µL tris)”. We have updated the correct volume in the Methods section (lines 99-100). For the washing process of purified PGN, we added 1 mL water, centrifuged at 10,000 rpm for 5 minutes, and removed supernatant. This information has added to the Methods section (lines 95-98).

      1. Line 183 - why were 6 modifications chose as the cutoff? Please make rationale more clear.

      We thank reviewer #2 for the comments. We set the maximum modification number of 6 in the assumption of one modification on each sugar of a trimeric muropeptide. A lower cutoff could effectively limit the identification of muropeptides with unlikely numbers of modifications, whereas a higher cutoff could allow for having multiple modifications on a muropeptide. In our hand, muropeptide modifications of E. coli are mostly N-deacetyl-MurNAc and anhydro-MurNAc, and modifications of gut microbes used here are mostly N-deacetyl-GlcNAc, anhydro-MurNAc, O-acetyl-MurNAc, loss of GlcNAc, and amidated iso-Glu. While we recommend starting data analysis with the cutoff of 6 modifications, users are free to adjust this based on their studies.

      1. Line 339 - define donor vs. acceptor here (can be added in parentheses after explaining the relevant chemical reactions further above in the text)

      Thank you for the suggestion. To provide greater clarity regarding the roles of the donor and acceptor substrates in the transpeptidation process, we have revised the content in the manuscript as follows:

      “(Section Inferring PGN Cross-linking Types Based on Identified PGN Fragments) …In general, there are two types of PGN cross-linkage…. Transpeptidation involves two stem peptides which function as acyl donor and acceptor substrates, respectively. As the enzyme names imply, the donor substrates that Ddts and Ldts bind to are terminated as D,D-stereocenters and L,D-stereocenters, which structurally means pentapeptides and tetrapeptides. During D,D-transpeptidation, Ddts recognize D-Ala4-D-Ala5 of the donor stem (pentapeptide) and remove the terminal D-Ala5 residue, forming an intermediate. The intermediate then cross-links the NH2 group in the third position of the neighboring acceptor stem, forming a 4-3 cross-link.”

      1. Line 366 following - can you calculate % crosslinks based on these numbers? What does "high abundance" of 3,3 crosslinks mean in this context? Is this the majority of PG?

      Thank you for your questions. Calculating the percentage of crosslinks based on the muropeptide compositional numbers is a valid consideration. However, it's important to note that the muropeptides we analyzed were hydrolyzed by mutanolysin, and as such, deriving an accurate % crosslink value from these data might not provide a true representation of the crosslinking percentage within the PGN network. For a more precise determination of % crosslinks, methods such as solid-phase NMR on purified peptidoglycan would be required. Our research provides insights into the characterization of PGN fragments and allows us to infer potential PGN cross-linkage types and the enzymes involved based on the dominant muropeptide fragments. Regarding the phrase "high abundance" in the context, it indicates that the M3b/M4b monomer and D34 dimer muropeptides represent a significant portion of the hydrolysis products. These muropeptides are major constituents within the PGN fragments obtained from the enzymatic hydrolysis.

      1. Line 375 - I am not sure PG is a meaningful diffusion barrier for drugs and signaling molecules, give that even larger proteins can apparently diffuse through the pores.

      Thank you for raising this point. Peptidoglycan indeed possesses relatively wide pores that allow for the diffusion of larger molecules, including proteins.5 Research has provided a rough estimate of the porosity of the PGN meshwork, suggesting that it allows for the diffusion of proteins with a maximum molecular mass of around 50 kDa.6 Considering this, we acknowledge that PGN may not serve as a significant diffusion barrier for drugs and signaling molecules. The porosity of the PGN scaffold, which is defined by the degree of cross-linking, plays a role in influencing the transport of molecules to the cell membrane. Thus, while PGN may not serve as a strict diffusion barrier, its structural characteristics still impact bacterial cell mechanics and interactions. We have revised the manuscript to reflect this understanding:

      “(Section Exploring the Bridge Length-dependent Cell Envelope Stiffness in B. longum and B. breve) …The porosity of the PGN scaffold, defined by the degree of cross-linking, influences the transport of larger molecules such as proteins. Therefore, modifications to PGN structure are anticipated to significantly affect bacterial cell mechanics and interactions.”

      1. Line 400 - what does "slower hydrolysis rate" refer to, is this chemical hydrolysis or enzymatic (autolysins?). also, I am not sure hydrolysis rate of either modality allows for solid conclusions about how hard (line 402) the PG is.

      Thank you for your comments. The hydrolysis rate here refers to the enzymatic hydrolysis, specifically the mutanolysin cleaving the β-N-acetylmuramyl-(1,4)-N-acetylglucosamine linkage. Indeed, there is no direct correlation between the hydrolysis rate and the hardness of PGN architecture, although the structure rigidity is a key determinant in protein digestion.7 Considering the enzymatic hydrolysis rate depending on the accessibility of the substrate to the enzyme, we proposed that the tighter PGN architecture could also lead to a slower hydrolysis rate. This speculation aligns with our observations of higher cell stiffness or more compact PGN structure of B. breve and its slower hydrolysis rate. We understand this is indirect proof, so the revised sentence now reads:

      “(Section Exploring the Bridge Length-dependent Cell Envelope Stiffness in B. longum and B. breve) …Furthermore, B. breve also showed a slower enzymatic hydrolysis rate in purified PGNs, implying that the cell wall structure of B. breve is characterized by a compact PGN architecture.”

      1. Line 424 - I am not convinced this pipeline can detect PG architectures that other pipelines cannot; likely, the difference between previous analyses and theirs is due to different growth conditions (3,3 crosslink formation is often modulated by environmental factors/growth stage). In the next sentence, it sounds like mutanolysin treatment is a novelty in PG analysis (which it is not).

      We apologize if this could have been clearer and we have revised the paragraph to describe our study more accurately. We agree that different growth conditions could influence PGN architecture and other pipelines could manually identify the PGN architectures or automatically identify them if they are not too complex. Our original intention was to highlight the ability of the HAMA program to automatically identify unreported PGN structure. Here are the revised sentences:

      “(Discussion) …We speculate that this finding may be influenced by the comprehensive mass spectrometric approaches we employed or by variations in growth conditions. Moreover, we utilized the well-established enzymatic method involving mutanolysin to cleave the β-N-acetylmuramyl-(1,4)-N-acetylglucosamine linkage, which preserves the original peptide linkage in intact PGN subunits.”

      1. Line 440- 442: As outlined in more detail above: I don't think you can conclude something about the relationship between bridge length and envelope stiffness based on these data. Thank you for your valuable feedback. We agree that our data may not definitively support the direct conclusion about the relationship between bridge length and envelope stiffness in Bifidobacterium species. Instead, we will rephrase this section to accurately present the observed correlations without overgeneralizing:

      “(Discussion) … Notably, our study suggested a potential correlation between the cell stiffness and the compactness of bacterial cell walls in Bifidobacterium species (Figure 5). B. longum, which predominantly harbors tetrapeptide bridges (Ser-Ala-Thr-Ala), exhibits a trend towards lower stiffness, whereas B. breve, characterized by PGN cross-linked with monopeptide bridges (Gly), demonstrates a trend towards higher stiffness. These findings suggested that it may be correlated between the increased rigidity and the more compact PGN architecture built by shorter cross-linked bridges.”

      References: 1. Huang, Y.-W.; Wang, Y.; Lin, Y.; Lin, C.; Lin, Y.-T.; Hsu, C.-C.; Yang, T.-C., Impacts of Penicillin Binding Protein 2 Inactivation on β-Lactamase Expression and Muropeptide Profile in Stenotrophomonas maltophilia. mSystems 2017, 2 (4), 00077-00017.

      1. Jarick, M.; Bertsche, U.; Stahl, M.; Schultz, D.; Methling, K.; Lalk, M.; Stigloher, C.; Steger, M.; Schlosser, A.; Ohlsen, K., The serine/threonine kinase Stk and the phosphatase Stp regulate cell wall synthesis in Staphylococcus aureus. Sci. Rep. 2018, 8 (1), 13693.

      2. Labischinski, H.; Goodell, E. W.; Goodell, A.; Hochberg, M. L., Direct proof of a "more-than-single-layered" peptidoglycan architecture of Escherichia coli W7: a neutron small-angle scattering study. J. Bacteriol. 1991, 173 (2), 751-756.

      3. Rohde, M., The Gram-Positive Bacterial Cell Wall. Microbiol. Spectr. 2019, 7 (3), gpp3-0044-2018.

      4. Vollmer, W.; Höltje, J. V., The architecture of the murein (peptidoglycan) in gram-negative bacteria: vertical scaffold or horizontal layer(s)? J. Bacteriol. 2004, 186 (18), 5978-5987.

      5. Vollmer, W.; Blanot, D.; De Pedro, M. A., Peptidoglycan structure and architecture. FEMS Microbiol. Rev. 2008, 32 (2), 149-167.

      6. Li, Q.; Zhao, D.; Liu, H.; Zhang, M.; Jiang, S.; Xu, X.; Zhou, G.; Li, C., "Rigid" structure is a key determinant for the low digestibility of myoglobin. Food Chem.: X 2020, 7, 100094.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      Type 1 diabetes mellitus (T1DM) progression is accelerated by oxidative stress and apoptosis. Eugenol (EUG) is a natural compound previously documented as anti-inflammatory, anti-oxidative, and anti-apoptotic. In this manuscript by Jiang et al., the authors study the effects of EUG on T1DM in MIN6 insulinoma cells and a mouse model of chemically induced T1DM. The authors show that EUG increases nuclear factor E2-related factor 2 (Nrf2) levels. This results in a reduction of pancreatic beta-cell damage, apoptosis, oxidative stress markers, and a recovery of insulin secretion. The authors highlight these effects as indicative of the therapeutic potential of EUG in managing T1DM.

      Strengths

      Relevant, timely, and addresses an interesting question in the field. The authors consistently observe enhanced beta cell functionality following EUG treatment, which makes the compound a promising candidate for T1DM therapy.

      Weaknesses

      (1) The in vivo experiments have too few biological replicates. With an n=3 (as all figure legends indicate) in complex mouse studies such as these, drawing robust conclusions becomes challenging. It is important to reproduce these results in a larger cohort, to validate the conclusions of the authors.

      Thanks for your comments. In the figure legends of the first draft manuscript, n=3 means at least 3 biological replicates, and in the section of material and methods, n=30 means sample size. The number of mice in each group is 30 and there were 150 mice used in this study, and mice are assigned as follows for the whole in vivo experiments. The relative information has been added in the revised manuscript.

      Author response image 1.

      (2) Another big concern is the lack of quantifications and statistical analysis throughout the manuscript. Although the authors claim statistical significance in various experiments, the limited information provided makes it difficult to verify. The authors use vague and minimal descriptions of their experiments, which further reduces the reader's comprehension and the reproducibility of the experiments.

      Thanks for your constructive suggestion. We conducted quantitative and statistical analysis of the entire manuscript through GraphPad Prism software again. Additionally, we have improved the experimental description in the revised manuscript.

      (3) Finally, the use of Min6 cells as a model for pancreatic beta cells is a strong limitation of this study. Future studies should seek to reproduce these findings in a more translational model and use more relevant in vitro cell systems (eg. Islets).

      Thanks for your professional comments. Mouse insulinoma cells (MIN6 cell line) are permanent cell lines isolated from mouse islet β cell tumors, which can reflect the functional changes of islet β cells. As mature islet cells, MIN6 cells have been widely used in the study of type 1 diabetes mellitus[1-4], so in this study, MIN6 cells were used as the cell model in vitro. In our future studies, we will try to conduct our findings using more relevant in vitro cell systems (eg. Islets).

      References:

      (1) WU M, CHEN W, ZHANG S, et al. Rotenone protects against β-cell apoptosis and attenuates type 1 diabetes mellitus [J]. Apoptosis, 2019, 24(11-12): 879-91.

      (2) LUO C, HOU C, YANG D, et al. Urolithin C alleviates pancreatic β-cell dysfunction in type 1 diabetes by activating Nrf2 signaling [J]. Nutr Diabetes, 2023, 13(1): 24.

      (3) LAKHTER A J, PRATT R E, MOORE R E, et al. Beta cell extracellular vesicle miR-21-5p cargo is increased in response to inflammatory cytokines and serves as a biomarker of type 1 diabetes [J]. Diabetologia, 2018, 61(5): 1124-34.

      (4) LIN Y, SUN Z. Antiaging Gene Klotho Attenuates Pancreatic β-Cell Apoptosis in Type 1 Diabetes [J]. Diabetes, 2015, 64(12): 4298-311.

      Reviewer #3 (Public Review):

      Summary:

      This study by Jiang et al. aims to establish the streptozotocin (STZ)-induced type 1 diabetes mellitus (T1DM) mouse model in vivo and the STZ-induced pancreatic β cell MIN6 cell model in vitro to explore the protective effects of Eugenol (EUG) on T1DM. The authors tried to elucidate the potential mechanism by which EUG inhibits the NRF2-mediated anti-oxidative stress pathway. Overall, this study is well executed with solid data, offering an intriguing report from animal studies for a potential new treatment strategy for T1DM.

      Strengths:

      The in vivo efficacy study is comprehensive and solid. Given that STZ-induced T1DM is a devastating and harsh model, the in vivo efficacy of this compound is really impressive.

      Weaknesses:

      (1) The Mechanism is linked with the anti-oxidant property of the compound, which is common for many natural compounds, such as flavonoids and polyphenol. However, rarely, this kind of compound has been successfully developed into therapeutics in clinical usage. Indeed, if that is the case, Vitamin C or Vitamin E could be used here as the positive control.

      Thanks for your comments. In fact, many anti-oxidant drugs are used for the treatment of type 1 diabetes mellitus in the clinical. For example, lipoic acid was used to treat diabetic peripheral neuropathy[5]. Vitamin E could effectively eliminate free radicals, protect cell membranes, and significantly reduce the risk of cardiovascular disease in patients with SPACE or ICARE diabetes[6]. Glutathione played crucial roles in the detoxification and anti-oxidant systems of cells and has been used to treat acute poisoning and chronic liver diseases by intravenous injection[7]. Therefore, eugenol enhances the management of type 1 diabetes mellitus by modulating oxidative stress pathways and holds potential as a future therapeutic choice for clinical application. In the future relevant studies, we will try to use Vitamin C or Vitamin E as the positive control.

      References:

      (5) ZIEGLER D, PAPANAS N, SCHNELL O, et al. Current concepts in the management of diabetic polyneuropathy [J]. J Diabetes Investig, 2021, 12(4): 464-75.

      (6) VARDI M, LEVY N S, LEVY A P. Vitamin E in the prevention of cardiovascular disease: the importance of proper patient selection [J]. J Lipid Res, 2013, 54(9): 2307-14.

      (7) HONDA Y, KESSOKU T, SUMIDA Y, et al. Efficacy of glutathione for the treatment of nonalcoholic fatty liver disease: an open-label, single-arm, multicenter, pilot study [J]. BMC Gastroenterol, 2017, 17(1): 96.

      Reviewer #1 (Recommendations For The Authors):

      • For each of the figure panels the authors should indicate the exact number of biological replicates (how many mice or how many independent in vitro experiments). For IF panels, the number of mice, the number of histology slides per mouse, number of fields analyzed should be indicated.

      Thanks for your constructive suggestion. These details had been added in the revised manuscript.

      • The methods state n=30 and Figure 1 states n=3. N=3 is too little for such a complex in vivo study and would severely reduce the reliability of the in vivo experiments.

      Thanks for your suggestion. In the figure legends of the first draft manuscript, n=3 means at least 3 biological replicates, and in the section of material and methods, n=30 means sample size. The number of mice in each group is 30 and there were 150 mice used in this study, and mice are assigned as follows for the whole in vivo experiments. The in vivo experimental data of Figure 1 were supplemented in the revised manuscript.

      • Individual data points should be included in each of the graphs from this manuscript.

      Thanks for your reminder. The revised manuscript have shown the individual data points in each of the graphs.

      • The quantifications and statistics in the manuscript need improvement. Several experiments are missing quantifications and/or statistical tests (e.g. Figure 1J). Other experiments show a quantification but without any explanation of replicates (e.g. Figures 2B and 2G). None of the experiments show individual data points, and as in the previous comment, these should be included.

      Thanks for your comments. In the revised manuscript, statistics and repetitions of experimental data have been supplemented, and individual data points were shown in each graph.

      • What is the reason for intragastric administration? The previous studies on which the dosages were based used oral administration (gavage). (Discussed in methods 4.2).

      Thanks for your professional comments. The intervention treatment of T1DM mice is conducted through two methods: oral administration[8] and oral gavage[9-11]. Due to limited experimental conditions, it is not feasible to feed a single mouse in a single cage, which makes it challenging to precisely control the actual daily intervention dose for each mouse when using oral administration. To ensure that each mouse receives an intervention dose according to its weight and expected dosage, we employ a method of gavage. In addition, oral gavage is more convenient and easier to operate than oral administration. Therefore, in vivo experiment of this study used eugenol gavage intervention as a treatment method. These details had been added in the revised manuscript.

      References:

      (8) ZHAO H, WU H, DUAN M, et al. Cinnamaldehyde Improves Metabolic Functions in Streptozotocin-Induced Diabetic Mice by Regulating Gut Microbiota [J]. Drug Des Devel Ther, 2021, 15: 2339-55.

      (9) XING D, ZHOU Q, WANG Y, et al. Effects of Tauroursodeoxycholic Acid and 4-Phenylbutyric Acid on Selenium Distribution in Mice Model with Type 1 Diabetes [J]. Biol Trace Elem Res, 2023, 201(3): 1205-13.

      (10) SUDIRMAN S, LAI C S, YAN Y L, et al. Histological evidence of chitosan-encapsulated curcumin suppresses heart and kidney damages on streptozotocin-induced type-1 diabetes in mice model [J]. Sci Rep, 2019, 9(1): 15233.

      (11) YAO H, SHI H, JIANG C, et al. L-Fucose promotes enteric nervous system regeneration in type 1 diabetic mice by inhibiting SMAD2 signaling pathway in enteric neural precursor cells [J]. Cell Commun Signal, 2023, 21(1): 273.

      • Urine volume cannot be specified per mouse (methods 4.4) unless the mice were single-housed or if the different groups were not mixed, both are not ideal study set-ups. Please clarify in the methods section.

      Thanks for your constructive suggestion. After successful modeling of T1DM mice, the successful modeling mice were grouped based on method 4.2 as follows Control, T1DM, T1DM + EUG (5 mg/kg/day), T1DM + EUG (10 mg/kg/day), and T1DM + EUG (20 mg/kg/day). To ensure consistency among groups, each group consisted of 5 mice and had equal amounts of diet (100 g), drinking water (250 mL), and environmental conditions for feeding. The urine-soaked area of mice in each group was recorded to quantify the urine volume. The conditions are the same for each group. The description of Method 4.4 has been improved in the revised manuscript.

      • OGTT (Figure 1H) of week 2 is missing. This is an important control time point, as it would show the effect of STZ before EUG treatment.

      Thanks for your careful review. OGTT (Figure 1H) of week 2 has been added in the revised manuscript.

      • In Figure 1J, the control group does not follow the expected ITT trajectory. If possible, add the 120-minute time point to see if the blood glucose levels return to baseline in the control group. The graph shows increased basal glucose levels in the experimental groups, but no differences in insulin tolerance. It also misses the AUC calculations. It is probably not significantly different, which should be noted in the text.

      Thanks for your suggestion. T1DM primarily manifests as pancreatic β cell damage and the absolute reduction of insulin secretion, resulting in the disorder of glucose metabolism in vivo. The oral glucose tolerance test (OGTT) is a series of plasma glucose concentrations measured within 2 h after oral gavage of a certain amount of glucose. It is a standard method to evaluate an individual's blood glucose regulation ability and to understand the function of islet β cells. Insulin resistance means reducing the efficiency of insulin to promote glucose uptake and utilization for various reasons, and the body's compensatory secretion of excessive insulin leads to hyperinsulinemia to maintain the stability of blood glucose. The insulin resistance test (ITT) is commonly employed to detect insulin resistance in T2DM. However, it was found that the ITT experiment had little correlation with T1DM. Therefore, the ITT experiment of Figure 1J and related description have been removed from the revised manuscript.

      • The staining and FACS data on the effects of STZ+EUG+/- ML385 are not convincing (Figure 6 and Figure 7) and do not seem to align with the bar graphs and the conclusions in the text. It would be good to include immunofluorescent staining for insulin to further validate the effects of STZ+EUG+/- ML385 on insulin expression.

      Thanks for your comments.

      (1) In the revised manuscript, between the statistical results and the pictures, so we re-conducted the statistics of the immunofluorescence results of NRF2 and HO-1, as follows:

      (1) NRF2 immunofluorescence staining:

      Author response image 2.

      Group 1

      Author response image 3.

      Group 2

      Author response image 4.

      Group 3

      Author response image 5.

      Group 4

      Author response image 6.

      Group 5

      Author response image 7.

      NRF2 immunofluorescence staining statistics:

      (2) HO-1 immunofluorescence staining:

      Author response image 8.

      Group 1

      Author response image 9.

      Group 2

      Author response image 10.

      Group 3

      Author response image 11.

      Group 4

      Author response image 12.

      Group 5

      Author response image 13.

      HO-1 immunofluorescence staining statistics:

      (2) The meanings represented by each quadrant of cell flow analysis are as follows: Q1 represents a group of necrotic cells, characterized by positive PI staining and negative Anenexin V staining; Q2 represents late apoptotic cells, with both PI and Anenexin V staining negative; Q3 represents early apoptotic cells, with both PI and Anenexin V staining positive; Q4 represents living cells, characterized by positive Anenexin V staining and negative PI staining. In the experiment, the number of apoptotic cells were calculated as the sum of late apoptotic cells in Q2 and early apoptotic cells in Q3. As shown in Figure 9F-G, these results were consistent with those observed in Figure 6G, 6J and Figure 7D-F.

      (3) MIN6 cells, as mouse islet β cell line, has the function of secreting insulin. The intervention of STZ was an absolute decrease in the number of islet β cells, so the result of insulin immunofluorescence staining was only a decrease in the number of MIN6 cells in each cell group. In addition, the detection of insulin protein expression level is always through ELISA method to assess the secretion of insulin protein in the cell supernatant. Figure 6E is the ELISA results of insulin protein secretion in the cell supernatant.

      • The experimental design for the in vitro experiments was unclear from the text. Consider including a schematic to show when cells were treated with STZ, EUG, and ML385.

      Thanks for your suggestion. The experimental design for the in vitro experiments of this study has been added in Figure 6A of the revised manuscript.

      • As stated in the Discussion, the use of the insulinoma line Min6 as a model instead of primary pancreatic beta cells is a clear limitation of the study. The mechanistic data would be stronger if validated on a more relevant system (eg. untransformed Islets).

      Thanks for your comments. Mouse insulinoma cells (MIN6 cell line) are permanent cell lines isolated from mouse islet β cell tumors, which can reflect the functional changes of islet β cells. As mature islet cells, MIN6 cells have been widely utilized as an in vitro cellular model for diabetes research to investigate the functionality of β cells within pancreatic islets[1, 2, 12]. So in this study, MIN6 cells were used as the cell model in vitro. In our future studies, we will try to conduct our findings using more relevant in vitro cell systems (eg. Islets).

      References:

      (1) WU M, CHEN W, ZHANG S, et al. Rotenone protects against β-cell apoptosis and attenuates type 1 diabetes mellitus [J]. Apoptosis, 2019, 24(11-12): 879-91.

      (2) LUO C, HOU C, YANG D, et al. Urolithin C alleviates pancreatic β-cell dysfunction in type 1 diabetes by activating Nrf2 signaling [J]. Nutr Diabetes, 2023, 13(1): 24.

      (12) CHEN H, LOU Y, LIN S, et al. Formononetin, a bioactive isoflavonoid constituent from Astragalus membranaceus (Fisch.) Bunge, ameliorates type 1 diabetes mellitus via activation of Keap1/Nrf2 signaling pathway: An integrated study supported by network pharmacology and experimental validation [J]. J Ethnopharmacol, 2024, 322: 117576.

      • The use of small molecule inhibitors such as ML385 can have unspecific effects. Genetic manipulation or the use of siRNAs to inhibit the NRF2 pathway would have been preferable for the in vitro experiments.

      Thanks for your constructive suggestion. ML385 is a commonly used and stable inhibitor of the NRF2 and has been used in a variety of disease studies[13-15]. The MIN6 cells utilized in this study were cultured under challenging conditions and exhibited a sluggish growth rate. Owing to the cytotoxicity associated with siRNAs transfection reagents, a significant proportion of MIN6 cells succumbed following transfection. Consequently, small molecule inhibitors ML385 were employed in this investigation. In our future studies, we will try to conduct our findings using siRNAs.

      References:

      (13) DANG R, WANG M, LI X, et al. Edaravone ameliorates depressive and anxiety-like behaviors via Sirt1/Nrf2/HO-1/Gpx4 pathway [J]. J Neuroinflammation, 2022, 19(1): 41.

      (14) WANG Z, YAO M, JIANG L, et al. Dexmedetomidine attenuates myocardial ischemia/reperfusion-induced ferroptosis via AMPK/GSK-3β/Nrf2 axis [J]. Biomed Pharmacother, 2022, 154: 113572.

      (15) LI J, DENG S H, LI J, et al. Obacunone alleviates ferroptosis during lipopolysaccharide-induced acute lung injury by upregulating Nrf2-dependent antioxidant responses [J]. Cell Mol Biol Lett, 2022, 27(1): 29.

      • The study proposes a mechanism in which EUG-induced disruption of KEAP1 and NRF2 interaction leads to NRF2 translocation to the nucleus and upregulation of proteins required to prevent oxidative stress. In Figure 6H it is unclear whether the nuclear NRF2 increases. Please add quantifications of the immunostainings.

      Thanks for your reminder. Figure 6J shows the quantifications of the immunostainings of NRF2 in the revised manuscript.

      • Some of the figure legends lack important information. In Figure 5A, 6E for instance, what is the protein expression normalized to?

      Thanks for your constructive suggestion. Protein normalization refers to the standardization of proteins from different sources and with different properties, so as to facilitate the comparison of protein content and expression in different samples. In WB experiment, protein expression normalization is one of the essential steps. Western blot of nuclear protein generally cannot be performed using β-Actin as an internal reference. Lamin B was chosen because β-Actin is an intrinsic parameter not found in the nucleus. N-NRF2, as a nuclear protein, requires Lamin B as a reference for protein normalization. The lack important information of WB in Figure have been supplemented in figure legends of the revised manuscript.

      • Please acknowledge previous literature on the effects of EUG/clove oil in diabetes models. The meta-analytical review by Carvalho et al. (DOI: 10.1016/j.phrs.2020.105315) should be cited and discussed.

      Thanks for your suggestion. It has been cited and discussed in the revised manuscripts.

      • Consider revising the text for grammar, language mistakes, and readability. The text is not always precise (e.g. in the explanation of gamma-H2AX in the results), does not explain terminology (e.g. the oxidative stress markers - line 204+205), or simplifies conclusions (e.g. "improved islet function" based on glucose tolerance test", line 129).

      Thanks for your comments. The above problem has been solved in the revised manuscripts. In addition, we had send our manuscript to the professional English language editing company to improve our paper, and the editorial certificate had been submitted as a supplement document.

      • In the current format, some figures are out of focus. Please make sure to upload a high-quality version for publication.

      Thanks for your suggestion. A high quality version figures has been uploaded. Perhaps due to the excessive content of the file after upload, the file is compressed, and the figures is not focused. So, all figures in this study have been uploaded separately for download in the review system.

      Reviewer #2 (Recommendations For The Authors):

      Below are specific points of criticism on the experiments presented.

      (1a) There is no comparison among eugenol treatments with regards to fasting weight, blood glucose, water intake, food intake, and, crucially, OGTT. All three treatments appear to show very similar effects but has this been statistically assessed? Shown statistical significance of ketonuria between no and high eugenol treatments seems exaggerated.

      Thanks for your comments. EUG intervention has a dose-dependent effect on T1DM. According to Figure 1B-I, 20 mg/kg EUG has the best effect. Fasting body weight, blood glucose, water intake, food intake, and OGTT were statistically assessed in Figure 1 of the revised manuscript. In addition, we performed statistical analyse of ketonuria between no and high eugenol treatments again in the revised manuscript. In the revised manuscript, we have also made objective revisions to the expression of eugenol's efficacy.

      (b) ITT is not used to detect T1DM (line 126).

      Thanks for your suggestion. T1DM primarily manifests as pancreatic β cell damage and the absolute reduction of insulin secretion, resulting in the disorder of glucose metabolism in vivo. The oral glucose tolerance test (OGTT) is a series of plasma glucose concentrations measured within 2 h after oral gavage of a certain amount of glucose. It is a standard method to evaluate an individual's blood glucose regulation ability and to understand the function of islet β cells. Insulin resistance means reducing the efficiency of insulin to promote glucose uptake and utilization for various reasons, and the body's compensatory secretion of excessive insulin leads to hyperinsulinemia to maintain the stability of blood glucose. The insulin resistance test (ITT) is commonly employed to detect insulin resistance in T2DM. However, it was found that the ITT experiment had little correlation with T1DM. Therefore, the ITT experiment and related description have been removed in the revised manuscript.

      (2) Here it is hard to reconcile the gradual increase of Ins protein levels in (STZ) and (STZ + increasing eugenol) samples with(a) results in 1 suggesting that the dose of eugenol does not significantly affect the outcome and(b) Ins expression, which is essentially undetectable in both STZ and STZ+EUG mice. A likely explanation is that EUG just postpones beta cell death. I assume that these analyses were done in week 10 but it is not stated.

      Thanks for your professional suggestion. Perhaps because the file is compressed, the gray value of WB strip is not obvious, so the expression of INS is not seen clearly. In fact, the intervention of STZ resulted in a significant decrease in INS expression compared with the Control group, which could be alleviated by the treatment of EUG. However, due to the large difference in INS between the STZ group, EUG treatment, and the Control group, the gray values of INS in the STZ group and the STZ + EUG group were not clear. As mentioned in the method 4.12-4.13, our WB and PCR samples were from 10 week mice.

      (3) The γH2Ax stainings provided are weak and do not fully correspond to the quantitation - the 5 mg/Kg EUG treatment appears less severe than the 10 mg/Kg. In contrast, changes in the PCD pathway are convincingly demonstrated.

      Thanks for your reminder. γH2AX immunohistochemical staining is required to be located in the islets. It measured the number of β cells stained with brown, not the brown area. The ZOOM image of γH2AX staining showed that the EUG improvement effect of 10 mg/kg was better than that of 5 mg/kg. γH2AX, as a marker of DNA damage, exhibits nuclear localization and is absent in the cytoplasmic compartment. Therefore, in Figure 4C-D, we quantified the proportion of cells exhibiting brown staining. In Figure 4C, black arrows were employed to highlight the presence of brown-stained islet β cells.

      (4) Is there a reason for looking at mRNA levels of Ho-1 but not KEAP1 or NQO-1 ? What is the expression of Nrf2 itself at the RNA level? Please give in the text what the abbreviations MDA, SOD, CAT GSH-Px stand for. Are these protein levels or activity assays? Units in the y-axis of graphs?

      Thanks for your constructive suggestion.The required KEAP1 and NQO-1 primers have been synthesized, and the relevant data have been supplemented in the revised manuscript. The expression of Nrf2 itself at the RNA level is T-NRF2 (Total NRF2). The MDA, SOD, CAT and GSH-Px abbreviations stand for Malondialdehyde, Superoxide dismutase, Catalase, Glutathione peroxidase, and the relevant information, which have been supplemented in the revised manuscript. These are activity assays of serum, and units in the y-axis of graphs have been added in the revised manuscripts.

      (5) The Ins levels in the culture medium of STZ + ML treated cells are much lower than the levels in STZ treated cells (6D). This is not consistent with the results of Ins cell content or Ins expression as stated (6B and D).

      Thanks for your careful review. The experimental samples in Figure 6C in the revised manuscript represent the proteins extracted from cells of each group, while the experimental samples in Figure 6E represent the supernatant of cells from each group. ML385 is an inhibitor of NRF2, which effectively suppresses the NRF2 signaling pathway and aggravates MIN6 cell damage, resulting in lower INS expression observed in both the STZ+ML385 group depicted in Figures 6C and 6E compared to that in the STZ group. Although the sample sources of the two groups differ and there are slight variations in the trend, it can be observed that the overall trend of the STZ+ML385 group is comparatively lower than that of the STZ group.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Chen et al. identified the role of endocardial id2b expression in cardiac contraction and valve formation through pharmaceutical, genetic, electrophysiology, calcium imaging, and echocardiography analyses. CRISPR/Cas9 generated id2b mutants demonstrated defective AV valve formation, excitation-contraction coupling, reduced endocardial cell proliferation in AV valve, retrograde blood flow, and lethal effects.

      Strengths:

      Their methods, data and analyses broadly support their claims.

      Weaknesses:

      The molecular mechanism is somewhat preliminary.

      We thank the reviewer for the positive assessment of our work. A detailed point-by-point response has been incorporated in the response to “Recommendations for the authors” section.

      Reviewer #2 (Public review):

      Summary:

      Biomechanical forces, such as blood flow, are crucial for organ formation, including heart development. This study by Shuo Chen et al. aims to understand how cardiac cells respond to these forces. They used zebrafish as a model organism due to its unique strengths, such as the ability to survive without heartbeats, and conducted transcriptomic analysis on hearts with impaired contractility. They thereby identified id2b as a gene regulated by blood flow and is crucial for proper heart development, in particular, for the regulation of myocardial contractility and valve formation. Using both in situ hybridization and transgenic fish they showed that id2b is specifically expressed in the endocardium, and its expression is affected by both pharmacological and genetic perturbations of contraction. They further generated a null mutant of id2b to show that loss of id2b results in heart malformation and early lethality in zebrafish. Atrioventricular (AV) and excitation-contraction coupling were also impaired in id2b mutants. Mechanistically, they demonstrate that Id2b interacts with the transcription factor Tcf3b to restrict its activity. When id2b is deleted, the repressor activity of Tcf3b is enhanced, leading to suppression of the expression of nrg1 (neuregulin 1), a key factor for heart development. Importantly, injecting tcf3b morpholino into id2b-/- embryos partially restores the reduced heart rate. Moreover, treatment of zebrafish embryos with the Erbb2 inhibitor AG1478 results in decreased heart rate, in line with a model in which Id2b modulates heart development via the Nrg1/Erbb2 axis. The research identifies id2b as a biomechanical signaling-sensitive gene in endocardial cells that mediates communication between the endocardium and myocardium, which is essential for heart morphogenesis and function.

      Strengths:

      The study provides novel insights into the molecular mechanisms by which biomechanical forces influence heart development and highlights the importance of id2b in this process.

      Weaknesses:

      The claims are in general well supported by experimental evidence, but the following aspects may benefit from further investigation:

      (1) In Figure 1C, the heatmap demonstrates the up-regulated and down-regulated genes upon tricane-induced cardiac arrest. Aside from the down-regulation of id2b expression, it was also evident that id2a expression was up-regulated. As a predicted paralog of id2b, it would be interesting to see whether the up-regulation of id2a in response to tricane treatment was a compensatory response to the down-regulation of id2b expression.

      We thank the reviewer for the comment. As suggested, we performed qRT-PCR analysis to assess id2a expression in tricaine-treated heart. Our results demonstrate a significant upregulation of id2a following the inhibition of cardiac contraction, suggesting a potential compensatory response to the decreased id2b. These new results have been incorporated into the revised manuscript (Figure 1D).

      (2) The study mentioned that id2b is tightly regulated by the flow-sensitive primary cilia-klf2 signaling axis; however aside from showing the reduced expression of id2b in klf2a and klf2b mutants, there was no further evidence to solidify the functional link between id2b and klf2. It would therefore be ideal, in the present study, to demonstrate how Klf2, which is a transcriptional regulator, transduces biomechanical stimuli to Id2b.

      We have examined the expression levels of id2b in both klf2a and klf2b mutants. The whole mount in situ results clearly demonstrate a decrease in id2b signal in both mutants (Figure 3E). As noted by the reviewer, klf2 is a transcriptional regulator, suggesting that the regulation of id2b may occur at the transcriptional level. However, dissecting the molecular mechanisms underlying the crosstalk between klf2 and id2b is beyond the scope of the present study.

      (3) The authors showed the physical interaction between ectopically expressed FLAG-Id2b and HA-Tcf3b in HEK293T cells. Although the constructs being expressed are of zebrafish origin, it would be nice to show in vivo that the two proteins interact.

      We thank the reviewer for this insightful comment. As suggested, we synthesized Flag-id2b and HA-tcf3b mRNA and co-injected them into 1-cell stage zebrafish embryos. We collected 100-300 embryos at 12, 24, and 48 hpf and performed western blot analysis using the same anti-HA and anti-Flag antibodies validated in HEK293 cell experiments. Despite multiple independent attempts, we were unable to detect clear bands of the tagged proteins in zebrafish embryos. We speculate that this could be due to mRNA instability, translational efficiency, or the low abundance of Id2b and Tcf3b proteins. We have acknowledged these technical limitations in the revised manuscript and clarified that the HEK293 cell data support a potential interaction between Id2b and Tcf3b, while confirming their endogenous interaction will require further investigations (Lines 295-296).

      Reviewer #3 (Public review):

      Summary:

      How mechanical forces transmitted by blood flow contribute to normal cardiac development remains incompletely understood. Using the unique advantages of the zebrafish model system, Chen et al make the fundamental discovery that endocardial expression of id2b is induced by blood flow and required for normal atrioventricular canal (AVC) valve development and myocardial contractility by regulating calcium dynamics. Mechanistically, the authors suggest that Id2b binds to Tcf3b in endocardial cells, which relieves Tcf3b-mediated transcriptional repression of Neuregulin 1 (NRG1). Nrg1 then induces expression of the L-type calcium channel component LRRC1. This study significantly advances our understanding of flow-mediated valve formation and myocardial function.

      Strengths:

      Strengths of the study are the significance of the question being addressed, use of the zebrafish model, and data quality (mostly very nice imaging). The text is also well-written and easy to understand.

      Weaknesses:

      Weaknesses include a lack of rigor for key experimental approaches, which led to skepticism surrounding the main findings. Specific issues were the use of morpholinos instead of genetic mutants for the bmp ligands, cilia gene ift88, and tcf3b, lack of an explicit model surrounding BMP versus blood flow induced endocardial id2b expression, use of bar graphs without dots, the artificial nature of assessing the physical interaction of Tcf3b and Id2b in transfected HEK293 cells, and artificial nature of examining the function of the tcf3b binding sites upstream of nrg1.

      We thank the reviewer for the positive assessment and the constructive suggestions. We have performed additional experiments and data analysis to address these issues. A detailed point-by-point response has been incorporated in the response to “Recommendations for the authors” section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Questions/Concerns:

      (1) In the introduction, it would be beneficial to include background information on the id2b gene, what is currently known about its function in heart development/regeneration and in other animal models than just the zebrafish.

      We thank the reviewer for the constructive suggestion. In the revised manuscript, we have added a paragraph in the Introduction to provide background on id2b and its role in heart development. Specifically, we discuss its function as a member of the ID (inhibitor of DNA binding) family of helix-loop-helix (HLH) transcriptional regulators and highlight its involvement in cardiogenesis in both zebrafish and mouse models. These additions help place our findings in a broader developmental and evolutionary context (Lines 91-100).

      (2) Of the 6 differentially expressed genes identified in Figure 1C, why did the authors choose to focus on id2b and not the other 5 downregulated genes?

      We thank the reviewer for the comments. As suggested, we have added a sentence in the revised manuscript to clarify the rationale for selecting id2b as the focus of the present study (Lines 117-121).

      (3) As the authors showed representative in situ images for id2b expression with blebbistatin treatment in Figure 1E, and tnn2a MO in Figure 1F, it would also be beneficial to show relative mRNA expression levels for id2b in conditions of blebbistatin treatment and tnn2a MO knockdown. In Fig. 1C: id2b is downregulated with tricaine, but id2a is upregulated with tricaine. Do these genes perform similar or different functions, results of gene duplication events?

      We thank the reviewer for the thoughtful suggestion. Our in situ hybridization results demonstrate reduced id2b expression following tricaine, blebbistatin, and tnn2 morpholino treatment. To further validate these observations and enhance cellular resolution, we generated an id2b:eGFP knockin line. Analysis of this reporter line confirmed a significant reduction in id2b expression in the endocardium upon inhibition of cardiac contraction and blood flow (Figure 3A-D), supporting our in situ results. The divergent expression patterns of id2a and id2b in response to tricaine treatment likely reflect functional specification following gene duplication in zebrafish. While our current study focuses on characterizing the role of id2b in zebrafish heart development, the specific function of id2a remains to be determined. 

      (4) In Fig. 2b, could the authors compare the id2b fluorescence with RNAscope ISH at 24, 48, and 72 hpf? RNAscope ISH allows for the visualization of single RNA molecules in individual cells. The authors should at least compare these in the heart to demonstrate that id2b accurately reflects the endogenous id2b expression. In Fig. 2E: Suggest showing the individual fluorescent images for id2b:eGFP and kdrl:mCherry in the same colors as top panel images instead of in black and white. In Fig. 2F: The GFP fluorescence from id2b:eGFP signals looks overexposed.

      We thank the reviewer for the valuable comment. In response, we attempted RNAscope in situ hybridization on embryos carrying the id2b:eGFP reporter to directly compare fluorescent reporter expression with endogenous id2b transcripts. However, we encountered a significant reduction in id2b:eGFP fluorescence following the RNAscope procedure, and even subsequent immunostaining with anti-GFP antibodies yielded only weak signals. Despite this technical limitation, the RNAscope results independently confirmed id2b expression in endocardial cells (Figure 2E), supporting the specificity and cell-type localization observed with the reporter line. As suggested by the reviewer, we have updated Figure 2G to display id2b:eGFP and kdrl:mCherry images in the same color scheme as the top panel to improve consistency and clarity. Additionally, we have replaced the images in Figure 2F to avoid overexposure and better represent the spatial distribution of id2b:eGFP in adult heart.

      (5) In Fig. 3A: are all the images in panel A taken with the same magnification? In Fig. 3e, could the authors show the localization of klf2 and id2b and confirm their expression in the same endocardial cells? In Fig. 3, the authors conclude that klf2-mediated biomechanical signaling is essential for activating id2b expression. This statement is somewhat overstated because they only demonstrated that knockout of klf2 reduced id2b expression.

      We thank the reviewer for these constructive comments. All images presented in Figure 3A were captured using the same magnification, as now clarified in the revised figure legend. We appreciate the reviewer’s question regarding the localization of klf2 and id2b. While we were unable to directly visualize both markers in the same embryos due to the current unavailability of klf2 reporter lines, prior studies using klf2a:H2B-eGFP transgenic zebrafish have demonstrated that klf2a is broadly expressed in endocardial cells, with enhanced expression in the atrioventricular canal region (Heckel et al., Curr Bio 2015, PMID: 25959969; Gálvez-Santisteban et al., Elife 2019, PMID: 31237233). Our id2b:eGFP reporter analysis revealed a similarly broad endocardial expression pattern. These independent observations support the likelihood that klf2a and id2b are co-expressed in the same endocardial cell population.   

      We also appreciate the reviewer’s comments regarding the connection between biomechanical signaling and id2b expression. Previous studies have already established that biomechanical cues directly regulate klf2 expression in zebrafish endocardial cells (Vermot et al., Plos Biol 2009, PMID: 19924233; Heckel et al., Curr Bio 2015, PMID: 25959969). In the present study, we observed a significant reduction in id2b expression in both klf2a and klf2b mutants, suggesting that id2b acts downstream of klf2. These observations together establish the role of biomechanical cues-klf2-id2b signaling axis in endocardial cells. Nevertheless, we agree with the reviewer that further investigation is required to elucidate the precise mechanism by which klf2 regulates id2b expression.

      (6) In Fig. 4: What's the mRNA expression for id2b in WT and id2b mutant fish hearts?

      We performed qRT-PCR analysis on purified zebrafish hearts and observed a significant reduction in id2b mRNA levels in id2b mutants compared to wild-type controls. These new results have been incorporated into the revised manuscript (Figure 4A).

      (7) In Fig. 5E, the heart rate shows no difference between id2b+/+ and id2b-/- fish according to echocardiography analysis. However, Fig. 5B indicates a difference in heart rate. Could the authors explain this discrepancy?

      We thank the reviewer for this insightful observation. In our study, we observed a reduction in heart rate in id2b mutants during embryonic stages (120 hpf), as shown in Figure 5B. However, this difference was not evident in adult fish based on echocardiography analysis (Figure 5E). While the exact reason for these changes during development remains unclear, it is possible that the reduction in cardiac output observed in id2b mutants during early development triggers compensatory mechanisms over time, ultimately restoring heart rate in adulthood. Given that heart rate is primarily regulated by pacemaker activity, further investigation will be required to determine whether such compensatory adaptations occur and to elucidate the underlying mechanisms.

      (8) In Fig. 6A: it's a little hard to read the gene names in the left most image in the panel. In Fig. 6B, the authors conducted qRT-PCR analysis of 72 hpf embryonic hearts and validated decreased nrg1 levels in id2b-/- compared to control. Since nrg1 is not specifically expressed in endocardial cells in the developing heart, the authors should isolate endocardial cells and compare nrg1 expression in id2b-/- to control. This would ensure that the loss of id2b affects nrg1 expression derived from endocardial cells rather than other cell types. In Supp Figure S6: Suggest adding an image of the UMAP projection to show tcf3b expression in endocardial cells from sequencing analysis.

      We thank the reviewer for these helpful suggestions. In response, we have increased the font size of gene names in the leftmost panel of Figure 6A to improve readability. Regarding nrg1 expression, we acknowledge the importance of assessing its cell-type specificity. Unfortunately, due to the lack of reliable transgenic or knock-in tools for nrg1, its precise expression pattern in embryonic hearts remains unclear. We attempted to isolate endocardial cells from embryonic hearts using FACS, but the limited number of cells obtained at this stage precluded reliable qRT-PCR analysis. Nonetheless, our data show that id2b is specifically expressed in endocardial cells, and publicly available single-cell RNA-seq datasets also support that nrg1 is predominantly expressed in endocardial, but not myocardial or epicardial cells during embryonic heart development (Figure 6-figure supplement 1). These findings suggest that id2b may regulate nrg1 expression in a cell-autonomous manner within the endocardium. As suggested, we have also added a UMAP image to Figure 7-figure supplement 1 to show tcf3b expression in endocardial cells, further supporting the cell identity in single-cell dataset.

      (9) In Fig. 6, Nrg1 knockout shows no gross morphological defects and normal trabeculation in larvae. Could the authors explain why they propose that endocardial id2b promotes nrg1 synthesis, thereby enhancing cardiomyocyte contractile function? Did Nrg1 knockdown with Mo lead to compromised calcium signaling and cardiac contractile function? Nrg2a has been reported to be expressed in endocardial cells in larvae, and its loss leads to heart function defects. Perhaps Nrg2a plays a more important role than Nrg1.

      We thank the reviewer for raising this important point. Although we did not directly test nrg1 knockout in our study, previous reports have shown that genetic deletion of nrg1 in zebrafish does not impair cardiac trabeculation during embryonic stages (Rasouli et al., Nat Commun 2017, PMID: 28485381; Brown et al., J Cell Mol Med 2018, PMID: 29265764). However, reduced trabecular area and signs of arrhythmia were observed in juvenile and adult fish (Brown et al., J Cell Mol Med 2018, PMID: 29265764), suggesting a potential role for nrg1 in maintaining cardiac structure and function later in development. Whether calcium signaling and cardiac contractility are affected at these stages remains to be determined. Given that morpholino-induced knockdown is limited to early embryonic stages, it is not suitable for assessing nrg1 function in juvenile or adult hearts.

      As noted by the reviewer, nrg2a is expressed in endocardial cells, and its deletion has been associated with cardiac defects (Rasouli et al., Nat Commun 2017, PMID: 28485381). To assess its potential involvement in our model, we performed qRT-PCR analysis and observed increased nrg2a expression in id2b mutant hearts (Author response image 1). This upregulation may reflect a compensatory response to the loss of id2b. Therefore, nrg2a is unlikely to play an essential role in mediating the depressed cardiac function in this context.

      Author response image 1.

      Expression levels of nrg2a. qRT-PCR analysis of nrg2a mRNA in id2b<sup>+/+</sup> and id2b<sup>-/-</sup> adult hearts. Data were normalized to the expression of actb1. N=5 biological replicates, with each sample containing two adult hearts.

      (10) In Fig. 7A of the IP experiment, it is recommended that the authors establish a negative control using control IgG corresponding to the primary antibody source. This control helps to differentiate non-specific background signal from specific antibody signal.

      As suggested, we have included an IgG control corresponding to the primary antibody species in the immunoprecipitation (IP) experiment to distinguish specific from non-specific binding. The updated data are presented in Figure 7A of the revised manuscript.

      (11) In Pg. 5, line 115: there is no reference included for previous literature on blebbistatin.

      We have added the corresponding reference (Line 126, Reference #5).

      In Pg. 5, lines 118-119; pg. 6 line 144: It would be beneficial to include a short sentence describing why choosing a tnnt2a morpholino knockdown to help provide mechanistic context to readers.

      We thank the reviewer for the constructive suggestion. In cardiomyocytes, tnnt2a encodes a sarcomeric protein essential for cardiac contraction, and its knockdown is a well-established method for abolishing heartbeat and blood flow in zebrafish embryos, thereby allowing investigation of flow-dependent gene regulation. In the revised manuscript, we have added a sentence and corresponding reference to clarify the rationale for using tnnt2a morpholino in our study (Lines 128-129, Reference #35).

      In Pg. 6, line 140: Results title of "Cardiac contraction promotes endocardial id2b expression through primary cilia but not BMP" is misleading and contradicts the results presented in this section and corresponding figure. For example, the bmp Mo knockdown experiments led to decreased id2b fluorescence and the last statement of this results section contradicts the title that BMP does not promote endocardial id2b in lines 179-180: "Collectively, these results suggest that BMP signaling and blood flow modulate id2b expression in a developmental-stage-dependent manner." It would be helpful to clarify whether BMP signaling is involved in id2b expression or not.

      We apologize for any confusion caused by the section title. Our results demonstrate that id2b expression is regulated by both BMP signaling and biomechanical forces in a developmental-stage-specific manner. Specifically, morpholino-mediated knockdown of bmp2b, bmp4, and bmp7a at the 1-cell stage significantly reduced id2b:eGFP fluorescence at 24 hpf (Figure 3-figure supplement 1A, B), suggesting that id2b is responsive to BMP signaling during early embryonic development. However, treatment with the BMP inhibitor Dorsomorphin during later stages (24-48 or 36-60 hpf) did not significantly alter id2b:eGFP fluorescence intensity in individual endocardial cells, although a modest reduction in total endocardial cell number was noted (Figure 3-figure supplement 1C, D). These results suggest that BMP signaling is required for id2b expression during early development but becomes dispensable at later stages, when biomechanical cues may play a more prominent role. To address this concern and better reflect the data, we have revised the Results section title to: "BMP signaling and cardiac contraction regulate id2b expression". This revised title more accurately reflects the dual regulation of id2b expression (Line 153).

      In line 205: Any speculation on why the hemodynamics was preserved between id2b mutant and WT siblings at 96 hpf?

      As suggested, we have included a sentence to address this observation. “Surprisingly, the pattern of hemodynamics was largely preserved in id2b<sup>-/-</sup> embryos compared to id2b<sup>+/+</sup> siblings at 96 hpf (Figure 4-figure supplement 1E, Video 1, 2), suggesting that the reduced number of endocardial cells in the AVC region was not sufficient to induce functional defects.” (Lines 223-225)

      In line 246: Fig. 6k and 6j are referenced, but should be figure 5k and 5j.

      We have corrected this in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      he manuscript was overall well explained, aside from a few minor points that would help facilitate reader comprehension:

      (1) The last paragraph of the introduction could be a brief summary of the study.

      We thank the reviewer for this constructive suggestion. As recommended, we have included a paragraph in the Introduction section summarizing our key findings to provide clearer context for the study (Lines 96-100).

      (2) Lines 127-128: 'revealed a substantial recapitulation of the... of endogenous id2b expression' may need to be rephrased.

      We thank the reviewer for the valuable suggestion. In the revised manuscript, we have changed the sentence to: “Comparison of id2b:eGFP fluorescence with in situ hybridization at 24, 48, and 72 hpf revealed that the reporter signal closely recapitulates the endogenous id2b expression pattern.” (Lines 137-139)

      (3) Line 182: '... in a developmental-stage-dependent manner' sounds a bit ambiguous, may need to slightly elaborate/ clarify what this means.

      We thank the reviewer for the helpful comment. To improve clarity, we have revised the statement to: “Collectively, these results suggest that id2b expression is regulated by both BMP and biomechanical signaling, with the relative contribution of each pathway varying across developmental stages.” (Lines 195-197)

      Reviewer #3 (Recommendations for the authors):

      (1) The conclusion that BMP signaling prior to 24 hpf is necessary for id2b expression is not fully supported by the data. How do the authors envision pre-linear heart tube BMP signaling impacting endocardial id2b expression during later chamber stages? Id2b reporter fluorescence can be clearly visualized in the linear heart tube in panel B from Figure 1. Does id2b expression initiate prior to contraction? Can the model be refined by showing when id2b endocardial reporter fluorescence is first observed, and whether this early/pre-contractile expression is dependent on BMP signaling?

      We thank the reviewer for the important comment. As suggested, we performed morpholino-mediated knockdown of bmp2b, bmp4, and bmp7a at the 1-cell stage. Live imaging at 24 hpf showed significantly reduced id2b:eGFP fluorescence compared to controls (Figure 3-figure supplement 1A, B), suggesting that id2b is responsive to BMP signaling during early embryonic development. However, treatment with the BMP inhibitor Dorsomorphin during 24-48 or 36-60 hpf did not significantly impact id2b:eGFP fluorescence intensity in individual endocardial cells, although a reduction in endocardial cell number was observed (Figure 3-figure supplement 1C, D). These results suggest that BMP signaling is essential for id2b expression during early embryonic development, while it becomes dispensable at later stages, when biomechanical cues exert a more significant role.

      (2) Overexpressing tagged versions of TCF3b and Id2b in HEK293 cells is a very artificial way to make the major claim that these two proteins interact in endogenous endocardial cells. Can this be done in zebrafish embryonic or adult hearts?

      We thank the reviewer for this insightful comment. As suggested, we synthesized Flag-id2b and HA-tcf3b mRNA and co-injected them into 1-cell stage zebrafish embryos. We collected 100-300 embryos at 12, 24, and 48 hpf and performed western blot analysis using the same anti-HA and anti-Flag antibodies validated in HEK293 cell experiments. Despite multiple independent attempts, we were unable to detect clear bands of the tagged proteins in zebrafish embryos. We speculate that this could be due to mRNA instability, translational efficiency, or the low abundance of Id2b and Tcf3b proteins. We have acknowledged these technical limitations in the revised manuscript and clarified that the HEK293 cell data support a potential interaction between Id2b and Tcf3b, while confirming their endogenous interaction will require further investigations (Lines 295-296).

      (3) The data presented are consistent with the claim that the tcf3b binding sites are functional upstream of nrg1 to repress its transcription. To fully support this idea, those two sites should be disrupted with gRNAs if possible.

      We thank the reviewer for the valuable suggestion. In response, we attempted to disrupt the tcf3b binding sites using sgRNAs. However, we encountered technical difficulties in identifying sgRNAs that specifically and efficiently target these binding sites without affecting adjacent regions. Despite these challenges, our luciferase reporter assay, using tcf3b mRNA overexpression and morpholino knockdown, clearly demonstrated that tcf3b binds to and regulates nrg1 promoter region. Nevertheless, we acknowledge that future study using genome editing will be necessary to validate the direct binding of tcf3b to nrg1 promoter.

      Minor Points:

      (1) Must remove all of the "data not shown" statements and add the primary data to the Supplemental Figures.

      As suggested, we have removed all of the “data not shown” statements and added the original data to the revised manuscript (Figure 4E, middle panels, and Figure 4-figure supplement 1F)

      (2) Must present the order of the panels in the figure as they are presented in the text. One example is Figure 6 where 6E is discussed in the text before 6C and 6D.

      We thank the reviewer for bring up this important point. In the revised manuscript, we have carefully revised the manuscript to ensure that the order of figure panels matches the sequence in which they are discussed in the text. Specifically, we have reorganized the presentation of Figure 6 panels to align with the text flow, discussing panels 6C and 6D before panel 6E. The updated figure and corresponding text have been corrected accordingly in the revised manuscript.

      (3) Change the italicized gene names (e.g. tcf3b) to non-italicized names with the first letter capitalized (e.g. Tcf3b) when referencing the protein.

      As suggested, we have revised the manuscript to use non-italicized names with the first letter capitalized when referring to proteins.

      (4) All bar graphs should be replaced with dot bar graphs.

      We have replaced all bar graphs with dot bar graphs throughout the manuscript.

      (5) The new id2b mutant allele should be validated as a true null using quantitative RT-PCR to show that the message becomes destabilized through non-sense mediated decay or by immunostaining/western blot analysis if there is a zebrafish Id2b-specific antibody available.

      We thank the reviewer for this important suggestion. We have performed qRT-PCR analysis and detected a significant reduction in id2b mRNA levels in id2b<sup>-/-</sup> compared to id2b<sup>+/+</sup> controls. These new results are presented in Figure 4A of the revised manuscript.

      (6) Was tricaine used to anesthetize embryos for capturing heart rate and percent fractional area change? This analysis should be performed with no or very limited tricaine as it affects heart rate and systolic function. These parameters were captured at 120 hpf, but the authors should also look earlier at 72 hpf at a time when valves are not present by calcium transients are necessary to support heart function.

      We thank the reviewer for this important comment. When performing live imaging to assess cardiac contractile function, we used low-dose tricaine (0.16 mg/mL) to anesthetize the zebrafish embryos. We have included this important information in the Methods section (Line 503). As suggested, we have also included the heart function results at 72 hpf, which are now presented in Figure 5-figure supplement 2A-C of the revised manuscript.

      (7) The alpha-actinin staining in Figure 5-supplement 2D is very pixelated and unconvincing. This should be repeated and imaged at a higher resolution.

      As suggested, we have re-performed the α-actinin staining and acquired higher-resolution images. The updated results are now presented in Figure 5-figure supplement 2G of the revised manuscript.

      (8) The authors claim that reductions in id2b mutant heart contractility are due to perturbed calcium transients instead of sarcomere integrity. Why do the authors think that regulation of calcium dynamics was not observed in the DEG enriched GO-terms? Was significant downregulation of cacna1 identified in the bulk RNAseq?

      We thank the reviewer for raising this important point. In our bulk RNAseq dataset comparing id2b mutant and control hearts, GO term enrichment was primarily associated with pathways related to cardiac muscle contraction and heart contraction (Figure 5-figure supplement 1B). We speculate that the transcriptional changes related to calcium dynamics may be relatively subtle and thus were not captured as significantly enriched GO terms. In addition, our qRT-PCR analysis revealed a significant reduction in cacna1c expression in id2b mutant hearts compared to controls, suggesting that id2b deletion impairs calcium channel expression. However, this change was not detected by RNA-seq, likely due to limitations in sensitivity.

      (9) In line 277, the authors say, "To determine whether this interaction occurs in zebrafish, Flag-id2b and HA-tcf3b were co-expressed in HEK293 cells...". This should be re-phrased to, "To determine if zebrafish Id2b and Tcf3b interact in vitro, Flag-id2b and HA-tcf3b were co-expressed in HEK293 cells for co-immunoprecipitation analysis." The sentence in line 275 should be changed to, "....heterodimer with Tcf3b to limit its function as a potent transcriptional repressor."

      We thank the reviewer for these constructive comments and have revised the text accordingly (Lines 291-294).

      (10) Small text corrections or ideas:

      Line 63: emphasized

      We have corrected this in the revised manuscript.

      Line 71: studied signaling pathways

      We have corrected this in the revised manuscript.

      Line 106: the top 6 DEGS (I think that the authors mean top 6 GO-terms) and is Id2b in one of the enriched GO categories?

      id2b is one of the top DEGs. This point has been clarified in the revised manuscript (Lines 116-117).

      Line 125: a knockin id2b:eGFP reporter line

      We have corrected this in the revised manuscript (Line 136).

      Line 138: This paragraph could use a conclusion sentence.

      We have added a conclusion sentence in the revised manuscript (Lines 150-151).

      Line 190: id2b-/- zebrafish experienced early lethality

      We have revised the statement as suggested (Line 206).

      Line 193: The prominent enlargement of the atrium with a smaller ventricle has characterized as cardiomyopathy in zebrafish (Weeks et al. Cardiovasc Res, 2024, PMID: 38900908), which has also been associated with disruptions in calcium transients (Kamel et al J Cardiovasc Dev Dis, 2021, PMID: 33924051 and Kamel et al, Nat Commun 2021, PMID: 34887420). This information should be included in the text along with these references.

      We thank the reviewer for this helpful suggestion. We have incorporated these important references into the revised manuscript and included the relevant information to acknowledge the established link between atrial enlargement, cardiomyopathy, and disrupted calcium transients in zebrafish models (Reference #41, 42, and 45; Lines 210 and 260).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      [...] Weaknesses

      Showing that A-2 and especially A-3 are outliers in the PCA analysis is useful, but it may be hiding other interesting signals in the data. The other strains are remarkably colinear on these plots, hinting that if the outliers were removed, one main component would emerge along which they are situated. It also seems possible that this additional analysis step would allow the second dimension to better differentiate them in a way that is interesting with respect to their mutator status or mutations in key metabolic or regulatory genes.

      We thank the reviewer for their positive comments and their constructive feedback on the manuscript. Following reviewer’s recommendation, we performed the PCA analysis on metabolism data after removing A-2 and A-3 data. We have detailed those results below. Consistent with a similar analysis performed on RNA-seq datasets in our previous publication, we find that removing these outliers has only a modest effect on separating mutators from non-mutators. We find that, while the new PC2 separates most mutators from the non-mutators, the separation is rather weak. Moreover, we do not see a similar distinction when looking at metabolic data in the Stationary phase. In the interest of improving the readability of the manuscript, we recommend not including these analysis in the final manuscript. We have presented the data for the reviewer’s benefit in Author response image 1, 2 and 3.

      Author response image 1.

      Author response image 2.

      Author response image 3.

      There is a missed opportunity to connect some key results to what is known about LTEE mutations that reduce the activity of pykF (pyruvate kinase I). This gene is mutated in all 12 LTEE populations, and often these mutations are frameshifts or transposon insertions that should completely knock out its activity. At first glance, inactivating an enzyme for a step in glycolysis does not make sense when the nutrient source in the growth medium is glucose, even though PykF is only one of two isozymes E. coli encodes for this reaction. There has been speculation that inactivating pykF increases the concentration of phosphoenolpyruvate (PEP) in cells and that this can lead to increased rates of glucose import because PEP is used by the phosphotransferase system of E. coli to import glucose (see https://doi.org/10.1002/bies.20629). The current study has confirmed the higher PEP levels, which is consistent with this model.

      We thank the reviewer for pointing out this missed opportunity. We have expanded the discussion around the role of pykF mutations and the elevated concentrations of PEP observed in our data in section 3.4.

      In the introduction, the papers cited to show the importance of changes in metabolism for adaptation do not seem to fit the focus of this study very well. They stress production of toxins and secondary metabolites, which do not seem to be mechanisms that are at work in the LTEE. I can think of two areas of background that would be more relevant: (1) studies of how bacterial metabolism evolves in adaptive laboratory evolution (ALE) experiments to optimize metabolic fluxes toward biomass production (for example, https://doi.org/10.1038/nature01149), and (2) discussions of how cross-feeding, metabolic niche specialization, and metabolic interdependence evolve in microbial communities, including in other evolution experiments (for example, https://doi.org/10.1073/pnas.0708504105 and https://doi.org/10.1128/mBio.00036-12).

      We thank the reviewer for pointing out missed citations in our introduction. We agree that these papers are relevant to the topic and have added their citations. Additionally, following the suggestion of another reviewer, we have reorganized the introduction so that the concept of the role of metabolism in evolution is presented first and the LTEE second.

      Reviewer #2 (Public Review):

      [...] Overall, this is a significant and well-executed research study. It offers new insights into the complex relationship between genetic changes and observable traits in evolving populations and utilizes metabolomics in the LTEE, a novel approach in combination with RNA-seq and mutation datasets.

      However, the paper's overall clarity is lacking. It is spread too thin and covers many topics without a clear focus. I strongly recommend a substantial rewrite of the manuscript, emphasizing structure and readability. The science is well executed, but the current writing does not do it justice.

      We thank the reviewer for their positive comments and their constructive feedback on the lack of clarity in writing. Following the reviewer’s suggestions, we have rewritten parts of the manuscript and reorganizd a few sections to improve readability. We hope the revised manuscript is significantly improved.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      1) Title and Abstract: Add the study organism to the abstract, and probably also the title. Currently, E. coli is not mentioned in either! I'm also not sure that the LTEE is a sufficiently well-known acronym to abbreviate this in the title.

      We have revised the title of the manuscript and now spell out LTEE and included E. coli in the title and the abstract.

      2) Abstract: I would switch the usage of metabolome to metabolism in a few more places. For example, "changes in its metabolism", "networked and convoluted nature of metabolism". The metabolome, the concentrations of all metabolites, is what is being measured, but I think of this as a phenotypic readout of how metabolism evolving.

      We have changed “metabolome” to “metabolism” in cases where we refer to what is evolving and use “metabolome” when we refer to what is being measured.

      3) Line 16: Technically, the 12 LTEE populations were not initially identical. The Ara- differed from the Ara+ ancestors by one intentional mutation and one unintentional mutation that was not discovered until whole genomes were sequenced. I would rephrase this to "where 12 replicate populations of E. coli are propagated" or something similar so that it can be correct without needing to describe this unnecessary detail.

      The line has been rephrased as suggested.

      4) General Note: The text refers to populations as Ara-3 but the figures use A-3. I'd suggest going with A-3 and similar throughout for consistency.

      Instances of Ara have been changed to A+/-, and a sentence specifying as such has been added to the intro to make mention of this.

      5) Lines 43-44, 97-98. My understanding is that both S and L ecotypes in A-2 can use both glucose and acetate, but that the differentiation is related to their specialization that leads to each one being better on one or the other nutrient. The descriptions make it sound like each grows at a different time. Also, by definition, cells are not growing during "stationary phase". The change from glucose utilization (and acetate secretion) to acetate utilization during one cycle of growth is better described as a diauxic shift.

      We have reworded this part to remove mention of “growth” during stationary phase and changed the wording such that it no longer sounds like they grow at different times.

      6) Line 54: The statement "provide the ability to test hypotheses from previous data" is vague. Either provide an example or delete.

      We have removed this sentence as suggested.

      7) Lines 71-72: The terms "interphase" and "intraphase" sound too much like parts of the cell cycle. I'd suggest describing the comparisons as between and within growth phases.

      The use of intra and interphase have been changed as suggested.

      8) Line 79: The citrate is presumably still a chelating agent, so change phrasing to "Citrate is present in the medium because it was originally added as a chelating agent" or something similar.

      This sentence has been rewritten as suggested.

      9) Line 83: Write out "mutation accumulations" so it is easier to understand as "the number of mutations that have accumulated".

      The phrase has been changed as suggested.

      10) Line 116: It's unclear whether the abundances of metabolites are "strategies of survival" in stationary phase. An equally valid explanation is that there is less selection on the metabolome to have a specific composition during stationary phase to have high fitness.

      We have added a line about the possibility for alternative hypotheses.

      11) Figure 1: There seems to be some information missing from the legend. What are R06 and R07 in Panels A and B? Is panel D exponential phase and panel E stationary phase?

      This information was inadvertently missing from the caption and has been added.

      12) Figures 2 and 3: Gene names should be in italics. To me, the gray for deleted genes is hard to tell apart from the blue/red. Perhaps you could put a little X in these boxes instead? I think that having a little triangle pointing from each gene or metabolite name its corresponding abundance panel would help the reader track which information goes with which features. In Fig. 3 the placement of L-aspartate is a bit awkward. I'd suggest moving it down so the dashed line does not have to go through the abundance panel.

      These figures have been edited to include small triangles that link a gene or metabolite and its heatmap. Additionally, an X has been added where genes have suffered inactivating mutations and the placement of some elements has been moved to improve overall clarity.

      13) Lines 183-185: It would be easier to see and judge the consistency of these argR related relationships if a correlation graph of some kind was shown, probably as a supplemental figure. This plot could, for example, have genes/metabolites across the x-axis and fold-change on the y-axis with lines connecting points corresponding to each of the twelve populations across these categories (like Fig S8 but with lines added). Alternatively, it could be a heat map with the populations across one axis and the genes/metabolites across the other axis (like Fig S3).

      We have added a supplementary figure consisting of heatmaps showing the consistency of these changes within an evolved line. It is now figure S9.

      14) Line 195: I think adding a sentence elaborating on what exactly mutation accumulation means in this context would be helpful to readers.

      We have attempted to clarify the meaning of this by specifically stating that it is due to the accumulation of deleterious mutations.

      15) Line 293: Is standard LTEE medium DM25? These omics experiments with the LTEE sometimes use similar media with different glucose concentrations, and this is a very important detail to precisely specify.

      We reference “standard” LTEE medium in the methods section and have additionally specified the amount of sugar to make it clear that we are not supplementing the media with additional sugar.

      16) Figure S8B. Is "cystine" used instead of "cysteine" on purpose here since the compound is oxidized in the metabolomics treatment?

      The use of cystine is intentional, we detect the oxidized compound.

      Reviewer #2 (Recommendations For The Authors):

      Title:

      The abbreviation "LTEE" should not be in the title. Most readers will not recognize what it means. Instead, either the full name of the experiment, "Long-Term Evolution Experiment with E. coli," should be used, or the title should be rephrased to "Linking genotypic and phenotypic changes during a long-term evolution experiment using metabolomics."

      We have spelled out LTEE and included E. coli in the title.

      Abstract:

      Sentence 1: Consider softening the statement: "Do changes in an organism's environment, genome, or gene expression patterns often lead to changes in its metabolome?"

      We have rephrased this sentence to “Changes in an organism's environment, genome, or gene expression patterns can lead to changes in its metabolism”.

      Sentence 4: Use a hyphen for "Long-Term."

      This addition has been made.

      Sentence 4: Replace "transduce" with a more appropriate term: "...how the effects of mutations can be distributed through a cellular network to eventually affect metabolism and fitness."

      We have rewritten this sentence as “to understand how mutations can eventually affect metabolism and perhaps fitness”.

      Sentence 5: Clarify the use of "both" to refer to the ancestor of the LTEE and its descendant populations as two classes.

      We have reworded this sentence so it’s clear that the ancestors and evolved lines are two separate classes “We used mass-spectrometry to broadly survey the metabolomes of the ancestral strains and all 12 evolved lines…”.

      Sentence 6: Reverse the order for better emphasis: "Our work provides a better understanding of how mutations might affect fitness through the metabolome in the LTEE, and thus provides a major step in developing a complete genotype-phenotype map for this experimental system."

      We have rearranged this sentence per the reviewers suggestion.

      Introduction:

      Revise the introduction for clarity, readability, and logical narrative progression. Start with the second paragraph to set up the basic scientific principles being studied and then transition to describing the LTEE as a model system to examine those principles.

      The introduction has been rearranged and reworded in parts to increase clarity.

      Sentence 1: Revise for clarity: "The Long-Term Evolution Experiment (LTEE) has studied 12 initially identical populations of Escherichia coli as they have evolved in a carbon-limited, minimal glucose medium under a daily serial transfer regime."

      Sentence 2: Suggestion: "Begun in 1988, the LTEE populations have evolved for more than 75,000 generations, making it the longest-running experiment of its kind."

      Paragraph 2, sentence 2: Italicize "Drosophila."

      Paragraph 3, sentence 2: Make an important distinction: "Ara-3 is unique in that it evolved the ability to grow aerobically on citrate."

      Paragraph 3, sentence 4: Introduce the IS-mediated loss of the rbs operon in the LTEE as if it has not been described elsewhere.

      These suggestions have been incorporated into the manuscript.

      Results:

      Section 3.1: The use of samples from hours 2 and 24 to represent exponential and stationary phase may present some issues. For instance, capturing Ara-3 during its exponential growth on glucose, but not citrate, at hour 2. Furthermore, except for Ara-3, the LTEE populations reach stationary phase after approximately 4 hours, and there could be significant differences between early, mid, and late stationary phase. This possibility should be acknowledged, and future follow-up work should consider exploring these differences.

      We have added sentences in the first paragraph of the results section to include these details. We have also added a short paragraph to the conclusions suggesting additional studies of stationary phase, citing work on evolution of E. coli during long term stationary phase.

      Paragraph 3: While Turner et al. 2017 is an essential reference regarding resource use differences between Ara-3 and other LTEE populations, it would be more suitable to reference Blount et al. 2012 for the mutations that enabled access to citrate. Also, it is important to note that the difference lies in the ability to grow aerobically on citrate, rather than the ability to metabolize it.

      This citation has been added.

      Paragraph 4: As mentioned elsewhere, most LTEE populations exhibit balanced polymorphisms. Therefore, it is more appropriate to state that Ara-2 is the best-understood example of long-term diversity. It is likely that there are important metabolic differences between co-existing lineages in other LTEE populations.

      We now refer to Ara-2 as being the best-understood example of long term diversity..

      Paragraph 5: The first sentence of this paragraph should likely end with "levels."

      The word “levels” was added to the end of this sentence.

      Figure 3: It is preferable to refer to the "Superpathway of arginine and polyamine biosynthesis," citing EcoCyc as a reference, rather than a descriptor.

      This has been changed to a reference.

      Section 3.3, Paragraph 3: While higher intracellular amino acid abundances may facilitate higher translation rates and faster growth, the higher abundances themselves do not evaluate the hypothesis. To evaluate the hypothesis, it is necessary to demonstrate that higher abundances are associated with higher translation or growth rates. Therefore, the final sentence of this paragraph is not meaningful.

      We have reworded this sentence to say that it’s not possible to tell what the additional amino acids are being used for given only this data and that additional experiments are needed to confirm this hypothesis.

      Section 3.4: The first paragraph of this section misstates how evolution works. The low level of glucose in the LTEE does not drive innovation; instead, innovation occurs at random through the introduction of variation by mutation. Although the existence of the citrate resource acts as a reward that selects for variation that provides access to it, it is essential to remember that evolution is blind to such a reward. Moreover, regarding the evolution of the Cit+ trait, it is incorrect to assert that low glucose contributed to its evolution. As shown by Quandt et al. (2015), it seems probable that Cit+ evolution was potentiated by adaptation to specialization on acetate, which is produced by overflow metabolism resulting from rapid growth on glucose. This rapid growth only occurs when glucose is relatively abundant. The level of glucose seems low to us because it is low relative to traditional levels in bacteriological media, but not to the bacteria.

      We agree that this is a semantical, but important distinction. We have reworded this part as to not suggest that evolution has any forward thinking properties and is indeed blind to any rewards that might occur as the result of adaptation.

      In general, all instances of "utilize" and its cognates should be replaced with "use" and its cognates.

      Instances of “utilize” have been changed to use and its cognates.

      There is some uncertainty about the expectation of ramping up the TCA cycle in the LTEE. Overflow metabolism and acetate production appear to be prevalent in the LTEE, suggesting that many lineages only partially oxidize carbon derived from glucose, thereby bypassing the TCA cycle. While it is possible that this interpretation is incorrect, it would be helpful to see it addressed in the manuscript.

      We agree that this is a plausible hypothesis, we have added a paragraph at the end of this section that discusses the implications of overflow metabolism as an alternative hypothesis.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Several concerns are raised from the current study.

      1) Previous studies showed that iTregs generated in vitro from culturing naïve T cells with TGF-b are intrinsically unstable and prone to losing Foxp3 expression due to lack of DNA demethylation in the enhancer region of the Foxp3 locus (Polansky JK et al, Eur J Immunol., 2008, PMID: 18493985). It is known that removing TGF-b from the culture media leads to rapid loss of Foxp3 expression. In the current study, TGF-b was not added to the media during iTreg restimulation, therefore, the primary cause for iTreg instability should be the lack of the positive signal provided by TGF-b. NFAT signal is secondary at best in this culturing condition.

      In restimulation, void of TGFb is necessary to cause iTreg instability. Otherwise, the setup is similar to the iTreg-inducing environment (Author response image 1). On the other hand, the ultimate goal of this study is to provide a scenario that bears some resemblance of clinical treatment, where TGFb may not be available. The reviewer is correct in stating that TGFb is essential for iTreg stability, we are studying the role played by NFAT in iTreg instability in vitro, and possibly in potential clinical use of iTreg .

      Author response image 1.

      Restimulation with TGFb will persist iTreg inducing environment, resulting in less pronounced instability. Sorted Foxp3-GFP+ iTregs were rested for 1d, and then rested or restimulated in the presence of TGF-β for 2 d. Percentages of Foxp3+ cells were analyzed by intracellular staining of Foxp3 after 2 d.

      2) It is not clear whether the NFAT pathway is unique in accelerating the loss of Foxp3 expression upon iTreg restimulation. It is also possible that enhancing T cell activation in general could promote iTreg instability. The authors could explore blocking T cell activation by inhibiting other critical pathways, such as NF-kb and c-Jun/c-Fos, to see if a similar effect could be achieved compared to CsA treatment.

      We thank the reviewer for this suggestion. We performed this experiment according to see extent of the role that NFAT plays, or whether other major pathways are involved. As Author response image 2 shows, solely inhibiting NFAT effectively rescued the instability of iTreg. The inhibition of NFkB (BAY 11-7082), c-Jun (SP600125), or a c-Jun/c-Fos complex (T5224) had no discernable effect, or in one case, possibly further reduction in stability. These results may indicate that NFAT plays a crucial and special role in TCR activation, which leads to iTreg instability. Other pathways, as far as how this experiment is designed, do not appear to be significantly involved.

      Author response image 2.

      Comparing effects of NFAT, NF-kB and c-Jun/c-Fos inhibitors on iTreg instability. Sorted Foxp3-GFP+ iTregs were rested for 1d, then restimulated by anti-CD3 and CD28 in the presence of listed inhibitors. Percentages of Foxp3+ cells were analyzed by intracellular staining after 2d restimulation.

      3) The authors linked chromatin accessibility and increased expression of T helper cell genes to the loss of Foxp3 expression and iTreg instability. However, it is not clear how the former can lead to the latter. It is also not clear whether NFAT binds directly to the Foxp3 locus in the restimulated iTregs and inhibits Foxp3 expression.

      T helper gene activation is likely to cause instability in iTregs by secreting more inflammatory cytokines, as shown in Figure Q9, for example, IL-21 secretion. Further investigation is needed to understand how these genes contribute to Foxp3 gene instability exactly. With our limited insight, there may be two possibilities. 1. IL-21 directly affects Foxp3 through its impact on certain inflammation-related transcription factors (TFs). 2. There could be an indirect relationship where NFAT has a greater tendency to bind to those inflammatory TFs when iTreg instability appears, promoting the upregulation of these Th genes like in activated T cells, while being less likely to bind to SMAD and Foxp3, representing a competitive behavior. We at the moment cannot comprehend the intricacies that lead to the differential effects on T helper genes and Treg related genes.

      With that said, we have previously attempted to explore the direct effect of NFAT on Foxp3 gene locus. Foxp3 transcription in iTregs primarily relies on histone modifications such as H3K4me3 (Tone et al., 2008; Lu et al., 2011) rather than DNA demethylation (Ohkura et al., 2012; Hilbrands et al., 2016). Previous studies have reported that NFAT and SMAD3 can together promote the histone acetylation of Foxp3 genes (Tone et al., 2008). In our previous set of experiments, we simultaneously obtained information of NFAT binding sites and H3K4me3. In Foxp3 locus, we observed a decreasing trend in NFAT binding to the CNS3 region of Foxp3 in restimulated iTregs compared to resting iTregs (Author response image 3). Additionally, the H3K4me3 modification in the CNS3 region of Foxp3 decreased upon iTreg restimulation, but inhibiting NFAT nuclear translocation with CsA could maintain this modification at its original level (Author response image 3).

      Author response image 3.

      The NFAT binding and histone modification on Foxp3 gene locus. Genome track visualization of NFAT binding profiles and H3K4me3 profiles in Foxp3 CNS3 locus in two batches of dataset.

      Based on these preliminary explorations, it is concluded that NFAT can directly bind to the Foxp3 locus, and it appears that NFAT decreases upon restimulation, resulting in a decrease in H3K4me3, ultimately leading to the close association of NFAT and Foxp3 instability. However, due to limited sample replicates, these data need to be verified for more solid conclusions. We speculate that during the induction of iTregs, NFAT may recruit histone-modifying enzymes to open the Foxp3 CNS3 region, and this effect is synergistic with SMAD. When instability occurs upon restimulation, NFAT binding to Foxp3 weakens due to the absence of SMAD's assistance, subsequently reducing the recruitment of histone modifications enzyme and ultimately inhibiting Foxp3 transcription.

      Reviewer #2 (Public Review):

      (1) Some concerns about data processing and statistic analysis.

      The authors did not provide sufficient information on statistical data analysis; e.g. lack of detailed descriptions about

      -the precise numbers of technical/biological replicates of each experiment

      -the method of how the authors analyze data of multiple comparisons... Student t-test alone is generally insufficient to compare multiple groups; e.g. figure 1.

      These inappropriate data handlings are ruining the evidence level of the precious findings.

      We thank the reviewer for pointing out this important aspect. In the figure legend, numbers of independently-performed experiment repeats are shown as N, biological replicates of each experiment as n. Student’s t test was used for comparing statistical significance between two groups. In this manuscript, all calculations of significant differences were based on comparisons between two groups. There were no multiple conditions compared simultaneously within a single group, and thus, no other calculation methods were used.

      (2) Untransparent data production; e.g. the method of Motif enrichment analysis was not provided. Thus, we should wait for the author's correction to fully evaluate the significance and reliability of the present study.

      Per this reviewer’s request, we have provided detailed descriptions of the data analysis for Fig 5, including both the method section and the Figure legend, as presented below:

      “The peaks annotations were performed with the “annotatePeak” function in the R package ChIPseeker (Yu et al, 2015).

      The plot of Cut&Tag signals over a set of genomic regions were calculated by using “computeMatrix” function in deepTools and plotted by using “plotHeatmap” and “plotProfile” functions in deepTools. The motif enrichment analysis was performed by using the "findMotifsGenome.pl" command in HOMER with default parameters.

      The motif occurrences in each peak were identified by using FIMO (MEME suite v5.0.4) with the following settings: a first-order Markov background model, a P value cutoff of 10-4, and PWMs from the mouse HOCOMOCO motif database (v11).”

      Additionally, we have also supplemented the method section with further details on the analysis of RNA-seq and ATAC-seq data.

      (3) Lack of evidence in human cells. I wonder whether human PBMC-derived iTreg cells are similarly regulated.

      This is a rather complicated issue, human T cells express FoxP3 upon TCR stimulation (PNAS, 103(17): 6659–6664), whose function is likely to protect T cells from activation induced cell death, and does not offer Treg like properties. In contrast in mice, FoxP3 can be used as an indicator of Treg. Currently, this is not a definitive marker for Treg in human, our FoxP3 based readouts do not apply. Nevertheless, we have now investigated whether inhibiting calcium signaling or NFAT could enhance the stability of human iTreg. As shown in Author response image 4, we found that the proportion of Foxp3-expressing cells did not show significant changes across the different conditions, while the MFI analysis revealed that CsA-treated iTreg exhibited higher Foxp3 expression levels compared to both restimulated iTreg and rest iTreg. However, CM4620 had no significant effect on Foxp3 stability, consistent with the observation of its limited efficacy in suppressing human iTreg long term activation. In summary, our results suggest that inhibiting NFAT signaling through CsA treatment can help maintain higher levels of Foxp3 expression in human iTreg.

      Author response image 4.

      Effect of inhibiting NFAT and calcium on human iTreg stability. Human naïve CD4 cells from PBMC were subjected to a two-week induction process to generate human iTreg. Subsequently, human iTreg were restimulated for 2 days with dynabeads followed by 2 days of rest in the prescence of CsA and CM-4620. Four days later, percentages of Foxp3+ cells and Foxp3 mean fluorescence intensity (MFI) were analyzed by intracellular staining.

      (4) NFAT regulation did not explain all of the differences between iTregs and nTregs, as the authors mentioned as a limitation. Also, it is still an open question whether NFAT can directly modulate the chromatin configuration on the effector-type gene loci, or whether NFAT exploits pre-existing open chromatin due to the incomplete conversion of Treg-type chromatin landscape in iTreg cells. The authors did not fully demonstrate that the distinct pattern of chromatin regional accessibility found in iTreg cells is the direct cause of an effector-type gene expression.

      To our surprise, the inhibition of NFkB (BAY 11-7082), c-Jun (SP600125), and the c-Jun/c-Fos complex (T5224) resulted in minimal alterations, as shown in Fig Q1. This seems to argue that NFAT may play a more special role in events leading iTreg instability.

      We hypothesize that NFAT takes advantage of pre-existing open chromatin state due to the incomplete conversion of chromatin landscape in iTreg cells. Because iTreg cells, after induction, already exhibit inherent chromatin instability, with highly-open inflammatory genes. Furthermore, when iTreg cells were restimulated, the subsequent change in chromatin accessibility was relatively limited and not rescued by NFAT inhibitor treatment (Author response image 5). Therefore, in the case of iTreg cells, we propose that NFAT exploits the easy access of those inflammatory genes, leading to rapid destabilization of iTreg cells in the short term.

      In contrast, tTreg cells possess a relatively stable chromatin structure in the beginning, it would be interesting to investigate whether NFAT or calcium signaling could disrupt chromatin accessibility during the activation or expansion of tTreg cells. It is possible that NFAT might cause the loss of the originally established demethylation map and open up inflammatory loci, thereby inducing a shift in gene transcriptional profiles, equally leading to instability.

      Author response image 5.

      Chromatin accessibility of Rest, Retimulated, CsA/ORAIinh treated restimulated iTreg. PCA visualization of chromatin accessibility profiles of different cell types. Color indicates cell type.

      To establish a direct relationship between gene locus accessibility and its overexpression, a controlled experimental approach can be employed. One such method involves precise manipulation of the accessibility of a specific genomic locus using CRISPR-mediated epigenetic modifications at targeted loci. Subsequently, the impact of this manipulation on the expression level of the target gene can be precisely examined. By conducting these experiments, it will be possible to determine whether the augmented gene accessibility directly causes the observed gene overexpression.

      Reviewer #1 (Recommendations For The Authors):

      1) It might be helpful to add TGF-b to the iTreg restimulation culture to remove the influence of the lack of TGF-b from the equation, and measure the influence of SOCE/NFAT on iTreg instability.

      Please refer to Author response image 1.

      2) Alternatively, authors can also culture iTreg cells with TGF-b for 2 weeks when they undergo epigenetic changes and become more stabilized (Polansky JK et al, Eur J Immunol., 2008, PMID: 18493985). At this point, the stabilized iTregs can be used to measure the influence of SOCE/NFAT on iTreg instability.

      In the study conducted by Polansky, it was observed in Figure 1 that prolonged exposure to TGF-β fails to induce stable Foxp3 expression and demethylation of the Treg-specific demethylated region (TSDR). Based on this finding, we could consider exploring alternative approaches to obtain a more stabilized iTreg population. One such approach could be isolating Foxp3+helios-Nrp1- iTreg cells directly from the peripheral in vivo, which are also known as pTregs. Generally, pTreg cells generated in vivo tend to be more stable compared to iTreg cells induced in vitro, and they already exhibit partial demethylation of the Treg signature, as shown in Fig 6C (Polansky JK et al, Eur J Immunol., 2008, PMID: 18493985). Investigating the role of NFAT and calcium signaling in pTreg cells would provide further insights into the additional roles of NFAT in Treg phenotypical transitions, particularly its role in chromatin accessibility.

      3) In Figure 3, NFAT binding to the inflammatory genes in iTreg cells was even stronger than in activated T conventional cells. This is possibly due to Tconv cells being stimulated only once while iTregs were restimulated. A fair comparison should be conducted with restimulated activated conventional T cells.

      Figure 3 demonstrates the accessibility of inflammatory gene loci, rather than NFAT binding. Comparing restimulated Tconvs with restimulated iTreg cells is indeed a valuable suggestion, as their activation state and polarization in iTreg directions could lead to distinct chromatin accessibility. Although one is activated long term regularly and the other is activated long term under iTreg polarization, it is highly likely that the chromatin state of both activated Tconvs and iTreg cells is highly open, especially in terms of the accessibility of inflammatory genes. This may provide us with a new perspective to understand iTreg cells, but will unlikely affect our central conclusion.

      4) In the in vivo experiment in Figure 6, a control condition without OVA immunization should be included as a baseline.

      We have performed this experiment in the absence of OVA, as depicted in Author response image 6. In the absence of OVA immunization, both WT-ORAI and DN-ORAI iTreg exhibited substantial stability, although DN-ORAI demonstrated a slightly less stable trend. Upon activation with 40ug and 100ug of OVA, DN-ORAI iTreg demonstrated enhanced stability than WT-ORAI iTreg, maintaining a higher proportion of Foxp3 expression.

      Author response image 6.

      Stability of DN-ORAI iTreg in vivo with or without OVA immunization. WT-ORAI/DN-ORAI-GFP+-transfected CD45.2+ Foxp3-RFP+ OT-II iTregs were transferred i.v. into CD45.1 mice. Recipients were left or immunized with OVA323-339 in Alum adjuvant. On day 5, mLN were harvested and analyzed for Foxp3 expression by intracellular staining.

      Reviewer #2 (Recommendations For The Authors):

      Major

      Some concerns about the data processing and statistic analysis, as mentioned in the public review. In the figure legend, what does it mean e.g. n=3, N=3? Technical triplicate experiments? Three mice? Independently-performed three experiments? The authors should define it at least in the "Statistical analysis" in the method section otherwise the readers cannot determine the reason why they mainly use SEM for the data description.

      Moreover, in some cases, the number of experiments was not sure; e.g., Fig.1B, Fig. 5.

      How did the authors analyze data including multiple comparisons? Student t-test alone is generally insufficient to compare multiple groups; e.g. figure 1.

      We thank the reviewer for pointing out this omission. Now, in the figure legend, numbers of independently-performed experiment repeats are shown as N, biological replicates of each experiment as n. For Fig. 1B, N=2, and for Fig 5, we have acquired NFAT Cut&Tag data for 2 times, N=2. Student’s t test was used for comparing statistical significance between two groups. In this manuscript, all calculations of significant differences were based on comparisons between two groups. There were no multiple conditions compared simultaneously within a single group, and thus, no other calculation methods were involved apart from the Student's t-test.

      In Figure 1A, the difference in suppressiveness seemed subtle. Data collection of multiple doses of Tconv:Treg ratio will enhance the reliability of such kind of analysis.

      We have now attempted the suppression assay with varying Treg:Tconv ratios and observed that the suppressive effect of iTreg was more obvious than that of tTreg when co-cultured at a 1:1 ratio with Tconv cells. However, as the cell number of tTreg and iTreg decreased, the inhibitory effects converged.

      Author response image 7.

      Compare multiple dose of Tconv:Treg ratio in suppression function CFSE-labelled OT-II T cells were stimulated with OVA-pulsed DC, then different number of Foxp3-GFP+ iTregs and tTregs were added to the culture to suppress the OT-II proliferation. After 4 days, CFSE dilution were analyzed. Left, Representative histograms of CFSE in divided Tconvs. Right, graph for the percentage of divided Tconvs.

      In Figure 3F, to which group did the shaded peaks belong? In this context, the authors should focus on "Activation Region" peaks (open chromatin signature in both TcAct & iTreg defined in Fig. 4D) but I did not find the peak in the focusing DNA regions in TcAct (e.g. the shaded regions in IL-4 loci). The clear attribution of the peaks to the heatmap will enhance the visibility and understanding of readers.

      We have selected some typical peaks that belong to Fig 3D. These genes encompass some T-cell activation-associated transcription factors, such as Irf4, Atf3, as well as multiple members of the Tnf family including Lta, Tnfsf4, Tnfsf8, and Tnfsf14. Additionally, genes related to inflammation such as Il12rb2, Il9, and Gzmc are included. These genes show elevated accessibility upon T-cell activation, partially open in activated nTreg cells, referred to as the "Activation Region." They collectively exhibit high accessibility in iTreg cells, which may contribute to their instability.

      Author response image 8.

      Chromatin accessibility of some “Activation Region”. Genomic track showing chromatin accessibility of Irf4, Atf3, Lta, Tnfsf8, Tnfsf4, Tnsfsf14, Il12rb2, Il9, Gzmc in activated Tconv and iTreg.

      In Figure 4A/S4A, the information on cell death will help the understanding of readers because the sustained SOCE is associated with cell survival as shown in Fig. S2. The authors can discuss the relationships between cell death and Foxp3 retention, which potentially leads to a further interesting question; e.g. the selective/resistance to activation-induced cell death as the identity of Treg cells.

      As shown in Author response image 9, activated iTreg cells indeed exhibit a certain degree of cell death compared to resting iTreg cells. The inhibition of NFAT by CsA enhances the survival rate of iTreg cells, but the inhibition of ORAI by CM-4620 leads to more severe cell death. The cell death induced by CsA and CM-4620 is not consistent, indicating that there may not be a direct proportional relationship between cell death and the expression of Foxp3 and Treg identity.

      Author response image 9.

      Relationship of cell death and Foxp3 stability in restimulated iTregs. Sorted Foxp3-GFP+ iTregs were rested for 1d, then restimulated by anti-CD3 and CD28 in the presence of CsA or CM-4620. After 2d restimulation, live cell percentage were analyzed by staining of Live/Dead fixable Aqua, and percentages of Foxp3+ cells were analyzed by intracellular staining of Foxp3. Upper, live cell percentage of iTregs. Lower, percentages of Foxp3 in iTregs.

      In Figure 5, the information for the data interpretation was insufficient.

      We have provided detailed descriptions of the data analysis for Fig 5, including both the method section and the Figure legend, as presented below:

      “The peaks annotations were performed with the “annotatePeak” function in the R package ChIPseeker (Yu et al, 2015). The plot of Cut&Tag signals over a set of genomic regions were calculated by using “computeMatrix” function in deepTools and plotted by using “plotHeatmap” and “plotProfile” functions in deepTools. The motif enrichment analysis was performed by using the "findMotifsGenome.pl" command in HOMER with default parameters. The motif occurrences in each peak were identified by using FIMO (MEME suite v5.0.4) with the following settings: a first-order Markov background model, a P value cutoff of 10-4, and PWMs from the mouse HOCOMOCO motif database (v11).”

      Additionally, we have also supplemented the method section with further details on the analysis of RNA-seq and ATAC-seq data.

      The correlation between the open chromatin status of the gene loci described in Fig.5E and the expression at mRNA level? e.g.; Do iTreg-Act cells produce a higher level of IL-21 than nTreg-act? The analysis in Fig.5F-G should be performed in parallel with nTreg cells to emphasize the distinct NFAT-chromatin regulation in iTreg cells.

      We have now compared the secretion levels of IL-21 in tTreg and iTreg upon activation and treated with CsA by ELISA. As shown in Author response image 10, tTreg did not secrete IL-21 regardless of activation status (undetectable), while iTreg did not secrete IL-21 at resting state but exhibited IL-21 secretion after 48 h of activation. Moreover, the secretion of IL-21 was inhibited by CsA and CM-4620 treatment. This observation aligns with our earlier findings where we observed nuclear binding of NFAT to gene loci of these cytokines, enhancing their expression and pushing iTreg unstable under inflammatory conditions. These findings further underscore the likelihood that the inhibition of calcium and NFAT signaling might contribute to the stabilization of iTreg by suppressing the secretion of inflammatory cytokines.

      Author response image 10.

      IL-21 secretion in tTreg and iTreg upon activation. iTregs and tTregs were sorted and restimulated with anti-CD3 and anti-CD28 antibodies, in the presence of CsA and CM-4620. Cell culture supernatant were harvested after 2 d restimulation and IL-21 secretion was analyzed by ELISA.

      Performing a parallel comparison of NFAT activity between tTreg and iTreg cells was initially part of our experimental plan. However, it proved challenging in practice, as we encountered difficulties in efficiently infecting tTreg cells with NFAT-flag. Consequently, we could not obtain a sufficient number of tTreg cells for conducting Cut&Tag experiments.

      Based on our observations, we speculate that there might be substantial differences in the accessibility of genes in tTreg cells, leading to considerable variations in the repertoire of genes available for NFAT to regulate. As a result, we expect significant differences in the nuclear localization and activity of NFAT between iTreg and tTreg cells.

      In Figure 6C, what does the FCM plot between Foxp3-CFSE look like?

      The authors can discuss the mechanism of ORAI-DN-mediated through such analysis; e.g. the possibility that selective proliferation defect by ORAI-DN in Foxp3- cells led to an increased percentage of Foxp3, not only just unstable transcription of Foxp3.

      This is an in vitro experiment to assess the suppressive effect of iTreg on Tconv proliferation. Therefore, CFSE is used to stain Tconv cells, but not iTreg cells, so we did not detect proliferation feature of iTreg.

      Minor

      Confusing terminology of "tTreg" at line 47, etc. "natural Treg" contains both thymic-derived Treg and periphery-derived Treg cells. (A Abbas et al. Nat Immunol. 2013)

      We have now changed the designation to tTreg at line 47. tTreg refers to thymus-derived regulatory T cells, while nTreg includes both tTreg and pTreg. However, it is important to note that the Treg cells used in our study were isolated from the spleen of 2-4-month-old Foxp3-GFP or Foxp3-RFP mice. The CD4+ T cells were first enriched using the CD4 Isolation kit, and the FACSAriaII was utilized to collect CD4+ Foxp3-GFP/RFP+ Treg cells. Subsequently, Helios and Nrp-1 staining revealed that the majority of these cells were nTreg, with only approximately 6% being pTreg. Overall, we consider the cells we used as tTreg.

      In all FCM analyses, the authors should clarify how to detect Foxp3 expression; Foxp3-GFP/Foxp3-RFP/Intracellular staining like Figure S5A (but not specified in the other FCM plots)

      All Foxp3 expressions in the article were assessed using intracellular staining, as described in the methods section, and we have added specific descriptions to each figure legend. The reason for employing intracellular staining is that we used Foxp3-IRES-GFP mice, where GFP and Foxp3 are not fused into a single protein, existing as separate proteins after expression. Therefore, during induction, the appearance of GFP protein might potentially represent the presence of Foxp3. However, in cases of Foxp3 instability, the degradation of GFP protein may not be entirely synchronized with that of Foxp3 protein, making GFP an unreliable indicator of Foxp3 expression levels. As a result, for the purification of pure iTreg cells, we used Foxp3-GFP/RFP fluorescence, while for observing instability, we employed intranuclear staining of Foxp3.

      In Figure 6B, the captions were lacking in the two graphs on the right side

      The two restimulation conditions, 0.125+0.25 and 0.25+0.5, have been added into Fig 6B right side.

      In Figure S2, the annotation of the x-y axis was missing.

      Added.

      Lack of reference at line 292.

      Reference 42-46 were added.

      In the method section, the authors should note the further product information of antibodies and reagents to enhance reproducibility and transparency. Making a list that clarifies the suppliers, Ab clone, product IDs, etc. is encouraged. The authors did not specify the supplier of recombinant proteins and which type of TGF-beta (TGF-beta 1, 2, or 3?).

      A detailed description of the mice, antibodies, Peptide recombinant protein, commercial kit, and software has been provided and incorporated into the methods section.

      In the method section, the authors should clarify which Foxp3-reporter strain. There are many strains of Foxp3-reporter mice in the world. In line 373, is the "FoxP3-IRES-GFP transgenic mice" true? Knock-in strain or BAC-transgene?

      This mouse is a gift from Hai Qi Lab in Tsinghua University. They acquired this mouse strain from Jackson Laboratory, and the strain name is B6.Cg-Foxp3tm2Tch/J, Strain #:006772. An IRES-EGFP-SV40 poly A sequence was inserted immediately downstream of the endogenous Foxp3 translational stop codon, but upstream of the endogenous polyA signal, generating a bicistronic locus encoding both Foxp3 and EGFP.

      The age of mice used in the experiments should be specified, and confusing words such as "young" should not be used in any method descriptions; e.g. line 405.

      The detailed mouse age has been added in the methods section. “To prepare Tconv, tTreg and iTreg for experiments, spleen was isolated from 2-4-month-old Foxp3-GFP mice for Tconv and tTreg sorting, and 6-week-old mice for iTreg induction.”

      The method of how the original ATAC-seq/Cut & Tag data were generated was not described in the method section.

      Added in method section.

      The reference section was incomplete, and the style was not unified. e.g.; ref 7, 24, 25, 26 ... I gave up checking all.

      The style of ref 7, 22, 24, 26, 28, 31, 33, 35 were modified.

      Changes in manuscript:

      Author Name: “Huiyun Lv” to “Huiyun Lyu”.

      Fig 1A was updated according to Reviwer 2’s suggestion.

      Fig S3E and associated description was added according to Reviwer 2’s suggestion.

      Fig S4C and associated description was added according to Reviwer 1’s suggestion.

      Fig 5H and associated description was added according to Reviwer 2’s suggestion.

      Fig 6D were updated according to Reviwer 1’s suggestion.

      Fig 2D was corrected, the labels for gapdh and actin in the iTreg panel were inadvertently switched. The mistake has been rectified, and the original gel image will be provided.

      Fig 2A and Fig 4A was updated.

      The style of Fig 6B and Fig S2A was modified.

      Method:

      Mice: FoxP3-IRES-GFP with more description.

      Flow Cytometry sorting and FACS: the detailed mouse age has been added. RNA-seq analysis, ATAC-sequencing, ATAC-seq analysis, Cut&Tag assay, Cut&Tag data analysis: more description was added.

      Statistical analysis: “Numbers of independently-performed experiment repeats are shown as N, biological replicates of each experiment as n.” were added.

      Reference: Ref 42-46 and 49-52 were added. The style of ref 7, 22, 24, 26, 28, 31, 33, 35 were corrected.

      A detailed description of the mice, antibodies, Peptide recombinant protein, commercial kit, and software has been provided.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides potentially important, new information about the combination of information from the two eyes in humans. The data included frequency tagging of each eye's inputs and measures reflecting both cortical (EEG) and sub-cortical processes (pupillometry). Binocular combination is of potentially general interest because it provides -in essence- a case study of how the brain combines information from different sources and through different circuits. The strength of supporting evidence appears to be solid, showing that temporal modulations are combined differently than spatial modulations, with additional differences between subcortical and cortical pathways. However, the manuscript's clarity could be improved, including by adding more convincing motivations for the approaches used.

      We thank the editor and reviewers for their detailed comments and suggestions regarding our paper. We have implemented most of the suggested changes. In doing so we noticed a minor error in our analysis code that affected the functions shown in Figure 2e (previously Figure 1e), and have fixed this and rerun the modelling. Our main results and conclusions are unaffected by this change. We have also added a replication data set to the Appendix, as this bears on one of the points raised by a reviewer, and included a co-author who helped run this experiment.

      Reviewer #1 (Public Review):

      In this paper, the interocular/binocular combination of temporal luminance modulations is studied. Binocular combination is of broad interest because it provides a remarkable case study of how the brain combines information from different sources. In addition, the mechanisms of binocular combination are of interest to vision scientists because they provide insight into when/where/how information from two eyes is combined.

      This study focuses on how luminance flicker is combined across two eyes, extending previous work that focused mainly on spatial modulations. The results appear to show that temporal modulations are combined in different ways, with additional differences between subcortical and cortical pathways.

      1. Main concern: subcortical and cortical pathways are assessed in quite different ways. On the one hand, this is a strength of the study (as it relies on unique ways of interrogating each pathway). However, this is also a problem when the results from two approaches are combined - leading to a sort of attribution problem: Are the differences due to actual differences between the cortical and subcortical binocular combinations, or are they perhaps differences due to different methods. For example, the results suggest that the subcortical binocular combination is nonlinear, but it is not clear where this nonlinearity occurs. If this occurs in the final phase that controls pupillary responses, it has quite different implications.

      At the very least, this work should clearly discuss the limitations of using different methods to assess subcortical and cortical pathways.

      The modelling asserts that the nonlinearity is primarily interocular suppression, and that this is stronger in the subcortical pathway. Moreover the suppression impacts before binocular combination. So this is quite a specific location. We now say more about this in the Discussion, and also suggest that fMRI might avoid the limits on the conclusions we can draw from different methods.

      1. Adding to the previous point, the paper needs to be a better job of justifying not only the specific methods but also other details of the study (e.g., why certain parameters were chosen). To illustrate, a semi-positive example: Only page 7 explains why 2Hz modulation was used, while the methods for 2Hz modulation are described in detail on page 3. No justifications are provided for most of the other experimental choices. The paper should be expanded to better explain this area of research to non-experts. A notable strength of this paper is that it should be of interest to those not working in this particular field, but this goal is not achieved if the paper is written for a specialist audience. In particular, the introduction should be expanded to better explain this area of research, the methods should include justifications for important empirical decisions, and the discussion should make the work more accessible again (in addition to addressing the issues raised in point 1 above). The results also need more context. For example, why EEG data have overtones but pupillometry does not?

      We now explain the choice of frequency in the final paragraph of the introduction as follows:

      ‘We chose a primary flicker frequency of 2Hz as a compromise between the low-pass pupil response (see Barrionuevo et al., 2014; Spitschan et al., 2014), and the relatively higher-pass EEG response (Regan, 1966).’

      We also mention why the pupil response is low-pass:

      ‘The pupil response can be modulated by periodic changes in luminance, and is temporally low-pass (Barrionuevo et al., 2014; Spitschan et al. 2014), most likely due to the mechanical limitations of the iris sphincter and dilator muscles’.

      Reviewer #2 (Public Review):

      Previous studies have extensively explored the rules by which patterned inputs from the two eyes are combined in the visual cortex. Here the authors explore these rules for un-patterned inputs (luminance flicker) at both the level of the cortex, using Steady-State Visual Evoked Potentials (SSVEPs) and at the sub-cortical level using pupillary responses. They find that the pattern of binocular combination differs between cortical and sub-cortical levels with the cortex showing less dichoptic masking and somewhat more binocular facilitation.

      Importantly, the present results with flicker differ markedly from those with gratings (Hou et al., 2020, J Neurosci, Baker and Wade 2017 cerebral cortex, Norcia et al, 2000 Nuroreport, Brown et al., 1999, IOVS). When SSVEP responses are measured under dichoptic conditions where each eye is driven with a unique temporal frequency, in the case of grating stimuli, the magnitude of the response in the fixed contrast eye decreases as a function of contrast in the variable contrast eye. Here the response increases by varying (small) magnitudes. The authors favor a view that cortex and perception pool binocular flicker inputs approximately linearly using cells that are largely monocular. The lack of a decrease below the monocular level when modulation strength increase is taken to indicate that previously observed normalization mechanism in pattern vision does not play a substantial role in the processing of flicker. The authors present a computational model of binocular combination that captures features of the data when fit separately to each data set. Because the model has no frequency dependence and is based on scalar quantities, it cannot make joint predictions for the multiple experimental conditions which is one of its limitations.

      A strength of the current work is the use of frequency-tagging of both pupil and EEG responses to measure responses for flicker stimuli at two anatomical levels of processing. Flicker responses are interesting but have been relatively neglected. The tagging approach allows one to access responses driven by each eye, even when the other eye is stimulated which is a great strength. The tagging approach can be applied at both levels of processing at the same time when stimulus frequencies are low, which is an advantage as they can be directly compared. The authors demonstrate the versatility of frequency tagging in a novel experimental design which may inspire other uses, both within the present context and others. A disadvantage of the tagging approach for studying sub-cortical dynamics via pupil responses is that it is restricted to low temporal frequencies given the temporal bandwidth of the pupil. The inclusion of a behavioral measure and a model is also a strength, but there are some limitations in the modeling (see below).

      The authors suggest in the discussion that luminance flicker may preferentially drive cortical mechanisms that are largely monocular and in the results that they are approximately linear in the dichoptic cross condition (no effect of the fixed contrast stimulus in the other eye). By contrast, prior research using dichoptic dual frequency flickering stimuli has found robust intermodulation (IM) components in the VEP response spectrum (Baitch and Levi, 1988, Vision Res; Stevens et al., 1994 J Ped Ophthal Strab; France and Ver Hoeve, 1994, J Ped Ophthal Strab; Suter et al., 1996 Vis Neurosci). The presence of IM is a direct signature of binocular interaction and suggests that at least under some measurement conditions, binocular luminance combination is "essentially" non-linear, where essential implies a point-like non-linearity such as squaring of excitatory inputs. The two views are in striking contrast. It would thus be useful for the authors could show spectra for the dichoptic, two-frequency conditions to see if non-linear binocular IM components are present.

      This is an excellent point, and one that we had not previously appreciated the importance of. We have generated a figure (Fig 8) showing the IM response in the cross frequency conditions. There is a clear response at 0.4Hz in the pupillometry data (2-1.6Hz), and at 3.6Hz in the EEG data (2+1.6Hz). We therefore agree that this shows the system is essentially nonlinear, despite the binocular combination appearing approximately linear. We now say in the Discussion:

      ‘In the steady-state literature, one hallmark of a nonlinear system is the presence of intermodulation responses at the sums and differences of fundamental flicker frequencies (Baitch & Levi, 1988; Tsai et al., 2012). In Figure 8 we plot the amplitude spectra of conditions from Experiment 1 in which the two eyes were stimulated at different frequencies (2Hz and 1.6Hz) but at the same contrast (48%; these correspond to the binocular cross and dichoptic cross conditions in Figures 2d,e and 3d,e). Consistent with the temporal properties of pupil responses and EEG, Figure 8a reveals a strong intermodulation difference response at 0.4Hz (red dashed line), and Figure 8b reveals an intermodulation sum response at 3.6Hz (red dashed line). The presence of these intermodulation terms is predicted by nonlinear gain control models of the type considered here (Baker and Wade, 2017; Tsai et al., 2012), and indicates that the processing of monocular flicker signals is not fully linear prior to the point at which they are combined across the eyes.’

      If the IM components are indeed absent, then there is a question of the generality of the conclusions, given that several previous studies have found them with dichoptic flicker. The previous studies differ from the authors' in terms of larger stimuli and in their use of higher temporal frequencies (e.g. 18/20 Hz, 17/21 Hz, 6/8 Hz). Either retinal area stimulated (periphery vs central field) or stimulus frequency (high vs low) could affect the results and thus the conclusions about the nature of dichoptic flicker processing in cortex. It would be interesting to sort this out as it may point the research in new directions.

      This is a great suggestion about retinal area. As chance would have it, we had already collected a replication data set where we stimulated the periphery, and we now include a summary of this data set as an Appendix. In general the results are similar, though we obtain a measurable (though still small) second harmonic response in the pupillometry data with this configuration, which is a further indication of nonlinear processing.

      Whether these components are present or absent is of interest in terms of the authors' computational model of binocular combination. It appears that the present model is based on scalar magnitudes, rather than vectors as in Baker and Wade (2017), so it would be silent on this point. The final summation of the separate eye inputs is linear in the model. In the first stage of the model, each eye's input is divided by a weighted input from the other eye. If we take this input as inhibitory, then IM would not emerge from this stage either.

      We have performed the modelling using scalar values here for simplicity and transparency, and to make the fitting process computationally feasible (it took several days even done this way). This type of model is quite capable of processing sine waves as inputs, and producing a complex output waveform which is Fourier transformed and then analysed in the same way as the experimental data (see e.g. Tsai, Wade & Norcia, 2012, J Neurosci; Baker & Wade, 2017, Cereb Cortex). However our primary aim here was to fit the model, and make inferences about the parameter values, rather than to use a specific set of parameter values to make predictions. We now say more about this family of models and how they can be applied in the methods section:

      “Models from this family can handle both scalar contrast values and continuous waveforms (Tsai et al., 2012) or images (Meese and Summers, 2007) as inputs. For time-varying inputs, the calculations are performed at each time point, and the output waveform can then be analysed using Fourier analysis in the same way as for empirical data.This means that the model can make predictions for the entire Fourier spectrum, including harmonic and intermodulation responses that arise as a consequence of nonlinearities in the model (Baker and Wade, 2017). However for computational tractability, we performed fitting here using scalar contrast values.”

      As a side point, there are quite a lot of ways to produce intermodulation terms, meaning they are not as diagnostic as one might suppose. We demonstrate this in Author response image 1, which shows the Fourier spectra produced by a toy model that multiplies its two inputs together (for an interactive python notebook that allows various nonlinearities to be explored, see here). Intermodulation terms also arise when two inputs of different frequencies are summed, followed by exponentiation. So it would be possible to have an entirely linear binocular summation process, followed by squaring, and have this generate IM terms (not that we think this is necessarily what is happening in our experiments).

      Author response image 1

      Related to the model: One of the more striking results is the substantial difference between the dichoptic and dichoptic-cross conditions. They differ in that the latter has two different frequencies in the two eyes while the former has the same frequency in each eye. As it stands, if fit jointly on the two conditions, the model would make the same prediction for the dichoptic and dichoptic-cross conditions. It would also make the same prediction whether the two eyes were in-phase temporally or in anti-phase temporally. There is no frequency/phase-dependence in the model to explain differences in these cases or to potentially explain different patterns at the different VEP response harmonics. The model also fits independently to each data set which weakens its generality. An interpretation outside of the model framework would thus be helpful for the specific case of differences between the dichoptic and dichoptic-cross conditions.

      As mentioned above, the limitations the reviewer highlights are features of the specific implementation, rather than the model architecture in general. Furthermore, although this particular implementation of the model does not have separate channels for different phases, these can be added (see e.g. Georgeson et al., 2016, Vis Res, for an example in the spatial domain). In future work we intend to explore the phase relationship of flicker, but do not have space to do this here.

      Prior work has defined several regimes of binocular summation in the VEP (Apkarian et al.,1981 EEG Journal). It would be useful for the authors to relate the use of their terms "facilitation" and "suppression" to these regimes and to justify/clarify differences in usage, when present. Experiment 1, Fig. 3 shows cases where the binocular response is more than twice the monocular response. Here the interpretation is clear: the responses are super-additive and would be classed as involving facilitation in the Apkarian et al framework. In the Apkarian et al framework, a ratio of 2 indicates independence/linearity. Ratios between 1 and 2 indicate sub-additivity and are diagnostic of the presence of binocular interaction but are noted by them to be difficult to interpret mechanistically. This should be discussed. A ratio of <1 indicates frank suppression which is not observed here with flicker.

      Operationally, we use facilitation to mean an increase in response relative to a monocular baseline, and suppression to mean a decrease in response. We now state this explicitly in the Introduction. Facilitation greater than a factor of 2 indicates some form of super-additive summation. In the context of the model, we also use the term suppression to indicate divisive suppression between channels, however this feature does not always result in empirical suppression (it depends on the condition, and the inhibitory weight). We think that interpretation of results such as these is greatly aided by the use of a computational modelling framework, which is why we take this approach here. The broad applicability of the model we use in the domain of spatial contrast lends it credibility for our stimuli here.

      Can the model explore the full range of binocular/monocular ratios in the Apkarian et al framework? I believe much of the data lies in the "partial summation" regime of Apkarian et al and that the model is mainly exploring this regime and is a way of quantifying varying degrees of partial summation.

      Yes, in principle the model can produce the full range of behaviours. When the weight of suppression is 1, binocular and monocular responses are equal. When the weight is zero, the model produces linear summation. When the weight is greater than 1, suppression occurs. It is also possible to produce super-additive summation effects, most straightforwardly by changing the model exponents. However this was not required for our data here, and so we kept these parameters fixed. We agree that the model is a good way to unify the results across disparate experimental paradigms, and that is our main intention with Figure 7i.

      Reviewer #3 (Public Review):

      This manuscript describes interesting experiments on how information from the two eyes is combined in cortical areas, sub-cortical areas, and perception. The experimental techniques are strong and the results are potentially quite interesting. But the manuscript is poorly written and tries to do too much in too little space. I had a lot of difficulty understanding the various experimental conditions, the complicated results, and the interpretations of those results. I think this is an interesting and useful project so I hope the authors will put in the time to revise the manuscript so that regular readers like myself can better understand what it all means.

      Now for my concerns and suggestions:

      The experimental conditions are novel and complicated, so readers will not readily grasp what the various conditions are and why they were chosen. For example, in one condition different flicker frequencies were presented to the two eyes (2Hz to one and 1.6Hz to the other) with the flicker amplitude fixed in the eye presented to the lower frequency and the flicker amplitude varied in the eye presented to the higher frequency. This is just one of several conditions that the reader has to understand in order to follow the experimental design. I have a few suggestions to make it easier to follow. First, create a figure showing graphically the various conditions. Second, come up with better names for the various conditions and use those names in clear labels in the data figures and in the appropriate captions. Third, combine the specific methods and results sections for each experiment so that one will have just gone through the relevant methods before moving forward into the results. The authors can keep a general methods section separate, but only for the methods that are general to the whole set of experiments.

      We have created a new figure (now Fig 1) that illustrates the conditions from Experiment 1, and is referenced throughout the paper. We have kept the names constant, as they are rooted in a substantial existing literature, and it will be confusing to readers familiar with that work if we diverge from these conventions. We did consider separating out the methods section, but feel it helps the flow of the results section to keep it as a single section.

      I wondered why the authors chose the temporal frequencies they did. Barrionuevo et al (2014) showed that the human pupil response is greatest at 1Hz and is nearly a log unit lower at 2Hz (i.e., the change in diameter is nearly a log unit lower; the change in area is nearly 2 log units lower). So why did the authors choose 2Hz for their primary frequency? And why did the authors choose 1.6Hz which is quite close to 2Hz for their off frequency? The rationale behind these important decisions should be made explicit.

      We now explain this in the Introduction as follows:

      ‘We chose a primary flicker frequency of 2Hz as a compromise between the low-pass pupil response (see Barrionuevo et al., 2014; Spitschan et al., 2014), and the relatively higher-pass EEG response (Regan, 1966).’

      It is a compromise frequency that is not optimal for either modality, but generates a measurable signal for both. The choice of 1.6 Hz was for similar reasons - for a 10-second trial it is four frequency bins away from the primary frequency, so can be unambiguously isolated in the spectrum.

      By the way, I wondered if we know what happens when you present the same flicker frequencies to the two eyes but in counter-phase. The average luminance seen binocularly would always be the same, so if the pupil system is linear, there should be no pupil response to this stimulus. An experiment like this has been done by Flitcroft et al (1992) on accommodation where the two eyes are presented stimuli moving oppositely in optical distance and indeed there was no accommodative response, which strongly suggests linearity.

      We have not tried this yet, but it’s on our to-do list for future work. The accommodation work is very interesting, and we now cite it in the manuscript as follows:

      ‘Work on the accommodative response indicates that binocular combination there is approximately linear (Flitcroft et al. 1992), and can even cancel when signals are in antiphase (we did not try this configuration here).’

      Figures 1 and 2 are important figures because they show the pupil and EEG results, respectively. But it's really hard to get your head around what's being shown in the lower row of each figure. The labeling for the conditions is one problem. You have to remember how "binocular" in panel c differs from "binocular cross" in panel d. And how "monocular" in panel d is different than "monocular 1.6Hz" in panel e. Additionally, the colors of the data symbols are not very distinct so it makes it hard to determine which one is which condition. These results are interesting. But they are difficult to digest.

      We hope that the new Figure 1 outlining the conditions has helped with interpretation here.

      The authors make a strong claim that they have found substantial differences in binocular interaction between cortical and sub-cortical circuits. But when I look at Figures 1 and 2, which are meant to convey this conclusion, I'm struck by how similar the results are. If the authors want to continue to make their claim, they need to spend more time making the case.

      Indeed, it is hard to make direct comparisons across figures - this is why Figure 4 plots the ratio of binocular to monocular conditions, and shows a clear divergence between the EEG and pupillometry results at high contrasts.

      Figure 5 is thankfully easy to understand and shows a very clear result. These perceptual results deviate dramatically from the essentially winner-take-all results for spatial sinewaves shown by Legge & Rubin (1981); whom they should cite by the way. Thus, very interestingly the binocular combination of temporal variation is quite different than the binocular combination of spatial variation. Can the pupil and EEG results also be plotted in the fashion of Figure 5? You'd pick a criterion pupil (or EEG) change and use it to make such plots.

      We now cite Legge & Rubin. We see what you mean about plotting the EEG and pupillometry results in the same coordinates as the matching data, but we don’t think this is especially informative as we would end up only with data points along the axes and diagonal of the plot, without the points at other angles. This is a consequence of how the experiments were conducted.

      My main suggestion is that the authors need to devote more space to explaining what they've done, what they've found, and how they interpret the data. I suggest therefore that they drop the computational model altogether so that they can concentrate on the experiments. The model could be presented in a future paper.

      We feel that the model is central to the understanding and interpretation of our results, and have retained it in the revised version of the paper.

      Reviewer #2 (Recommendations For The Authors):

      I found the terms for the stimulus conditions confusing. I think a simple schematic diagram of the conditions would help the reader.

      Now added (the new Fig 1).

      In reporting the binocular to monocular ratio, please clarify whether the monocular data was from one eye alone (and how that eye was chosen) or from both eyes and then averaged, or something else. It would be useful to plot the results from the dichoptic condition in this form, as well.

      These were averaged across both eyes. We now say in the Methods section:

      ‘We confirmed in additional analyses that the monocular consensual pupil response was complete, justifying our pooling of data across the eyes.’

      Also, clarify whether the term facilitation is used as above throughout (facilitation being > 2 times monocular response under binocular condition) or if a different criterion is being used. If we take facilitation to mean a ratio > 2, then facilitation depends on temporal frequency in Figure 4.

      We now explain our use of these terms in the final paragraph of the Introduction:

      ‘Relative to the response to a monocular signal, adding a signal in the other eye can either increase the response (facilitation) or reduce it (suppression).’

      The magnitude of explicit facilitation attained is interesting, but not without precedent. Ratios of binocular to mean monocular > 2, have been reported previously and values of summation depend strongly on the stimulus used (see for example Apkarian et al., EEG Journal, 1981, Nicol et al., Doc Ophthal, 2011).

      We now mention this in the Discussion as follows:

      ‘(however we note that facilitation as substantial as ours has been reported in previous EEG work by Apkarian et al. (1981))’

      In Experiment 3, the authors say that the psychophysical matching results are consistent with the approximately linear summation effects observed in the EEG data of Experiment 1. In describing Fig. 3, the claim is that the EEG is non-linear, e.g. super-additive - at least at high contrasts. Please reconcile these statements.

      We think that the ‘superadditive’ effects are close enough to linear that we don’t want to make too much of a big deal about them - this could be measurement error, for example. So we use terms such as near-linear, or approximately linear, when referring to them throughout.

      Reviewer #3 (Recommendations For The Authors):

      Let me make some more specific comments using a page/paragraph/line format to indicate where in the text they're relevant.

      1/2 (middle)/3 from end. "In addition" seems out of place here.

      Removed.

      1/3/4. By "intensities" do you mean "contrasts"?

      Fixed.

      1/3/last. "... eyes'...".

      Fixed.

      2/5/3. By "one binocular disc", you mean into "one perceptually fused disc".

      Rewritten as: ‘to help with their perceptual fusion, giving the appearance of a single binocular disc’

      3/1/1. "calibrated" seems like the wrong word here. I think you're just changing the vergence angle to enable fusion, right?

      Now rewritten as: ‘Before each experiment, participants adjusted the angle of the stereoscope mirrors to achieve binocular fusion’

      3/1/1. "adjusting the angles...". And didn't changing the mirror angles affect the shapes of the discs in the retinal images?

      Perhaps very slightly, but this is well within the tolerance of the visual system to compensate for in the fused image, especially for such high contrast edges.

      3/3/5. "fixed contrast" is confusing here because it's still a flickering stimulus if I follow the text here. Reword.

      Now ‘fixed temporal contrast’

      3/4/1. It would be clearer to say "pupil tracker" rather than "eye tracker" because you're not really doing eye tracking.

      True, but the device is a commercial eye tracker, so this is the appropriate term regardless of what we are using it for.

      3/5/6. I'm getting lost here. "varying contrast levels" applies to the dichoptic stimulus, right?

      Yes, now reworded as ‘In the other interval, a target disc was displayed, flickering at different contrast levels on each trial, but with a fixed interocular contrast ratio across the block.’

      3/5/7. Understanding the "ratio of flicker amplitudes" is key to understanding what's going on here. More explanation would be helpful.

      Addressed in the above point.

      4/3/near end. Provide some explanation about why the Fourier approach is more robust to noise.

      Added ‘(which can make the phase and amplitude of a fitted sine wave unstable)’

      Figure 1. In panel a, explain what the numbers on the ordinate mean. What's zero, for example? Which direction is dilation? Same question for panel b. It's interesting in panel c that the response in one eye to 2Hz increases when the other eye sees 1.6Hz. Would be good to point that out in the text.

      Good idea about panel (a) - we have changed the y-axis to ‘Relative amplitude’ for clarity, and now note in the figure caption that ‘Negative values indicate constriction relative to baseline, and positive values indicate dilation.’ Panel (b) is absolute amplitude, so is unsigned. Panel (c) only contains 2Hz conditions, but there is some dichoptic suppression across the two frequencies in panels (d,e) - we now cover this in the text and include statistics.

      6/2/1. Make clear in the text that Figure 1c shows contrast response functions for the pupil.

      Now noted in the caption.

      Figure 3. I'm lost here. I feel like I should be able to construct this figure from Figures 1 and 2, but don't know how. More explanation is needed at least in the caption.

      Done. The caption now reads:

      ‘Ratio of binocular to monocular response for three data types. These were calculated by dividing the binocular response by the monocular response at each contrast level, using the data underlying Figures 2c, 3c and 3f. Each value is the average ratio across N=30 participants, and error bars indicate bootstrapped standard errors.’

      9/1/1-2. I didn't find the evidence supporting this statement compelling.

      We now point the reader to Figure 4 as a reminder of the evidence for this difference.

      9/1/6-9. You said this. But this kind of problem can be fixed by moving the methods sections as I suggested above.

      As mentioned, we feel that the results section flows better with the current structure.

      Figure 4. Make clear that this is EEG data.

      Now added to caption.

      Figure 5 caption. Infinite exponent in what equation?

      Now clarified as: ‘models involving linear combination (dotted) or a winner-take-all rule (dashed)’

      Figure 6. I hope this gets dropped. No one will understand how the model predictions were derived. And those who look at the data and model predictions will surely note (as the authors do) that they are rather different from one another.

      As noted above, we feel that the model is central to the paper and have retained this figure. We have also worked out how to correct the noise parameter in the model for the number of participants included in the coherent averaging, which fixes the discrepancy at low contrasts. The correspondence between the data and model in is now very good, and we have plotted the data points and curves in the same panels, which makes the figure less busy.

      12/1. Make clear in this paragraph that "visual cortex" is referring to EEG and perception results and that "subcortical" is referring to pupil. Explain clearly what "linear" would be and what the evidence for "non-linear" is.

      Good suggestion, we have added qualifiers linking to both methods. Also tidied up the language to make it clearer that we are talking about binocular combination specifically in terms of linearity, and spelled out the evidence for each point.

      12/2/6-9. Explain the Quaia et al results enough for the reader to know what reflexive eye movements were studied and how.

      We now specify that these eye movements are also known as the ‘ocular following response’ and were measured using scleral search coils.

      12/2/9-10. Same for Spitchan and Cajochen: more explanation.

      Added:

      “(melatonin is a hormone released by the pineal gland that regulates sleep; its production is suppressed by light exposure and can be measured from saliva assays)”

      12/3/2-3. Intriguing statements about optimally combining noisy signals, but explain this more. It won't be obvious to most readers.

      We have added some more explanation to this section.

      13/1. This is an interesting paragraph where the authors have a chance to discuss what would be most advantageous to the organism. They make the standard argument for perception, but basically punt on having an argument for the pupil.

      Indeed, we agree that this point is necessarily speculative, however we think it is interesting for the reader to consider.

      13/2/1. "Pupil size affects the ..." is more accurate.

      Fixed.

      13/2/2 from end. Which "two pathways"? Be clear.

      Changed to ‘the pupil and perceptual pathways’

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The mechanism by which STAMBPL1 mediates GRHL3 transcription through its interaction with FOXO1 is not sufficiently discussed, especially in relation to how STAMBPL1 regulates FOXO1. Some reported effects are modest.

      We appreciate the reviewer’s comments. In response, we have added a discussion on the potential mechanisms by which STAMPBL1 regulates FOXO1 transcriptional activity in Discussion, highlighted in red on page 18, lines 342 to 352. The specific reply content is as follows: “The transcriptional activity of FOXO1 is primarily regulated by its nucleocytoplasmic shuttling process (Van Der Heide, Hoekman et al. 2004). The PI3K/AKT pathway promotes the phosphorylation of FOXO1, resulting in the formation of a complex with members of the 14-3-3 family (including 14-3-3σ, 14-3-3ε, and 14-3-3ζ), which facilitates its export from the nucleus and inhibits its transcriptional activity (Huang and Tindall 2007, Tzivion, Dobson et al. 2011). It’s reported that TDAG51 prevents the binding of 14-3-3ζ to FOXO1 in the nucleus by interacting with FOXO1, thereby enhancing its transcriptional activity through increased accumulation within the nucleus (Park, Jeon et al. 2023). Our results indicate that the overexpression of STAMBPL1 and STAMBPL1-E292A did not affect the protein levels of FOXO1 (Fig.7E and Fig.S5E), but STAMBPL1 co-localizes with FOXO1 in the nucleus (Fig.7M) and interacts with it (Fig.7N and Fig.S5I-J). This suggests that STAMBPL1 enhances the transcriptional activity of FOXO1 on GRHL3 by interacting with nuclear FOXO1.” The result was added to Supplementary Figure 5 as Fig.S5E.

      Reviewer #2 (Public review):

      (1) A potential limitation of the study is the reliance on specific cellular and animal models, which may constrain the extrapolation of these findings to the broader spectrum of human TNBC biology. Furthermore, while the study provides evidence for a novel regulatory axis involving STAMBPL1, FOXO1, and GRHL3, the multifaceted nature of angiogenesis may implicate additional regulatory factors not exhaustively addressed in this research.

      We appreciate the valuable suggestions provided by the reviewer. In Discussion, we have added an in-depth discussion of the limitations of the study, as well as an analysis of the regulatory factors related to tumor angiogenesis, which highlighted in red on pages 20 to 21, lines 396 to 412. The relevant content added is as follows: “In this study, we utilized two triple-negative breast cancer cell lines, HCC1806 and HCC1937, along with human primary umbilical vein endothelial cells (HUVECs) and a nude mouse breast orthotopic transplantation tumor model to investigate the regulatory mechanism by which STAMBPL1 activates the GRHL3/HIF1α/VEGFA signaling pathway through its interaction with FOXO1, thereby promoting angiogenesis in TNBC. The results of this study have certain limitations regarding their applicability to human TNBC biology. Furthermore, in addition to the HIF1α/VEGFA signaling pathway emphasized in this study, tumor cells can continuously release or upregulate various pro-angiogenic factors, such as Angiopoietin and FGF, which activate endothelial cells, pericytes (PCs), cancer-associated fibroblasts (CAFs), endothelial progenitor cells (EPCs), and immune cells (ICs). This leads to capillary dilation, basement membrane disruption, extracellular matrix remodeling, pericyte detachment, and endothelial cell differentiation, thereby sustaining a highly active state of angiogenesis (Liu, Chen et al. 2023). It is important to collect clinical TNBC tissue samples in the future to analyze the expression of the STAMBPL1/FOXO1/GRHL3/HIF1α/VEGFA signaling axis. Furthermore, patient-derived organoid and xenograft models are useful to elucidate the regulatory relationship of this axis in TNBC angiogenesis”

      Reviewer #3 (Public review):

      The main weaknesses of this work are that the relevance of this molecular axis to the pathogenesis of TNBC is not clear, and it is not clearly established whether this is a regulatory pathway that occurs in hypoxic conditions or independently of oxygen levels.

      (1) With respect to the first point, both FOXO1 and GRHL3 have been previously described as tumor suppressors, with reports of FOXO1 inhibiting tumor angiogenesis. Therefore, this works describes an apparently contradictory function of these proteins in TNBC. While it is not surprising that the same genes perform divergent functions in different tumor contexts, a stronger evidence in support of the oncogenic function of these two genes should be provided to make the data more convincing. As an example, the data in support of high STAMBPL1, FOXO and GRHL3 gene expression in TNBC TCGA specimens provided in Figure 8 is not very strong and it is not clear what the non-TNBC specimens are (whether other breast cancers or other tumors, perhaps those tumors whether these genes perform tumor suppressive functions). To strengthen the notion that STAMBPL1, FOXO and GRHL3 are overexpressed in TNCB, the authors could provide a comparison with normal tissue, as well as the analysis of other publicly available datasets (like the NCI Clinical Proteomic Tumor Analysis Consortium as an example). Finally, is it not clear what are the basal protein expression levels of STAMBPL1 in the cell lines used in this study, as based on the data presented in Figures 2D and F it appears that the protein is not expressed if not exogenously overexpressed. It would be helpful if the authors addressed this issue and provided further evidence of STAMBPL1 expression in TNBC cell lines.

      We appreciate the suggestions. In this study, we utilized the BCIP online tool to analyze the Metabric database, incorporating adjacent normal tissues as controls. Although the expression levels of STAMBPL1, FOXO1, and GRHL3 in breast cancer tissues are not uniformly higher than those in adjacent tissues, their expression levels in triple-negative breast cancer (TNBC) are significantly elevated compared to non-TNBC. The results of this re-analysis have been added in Supplementary Figure 6 as Fig.S6A-C.

      About the question of the basal protein expression levels of STAMBPL1 in the cell lines used in this study, our response is that Fig. 2A showed the endogenous level of STAMBPL1 in HCC1806 and HCC1937. For Fig. 2D and 2F, the overexpressed STAMBPL1 was fused with a 3xFlag tag, resulting in a higher molecular weight compared to the endogenous STAMBPL1. In the revised Figure 2, we have indicated the positions of the endogenous (Endo.) and exogenous (OE.) STAMBPL1 bands with arrows.

      (2) Linked to these considerations is the second major criticism, namely that it is not made clear if this new regulatory axis is proposed to act in normoxic or hypoxic conditions. The experiments presented in this paper are performed in both conditions but a clear explanation as to why cells are exposed to hypoxia is not given and would be necessary being that HIF-1a transcription and not protein stability is being analyzed. Also, different hypoxic conditions are sometimes used, resulting in different mRNA levels of HIF-1a and its downstream targets and quite significant fluctuations within the same cell line from one experimental setting to the next. The authors should provide an explanation as to why experimental conditions are changed and, more importantly, the experiments presented in Figure 2 should be performed also in normoxia.

      Thanks for the comments. Under normoxic conditions, HIF1α is recognized by pVHL due to hydroxylation and is rapidly degraded via the proteasomal pathway. In contrast, under hypoxic conditions, HIF1α protein is accumulated. To investigate the effect of STAMBPL1 knockdown on HIF1A gene transcription levels, we conducted experiments under hypoxic conditions to avoid interference from the rapid degradation of HIF1α at the protein level, as shown in Figures 2B-C. Furthermore, under normoxic conditions, the overexpression of STAMBPL1 had been demonstrated to significantly enhance the protein levels of HIF1α and upregulate the transcription of VEGFA through HIF1α. To avoid the potential impact of excessive accumulation of HIF1α protein under hypoxic conditions on its protein level detection and the transcription of downstream VEGFA, the related experiments shown in Figure 2D-G were performed under normoxic conditions. We have explained the corresponding experimental conditions in the “Result” and “Figure legends” according to the reviewer's comments, highlighted in red.

      (3) Another critical point is that necessary experimental controls are sometimes missing, and this is reducing the strength of some of the conclusions enunciated by the authors. As examples, experiments where overexpression of STAMBPL1 is coupled to silencing of FOXO1 to demonstrate dependency lack FOXO1 silencing the absence of STAMBPL1 overexpression. Because diminishing FOXO1 expression affects HIF-1a/VEGF transcription even in the absence of STAMBPL1 (shown in Figure 7C, D), it is not clear if the data presented in Figure 7G are significant. The difference between HIF-1a expression upon FOXO1 silencing should be compared in the presence or absence of STAMBPL1 overexpression to understand if FOXO1 impacts HIF-1a transcription dependently or independently of STAMBPL1.

      Thank you for this comment. For Fig.7G-H, our experimental objective was to determine whether the activation of HIF1A/VEGFA transcription by STAMBPL1 via FOXO1. Therefore, under STAMBPL1 overexpression, we knocked down FOXO1 to investigate whether FOXO1 silencing could reverse the upregulation of HIF1A/VEGFA transcription induced by STAMBPL1 overexpression.

      (4) In addition, some minor comments to improve the quality of this manuscript are provided.

      (4.1) As a general statement, the manuscript is extremely synthetic. While this is not necessarily a negative feature, sometimes results are discussed in the figure legends and not in the main text (as an example, western blots showing HIF-1a expression) and this makes it hard to read thought the data in an easy and enjoyable manner.

      Thank you for this suggestion. We have revised the figure legends to make them clearer and more concise, highlighted in red.

      (4.2) The effect of STAMBPL1 overexpression on HIF-1a transcription is minor (Figure 2) The authors should explain why they think this is the case and whether hypoxia may provide a molecular environment that is more permissive to this type of regulation.

      Thank you for the comment. Under normoxic conditions, we conducted WB to examine the protein expression of HIF1α after the overexpression of STAMBPL1 and the knockdown of HIF1α. To visually illustrate the impact of STAMBPL1 overexpression on HIF1A protein levels, as well as the effectiveness of HIF1α knockdown, we annotated the grayscale analysis results of the bands in Figures 2D and 2F. As the reviewer pointed out, under normoxic conditions, HIF1α is rapidly degraded, which may explain why the upregulation of HIF1α protein levels by STAMBPL1 overexpression is not very pronounced.

      (4.3) HIF-1a does not appear upregulated at the protein level protein by STAMBPL1 or GRLH3 overexpression, even though this is stated in the legends of Figures 2 and 6. The authors should show unsaturated western blots images and provide quantitative data of independent experiments to make this point.

      Thank you for this comment. We have added the unsaturated image of HIF1α into Fig.2D, and performed a grayscale analysis of the HIF1α bands in Fig.2D and Fig.6A to indicate the relative protein level of HIF1α.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors previously reported that STAMBPL1 stabilizes MKP1 in TNBC. However, in this study, they focus on HIF1a. Given that STAMBPL1 affects HIF1a expression, it would be valuable to examine the levels of ROS in TNBC cells with or without STAMBPL1, as ROS is known to influence HIF1a stability.

      Thank you for your comments. It’s known that STAMBPL1 functions as a deubiquitinating enzyme. However, our study reveals that the upregulation of HIF1α by STAMBPL1 is independent of its deubiquitinating activity. This conclusion is supported by the observation that overexpression of the deubiquitinase active site mutant, STAMBPL1-E292A, also upregulated HIF1α expression (Figure 1F). Moreover, STAMBPL1 overexpression enhanced HIF1α transcription (Figures 4E and S3E), while STAMBPL1 knockdown was able to inhibit the transcription of HIF1α (Figures 2B-C). These results indicate that STAMBPL1 mediates the transcription of HIF1α but does not affect the stability of HIF1α. For these reasons, we think that it is unnecessary to examine the ROS levels.

      (2) Figure 1A: The regulation of HIF1a mRNA by STAMBPL1, but not its protein levels, could be better addressed by using MG132 to rule out the impact of protein degradation.

      Thanks for this comment. Under normoxic conditions, the oxygen-sensitive prolyl hydroxylases PHD1-3 act on HIF1α, specifically inducing hydroxylation at the proline 402 and 564 residues. These hydroxylated residues are recognized by the pVHL/E3 ubiquitin ligase complex, leading to ubiquitination and subsequent degradation via the proteasome pathway. Conversely, under hypoxic conditions, PHD1-3 are inactivated, and non-hydroxylated HIF1α is not recognized by the pVHL/E3 ubiquitin ligase complex, thereby avoiding ubiquitination and proteasomal degradation (DOI: 10.1073/pnas.95.14.7987, DOI: 10.1515/BC.2004.016, and DOI: 10.1042/BJ20040620). The mechanism of HIF1α accumulation under hypoxia is analogous to the action of the proteasome inhibitor MG132. When we treated cells with hypoxia, the ubiquitination and proteasomal degradation pathway of HIF1α was blocked. At this time, STAMBPL1 knockdown could downregulate the expression of HIF1α (Fig.1A). Meanwhile, since the knockdown of STAMBPL1 significantly downregulated the mRNA level of HIF1α under hypoxia (Fig.2B-C), we concluded that STAMBPL1 affects the expression of HIF1α by mediating its transcription. In addition, MG132 will block all proteasomal substrate degradation and may affect HIF1α mRNA levels indirectly.

      (3) Figure 2D and 2F: The effect of STAMBPL1 in promoting HIF1a expression is quite mild, and the effect of HIF1a knockdown is also modest. Given the high levels of STAMBPL1 in TNBC cell lines (Figure 2A), it would be better to repeat these experiments in a STAMBPL1-knockdown setting for clearer insights.

      We appreciate this insightful suggestion. Considering that the regulation of HIF1α expression by STAMBPL1 occurs at the transcriptional level, and to prevent excessive accumulation of HIF1a during hypoxia that could confound the effect of STAMBPL1 overexpression on HIF1α regulation, we opted to overexpress STAMBPL1 under normoxic conditions and subsequently knock down HIF1α, as shown in Fig.2D and Fig.2F. This approach allowed us to observe that STAMBPL1 overexpression can upregulate HIF1a expression to some extent. Additionally, in response to the reviewer's suggestion to knock down STAMBPL1, we have conducted the corresponding experiments, with results presented in Fig.1A-E and Fig.2B-C.

      (4) Figure 4A: Why does the RNA-seq pattern differ significantly between the two siRNAs? Additionally, the authors should clarify why they focus primarily on transcription factors, as other mechanisms, such as mRNA stability and RNA modification, could also influence gene transcription.

      Thank you for this comment. Two siRNAs for STAMBPL1 were designed and synthesized by a biotechnology company. Although both siRNAs target STAMBPL1, they target different sequences. While both siRNAs effectively knocked down STAMBPL1 (Fig. 1A and Fig. 2A), the possibility of off-target effects cannot be completely ruled out. Therefore, we needed to use two siRNAs simultaneously for RNA-seq, ensuring that the gene expression changes observed are due to the knockdown of STAMBPL1 by focusing on genes downregulated by both two siRNAs. Additionally, among the 27 genes downregulated by both two siRNAs, only 18 genes were annotated. Of these 18 genes, except for GRHL3, which is a transcription factor reported to be involved in gene transcription regulation, the remaining 17 genes have no documented association with RNA transcription, stability, or modification. Therefore, we focused on the GRHL3 gene.

      (5) Figure 5G: To investigate whether STAMBPL1 and GRHL3 function epistatically in the pathway, a double knockdown of STAMBPL1 and GRHL3 should be examined. Additionally, a double knockdown of STAMBPL1 and FOXO1 should be assessed.

      Thank you for your comment. In Figure 5G, we aimed to assess the knockdown efficiency of GRHL3 using siRNAs. To determine whether STAMBPL1 upregulates the HIF1a/VEGFA axis via GRHL3, we overexpressed STAMBPL1 and subsequently knocked down GRHL3. Our findings indicated that STAMBPL1 overexpression indeed enhanced the HIF1a/VEGFA axis, which was rescued by the knockdown of GRHL3, as shown in Figures 4E-F and S3E-F. Similarly, upon overexpressing STAMBPL1 and knocking down FOXO1, we observed that STAMBPL1 overexpression increased the GRHL3/HIF1a/VEGFA axis, which could also be rescued by knocking down FOXO1, as shown in Figures 7F-H. These results suggest that STAMBPL1 upregulates the GRHL3/HIF1a/VEGFA axis through FOXO1. We do not think it is a right way to double knock down STAMBPL1 and FOXO1 or GRHL3.

      (6) Figure 7: It remains unclear how STAMBPL1 regulates FOXO1. The authors show that STAMBPL1 increases the transcriptional activation of FOXO1 at the GRHL3 promoter, but it is not clear if STAMBPL1 is required for FOXO1 binding to the GRHL3 promoter. To address this, STAMBPL1-knockdown should be included to examine its effect on FOXO1 binding to the GRHL3 promoter. Furthermore, it would be important to determine whether the STAMBPL1-FOXO1 interaction is essential for GRHL3 transcription. Since the interaction sites of STAMBPL1-FOXO1 have been mapped, a mutant disrupting the interaction would provide better insight into how STAMBPL1 promotes GRHL3 transcription by interacting with FOXO1.

      Thank you for this comment. It has been reported that FOXO1 promotes the transcription of the GRHL3 gene by interacting with its promoter (DOI: 10.1093/nar/gkw1276). We also verified through ChIP assay that FOXO1 can bind to the promoter of GRHL3 gene (Fig.7I) and mediate its transcription. Specifically, knocking down FOXO1 significantly down-regulated the mRNA level of GRHL3 (Fig.7B), and the GRHL3 promoter lacking FOXO1 binding site almost completely lost transcriptional activity (Fig.7J), indicating that FOXO1 is crucial for the transcriptional activity of the GRHL3 promoter. Overexpression of STAMBPL1 enhances the activating effect of FOXO1 on the transcriptional activity of the GRHL3 promoter (Fig.7K). However, the up-regulation of GRHL3 transcription by overexpression of STAMBPL1 is completely blocked by FOXO1 knockdown (Fig.7F), and the knockdown of FOXO1 essentially blocks the binding of STAMBPL1 to the GRHL3 promoter (Fig.7L), suggesting that STAMBPL1 affects the transcriptional expression of GRHL3 based on FOXO1. As we added in Discussion, the transcription factor activity of FOXO1 is mainly regulated by its nucleoplasm shuttling process, and the accumulation of FOXO1 in nucleus can enhance its transcription factor activity (DOI: 10.1042/BJ20040167; DOI: 10.15252/embj.2022111867). In our research, neither STAMBPL1 nor its mutant of deubiquitinating enzyme site affected the expression of FOXO1 (Fig.S5E), but STAMBPL1 and FOXO1 co-located in the nucleus (Fig.7M), and they interacted with each other (Fig.7N, Fig.S5I-J). Therefore, we speculate that STAMBPL1 interacts with FOXO1 in the nucleus, obstructs the binding of FOXO1 with the members of 14-3-3 family, inhibits the export of FOXO1, thereby enhancing its transcriptional activity. This interaction between STAMBPL1 and FOXO1 does not necessarily affect the binding of FOXO1 with DNA, including the GRHL3 promoter.

      (7) Figure 8 A-C: What is the correlation among the expressions of STAMBPL1, FOXO1, and GRHL3 in TNBC tumors compared to non-TNBC tumors?

      Thank you for your comment. In Figure 8A-C, we analyzed the expression levels of STAMBPL1, FOXO1, and GRHL3 in both TNBC and non-TNBC samples using the BCIP. The results indicate that the expression levels of these three genes are significantly higher in TNBC compared to non-TNBC samples. To investigate the correlation among the expressions of STAMBPL1, FOXO1, and GRHL3 in TNBC versus non-TNBC, we further utilized the Metabric data. Besides the positive correlation trend between STAMBPL1 and GRHL3 expression in TNBC clinical samples (Pearson R = 0.27), no significant correlation was observed in the expression levels of STAMBPL1, FOXO1, and GRHL3 in TNBC and non-TNBC clinical samples (as shown in Author response image 1 below). Since STAMBPL1 and FOXO1 are involved as protein molecules in the transcriptional regulation of GRHL3 gene, and the data obtained from the Metabric database are the transcriptional levels of these three genes, this might be the reason why the correlation between their expressions was not observed.

      Author response image 1.

      Reviewer #2 (Recommendations for the authors):

      The authors have thoroughly elucidated the role of STAMBPL1 in TNBC. However, it would be beneficial to discuss the potential clinical implications of these findings, such as how targeting STAMBPL1 or FOXO1 might impact current treatment strategies for TNBC. However, several issues need to be addressed.

      Major:

      (1) While the study provides an exhaustive analysis of the molecular mechanisms, a comparison with other subtypes of breast cancer could enhance our understanding of the specificity of the STAMBPL1/FOXO1/GRHL3/HIF1α/VEGFA axis in TNBC.

      Thank you for your comment. According to report, STAMBPL1 is significantly associated with the mesenchymal characteristics of breast cancer (DOI: 10.1038/s41416-020-0972-x). We utilized cBioPortal (http://www.cbioportal.org/) to analyze the expression of STAMBPL1 across various clinical subtypes of breast cancer. The results indicated that STAMBPL1 is highly expressed in invasive breast cancer, which has been added to Supplementary Figure 6 as Fig.S6D. Given that TNBC is an aggressive type of invasive breast cancer, we further examined the expression of STAMBPL1 in TNBC compared to non-TNBC using BCIP (http://omicsnet.org/bcancer/database). Our findings revealed that the expression level of STAMBPL1 in TNBC was elevated relative to its levels in non-TNBC (Fig.8A). Additionally, since tumor angiogenesis is a critical factor influencing the metastasis of cancer cells, our study focused specifically on the pro-angiogenic effects of STAMBPL1 in TNBC.

      (2) The authors might consider discussing any potential off-target effects of the siRNA and shRNA used in the study to bolster the conclusions drawn from the knockdown experiments.

      We appreciate the reviewer's suggestion. It is well-known that siRNA or shRNA have off-target effects. To address this concern, we employed two siRNAs for each gene knockdown in our study. Specifically, we knocked down genes such as STAMBPL1, FOXO1, GRHL3, and HIF1A in two TNBC cell lines, HCC1806 and HCC1937, using two siRNAs. Except for siRNA#1 targeting HIF1A, which did not show a significant knockdown effect in HCC1806 cells (Fig.2D and Fig.6A), the knockdown effects of other siRNAs on their respective genes were effective, and the resulting phenotypes were consistent. As shown in Fig.2F and Fig.S4H, siRNA#1 targeting HIF1A had a significant knockdown effect in HCC1937 cells. The lower knockdown efficiency of this siRNA in HCC1806 cell line might be attributed to cell-specific factors.

      (3) It would be advantageous if the authors could provide further details on the patient demographics and tumor characteristics in the TCGA database analysis to better comprehend the clinical relevance of their findings.

      Thanks for the reviewer's suggestions. We have now indicated the number of clinical samples in each group in the legend of Fig.8A-C. Since we utilized the BCIP online database to analyze and compare the expression levels of the three genes STAMBPL1, FOXO1, and GRHL3 in TNBC and non-TNBC, we are unable to obtain more specific information regarding the tumor characteristics of each sample. However, our analysis clearly shows that the expression levels of these three genes are significantly higher in TNBC compared to non-TNBC.

      (4) The authors should consider discussing any limitations regarding the generalizability of their findings, such as potential variations among different TNBC subtypes or the specificity of their observations to certain stages of the disease.

      We appreciate the reviewer's comment. Accordingly, we have added a discussion on the limitation of this study in Discussion, highlighted in red font on pages 20 to 21, lines 396 to 412. In addition, we utilized the bc-GenExMiner online database to conduct a comparative analysis of STAMBPL1 expression in different subtypes of non-TNBC and TNBC. The result indicates that STAMBPL1 is highly expressed in mesenchymal-like and basal-like TNBC, which has been added into Supplementary Figure 6 as Fig.S6E. Since these two subtypes of TNBC are highly invasive and metastatic, it suggests that targeting the signaling pathway of STAMBPL1/FOXO1/GRHL3/HIF1α/VEGFA may offer clinical benefits for patients with invasive TNBC.

      Minor:

      The paper is generally well-written, but it's crucial to maintain vigilance for subject-verb agreement, proper use of tense, and consistent terminology.

      Thank you for this suggestion. We have thoroughly revised the article for issues such as grammar, including tense, subject-verb agreement, and terminology.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      In this manuscript (eLife-RP-RA-2024-103904), the authors identified that NOLC1 was upregulated in gastric cancer samples, which promoted cancer progression and cisplatin resistance. They further found that NOLC1 could bind to p53 and decrease its nuclear transcriptional activity, then inhibit p53-mediated ferroptosis. There are several major concerns regarding the conclusions.

      Strengths:

      This study identified that NOLC1 could bind to p53 and decrease its nuclear transcriptional activity, then inhibit p53-mediated ferroptosis in gastric cancer.

      Weaknesses:

      The major conclusions were not sufficiently supported by the results. The experiments were not conducted in a comprehensive manner.

      Major concerns

      (1) The authors investigated NOLC1 expression in gastric cancer (GC) using clinical samples, which is valuable; however, the sample array includes only 3 patients. This sample size is insufficient to support conclusions for human samples. Please increase the sample size and apply a more robust statistical analysis. Additionally, specify the statistical methods used in the figure legend.

      Thanks very much for the kind comments and great suggestions. As suggested, we have increased the sample size of GC patients, and the new data (six pair samples) was shown in Fig. S1A, further reflecting that NOLC1 was upregulate in gastric cancer (GC). Moreover, the statistical methods have been added in each figure legend.

      (2) These data are not sufficient to support the key conclusion of this study "NOLC1 is significantly upregulated in GC tissues and Cis-resistant GC cells". There is no convincing data showing that NOLC1 upregulation is specific to cancer cells or any other cell types. Based on the following results that NOLC1 expressed in cancer cells can support cancer cell survival and drug resistance, the authors switched to investigating the role of NOLC1 in cancer cells without demonstrating cancer cells indeed highly upregulate NOLC1.

      Thanks for raising this good question. As shown in Fig. 1E-F, the TCGA database have shown that NOLC1 was upregulated in GC. Moreover, we further analyzed the NOLC1 expression level in other cancer type, according to the Human Protein Atlas (https://www.proteinatlas.org/). The results indicated that NOLC1 mRNA level was much higher in almost all cancers except acute myeloid leukemia (LAML). In addition, according to the gene expression profiling interactive analysis (GEPIA, http://gepia.cancer-pku.cn/index.html), NOLC1 mRNA level was above 100 nTPM in most gastric cancer cell lines, however in most non-cancerous cell lines was below 100 nTPM, indicating that NOLC1 was up-regulated in gastric cancer.

      Author response image 1.

      The mRNA level of NOLC1 in different GC cells and non-cancerous cells.

      (3) The authors primarily use MGC-803 cells for experiments; however, MGC-803 is known to be a HeLa-contaminated cell line. Could the authors explain this choice of using this cell line only? Did they validate key findings with additional cell lines? This is particularly important for assays such as cisplatin resistance validation, in vivo experiments, TEM imaging, and MitoPeDPP fluorescence imaging.

      Thanks for raising this good question. We are not only use MGC-803 cells, the key findings in vitro was also validated in MKN-45 cells (Fig. 2), and in vivo experiment also validated in Mouse Forestomach Carcinoma cells (MFC)-tumor bearing 615 mice model (Fig 7). Furthermore, we further added some experiments in MKN-45 cells. The TEM imaging showed that NOLC1 could significantly inhibit cisplatin (Cis) induced lipid membrane damage in MKN-45 cells (Fig. S6A). Moreover, MitoPeDPP fluorescence assay analyzed by FCAs also indicating that rapid ROS was enriched in mitochondria in MKN-45 cells (Fig. 4E, Fig. S6J).

      (4) In Figure 2, did the authors perform assays with NOLC1 overexpression? If so, please include these results to strengthen the conclusions.

      Thanks very much for the kind comments and great suggestions. As suggested, we added new data about NOLC1 overexpression assay Cell counting kit-8 assay shows that NOLC1-overexpression group is more resistance to Cis compared to vector group (Fig. S4E, S5A).

      (5) The authors show in Figures 2A-B that shNOLC1 without cisplatin treatment does not affect cell viability. However, Figures 2D-E suggest increased apoptosis in shNOLC1 cells without cisplatin treatment. Additionally, in vivo studies in Figure 3 show no significant difference between the shNC+PBS and shNOLC1+PBS groups, which appears contradictory to the apoptosis assays. Similarly, Ki67 staining shows decreased scores in the shNOLC1 group compared to shNC. Could the authors clarify this inconsistency?

      Thanks for raising this good question. In Fig 2D-E, the difference in proportion of death cells between shNOLC1 and shNC treated with PBS groups were only 3% (MGC-803) and 7% (MKN-45) which is much lower than that treated with cisplatin in vitro. Moreover, in vivo analysis indicated that the average tumor volume in NOLC1+PBS group was smaller than that in NC group, but there was no statistical significance (p value = 0.3962). Moreover, tumor proliferation is a complex process regulated by many factors [1,2], thus the level of Ki67 is by no means the same as the rate of tumor proliferation, might be positively correlated.

      (6) In Figure 4, NOLC1 knockdown appears to enhance cisplatin-induced ferroptosis rather than apoptosis. Given p53's role in apoptosis, did the authors compare the effects of NOLC1 on cisplatin-induced apoptosis vs. ferroptosis? If so, please clarify whether NOLC1 predominantly regulates apoptosis or ferroptosis.

      Thanks for raising this good question. We do have compared the effects of NOLC1 on cisplatin-induced apoptosis vs. ferroptosis. As shown in Fig. 5A, NOLC1 knockdown obviously increased the BCL-2 protein level which is an anti-apoptotic protein and mediated by p53 via protein interaction in cytoplasm[3,4], this phenomenon may cause by the increasing level of p53 in cytoplasm (Fig. 6I). Also, the TEM imaging showed the classic ferroptotic morphological changes rather than apoptosis (Fig. 5A, S6A). Taken together, NOLC1 mainly regulates p53 mediated ferroptosis rather than apoptosis.

      (7) Did the authors perform co-IP assays with p53 or HA antibodies to immunocapture NOLC1? If not, please add this experiment to support protein interactions. The mechanistic correlation between p53 and NOLC1 can be supported by adding experiments using multiple GC cell lines with various p53 alterations (such as loss-of- function or gain-of-function mutations/deletions). This is critical because the authors specifically claimed that NOLC1 can inhibit p53-mediated ferroptosis, but not other tumor suppressors.

      Thanks very much for the kind comments and great suggestions. As suggested, we had performed Co-IP assay with anti-HA antibodies to immunocapture NOLC1-FLAG. As shown in Fig. 5K, p53 DNA binding domain (DBD)-HA could immunocapture with NOLC1, further indicated that NOLC1 could binding to p53 DBD. Moreover, we concur with the reviewer that adding experiments using multiple p53 alterations, however considering that different p53 mutants have completely different functional changes. Therefore, we using siRNA to knockdown p53 level in MGC-803 cells, the results shown that NOLC1 mediated resistance was disappear and the GPX4 level was increased (Fig. S10). These data have shown that NOLC1 promotes GC resistance via mediated p53 functions.

      (8) In Figure S5B, the LDH release can be blocked by Fer-1?

      Thanks for raising this good question. As suggested, Fer-1 (20 μmol/mL) significantly blocked the LDH release in NOLC1 knockdown group (Fig S6E). This data further confirmed that NOLC1 suppressed Cis-induced ferroptosis.

      (9) How about the ubiquitination assay in MGC-803 cells?

      Thanks for raising this good question. As suggested, we also analyzed the ubiquitination assay in MGC-803 cells. As the result showed that NOLC1 also could increasing level of ubiquitination of p53 (Fig. 6H).

      (10) In Figure 6H, the DBD domain of NOLC1 is required for inhibiting P53 ubiquitination.

      Thanks for your opinion. However, in our paper, we only mentioned that p53 DBD domain, rather than NOLC1 DBD domain. Also, we did not find any DNA binding function of NOLC1 in the Pubmed database. Therefore, we would like to ask whether the revised opinion is correct.

      (11) In Figure 8B, the CD3 antibody is not specific, please change it to a new one.

      Thanks very much for the kind comments and great suggestions. As suggested, we have used new CD3 antibody and the new data was added in Fig. 8B.

      (12) The authors report that NOLC1 influences peripheral blood lymphocytes with cisplatin treatment, with or without PD-1. Could the authors explain why NOLC1 would affect peripheral blood lymphocytes? Additionally, did they assess immune cell infiltration in the tumor microenvironment (TME) by flow cytometry?

      Thanks for raising good question. The tumor size of the knockdown group treated with Cis + PD-1 was too small (less than 100 mg) to extract enough infiltrated immune cells (less than 10000 CD45<sup>+</sup> cells), thus we chose to detect immune cells in the blood of the mice. Considering that the infiltrating immune cells including CTLs were originate from peripheral blood by circulation. Under the normal conditions, serval tumor biology behavior impact the TME to limit immune responses and present barriers to cancer therapy. For example, tumor could express or secret lots of negative regulator like PD-L1. Causing immune cells cannot recognize tumor cells and infiltrate into tumor tissue. Ferroptosis, as a new from of ICD, could damage tumor cell plasm and release amount of tumor associated antigen and tumor-specific antigens causing immune cells priming and activation. Eventually, the activated immune cells in peripheral blood travel towards the tumor site, infiltrating the tumor tissue under favorable co-stimulatory conditions and guided by chemokine gradients. Once within the tumor microenvironment, these activated T cells can control tumor growth through direct tumor cell destruction and cytokine-mediated processes [5–8]

      To assess immune cell infiltration in the TME, we analyzed the tumor infiltrated CD3<sup>+</sup> and CD8<sup>+</sup> immune cells in tumor tissue by immunofluorescence (Fig. 8B). Thus, the peripheral blood lymphocytes could reflect the infiltration of immune cells in the tumor.

      Minor concerns:

      (1) Please clarify the statistical methods in each figure legend.

      Thanks for your opinion. We have added statistical methods in each figure legend.

      (2) In Figure 2D, please provide statistical data of cleaved-caspase3 expression.

      Thanks for your opinion. As is shown in Fig. S5B-C, the relative cleaved-caspase3 were provided.

      (3) Please ensure that the canonical expressions used in the research paper are adhered to.

      Thanks for your opinion. We have carefully modified our expressions in our paper.

      (4) Please pay more attention to the grammar and formatting of texts.

      Thanks for your opinion. We revised our manuscript through the American Journal Experts (AJE) service.

      Reviewer #2:

      Summary:

      Shengsheng Zhao et al. investigated the role of nucleolar and coiled-body phosphoprotein 1 (NOLC1) in relegating gastric cancer (GC) development and cisplatin-induced drug resistance in GC. They found a significant correlation between high NOLC1 expression and the poor prognosis of GC. Meanwhile, upregulation of NOLC1 was associated with cis-resistant GC. Experimentally, the authors demonstrate that knocking down NOLC1 increased GC sensitivity to Cis possibly by regulating ferroptosis. Mechanistically, they found NOLC1 suppressed ferroptosis by blocking the translocation of p53 from the cytoplasm to the nucleus and promoting its degradation. In addition, The authors also evaluated the effect of combinational treatment of anti- PD-1 and cisplatin in NOLC1-knockdown tumor cells, revealing a potential role of NOLC1 in the targeted therapy for GC.

      Strengths:

      Chemoresistance is considered a major reason causing failure of tumor treatment and death of cancer patients. This paper explored the role of NOLC1 in the regulation of Cis-mediated resistance, which involves a regulated cell death named ferroptosis. These findings provide more evidence highlighting the study of regulated cell death to overcome drug resistance in cancer treatment, which could give us more potential strategies or targets for combating cancer.

      Weaknesses:

      More evidence supporting the regulation of ferroptosis induced by Cisplatin by NOLC1 should be added. Particularly, the role of ferroptosis in the cisplatin-resistance should be verified and whether NOLC1 regulates ferroptosis induced by additional FINs should be explored. Besides, the experiments to verify the regulation of ferroptosis sensitivity by NOLC1 are sort of superficial. The role of MDM2/p53 in ferroptosis or cisplatin resistance mediated by NOLC1 should be further studied by genetic manipulation of p53, which is the key evidence to confirm its contribution to NOLC1 regulation of GC and relative cell death.

      Major points:

      (1) More evidence supporting the regulation of ferroptosis induced by Cisplatin by NOLC1 should be added. Particularly, the role of ferroptosis in the cisplatin-resistance should be verified and whether NOLC1 regulates ferroptosis induced by additional FINs should be explored.

      Thanks very much for the kind comments and great suggestions. As suggested, we have further analyzed the ferroptosis inhibit ability of NOLC1 in MGC-45 cells treated with Erastin, a common used ferroptosis activator. As shown in Fig. S6B, the ferroptosis activated by Erastin was also blocked by NOLC1.

      (2) In Figure 1J, the CR cell line should obviously have less apoptosis-maker c-PARP expression, which means these cells are resistant to apoptosis induced by CR. Thus, it would be more rational to study the role of apoptosis regulation by NOLC1. Why did the later data shift to the study of ferroptosis?

      Thanks for raising this good question. In the CR cells, the expression levels of many genes were changed, so it is uncertain whether the decreased expression level of cleaved-PARP in the resistant cells is caused by NOLC1 up-regulated. To explore the specific mechanism of NOLC1 mediated resistant, we performed the TEM imaging (Fig. 4A, S6A) and the results showed that cells exhibited classic ferroptosis morphological changes. Moreover, the BCL-2 (an anti-apoptotic protein, and regulated by p53 via protein interaction in cytoplasm) was increased after NOLC1 knockdown (Fig S5A). This phenomenon may cause by the increasing p53 levels in the cytoplasm[3,4] (Fig 5I). Taken together we shift to study of cisplatin induced ferroptosis.

      (3) Besides, how about the regulation of apoptosis during cis-resistance by NOLC1 in GC?

      Thanks for raising this good question. As mentioned above the Cis induced apoptosis was not as significant as ferroptosis, caused by BCL-2 (a key anti-apoptosis protein) increasing which is mediated by p53 via protein interaction in cytoplasm. NOLC1 increased plasm p53 level subsequently increased BCL-2 level.

      (4) The experiments to verify the regulation of ferroptosis sensitivity by NOLC1 are sort of superficial. The role of MDM2/p53 in ferroptosis or cisplatin resistance mediated by NOLC1 should be further studied by genetic manipulation of p53, which is the key evidence to confirm its contribution to NOLC1 regulation of GC and relative cell death.

      Thanks for raising this good question. As is shown in Fig S10, after knockdown p53 protein level by using siRNA, NOLC1 could not promote Cis-resistance and the GPX4 level was increased reflecting that NOLC1 promotes Cis resistance via mediate p53 function.

      (5) In Figure 2, the data indicated that the knockdown of NOLC1 increased rH2Ax in the presence of Cisplatin, which indicated that NOLC1 might regulate DNA damage-related cellular function. These functions should be more relevant to cisplatin resistance, considering the fundamental effect of this chemo drug.

      Thanks very much for the kind comments and great suggestions. Indeed, we found that DNA damage was more obvious in knockdown groups, but the ferroptotic changes like ROS and mitochondrial membrane damage were also significantly different in knockdown groups. Considering that as a chemo drug, cisplatin not only induces damage DNA but also acts as a stress which could activates various signal pathways including apoptosis, ferroptosis, pyroptosis, necroptosis, etc., under different drug concentrate or time [9–11]. Therefore, it is important to find out the NOLC1 predominantly blocked pathway in GC.

      (6) In Figure.4, ferroptosis inhibitors like Ferr-1 or DFO should be used to verify the regulation of ferroptosis by Cisplatin and NOLC1.

      Thanks very much for the kind comments and great suggestions. As suggested, we performed additional LDH release assay. The results showed that Fer-1 also could block cisplatin induced LDH release in NOLC1 knockdown groups (Fig. S6E).

      (7) In Figure 4H, Cisplatin decreased FSP1 and GPX4, which could be enhanced in the NOLC1-konckdown cell line. Meanwhile, the knockdown of NOLC1 increased the ACSL4 level. These findings could be the key reason for the regulation of ferroptosis by NOLC1 rather than p53 since they all are direct regulators of ferroptosis.

      Thanks very much for the kind comments and great suggestions. We rewrote the text as you suggested. Recently, it also has been reported that ACSL4-regulated ferroptosis is related to p53, but the exact mechanism is still unclear [12]. Moreover, further studies of specific relation between NOLC1 and FSP1/ACSL4 will be conducted in the further

      (8) Whether p53 mediates the regulation of ferroptosis and cisplatin resistance by NOLC1 should be thoroughly studied using p53-KO cell lines.

      Thanks very much for the kind comments and great suggestions. As previously mentioned, by using si-RNA to knockdown p53, the NOLC1 mediate Cis-resistance were blocked (Fig. S10). Meanwhile, the GPX4 level was also increased in p53/NOLC1 double-knockdown groups compared to the NOLC1 knockdown group. These data indicating that NOLC1 suppresses ferroptosis via mediating p53 functions.

      Reviewer #3:

      The authors have put forth a compelling argument that NOLC1 is indispensable for gastric cancer resistance in both in vivo and in vitro models. They have further elucidated that NOLC1 silencing augments cisplatin-induced ferroptosis in gastric cancer cells. The mechanistic underpinning of their findings suggests that NOLC1 modulates the p53 nuclear/plasma ratio by engaging with the p53 DNA Binding Domain, which in turn impedes p53-mediated transcriptional regulation of ferroptosis. Additionally, the authors have shown that NOLC1 knockdown triggers the release of ferroptosis-induced damage-associated molecular patterns (DAMPs), which activate the tumor microenvironment (TME) and enhance the efficacy of the anti-PD-1 and cisplatin combination therapy.

      Strengths:

      The manuscript presents a robust dataset that substantiates the authors' conclusion. They have identified NOLC1 as a potential oncogene that confers resistance to immuno-chemotherapy in gastric cancer through the mediation of ferroptosis and subsequent TME reprogramming. This discovery positions NOLC1 as a promising therapeutic target for gastric cancer treatment. The authors have delineated a novel mechanistic pathway whereby NOLC1 suppresses p53 transcriptional functions by reducing its nuclear/plasma ratio, underscoring the significance of p53 nuclear levels in tumor suppression over total protein levels.

      Weaknesses:

      While the overall findings are commendable, there are specific areas that could benefit from further refinement. The authors have posited that NOLC1 suppresses p53- mediated ferroptosis; however, the mRNA levels of ferroptosis genes regulated by p53 have not been quantified, which is a critical gap in the current study. In Figure 4A, transmission electron microscopy (TEM) results are reported solely for the MGC-803 cell line. It would be beneficial to include TEM data for the MKN-45 cell line to strengthen the findings. The authors have proposed a link between NOLC1-mediated reduction in the p53 nuclear/plasma ratio and gastric cancer resistance, yet the correlation between this ratio and patient prognosis remains unexplored, which is a significant limitation in the context of clinical relevance.

      Thanks very much for the kind comments and great suggestions. As suggested, recently studies have reported that CDKN1A (also called p21, a p53 transcriptional mediated protein) could promotes ferroptosis[13], the mRNA levels of ferroptosis genes regulated by p53 have were quantified in Fig. S8G-H. Moreover, we further proceed TEM imaging in MKN-45 cells, the result was consistent to MGC-803 cells, reflecting that NOLC1 has a broad spectrum of promoting drug resistance in gastric cancer. Also, recently studies have reported that p53 transcriptional active and p53 transcriptional inactive types include patients with intermediate prognosis and recurrence rates, with the p53-acvtie group showing better prognosis[14]. Considering p53 transcriptional activity depends on p53 nuclear accumulation, we assume that the low level of p53 nuclear/plasma may cause poor prognosis in gastric cancer. Meanwhile we will further collect enough samples and their prognostic information to analysis NOLC1-mediated reduction in the p53 nuclear/plasma ratio and gastric cancer resistance.

      References

      (1) Z. Seferbekova, A. Lomakin, L.R. Yates, M. Gerstung, Spatial biology of cancer evolution, Nat Rev Genet 24 (2023) 295–313. https://doi.org/10.1038/s41576-022-00553-x.

      (2) T. Matsuoka, M. Yashiro, Molecular Mechanism for Malignant Progression of Gastric Cancer Within the Tumor Microenvironment, IJMS 25 (2024) 11735. https://doi.org/10.3390/ijms252111735.

      (3) Y. Liu, Z. Su, O. Tavana, W. Gu, Understanding the complexity of p53 in a new era of tumor suppression, Cancer Cell (2024) S1535610824001338. https://doi.org/10.1016/j.ccell.2024.04.009.

      (4) R. Pan, V. Ruvolo, H. Mu, J.D. Leverson, G. Nichols, J.C. Reed, M. Konopleva, M. Andreeff, Synthetic Lethality of Combined Bcl-2 Inhibition and p53 Activation in AML: Mechanisms and Superior Antileukemic Efficacy, Cancer Cell 32 (2017) 748-760.e6. https://doi.org/10.1016/j.ccell.2017.11.003.

      (5) E. Catanzaro, M. Beltrán-Visiedo, L. Galluzzi, D.V. Krysko, Immunogenicity of cell death and cancer immunotherapy with immune checkpoint inhibitors, Cell Mol Immunol 22 (2024) 24–39. https://doi.org/10.1038/s41423-024-01245-8.

      (6) G. Lei, L. Zhuang, B. Gan, The roles of ferroptosis in cancer: Tumor suppression, tumor microenvironment, and therapeutic interventions, Cancer Cell 42 (2024) 513–534. https://doi.org/10.1016/j.ccell.2024.03.011.

      (7) E. Catanzaro, R. Demuynck, F. Naessens, L. Galluzzi, D.V. Krysko, Immunogenicity of ferroptosis in cancer: a matter of context?, Trends in Cancer 10 (2024) 407–416. https://doi.org/10.1016/j.trecan.2024.01.013.

      (8) X. Jiang, B.R. Stockwell, M. Conrad, Ferroptosis: mechanisms, biology and role in disease, Nat Rev Mol Cell Biol 22 (2021) 266–282. https://doi.org/10.1038/s41580-020-00324-8.

      (9) J.-L. Roh, E.H. Kim, H. Jang, D. Shin, Nrf2 inhibition reverses the resistance of cisplatin-resistant head and neck cancer cells to artesunate-induced ferroptosis, Redox Biology 11 (2017) 254–262. https://doi.org/10.1016/j.redox.2016.12.010.

      (10) X. Wang, Y. Zhou, D. Wang, Y. Wang, Z. Zhou, X. Ma, X. Liu, Y. Dong, Cisplatin-induced ototoxicity: From signaling network to therapeutic targets, Biomedicine & Pharmacotherapy 157 (2023) 114045. https://doi.org/10.1016/j.biopha.2022.114045.

      (11) J. Liang, G. Bi, Y. Huang, G. Zhao, Q. Sui, H. Zhang, Y. Bian, J. Yin, Q. Wang, Z. Chen, C. Zhan, MAFF confers vulnerability to cisplatin-based and ionizing radiation treatments by modulating ferroptosis and cell cycle progression in lung adenocarcinoma, Drug Resistance Updates 73 (2024) 101057. https://doi.org/10.1016/j.drup.2024.101057.

      (12) M.Y. Kosim, T. Fukazawa, M. Miyauchi, N. Hirohashi, K. Tanimoto, p53 status modifies cytotoxic activity of lactoferrin under hypoxic conditions, Front. Pharmacol. 13 (2022) 988335. https://doi.org/10.3389/fphar.2022.988335.

      (13) Q. Gao, J. Chen, C. Li, J. Zhan, X. Yin, B. Li, H. Dong, L. Luo, Z. Li, CDKN1A promotes Cis-induced AKI by inducing cytoplasmic ROS production and ferroptosis, Food and Chemical Toxicology 193 (2024) 115003. https://doi.org/10.1016/j.fct.2024.115003.

      (14) R. Cristescu, Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes, Nature Medicine (2015).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study aimed at replicating two previous findings that showed (1) a link between prediction tendencies and neural speech tracking, and (2) that eye movements track speech. The main findings were replicated which supports the robustness of these results. The authors also investigated interactions between prediction tendencies and ocular speech tracking, but the data did not reveal clear relationships. The authors propose a framework that integrates the findings of the study and proposes how eye movements and prediction tendencies shape perception.

      Strengths:

      This is a well-written paper that addresses interesting research questions, bringing together two subfields that are usually studied in separation: auditory speech and eye movements. The authors aimed at replicating findings from two of their previous studies, which was overall successful and speaks for the robustness of the findings. The overall approach is convincing, methods and analyses appear to be thorough, and results are compelling.

      Weaknesses:

      Linking the new to the previous studies could have been done in more detail, and the extent to which results were replicated could have been discussed more thoroughly.

      Eye movement behavior could have been presented in more detail and the authors could have attempted to understand whether there is a particular component in eye movement behavior (e.g., microsaccades) that drives the observed effects.

      We would like to thank you for your time and effort in reviewing our work and we appreciate the positive comments!

      We extended our manuscript, now providing intermediate results on individual prediction tendency, which can be compared to our results from Schubert et al., (2023).

      Furthermore, we expanded our discussion now detailing the extent to which our results (do not) replicate the previous findings (e.g. differences in horizontal vs. vertical ocular speech tracking, lack of distractor tracking, link between ocular speech tracking and behavioral outcomes).

      While we agree with the reviewer that it is an important and most interesting question, to what extent individual features of gaze behavior (such as microsaccades, blinks etc.) contribute to the ocular speech tracking effect, it is beyond the scope of the current manuscript. It will be methodologically and conceptually challenging to distinguish these features from one another and to relate them to diverse cognitive processes. We believe that a separate manuscript is needed to give these difficult questions sufficient space for new methodological approaches and control analyses. The primary goal of this manuscript was to replicate the findings of Gehmacher et al. (2024) using similar methods and to relate them to prediction tendencies, attention, and neural speech tracking. 

      Reviewer #2 (Public review):

      Summary

      Schubert et al. recorded MEG and eye-tracking activity while participants were listening to stories in single-speaker or multi-speaker speech. In a separate task, MEG was recorded while the same participants were listening to four types of pure tones in either structured (75% predictable) or random (25%) sequences. The MEG data from this task was used to quantify individual 'prediction tendency': the amount by which the neural signal is modulated by whether or not a repeated tone was (un)predictable, given the context. In a replication of earlier work, this prediction tendency was found to correlate with 'neural speech tracking' during the main task. Neural speech tracking is quantified as the multivariate relationship between MEG activity and speech amplitude envelope. Prediction tendency did not correlate with 'ocular speech tracking' during the main task. Neural speech tracking was further modulated by local semantic violations in the speech material, and by whether or not a distracting speaker was present. The authors suggest that part of the neural speech tracking is mediated by ocular speech tracking. Story comprehension was negatively related to ocular speech tracking.

      Strengths

      This is an ambitious study, and the authors' attempt to integrate the many reported findings related to prediction and attention in one framework is laudable. The data acquisition and analyses appear to be done with great attention to methodological detail (perhaps even with too much focus on detail-see below). Furthermore, the experimental paradigm used is more naturalistic than was previously done in similar setups (i.e. stories instead of sentences).

      Weaknesses

      For many of the key variables and analysis choices (e.g. neural/ocular speech tracking, prediction tendency, mediation) it is not directly clear how these relate to the theoretical entities under study, and why they were quantified in this particular way. Relatedly, while the analysis pipeline is outlined in much detail, an overarching rationale and important intermediate results are often missing, which makes it difficult to judge the strength of the evidence presented. Furthermore, some analysis choices appear rather ad-hoc and should be made uniform and/or better motivated.

      We would like to thank you very much for supporting our paper and your thoughtful feedback!

      To address your concerns, that our theoretical entities as well as some of our analytical choices lack transparency, we expanded our manuscript in several ways:

      (1) We now provide the intermediate results of our prediction tendency analysis (see new Figure 2 of our manuscript). These results are comparable to our findings from Schubert et al. (2023), demonstrating that on a group level there is a tendency to pre-activate auditory stimuli of high probability and illustrating the distribution of this tendency value in our subject population.

      (2) We expanded our methods section in order to explain our analytical choices (e.g. why this particular entropy modulation paradigm was used to measure individual prediction tendency).

      (3) We now provide an operationalisation of the terms “neural speech tracking” and “ocular speech tracking” at their first mention, to make these metrics more transparent to the reader.

      (4) We are summarizing important methodological information ahead of each results section, in order to provide the reader with a comprehensible background, without the necessity to read through the detailed methods section. 

      (5) We expanded our discussion section, with a special emphasis on relating the key variables of the current investigation to theoretical entities.

      Reviewer #3 (Public review):

      Summary:

      In this paper, the authors measured neural activity (using MEG) and eye gaze while individuals listened to speech from either one or two speakers, which sometimes contained semantic incongruencies.

      The stated aim is to replicate two previous findings by this group: (1) that there is "ocular speech tracking" (that eye-movements track the audio of the speech), (2) that individual differences in neural response to tones that are predictable vs. not-predictable in their pitch is linked to neural response to speech. In addition, here they try to link the above two effects to each other, and to link "attention, prediction, and active sensing".

      Strengths:

      This is an ambitious project, that tackles an important issue and combines different sources of data (neural data, eye-movements, individual differences in another task) in order to obtain a comprehensive "model" of the involvement of eye-movements in sensory processing.

      The authors use many adequate methods and sophisticated data-analysis tools (including MEG source analysis and multivariate statistical models) in order to achieve this.

      Weaknesses:

      Although I sympathize with the goal of the paper and agree that this is an interesting and important theoretical avenue to pursue, I am unfortunately not convinced by the results and find that many of the claims are very weakly substantiated in the actual data.

      Since most of the analyses presented here are derivations of statistical models and very little actual data is presented, I found it very difficult to assess the reliability and validity of the results, as they currently stand. I would be happy to see a thoroughly revised version, where much more of the data is presented, as well as control analyses and rigorous and well-documented statistical testing (including addressing multiple comparisons).

      We thank you for your thoughtful feedback. We appreciate your concerns and will address them below in greater detail.

      These are the main points of concern that I have regarding the paper, in its current format.

      (1) Prediction tendencies - assessed by listening to sequences of rhythmic tones, where the pitch was either "predictable" (i.e., followed a fixed pattern, with 25% repetition) or "unpredictable" (no particular order to the sounds). This is a very specific type of prediction, which is a general term that can operate along many different dimensions. Why was this specific design selected? Is there theoretical reason to believe that this type of prediction is also relevant to "semantic" predictions or other predictive aspects of speech processing?

      Theoretical assumptions and limitations of our quantification of individual prediction tendency are now shortly summarized in the first paragraph of our discussion section. With this paradigm we focus on anticipatory “top-down” predictions, whilst controlling for possibly confounding “bottom-up” processes. Since this study aimed to replicated our previous work we chose the same entropy-modulation paradigm as in other studies from our group (e.g. Demarchi et al. 2019, Schubert et al. 2023;2024, Reisinger et al. 2024), which has proven to give reproducible findings of feature-specific preactivations of sounds in a context of low entropy. One advantage of this design is that it gives us the opportunity to directly compare the processing of “predictable” and “unpredictable” sounds of the same frequency in a time-resolved manner (this argument is now also included in the Methods section).

      Regarding the question to what extent this type of prediction might also be relevant to “semantic” predictions we would like to refer to our previous study (Schubert et al., 2023), where we explicitly looked at the interaction between individual prediction tendency and encoding of semantic violations in the cortex. (In short, there we found a spatially dissociable interaction effect, indicating an increased encoding of semantic violations that scales with prediction tendency in the left hemisphere, as well as a disrupted encoding of semantic violations for individuals with stronger prediction tendency in the right hemisphere.) We did not aim to replicate all our findings in the current study, but instead we focused on merging the most important results from two independent phenomena in the domain of speech processing and bringing them into a common framework. However, as now stated in our discussion, we believe that “predictions are directly linked to the interpretation of sensory information. This interpretation is likely to occur at different levels along the cognitive (and anatomical) hierarchy…” and that “this type of prediction is relevant for acoustic processing such as speech and music, whose predictability unfolds over time.”

      (2) On the same point - I was disappointed that the results of "prediction tendencies" were not reported in full, but only used later on to assess correlations with other metrics. Even though this is a "replication" of previous work, one would like to fully understand the results from this independent study. On that note, I would also appreciate a more detailed explanation of the method used to derive the "prediction tendency" metric (e.g, what portion of the MEG signal is used? Why use a pre-stimulus and not a post-stimulus time window? How is the response affected by the 3Hz steady-state response that it is riding on? How are signals integrated across channels? Can we get a sense of what this "tendency" looks like in the actual neural signal, rather than just a single number derived per participant (an illustration is provided in Figure 1, but it would be nice to see the actual data)? How is this measure verified statistically? What is its distribution across the sample? Ideally, we would want enough information for others to be able to replicate this finding).

      We now included a new figure (similar to Schubert et al. 2023) showing the interim results of the “prediction tendency” effect as well as individual prediction tendency values of all subjects.

      Furthermore we expanded the description of the “prediction tendency” metric in the Methods section, where we explain our analytical choices in more detail. In particular we used a pre-stimulus time window in order to capture “anticipatory predictions”. The temporally predictably design gives us the opportunity to capture this type of predictions. The integration across channels is handled by the multivariate pattern analysis (MVPA), which inherently integrates multidimensional data (as mentioned in the methods section we used data from 102 magnetometers) and links it to (in this case) categorical information.

      (3) Semantic violations - half the nouns ending sentences were replaced to create incongruent endings. Can you provide more detail about this - e.g., how were the words selected? How were the recordings matched (e.g., could they be detected due to audio editing?)? What are the "lexically identical controls that are mentioned"? Also, is there any behavioral data to know how this affected listeners? Having so many incongruent sentences might be annoying/change the nature of listening. Were they told in advance about these?

      We expanded the Methods section and included the missing information: 

      “We randomly selected half of the nouns that ended a sentence (N = 79) and replaced them with the other half to induce unexpected semantic violations. The swap of nouns happened in the written script before the audio material was recorded in order to avoid any effects of audio clipping. Narrators were aware of the semantic violations and had been instructed to read out the words as normal. Consequently all target words occurred twice in the text, once in a natural context (serving as lexical controls) and once in a mismatched context (serving as semantic violations) within each trial, resulting in two sets of lexically identical words that differed greatly in their contextual probabilities (see Figure 1F for an example). Participants were unaware of these semantic violations.” Since we only replaced 79 words with semantic violations in a total of ~ 24 minutes of audio material we believe that natural listening was not impaired. In fact none of the participants mentioned to have noticed the semantic violations during debriefing (even though they had an effect on speech tracking in the brain). 

      (4) TRF in multi-speaker condition: was a univariate or multivariate model used? Since the single-speaker condition only contains one speech stimulus - can we know if univariate and multivariate models are directly comparable (in terms of variance explained)? Was any comparison to permutations done for this analysis to assess noise/chance levels?

      For mTRF models it depends on the direction (“encoding” vs. “decoding”) whether or not the model is comparable to a univariate model. In our case of an encoding model the TRFs are fitted to each MEG channel independently. This gives us the possibility to explore the effect over different areas (whereas a multivariate “decoding” model would result in only one speech reconstruction value).

      In both conditions (single and multi speaker) a single input feature (the envelope of the attended speech stream) was used. Of course it would be possible to fit the model to use a multivariate encoding model, predicting the brain’s response to the total input of sounds. This would, however, target a slightly different question than ours as we aimed to investigate how much of the attended speech is tracked.

      Regarding your suggestion of a comparison to permutations to assess noise levels we would like to point out that we chose the same methodological approach as in our previous studies, that we aimed to replicate here. Indeed in these original studies no permuted versions (with exception of the mediation analysis where comparing a model with an additional input predictor to a single predictor model would not result in a fair comparison) have been used. We conducted the mTRF approach considering the guidelines of Crosse et al. (2016) to the best of our knowledge and in accordance with similar studies in this field.

      Crosse, M. J., Di Liberto, G. M., Bednar, A., & Lalor, E. C. (2016). The multivariate temporal response function (mTRF) toolbox: a MATLAB toolbox for relating neural signals to continuous stimuli. Frontiers in human neuroscience, 10, 604.

      (5) TRF analysis at the word level: from my experience, 2-second segments are insufficient for deriving meaningful TRFs (see for example the recent work by Mesik & Wojtczak). Can you please give further details about how the analysis of the response to semantic violations was conducted? What was the model trained on (the full speech or just the 2-second long segments?) Is there a particular advantage to TRFs here, relative - say - to ERPs (one would expect a relatively nice N400 response, not)? In general, it would be nice to see the TRF results on their own (and not just the modulation effects).

      We fully agree with the reviewers statement that 2-second segments would have been too short to derive meaningful TRFs. To investigate the effect of semantic violations, we used the same TRFs trained on the whole dataset (with 4-fold cross validation). The resulting true as well as the predicted data was segmented into single word epochs of 2 seconds. We selected semantic violations as well as their lexically identical controls and correlated true with predicted responses for every word. Thus, we conducted the same analysis as for the overall encoding effect, focusing on only part of the data. We have reformulated the Methods section accordingly to clear up this misunderstanding. Since the TRFs are identical to the standard TRFs from the overall neural speech tracking, they are not informative to the semantic violation effect. However, since the mTRF approach is the key method throughout the manuscript (and our main focus is not on the investigations of brain responses to semantic violations) we have favoured this approach over the classical ERF analysis. 

      (6) Another related point that I did not quite understand - is the dependent measure used for the regression model "neural speech envelope tracking" the r-value derived just from the 2sec-long epochs? Or from the entire speech stimulus? The text mentions the "effect of neural speech tracking" - but it's not clear if this refers to the single-speaker vs. twospeaker conditions or to the prediction manipulation. Or is it different in the different analyses? Please spell out exactly what metric was used in each analysis.

      As suggested we now provide a clear definition of each dependent metric for each analysis.

      “Neural speech tracking” refers to the correlation coefficients between predicted and true brain responses from the aforementioned encoding model, trained and tested on the whole audio material within condition (single vs. multi-speaker).

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers have provided a number of recommendations to improve the manuscript, particularly requesting that more data be reported, with an emphasis on the measurements themselves (eye movements and TRFs) rather than just the numerical outputs of mathematical models.

      We appreciate all the reviewers' and editor’s comments and effort to improve our manuscript. In the revised version we provide interim findings and missing data, updated figures that include an intuitive illustration of the metrics (such as TRFs), and a thoroughly revised discussion section where we focus on the relationship between our observed quantities and theoretical entities. We now offer operationalized definitions of the relevant concepts (“prediction tendency”, “active ocular sensing” and “selective attention”) and suggest how these entities might be related in the context of speech processing, based on the current findings. We are confident that this revision has improved the quality of our paper a lot and we are grateful for all the feedback and suggestions. 

      Reviewer #1 (Recommendations for the authors):

      (1) Participants had to fixate throughout the tasks. How did the authors deal with large eye movements that violated the instructed fixation?

      As described in the Methods section: “Participants were instructed to look at a black fixation cross at the center of a grey screen.” This instruction was not intended to enforce strict fixation but rather to provide a general reference point, encouraging participants to keep their gaze on the grey screen and avoid freely scanning the room or closing their eyes. Unlike trial-based designs, where strict fixation is feasible due to shorter trial durations, this approach did not impose rigid fixation requirements. Consequently, the threshold for "instruction violation" was inherently more flexible, and no additional preprocessing was applied to the gaze vectors.

      Fixating for such an extended period of time (1.5 hours?) is hard. Did fixation behavior change over time? Could (fixation) fatigue affect the correlations between eye movements and speech tracking? For example, fatigued participants had to correct their fixation more often and this drives, in part, the negative correlation with comprehension?

      Yes, participants spent approximately 2 hours in the MEG, including preparation time (~30 minutes). However, participants were given opportunities to rest their eyes between different parts and blocks of the experiment (e.g., resting state, passive listening, and audiobook blocks), which should help mitigate fatigue to some extent.

      That said, we agree that it is an intriguing idea that fatigue could drive the ocular speech tracking effect, with participants potentially needing to correct their gaze more as the experiment progresses. However, our analysis suggests this is unlikely for several reasons:

      (1) Cross-validation in encoding models: Ocular speech tracking effects were calculated using a 4-fold cross-validation approach (this detail has now been added to the Methods section; please see our response to public review #3). This approach reduces the influence of potential increases in gaze corrections over time, as the models are trained and validated on independent data splits.  Moreover, if there were substantial differences in underlying response magnitudes between folds - for instance, between the first and fourth fold - this would likely compromise the TRF's ability to produce valid response functions for predicting the left-out data. Such a scenario would not result in significant tracking, further supporting the robustness of the observed effects.

      (2) TRF time-course stability: If fatigue were driving increased gaze corrections, we would expect this to be reflected in a general offset (capturing the mean difference between folds) in the TRF time-courses shown in Figure 4 (right panel). However, no such trend / offset is evident.

      (3) Comparison of eye movement data: To directly investigate this possibility, we compared the amount of total eye movements between the first and last blocks for both the single and multi-speaker conditions. Total movement was calculated by first calculating the differences in pixel values between consecutive eye positions on both the x- and y-axes. The Euclidean distance was then computed for each difference, providing a measure of movement between successive time points. Summing these distances yielded the total movement for each block. Statistical analysis was performed separately for the single speaker (ASS) and multi-speaker (AMS) conditions. For each condition, paired comparisons were made between the first and last blocks (we resorted to non-parametric tests, if assumptions of normality were violated):

      For the single speaker condition (ASS), the normality assumption was not satisfied (p≤0.05p, Kolmogorov-Smirnov test). Consequently, a Wilcoxon signedrank test was conducted, which revealed no significant difference in total movements between the first and last blocks (z=−1.330, p=0.184). For the multi-speaker condition (AMS), the data met the normality assumption (p>0.05), allowing the use of a paired t-test. The results showed no significant difference in total movements between the first and last blocks (t=−0.184, p=0.855).

      The results are visualized in a bar plot (see below), where individual data points are displayed alongside the mean and standard error for each block. Statistical annotations indicate that neither condition demonstrated significant differences between the blocks. These findings suggest that total eye movements remained stable across the experimental conditions, regardless of whether participants were exposed to a single or multiple speakers.

      Author response image 1.

      (4) Behavioral responses: Participants’ behavioral responses did not indicate any decrease in comprehensibility for later blocks compared to earlier ones. Specifically, a comparison of comprehension scores between the first and last blocks revealed no significant difference in either the single-speaker condition (ASS; Wilcoxon signed-rank test Z=−0.5911, p=0.5545) or the multi-speaker condition (AMS; Wilcoxon signed-rank test: Z=0.5018, p=0.6158). These findings suggest that participants maintained consistent levels of comprehension throughout the experiment, regardless of the condition or block order. The results are visualized in a bar plot (see below), where individual data points are displayed alongside the mean and standard error for each block. Statistical annotations indicate that neither condition demonstrated significant differences between the blocks.

      Author response image 2.

      Together, these factors suggest that fatigue is unlikely to be a significant driver of the ocular speech tracking effects observed in this study.

      (2) The authors should provide descriptive statistics of fixation behavior /fixational eye movements. What was the frequency and mean direction of microsaccades, do they follow the main sequence, etc., quantify drift and tremor?

      Thank you for their suggestion regarding descriptive statistics. To address this, we computed the rates of microsaccades (which were extracted using the microsaccade detection algorithm as proposed in Liu, B., Nobre, A. C. & van Ede, F. Functional but not obligatory link between microsaccades and neural modulation by covert spatial attention. Nat. Commun. 13, 3503 (2022)) and fixations as these metrics are directly relevant to our study and the requests above.

      Microsaccade Rates:

      - Single speaker Condition: Mean = 2.306 Hz, SD = 0.363 Hz. ○ Multi speaker: Mean = 2.268 Hz, SD = 0.355 Hz.

      Fixation Rates:

      - Single speaker Condition: Mean = 2.858 Hz, SD = 1.617 Hz. ○ Multi speaker Condition: Mean = 2.897 Hz, SD = 1.542 Hz.

      These values fall within the expected ranges reported in the literature (fixation rates: 2– 4 Hz, microsaccade rates: ~0.5–2.5 Hz) and serve as a sanity check, confirming the plausibility of our eye-tracking data. Regarding the reviewer’s request for additional metrics (e.g., microsaccade direction, main sequence analysis, drift, and tremor), extracting these features would require advanced algorithms and analyses not supported by our current preprocessing pipeline or dataset. We hope that the provided metrics, which were the main focus of this study, serve as a sufficient sanity check and highlight the robustness of our data.

      Related to this, I am wondering whether microsaccades are the feature that drives speech tracking.

      This is an important and pressing question that we aim to address in future publications. Currently, our understanding - and the reason microsaccades and blinks are not analysed in this manuscript - is limited by methodological constraints. Specifically, microsaccades are binary response vectors, which are not compatible with TRF analyses. Addressing this would require adapting future models to handle timecontinuous binary response data or exploring alternative approaches, such as regression-based ERFs (for example as in Heilbron et al. 2022). As the primary goal of this manuscript was to replicate the findings of Gehmacher et al. (2024) using similar methods and to integrate these findings into an initial unified framework, we did not investigate additional eye movement features here. However, we agree that microsaccades (and also blinks, see below) likely contribute, at least in part, to the observed ocular speech tracking effects, and we now suggest this in the Discussion:  

      “Relatedly, it remains an open question whether microsaccades are a key feature driving ocular speech tracking. However, our current study does not analyze microsaccades due to methodological constraints: microsaccades are binary response vectors, which are incompatible with TRF analyses used here. Addressing this would require adapting models to handle time-continuous binary response data or potentially exploring alternative approaches, such as regression-based ERFs (e.g., as in Heilbron et al., 2022). While these limitations preclude microsaccade analysis in the current study, we hypothesize that they could enhance temporal precision and selectively amplify relevant sensory input, supporting auditory perception. Future studies should explore this possibility to uncover the specific contributions of microsaccades to speech tracking.”

      (3) Can the authors make sure that interpolated blinks did not drive any of the effects? Can interpolated blink trials be excluded?

      Using continuous audiobooks as stimuli meant that we could not exclude blink periods from the analysis without introducing substantial continuation artifacts in the TRF analysis. Importantly, the concept of covert motor routines and active sensing suggests that participants engage more strongly in motor routines - including ocular behaviors such as microsaccades and blinks - during tasks like speech tracking. These motor routines are inherently tied to individual gaze patterns, making microsaccades and blinks correlated with other ocular behaviors. This complicates efforts to disentangle their individual contributions to the observed ocular speech tracking effects.

      Engagement in these motor routines, as posited by active sensing, would naturally load onto various viewing behaviors, further intertwining their roles.

      Even if we were to examine correlations, such as the amount of blinks with the ocular speech tracking effect, it is unlikely to provide a clearer understanding due to these inherent overlaps. The methodological and conceptual challenge lies in distinguishing these features from one another and understanding their respective roles in driving the observed effects.

      However, the aim of this manuscript was not to dissect the ocular speech tracking effect in greater detail, but rather to relate it - based on similar analytical choices as in Gehmacher et al - to prediction tendencies, attention, and neural speech tracking. While it will be crucial in future work to differentiate these patterns and their connections to diverse cognitive processes, it is beyond the scope of this study to address all these questions comprehensively.

      We acknowledge that eye movements, including microsaccades and blinks (however, see challenges for this in response 2), remain underexplored in many experimental paradigms. Their interplay with cognitive processes - such as attention, prediction, and sensory integration - will undoubtedly be an important focus for future studies. 

      (4) Could the authors provide more details on how time shuffling was done for the eyemovement predictor, and include a circularly shifted version (or a version that does not destroy temporal contiguity) in their model comparisons? Some types of shuffling can result in unrealistic time series, which would end up in an unfair comparison with the model that has the real eye movement traces as predictors.

      We thank the reviewer for their insightful question regarding the time-shuffling procedure for the eye-movement predictor and for suggesting the inclusion of a circularly shifted version in our model comparisons. Below, we provide further details about our approach and the rationale behind it:

      (1) Random Shuffling: In our analysis, the eye-movement predictor was randomly shuffled over time, meaning that individual samples were randomly replaced. This method completely disrupts the temporal structure of the signal, providing a null model that directly tests whether the temporal mediation observed is due to the specific temporal relationship between ocular movements and envelope tracking.

      (2) Circular Shifting: While circular shifting maintains temporal contiguity, it introduces certain challenges in the context of TRF analysis. Specifically:

      - Adaptation to Shifts: The TRF model could adapt to the introduced shift, potentially reducing the validity of the null comparison.

      - Similarity due to Repetition: The broadband envelope exhibits strong repetitive patterns over time, such as rhythms inherent to speech. Circular shifting can therefore produce predictors that are very similar to the original signal. As a result, this similarity may lead to null distributions that do not adequately disrupt the temporal mediation we aim to test, making it less robust as a control.

      (3) Rationale for Random Shuffling: The primary goal of our mediation analysis is to determine whether there is a temporal mediation of envelope tracking by ocular movements. By deliberately destroying the temporal structure through random shuffling, we ensure that the null model tests for the specific temporal relationship that is central to our hypothesis. Circularly shifted predictors, on the other hand, may partially preserve temporal dependencies, making them less suitable for this purpose.

      In summary, while circular shifting is a valuable approach in other contexts, it is less appropriate for the specific goals of this study. We hope this explanation clarifies our methodological choices and demonstrates their alignment with the aims of our analysis.

      (5) Replication: I want to point out that it is great that the previous findings were in principle replicated. However, I would like to suggest a more nuanced evaluation of the replication:

      a) Instead of a (direct) replication, the present study should be called a 'conceptual replication', since modifications in design and procedure were made.

      Thank you very much for this suggestion! We now use the term ‘conceptual replication’ throughout the manuscript.

      b) Not all the findings from the Gehmacher et al., 2024 study were replicated to a full extent:

      Did the authors find indications of a vertical vs. horizontal tracking difference in the Gehmacher 2024 data? Could they check this in the Gehmacher 2024 data?

      The findings for horizontal and vertical gaze tracking in Gehmacher et al. (2024) are detailed in the supplementary material of that publication. Both single-speaker and multi-speaker target conditions showed significant speech tracking effects in both horizontal and vertical directions. However, there was a slightly stronger tracking effect for the single-speaker condition in the vertical direction. Due to the highly predictable structure of words in Gehmacher et al. effects here were probably overall boosted as compared to continuous audiobook listening, likely leading to the differentiation of horizontal and vertical gaze. See figures in Gehmacher et al. supplementary file for reference.

      c) Another difference between their previous and this study is the non-existent tracking of the multi-speaker distractor in this study. The authors should point this out clearly in the discussion and potentially provide an explanation.

      Thank you for highlighting this point! We now address this in the discussion:

      “Importantly, in contrast to Gehmacher et al. (2024), we did not observe ocular tracking of the multi-speaker distractor in this study. This difference is likely attributable to the simplistic single-trial, 5-word task structure in Gehmacher et al., which resulted in high temporal overlap between the target and distractor speech streams and likely drove the significant distractor-tracking effects observed in that study. The absence of such an effect during continuous listening in our study suggests that ocular tracking is indeed more specific to selective attention.”

      Minor:

      (1) I was a little surprised to not see an indication of eyes/eye movements in Figure 6. The intention of the authors might have been to create a general schematic illustration, but I find this a bit misleading. This paper provides nice evidence for a specific ocular effect in speech tracking. There is, to my knowledge, no indication that speech would be influenced by different kinds of active sensing (if there are, please include them in the discussion). Given that the visuomotor system is quite dominant in humans, it might actually be the case that the speech tracking the authors describe is specifically ocular.

      Taking into account all the reviewers' remarks on the findings and interpretations, we have updated this figure (now Fig. 7) in the manuscript to make it more specific and aligned with the revised discussion section. Throughout the manuscript, we now explicitly refer to active ocular sensing in relation to speech processing and have avoided the broader term 'active sensing' in this context. We hope these revisions address the concerns raised.

      (2) I find the part in the discussion (page 2, last paragraph) on cognitive processes hard to follow. I don't agree that 'cognitive processes' are easily separable from any of the measured responses (eye and brain). Referring to the example they provide, there is evidence that eye movements are correlated with brain activity that is correlated with memory performance. How, and more importantly, why would one separate those?

      Thank you for raising this important point. We have carefully considered your comments, particularly regarding the interplay between cognitive processes and measured responses (eye and brain), as well as the challenge of conceptually separating them. Additionally, we have incorporated Reviewer #2's query (13) into a unified and complementary reasoning. In response, we have rewritten the relevant paragraph in the discussion to provide a clearer and more detailed explanation of how ocular and neural responses contribute to speech processing in an interdependent manner. We hope this revision addresses your concerns and offers a more precise and coherent discussion on this topic:

      “Despite the finding that eye movements mediate neural speech tracking, the behavioural relevance for semantic comprehension appears to differ between ocular and neural speech tracking. Specifically, we found a negative association between ocular speech tracking and comprehension, indicating that participants with lower comprehension performance exhibited increased ocular speech tracking. Interestingly, no significant relationship was observed between neural tracking and comprehension.

      In this context, the negative association between ocular tracking and comprehension might reflect individual differences in how participants allocate cognitive resources. Participants with lower comprehension may rely more heavily on attentional mechanisms to process acoustic features, as evidenced by increased ocular tracking. This reliance could represent a compensatory strategy when higher-order processes, such as semantic integration or memory retrieval, are less effective. Importantly, our comprehension questions (see Experimental Procedure) targeted a broad range of processes, including intelligibility and memory, suggesting that this relationship reflects a trade-off in resource allocation between low-level acoustic focus and integrative cognitive tasks.

      Rather than separating eye and brain responses conceptually, our analysis highlights their complementary contributions. Eye movements may enhance neural processing by increasing sensitivity to acoustic properties of speech, while neural activity builds on this foundation to integrate information and support comprehension. Together, these systems form an interdependent mechanism, with eye and brain responses working in tandem to facilitate different aspects of speech processing.

      This interpretation is consistent with the absence of a difference in ocular tracking for semantic violations (e.g., words with high surprisal versus lexically matched controls), reinforcing the view that ocular tracking primarily reflects attentional engagement with acoustic features rather than direct involvement in semantic processing. This aligns with previous findings that attention modulates auditory responses to acoustic features (e.g., Forte et al., 2017), further supporting the idea that ocular tracking reflects mechanisms of selective attention rather than representations of linguistic content.

      Future research should investigate how these systems interact and explore how ocular tracking mediates neural responses to linguistic features, such as lexical or semantic processing, to better understand their joint contributions to comprehension.”.  

      (3) Attention vs. predictive coding. I think the authors end up with an elegant description of the observed effects, "as an "active sensing" mechanism that implements the attentional optimization of sensory precision." However, I feel the paragraph starts with the ill-posed question "whether ocular speech tracking is modulated not by predictive, but other (for example attentional) processes". If ocular tracking is the implementation of a process (optimization of sensory precision, aka attention), how could it be at the same time modulated by that process? In my opinion, adding the notion that there is a modulation by a vague cognitive concept like attention on top of what the paper shows does not improve our understanding of how speech tracking in humans works.

      Thank you for raising this point. We agree that it is critical to clarify the relationship between ocular speech tracking, attention, and predictive processes, and we appreciate the opportunity to refine this discussion.  

      To avoid the potential confusion that active ocular sensing represents on the one hand an implementation of selective attention on the other it seems to be modulated by it, we now use  the formulation “ocular speech tracking reflects attentional mechanisms rather than predictive processes.”

      To address your concern that the conceptualization of attention seems rather vague, we have revised the whole paragraph in order to redefine the theoretical entities in question (including selective attention) and to provide a clearer and more precise picture (see also our revised version of Fig. 6, now Fig. 7). We now focus on highlighting the distinct yet interdependent roles of selective attention and individual prediction tendencies for speech tracking.:

      “With this speculative framework we attempt to describe and relate three important phenomena with respect to their relevance for speech processing: 1) “Anticipatory predictions” that are created in absence of attentional demands and contain probabilistic information about stimulus features (here, inferred from frequency-specific pre-activations during passive listening to sound sequences). 2) “Selective attention” that allocates resources towards relevant (whilst suppressing distracting) information (which was manipulated by the presence or absence of a distractor speaker). And finally 3) “active ocular sensing”, which refers to gaze behavior that is temporally aligned to attended (but not unattended) acoustic speech input (inferred from the discovered phenomenon of ocular speech tracking). We propose that auditory inflow is, at a basic level, temporally modulated via active ocular sensing, which “opens the gates” in the sensory periphery at relevant timepoints. How exactly this mechanism is guided (for example where the information about crucial timepoints comes from, if not from prediction, and whether it requires habituation to a speechstream etc.) is yet unclear. Unlike predictive tendencies, active ocular sensing appears to reflect selective attention, manifesting as a mechanism that optimizes sensory precision. Individual differences with respect to anticipatory predictions on the other hand, seem to be independent from the other two entities, but nevertheless relevant for speech processing. We therefore support the notion that representational content is interpreted based on prior probabilistic assumptions. If we consider the idea that “a percept” of an (auditory) object is actually temporally and spatially distributed (across representational spacetime - see Fig. 7), the content of information depends on where and when it is probed (see for example Dennett, 1991 for similar ideas on consciousness). Having to select from multiple interpretations across space and time requires a careful balance between the weighting of internal models and the allocation of resources based on current goals. We suggest that in the case of speech processing, this challenge results in an independent adaptation of feature-based precision-weighting by predictions on the one hand and temporal precision-weighting by selective attention on the other.”

      Reviewer #2 (Recommendations for the authors):

      My main recommendation is outlined in the Weaknesses above: the overarching rationale for many analysis choices should be made explicit, and intermediate results should be shown where appropriate, so the reader can follow what is being quantified and what the results truly mean. Specifically, I recommend to pay attention to the following (in no particular order):

      (1) Define 'neural speech tracking' early on. (e.g.: 'The amount of information in the MEG signal that can multivariately be explained by the speech amplitude envelope.' (is that correct?))

      Thank you for pointing out that this important definition is missing. It is now defined at the first mention in the Introduction as follows: “Here (and in the following) “neural speech tracking” refers to a correlation coefficient between actual brain responses and responses predicted from an encoding model based solely on the speech envelope”.

      (2) Same for 'ocular speech tracking'. Here even reading the Methods does not make it unambiguous how this term is used.

      It is now defined at the first mention in the Introduction as follows: “Ocular speech tracking” (similarly to “neural speech tracking” refers to the correlation coefficient between actual eye movements and movements predicted from an encoding model based on the speech envelope”.

      In addition also define both (neural and ocular speech tracking) metrics in the Methods Section.

      (3) Related to this: for ocular speech tracking, are simply the horizontal and vertical eye traces compared to the speech envelope? If so, this appears somewhat strange: why should the eyes move more rightward/upward with a larger envelope? And the direction here depends on the (arbitrary) sign of right = positive, etc. (It would make more sense to quantify 'amount of movement' in some way, but if this is done, I missed it in Methods.)

      Thank you for your insightful comments. You are correct that the horizontal and vertical traces were used for ocular speech tracking, and no additional details were included in the Methods. While we agree that the observed rightward/upward movement may seem unusual, this pattern is consistent with previous findings, including those reported in Gehmacher et al. (2024). In that study, we discussed how ocular speech tracking could reflect a broader engagement of the motor system during speech perception. For example, we observed a general right-lateralized gaze bias when participants attended to auditory speech, which we hypothesized might resemble eye movements during text reading, with a similar temporal alignment (~200 ms). We also speculated that this pattern might differ in cultures that read text from right to left.

      We appreciate your suggestion to explore alternative methods for quantifying gaze patterns, such as the "amount of movement" or microsaccades. While these approaches hold promise for future studies, our primary aim here was to replicate previous findings using the same signal and analysis methods to establish a basis for further exploration.  

      (4) In the Introduction, specifically blink-related ocular activity is mentioned as being related to speech tracking (for which a reference is, incidentally, missing), while here, any blink-related activity is excluded from the analysis. This should be motivated, as it appears in direct contradiction.

      Thank you for pointing this out. The mention of blink-related ocular activity in the Introduction refers to findings by Jin et al. (2018), where such activity was shown to align with higher-order syntactic structures in artificial speech. We have now included the appropriate reference for clarity.

      While Jin et al. focused on blink-related activity, in the present study, we focused on gaze patterns to investigate ocular speech tracking, replicating findings from

      Gehmacher et al. (2024). This approach was motivated by our goal to validate previous results using the same methodology. Importantly to this point, the exclusion of blinks in our analysis was due to methodological constraints of TRF analysis, which requires a continuous response signal; blinks, being discrete and artifact-prone, are incompatible with this approach.

      To address your concern, we revised the Introduction to clarify this distinction and provide explicit motivation for focusing on gaze patterns. It now reads:

      “Along these lines, It has been shown that covert, mostly blink related eye activity aligns with higher-order syntactic structures of temporally predictable, artificial speech (i.e. monosyllabic words; Jin et al, 2018). In support of ideas that the motor system is actively engaged in speech perception (Galantucci et al., 2006; Liberman & Mattingly, 1985), the authors suggest a global entrainment across sensory and (oculo)motor areas which implements temporal attention. 

      In another recent study from our lab (Gehmacher et al., 2024), we showed that eye movements continuously track intensity fluctuations of attended natural speech, a phenomenon we termed ocular speech tracking. In the present study, we focused on gaze patterns rather than blink-related activity, both to replicate findings from

      Gehmacher et al. (2024) and because blink activity is unsuitable for TRF analysis due to its discrete and artifact-prone nature. Hence, “Ocular speech tracking” (similarly to “neural speech tracking” refers to the correlation coefficient between actual eye movements and movements predicted from an encoding model based on the speech envelope.”

      Jin, P., Zou, J., Zhou, T., & Ding, N. (2018). Eye activity tracks task-relevant structures during speech and auditory sequence perception. Nature communications, 9(1), 5374.

      (5) The rationale for the mediation analysis is questionable. Let speech envelope = A, brain activity = B, eye movements = C. The authors wish to claim that A -> C -> B. But it is equally possible that A -> B -> C. They reflect on this somewhat in Discussion, but throughout the rest of the paper, the mediation analysis is presented as specifically testing whether A -> B is mediated by C, which is potentially misleading.

      Indeed we share your concern regarding the directionality of the relationships in the mediation analysis. Our choice of ocular movements as a mediator was motivated by the fact that the relationship between acoustic speech and neural activity is well established, as well as previous results indicating that oculomotor activity contributes to cognitive effects in auditory attention (Popov et al., 2022). 

      Indeed, here we treat both interpretations (“ocular movements contribute to neural speech tracking” versus “neural activity contributes to ocular speech tracking”) as equal.  We now emphasise this point in our discussion quite thoroughly:

      “It is important to note that our current findings do not allow for inference on directionality. Our choice of ocular movements as a mediator was motivated by the fact that the relationship between acoustic speech and neural activity is well established, as well as previous results indicating that oculomotor activity contributes to cognitive effects in auditory attention (Popov et al., 2022). However, an alternative model may suggest that neural activity mediates the effect of ocular speech tracking. Hence, it is possible that ocular mediation of speech tracking may reflect a) active (ocular) sensing for information driven by (top-down) selective attention or b) improved neural representations as a consequence of temporally aligned increase of sensory gain or c) (not unlikely) both. In fact, when rejecting the notion of a single bottom-up flow of information and replacing it with a model of distributed parallel and dynamic processing, it seems only reasonable to assume that the direction of communication (between our eyes and our brain) will depend on where (within the brain) as well as when we look at the effect. Thus, the regions and time-windows reported here should be taken as an illustration of oculo-neural communication during speech processing rather than an attempt to "explain" neural speech processing by ocular movements.”

      (6) The mediation analysis can be improved by a proper quantification of the effect (sizes or variance explained). E.g. how much % of B is explained by A total, and how much of that can in turn be explained by C being involved? For drawing directional conclusions perhaps Granger causality could be used.

      In Figure 4 (now Figure 5) of our manuscript we use standardized betas (which correspond to effect sizes) to illustrate the mediation effect. With the current mTRF approach it is however not possible (or insightful) to compare the variance explained. It is reasonable to assume that variance in neural activity will be explained better when including oculomotor behavior as a second predictor along with acoustic simulation. However this increase gives no indication to what extent this oculomotor behavior was task relevant or irrelevant (since all kinds of “arbitrary” movements will be captured with brain activity and therefore lead to an increase in variance explained). For this reason we chose to pursue the widely accepted framework of mediation (Baron & Kenny, 1986). This (correlational) approach is indeed limited in its interpretations (see prev. response), however the goal of the current study was to replicate and illustrate the triad relationship of acoustic speech input, neural activity and ocular movements with no particular hypotheses on directionality.

      (7) Both prediction tendency and neural speech tracking depend on MEG data, and thus on MEG signal-to-noise ratio (SNR). It is possible some participants may have higher SNR recordings in both tasks, which may result in both higher (estimated) prediction tendency and higher (estimated) speech tracking. This would result in a positive correlation, as the authors observe. This trivial explanation should be ruled out, by quantifying the relative SNR and testing for the absence of a mediation here.

      We agree that for both approaches (MVPA and mTRF models) individual MEG SNR plays an important role. This concern has been raised previously and addressed in our previous manuscript (Schubert et al., 2023). First, it should be noted that our prediction tendency value is the result of a condition contrast (rather than simple decoding accuracy) which compensates for the influence of subject specific signal-to-noise ratio (as no vacuous difference in SNR is to be expected between conditions). Second, in our previous study we also used frequency decoding accuracy as a control variable to correlate with speech tracking variables of interest and found no significant effect.

      (8) Much of the analysis pipeline features temporal response functions (TRFs). These should be shown in a time-resolved manner as a key intermediate step.

      We now included the Neural Speech tracking TRFs into the Figure (now Figure 3).

      (9) Figure 2 shows much-condensed results from different steps in the pipeline. If I understand correctly, 2A shows raw TRF weights (averaged over some time window?), while 2B-F shows standardized mean posterior regressor weights after Bayesian stats? It would be very helpful to make much more explicit what is being shown here, in addition to showing the related TRFs.

      Thank you for pointing this out! The figure description so far has been indeed not very insightful on this issue. We now adapted the caption and hope this clarifies the confusion: “ Neural speech tracking is related to prediction tendency and word surprisal, independent of selective attention. A) Envelope (x) - response (y) relationships are estimated using deconvolution (Boosting). The TRF (filter kernel, h) models how the brain processes the envelope over time. This filter is used to predict neural responses via convolution. Predicted responses are correlated with  actual neural activity to evaluate model fit and the TRF's ability to capture response dynamics. Correlation coefficients from these models are then used as dependent variables in Bayesian regression models. (Panel adapted from Gehmacher et al., 2024b). B) Temporal response functions (TRFs) depict the time-resolved neural tracking of the speech envelope for the single speaker and multi speaker target condition, shown here as absolute values averaged across channels. Solid lines represent the group average. Shaded areas represent 95% Confidence Intervals. C–H) The beta weights shown in the sensor plots are derived from Bayesian regression models in A). For Panel C, this statistical model is based on correlation coefficients computed from the TRF models (further details can be found in the Methods Section). C) In a single speaker condition, neural tracking of the speech envelope was significant for widespread areas, most pronounced over auditory processing regions. D) The condition effect indicates a decrease in neural speech tracking with increasing noise (1 distractor). E) Stronger prediction tendency was associated with increased neural speech tracking over left frontal areas. F) However, there was no interaction between prediction tendency and conditions of selective attention. G) Increased neural tracking of semantic violations was observed over left temporal areas. H) There was no interaction between word surprisal and speaker condition, suggesting a representation of surprising words independent of background noise. Marked sensors indicate ‘significant’ clusters, defined as at least two neighboring channels showing a significant result. N = 29.”

      Gehmacher, Q., Schubert, J., Kaltenmaier, A., Weisz, N., & Press, C. (2024b). The "Ocular Response Function" for encoding and decoding oculomotor related neural activity. bioRxiv, 2024-11.

      (10) Bayesian hypothesis testing is not done consistently. Some parts test for inclusion of 0 in 94% HDI, while some parts adopt a ROPE approach. The same approach should be taken throughout. Additionally, Bayes factors would be very helpful (I appreciate these depend on the choice of priors, but the default Bambi priors should be fine).

      Our primary aim in this study was to replicate two recent findings: (1) the relationship between individual prediction tendencies and neural speech tracking, and (2) the tracking of the speech envelope by eye movements. To maintain methodological consistency with the original studies, we did not apply a ROPE approach when analyzing these replication effects. Instead, we followed the same procedures as the original work, focusing on the inclusion of 0 in the HDI for the neural effects and using the same methods for the ocular effects. Additionally, we were not specifically interested in potential null effects in these replication analyses, as our primary goal was to test whether we could reproduce the previously reported findings.

      For the mediation analysis, however, we chose to extend the original approach by not only performing the analysis in a time-resolved manner but also applying a ROPE approach. This decision was motivated by our interest in gaining more comprehensive insights — beyond the replication goals — by also testing for potential null effects, which can provide valuable information about the presence or absence of mediation effects.

      We appreciate your thoughtful feedback and hope this clarifies our rationale for the differing approaches in our Bayesian hypothesis testing. 

      Regarding Bayes Factors, 

      We understand that some researchers find Bayes Factors appealing, as they offer a seemingly simple and straightforward way to evaluate the evidence in favor of/ or against H0 in relation to H1 (e.g. BF10 > 102 =  Decisive; according to the Jeffreys Scale). However, in practice Bayes Factors are often misunderstood e.g. by interpreting Bayes Factor as posterior odds or not acknowledging the notion of relative evidence in the Bayes Factor (see Wong et al. 2022). Instead of using Bayes Factors, we prefer to rely on estimating and reporting the posterior distribution of parameters given the data, prior and model assumptions (in form of the 94% HDI). This allows for a continuous evaluation of evidence for a given hypothesis that is in our eyes easier to interpret as a Bayes Factor.

      Jeffreys, Harold (1998) [1961]. The Theory of Probability (3rd ed.). Oxford, England. p. 432. ISBN 9780191589676.

      Wong, T. K., Kiers, H., & Tendeiro, J. (2022). On the Potential Mismatch Between the Function of the Bayes Factor and Researchers’ Expectations. Collabra: Psychology, 8(1), 36357. https://doi.org/10.1525/collabra.36357

      (11) It would be helpful if Results could be appreciated without a detailed read of Methods. I would recommend a recap of each key methodological step before introducing the relevant Result. (This may also help in making the rationale explicit.)

      In addition to the short recaps of methods that were already present, and information on quantifications of neural and ocular tracking and bayes statistics (see responses 1, 2, 9), we now added the following parts below to the results sections. Please refer to them in the context of the manuscript where they should now complement a key recap of methodological steps necessary to readily understand each analysis and rational that led to the results:

      Individual prediction tendency is related to neural speech tracking:

      “Thus, this measure is a single value per subject, which comprises a) differences between two contextual probabilities (i.e. ordered vs. random) in b) feature-specific tone representations c) in advance of their observation (summed over a time-window of -0.3 - 0 s). Importantly, this prediction tendency was assessed in an independent entropy modulation paradigm (see Fig. 1). On a group level we found an increased tendency to pre-activate a stimulus of high probability (i.e. forward transition) in an ordered context compared to a random context (see Fig, 2A). This effect replicates results from our previous work (Schubert et al., 2023, 2024). Using the summed difference between entropy levels (ordered - random) across pre-stimulus time, one value was extracted per subject (Fig. 2B). This value was used as a proxy for “individual prediction tendency” and correlated with encoding of clear speech across different MEG sensors. [...]

      Neural speech tracking, quantified as the correlation coefficients between predicted and observed MEG responses to the speech envelope, was used as the dependent variable in Bayesian regression models. These models included condition (single vs. multi-speaker) as a fixed effect, with either prediction tendency or word surprisal as an additional predictor, and random effects for participants.”

      Eye movements track acoustic speech in selective attention:

      “For this, we separately predicted horizontal and vertical eye movements from the acoustic speech envelope using temporal response functions (TRFs). The resulting model fit (i.e. correlation between true and predicted eye movements) is commonly referred to as “speech tracking”. Bayesian regression models were applied to evaluate tracking effects under different conditions of selective attention (single speaker, attended multi-speaker, unattended multi-speaker). Furthermore, we assessed whether individual prediction tendency or semantic word surprisal influenced ocular speech tracking.”

      Neural speech tracking is mediated by eye movements:

      “This model evaluates to what extent gaze behaviour functions as a mediator between acoustic speech input and brain activity.”

      Neural and ocular speech tracking are differently related to comprehension: “Bayesian regression models were used to investigate relationships between neural/ocular speech tracking and comprehension or difficulty. Ocular speech tracking was analyzed separately for horizontal and vertical eye movements.”

      (12) The research questions in the Introduction should be sharpened up, to make explicit when a question concerns a theoretical entity, and when it concerns something concretely measured/measurable.

      We sharpened them up:

      “Taking into account the aforementioned study by Schubert and colleagues (2023), the two recently uncovered predictors of neural tracking (individual prediction tendency and ocular tracking) raise several empirical questions regarding the relationship between predictive processes, selective attention, and active ocular sensing in speech processing:

      (1) Are predictive processes related to active ocular sensing in the same way they are to neural speech tracking? Specifically, do individuals with a stronger tendency to anticipate predictable auditory features, as quantified through prestimulus neural representations in an independent tone paradigm, show increased or even decreased ocular speech tracking, measured as the correlation between predicted and actual eye movements? Or is there no relationship at all?

      (2) To what extent does selective attention influence the relationship between prediction tendency, neural speech tracking, and ocular speech tracking? For example, does the effect of prediction tendency or ocular speech tracking on neural tracking differ between a single-speaker and multi-speaker listening condition?

      (3) Are individual prediction tendency and ocular speech tracking related to behavioral outcomes, such as comprehension and perceived task difficulty? Speech comprehension is assessed through accuracy on comprehension questions, and task difficulty is measured through subjective ratings.

      Although predictive processes, selective attention, and active sensing have been shown to contribute to successful listening, their potential interactions and specific roles in naturalistic speech perception remain unclear. Addressing these questions will help disentangle their contributions and establish an integrated framework for understanding how neural and ocular speech tracking support speech processing.”

      (13) The negative relationship between story comprehension and ocular speech tracking appears to go against the authors' preferred interpretation, but the reflection on this in the Discussion is very brief and somewhat vague.

      Thank you for pointing this out. We have taken your comments into careful consideration and also incorporated Reviewer #1's query (Minor point 2) into a unified and complementary reasoning. We have rewritten the relevant paragraph in the discussion to provide a clearer and more detailed explanation. We hope this revision offers a more precise and less vague discussion on this important point.

      “Despite the finding that eye movements mediate neural speech tracking, the behavioural relevance for semantic comprehension appears to differ between ocular and neural speech tracking. Specifically, we found a negative association between ocular speech tracking and comprehension, indicating that participants with lower comprehension performance exhibited increased ocular speech tracking. Interestingly, no significant relationship was observed between neural tracking and comprehension.

      In this context, the negative association between ocular tracking and comprehension might reflect individual differences in how participants allocate cognitive resources. Participants with lower comprehension may rely more heavily on attentional mechanisms to process acoustic features, as evidenced by increased ocular tracking. This reliance could represent a compensatory strategy when higher-order processes, such as semantic integration or memory retrieval, are less effective. Importantly, our comprehension questions (see Experimental Procedure) targeted a broad range of processes, including intelligibility and memory, suggesting that this relationship reflects a trade-off in resource allocation between low-level acoustic focus and integrative cognitive tasks.

      Rather than separating eye and brain responses conceptually, our analysis highlights their complementary contributions. Eye movements may enhance neural processing by increasing sensitivity to acoustic properties of speech, while neural activity builds on this foundation to integrate information and support comprehension. Together, these systems form an interdependent mechanism, with eye and brain responses working in tandem to facilitate different aspects of speech processing.

      This interpretation is consistent with the absence of a difference in ocular tracking for semantic violations (e.g., words with high surprisal versus lexically matched controls), reinforcing the view that ocular tracking primarily reflects attentional engagement with acoustic features rather than direct involvement in semantic processing. This aligns with previous findings that attention modulates auditory responses to acoustic features (e.g., Forte et al., 2017), further supporting the idea that ocular tracking reflects mechanisms of selective attention rather than representations of linguistic content.

      Future research should investigate how these systems interact and explore how ocular tracking mediates neural responses to linguistic features, such as lexical or semantic processing, to better understand their joint contributions to comprehension.”.  

      (14) Page numbers would be helpful.

      We added the page numbers.

      Reviewer #3 (Recommendations for the authors):

      Results

      (1) Figure 2 - statistical results are reported in this figure, but they are not fully explained in the text, nor are statistical values provided for any of the analyses (as far as I can tell).

      Also, how were multiple comparisons dealt with (the choice of two neighboring channels seems quite arbitrary)? Perhaps for this reason, the main result - namely the effect of "prediction tendency" and "semantic violations" - is quite sparse and might not survive more a rigorous statistical criterion. I would feel more comfortable with these results if the reporting of the statistical analysis had been more thorough (ideally, including comparison to control models).

      We would like to thank you again for your detailed queries, comments, and questions on our work. We first of all adapted this figure (now Figure 3 in the manuscript, please see responses 8 and 9 to Reviewer #2) to help readers understand the metrics and values within each statistical analysis. In addition, we indeed did not include the detailed statistics in the text! We now added the missing statistic reports calculated as averages over ‘clusters’:

      “Replicating previous findings (Schubert et al., 2023), we found widespread encoding of clear speech (average over cluster: β = 0.035, 94%HDI = [0.024, 0.046]), predominantly over auditory processing regions (Fig. 3C), that was decreased (β = -0.018, 94%HDI = [0.029, -0.006]) in a multi-speaker condition (Fig. 3D). Furthermore, a stronger prediction tendency was associated with increased neural speech tracking (β = 0.014, 94%HDI = [0.004, 0.025]) over left frontal sensors (see Fig. 3E). We found no interaction between prediction tendency and condition (see Fig. 3F).” [...] “In a direct comparison with lexically identical controls, we found an increased neural tracking of semantic violations (β = 0.039, 94%HDI = [0.007, 0.071]) over left temporal areas (see Fig. 3G). Furthermore, we found no interaction between word surprisal and speaker condition (see Fig. 3H).”

      Regarding the "prediction tendency" effect, it is important to note that this finding replicates a result from Schubert et al. (2023). The left frontal location of this effect is also consistent over studies, which convinces us of the robustness of the finding. Furthermore, testing this relationship properly requires a mixed-effects model in order to account for the variability across subjects that is not explained by fixed effects and the repeated measures design. For this reason a random Intercept had to be fitted for each subject (1|subject in the respective model formula). This statistical requirement motivated our decision to use bayesian statistics as (at least to our knowledge) there is no implementation of a cluster-based permutation mixed effects model (yet). In order to provide a more conservative criterion (as bayesian statistics don’t require a multiple comparison correction) we chose to impose in addition the requirement of a “clustered” effect.

      The choice of using two neighboring channels is consistent with the default parameter settings in FieldTrip’s cluster-based permutation testing (cfg.minnbchan = 2). This parameter specifies the minimum number of neighboring channels required for a sample to be included in the clustering algorithm, ensuring spatial consistency in the identified clusters. This alignment ensures that our methodology is comparable to numerous prior studies in the field, where such thresholds are standard. While it is true that all statistical analyses involve some degree of arbitrariness in parameter selection (e.g., alpha levels or clustering thresholds), our approach reflects established conventions and ensures comparability with previous findings.

      While the original study utilized source space analyses, we replicated this effect using only 102 magnetometers. This choice was made for computational simplicity, demonstrating that the effect is robust even without source-level modeling. Similarly, the "semantic violation" effect, while perceived as sparse, is based solely on magnetometer data and - in our opinion - should not be viewed as overly sparse given the methods employed. This effect aligns with the two-neighbor clustering approach, ensuring spatial consistency across magnetometers. The results reflect the robustness of the effects within the constraints of magnetometer-level analyses.

      Overall, the methodological choices, including the choice of a bayesian linear mixed effects model, the use of two neighboring channels and the reliance on magnetometers, are grounded in established practices and methodological considerations. While stricter thresholds or alternative approaches might yield different results, our methods align with best practices in the field and ensure the robustness, comparability, and replicability of our findings.

      (2) Figure 3 - the difference between horizontal and vertical eye-movements. This result is quite confusing and although the authors do suggest a possible interpretation for this in the discussion, I do wonder how robust this difference is or whether the ocular signal (in either direction) is simply too noisy or the effect too small to be detected consistently across conditions. Also, the ocular-TRFs themselves are not entirely convincing in suggesting reliable response/tracking of the audio - despite the small-but-significant increase in prediction accuracy.

      The horizontal versus vertical comparison was conducted to explore potential differences in how these dimensions contribute to ocular tracking of auditory stimuli (please also see our response to Reviewer #1, Response 5b, that includes the vertical vs. horizontal effects of Gehmacher at al. 2024). It would indeed be interesting to develop a measure that combines the two directions into a more natural representation of 'viewing,' such as a combined vector. However, this approach would require the use of complex numbers to represent both magnitude and direction simultaneously, hence the development of novel TRF algorithms capable of modeling this multidimensional signal. While beyond the scope of the current study, this presents an exciting avenue for future research and would allow us to move closer to understanding ocular speech tracking and the robustness of these effects, above and beyond the already successful replication.

      It is also important to emphasize that ocular-TRFs are derived from (viewing) behavioral data rather than neural signals, and are thus inherently subject to greater variability across participants and time. This higher variability does not necessarily indicate a small or unreliable effect but reflects the dynamic and task-dependent nature of eye movement behavior. The TRFs with shaded error margins represent this variability, highlighting how eye movements are influenced by both individual differences and moment-to-moment changes in task engagement.

      Despite this inherent variability, the significant prediction accuracy improvements confirm that ocular-TRFs reliably capture meaningful relationships between eye movements and auditory stimuli. The observed differences between horizontal and vertical TRFs further support the hypothesis that these dimensions are differentially involved in the task, possibly driven by the specific roles they play in sensorimotor coupling.

      (3) Figure 4 - this figure shows source distribution of 3 PCA components, derived from the results of the mediation effect of eye movements on the speech-tracking. Here too I am having difficulty in interpreting what the results actually are. For one, all three components are quite widespread and somewhat overlapping, so although they are statistically "independent" it is hard to learn much from them about the brain regions involved and whether they truly represent separable contributions. Similarly difficult to interpret are the time courses, which share some similarities with the known TRFs to speech (especially PC3). I would have expected to find a cleaner "auditory" response, and clearer separation between sensory regions and regions involved in the control of eye movements. I also wonder why the authors chose not to show the sourcelocalization of the neural and ocular speech-tracking responses alone - this could have helped us between understand what "mediation" of the neural response might look like.

      We appreciate the reviewer’s interest in better understanding the source distribution and time courses of the PCA components. While we acknowledge that the widespread and overlapping nature of the components may make a more fine grained interpretation challenging, it is important to emphasize that our analysis simply reflects the data, hence we can only present and interpret what the analysis revealed.

      Regarding your suggestion to show the source localization of ocular speech tracking and neural speech tracking alone, we would like to point out that ocular tracking is represented by only one channel for vertical and one channel for horizontal eye movements. Thus, in this case the estimated source of the effect are the eyes themselves. We believe that the source localization of neural speech tracking has been a thoroughly studied topic in research so far (locating it to perisylvian, auditory areas with a stronger preference for the left hemisphere) and can also be seen in Schubert et al., (2023). Nevertheless, we believe the observed PCA components still provide valuable, and most importantly novel insights into the interplay between eye movements and neural responses in speech tracking.  

      Discussion/interpretation

      (1) Although I appreciate the authors' attempt to propose a "unified" theoretical model linking predictions about low-level features to higher features, and the potential involvement of eye movements in 'active sensing' I honestly think that this model is overambitious, given the data presented in the current study. Moreover, there is very little discussion of past literature and existing models of active sensing and hierarchical processing of speech, that could have helped ground the discussion in a broader theoretical context. The entire discussion contains fewer than 20 citations (some of which are by these authors) and needs to be substantially enriched in order to provide context for the authors' claims.

      Thank you very much for your thoughtful feedback and for appreciating our approach. We hope that the revised manuscript addresses your concerns. Specifically, we now emphasize that our proposal is a conceptual framework, with the main goal to operationale “prediction tendency”, “active ocular sensing”, and “selective attention” and to “organise these entities according to their assumed function for speech processing and to describe their relationship with each other.” We did this by thoroughly revising our discussion section with a clear emphasis on the definition of terms, for example: 

      “With this speculative framework we attempt to describe and relate three important phenomena with respect to their relevance for speech processing: 1) “Anticipatory predictions” that are created in absence of attentional demands and contain probabilistic information about stimulus features (here, inferred from frequency-specific pre-activations during passive listening to sound sequences). 2) “Selective attention” that allocates resources towards relevant (whilst suppressing distracting) information (which was manipulated by the presence or absence of a distractor speaker). And finally 3) “active ocular sensing”, which refers to gaze behavior that is temporally aligned to attended (but not unattended) acoustic speech input (inferred from the discovered phenomenon of ocular speech tracking).”

      Our theoretical proposals are now followed by a recap of our results that support the respective idea, for example: 

      “...these predictions are formed in parallel and carry high feature-specificity but low temporal precision (as they are anticipatory in nature). This idea is supported by our finding that pure-tone anticipation is visible over a widespread prestimulus interval, instead of being locked to sound onset”

      “....we suggest that active (ocular) sensing does not necessarily convey feature- or content-specific information, it is merely used to boost (and conversely filter) sensory input at specific timescales (similar to neural oscillations). This assumption is supported by our finding that semantic violations are not differentially encoded in gaze behaviour than lexical controls.”

      And we put a strong focus on highlighting the boundaries of these ideas, in order to avoid theoretical confusion, misunderstandings or implicit theoretical assumption that are not grounded in data, in particular: 

      “In fact, when rejecting the notion of a single bottom-up flow of information and replacing it with a model of distributed parallel and dynamic processing, it seems only reasonable to assume that the direction of communication (between our eyes and our brain) will depend on where (within the brain) as well as when we look at the effect. Thus, the regions and time-windows reported here should be taken as an illustration of oculo-neural communication during speech processing rather than an attempt to "explain" neural speech processing by ocular movements.”

      “Even though the terminology [“hierarchy”] is suggestive of a fixed sequence (similar to a multi storey building) with levels that must be traversed one after each other (and even the more spurious idea of a rooftop, where the final perceptual experience is formed and stored into memory), we distance ourselves from these (possibly unwarranted) ideas. Our usage of “higher” or “lower” simply refers to the observation that the probability of a feature at a higher (as in more associative) level affects the interpretation (and thus the representation and prediction) of a feature at lower (as in more segregated) levels (Caucheteux et al., 2023).”

      Additionally, we have made substantial efforts to present complementary results (see response to Reviewer #2, point 8) to further substantiate our interpretation. Importantly, we have updated the illustration of the model (see response to Reviewer #, minor point 1) and refined both our interpretations and the conceptual language in the Discussion. Furthermore, we have included additional citations where appropriate to strengthen our argument.

      We would also like to briefly note that this section of the Discussion aimed to highlight existing literature that bridges the gap our model seeks to address. However, as this is a relatively underexplored area, the references available are necessarily limited.

      (2) Given my many reservations about the data, as presented in the current version of the manuscript, I find much of the discussion to be an over-interpretation of the results. This might change if the authors are able to present more robust results, as per some of my earlier comments.

      We sincerely hope that our comprehensive revisions have addressed your concerns and improved the manuscript to your satisfaction.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):*

      The manuscript by Hariani et al. presents experiments designed to improve our understanding of the connectivity and computational role of Unipolar Brush Cells (UBCs) within the cerebellar cortex, primarily lobes IX and X. The authors develop and cross several genetic lines of mice that express distinct fluorophores in subsets of UBCs, combined with immunocytochemistry that also distinguishes subtypes of UBCs, and they use confocal microscopy and electrophysiology to characterize the electrical and synaptic properties of subsets of so-labelled cells, and their synaptic connectivity within the cerebellar cortex. The authors then generate a computer model to test the possible computational functions of such interconnected UBCs.

      Using these approaches, the authors report that:

      1) GRP-driven TDtomato is expressed exclusively in a subset (20%) of ON-UBCs, defined electrophysiologically (excited by mossy fiber afferent stimulation via activation of UBC AMPA and mGluR1 receptors) and immunocytochemically by their expression of mGluR1.

      2) UBCs ID'd/tagged by mCitrine expression in Brainbow mouse line P079 are expressed in a similar minority subset of OFF-UBCs defined electrophysiologically (inhibited by mossy fiber afferent stimulation via activation of UBC mGluR2 receptors) and immunocytochemically by their expression of Calretinin. However, such mCitrine expression was also detected in some mGluR1 positive UBCs, which may not have shown up electrophysiologically because of the weaker fluorophore expression without antibody amplification.

      This is correctly stated with the exception that the P079 mouse line itself expresses mCitrine. The Brainbow mouse line was used in the connectivity study by crossing it to the GRP-Cre or Calretinin-Cre lines.

      3) Confocal analysis of crossed lines of mice (GRP X P079) stained with antibodies to mGluR1 and calretinin documented the existence of all possible permutations of interconnectivity between cells (ON-ON, ON-OFF, OFF-OFF, OFF-ON), but their overall abundance was low, and neither their absolute nor relative abundance was quantified.

      They were certainly rare to observe using our approaches, but we reasoned that the densities of such connections are not possible to estimate accurately. Please see discussion below.

      4) A computational model (NEURON ) indicated that the presence of an intermediary UBC (in a polysynaptic circuit from MF to UBC to UBC) could prolong bursts (MF-ON-ON), prolong pauses (MF-ON-OFF), cause a delayed burst (MF-OFF- OFF), cause a delayed pause (MF-OFF-ON) relative to solely MF to UBC synapses which would simply exhibit long bursts (MF-ON) or long pauses (MF-OFF).

      The authors thus conclude that the pattern of interconnected UBCs provides an extended and more nuanced pattern of firing within the cerebellar cortex that could mediate longer-lasting sensorimotor responses.

      The cerebellum's long-known role in motor skills and reflexes, and associated disorders, combined with our nascent understanding of its role in cognitive, emotional, and appetitive processing, makes understanding its circuitry and processing functions of broad interest to the neuroscience and biomedical community. The focus on UBCs, which are largely restricted to vestibular lobules of the cerebellum reduces the breadth of likely interest somewhat. The overall design of specific experiments is rigorous and the use of fluorophore expressing mouse lines is creative. The data that is presented and the writing are clear. However, the overall experimental design has issues that reduce overall interpretation (please see specific issues for details), which combined with a lack of thorough analysis of the experimental outcomes severely undermines the value of the NEURON model results and the advance in our understanding of cerebellar processing in situ (again, please see specific issues for details).

      Specific issues:

      1) All data gathered with inhibition blocked. All of the UBC response data (Fig. 1) was gathered in the presence of GABAAR and Glycine R blockers. While such an approach is appropriate generally for isolating glutamatergic synaptic currents, and specifically for examining and characterizing monosynaptic responses to single stimuli, it becomes problematic in the context of assaying synaptic and action potential response durations for long-lasting responses, and in particular for trains of stimuli, when feed-forward and feed-back inhibition modulates responses to afferent stimulation. That is, even for single MF stimuli, given the >500ms duration of UBC synaptic currents, there is plenty of time for feedback inhibition from Golgi cells (or feedforward, from MF to Golgi cell excitation) to interrupt AP firing driven by the direct glutamatergic synaptic excitation. This issue is compounded further for all of the experiments examining trains of MF stimuli. Beyond the impact of feedback inhibition on the AP firing of any given UBC, it would also obviously reduce/alter/interrupt that UBC's synaptic drive of downstream UBCs. This issue fundamentally undermines our ability to interpret the simulation data of Vm and AP firing of both the modeled intermediate and downstream UBC, in terms of applying it to possible cerebellar cortical processing in situ.

      The goal of Figure 1 was to determine the cell types of labeled UBCs in transgenic mouse lines, which is determined entirely by their synaptic responses to glutamate (Borges-Merjane and Trussell, 2015). Thus, blocking inhibition was essential to produce clear results in the characterization of GRP and P079 UBCs. While GABAergic/glycinergic feedforward and feedback inhibition is certainly important in the intact circuit, it was not our intention, nor was it possible, to study its contribution in the present study. Leaving inhibition unblocked does not lead to a physiologically realistic stimulation pattern in acute brain slices, because electrical stimulation produces synchronous excitation and inhibition by directly exciting Golgi cells, rather than their synaptic inputs. The main inhibition that UBCs receive that are crucial to determining burst or pause durations is not via GABA/glycine, but instead through mGluR2, which lasts for 100-1000s of milliseconds. The main excitation that drives UBC firing is mGluR1 and AMPA, which both last 100-1000s of milliseconds. Thus, these large conductances are unlikely to be significantly shaped by 1-10 ms IPSCs from feedforward and feedback GABA/glycine inhibition. Recent studies that examined the duration of bursting or pausing in UBCs had inhibition blocked in their experiments, presumably for the reasons outlined above (Guo et al., 2021; Huson et al., 2023).

      In Author response image 1 is an example showing the synaptic currents and firing patterns in an ON UBC before and after blocking inhibition. The GABA/glycinergic inhibition is fast, occurs soon after the stimuli and has little to no effect on the slow inward current that develops after the end of stimulation, which is what drives firing for 100s of milliseconds.

      Author response image 1.

      Example showing small effect of GABAergic and glycinergic inhibition on excitatory currents and burst duration. A) Excitatory postsynaptic currents in response to train of 10 presynaptic stimuli at 50 Hz before (black) and after (Grey) blocking GABA and glycine receptors. The slow inward current that occurs at the end of stimulation is little affected. B) Expanded view of the synaptic currents evoked during the train of stimuli. GABA/glycine receptors mediate the fast outward currents that occur immediately after the first couple stimuli. C) Three examples of the bursts caused by the 50 Hz stimulation in the same cell without blocking GABA and glycine receptors. D) Three examples in the same cell after blocking GABA and glycine receptors.

      2) No consideration for the involvement of polysynaptic UBCs driving UBC responses to MF stimulation in electrophysiology experiments. Given the established existence (in this manuscript and Dino et al. 2000 Neurosci, Dino et al. 2000 ProgBrainRes, Nunzi and Mugnaini 2000 JCompNeurol, Nunzi et al. 2001 JCompNeurol) of polysynaptic connections from MFs to UBCs to UBCs, the MF evoked UBC responses established in this manuscript, especially responses to trains of stimuli could be mediated by direct MF inputs, or to polysynaptic UBC inputs, or possibly both (to my awareness not established either way). Thus the response durations could already include extension of duration by polysynaptic inputs, and so would overestimate the duration of monosynaptic inputs, and thus polysynaptic amplification/modulation, observed in the NEURON model.

      We are confident that the synaptic responses shown are monosynaptic for several reasons. UBCs receive a single mossy fiber input on their dendritic brush, and thus if our stimulation produces a reliable, short-latency response consistent with a monosynaptic input, then there is not likely to be a disynaptic input, because the main input is accounted for by the monosynaptic response. In all cells included in our data set, the fast AMPA receptor-mediated currents always occurred with short latency (1.24 ± 0.29 ms; mean ± SD; n = 13), high reliability (no failures to produce an EPSC in any of the 13 GRP UBCs in this data set), and low jitter (SD of latency; 0.074 ± 0.046 ms; mean ± SD; n = 13). These measurements have been added to the results section. In some rare cases, we did observe disynaptic currents, which were easily distinguishable because a single electrical stimulation produced a burst of EPSCs at variable latencies. Please see example in Author response image 2. These cases of disynaptic input, which have been reported by others (Diño et al., 2000; Nunzi and Mugnaini, 2000; van Dorp and De Zeeuw, 2015) support the conclusion that UBCs receive input from other UBCs.

      Author response image 2.

      Example of GRP UBC with disynaptic input. Three examples of the effect of a single presynaptic stimulus (triangle) in a GRP UBC with presumed disynaptic input. Note the variable latency of the first evoked EPSC, bursts of EPSCs, and spontaneous EPSCs.

      3) Lack of quantification of subtypes of UBC interconnectivity. Given that it is already established that UBCs synapse onto other UBCs (see refs above), the main potential advance of this manuscript in terms of connectivity is the establishment and quantification of ON-ON, ON-OFF, OFF-ON, and OFF-OFF subtypes of UBC interconnections. But, the authors only establish that each type exists, showing specific examples, but no quantification of the absolute or relative density was provided, and the authors' unquantified wording explicitly or implicitly states that they are not common. This lack of quantification and likely small number makes it difficult to know how important or what impact such synapses have on cerebellar processing, in the model and in situ.

      As noted by the reviewer, the connections between UBCs were rare to observe. We decided against attempting to quantify the absolute or relative density of connections for several reasons. A major reason for rare observations of anatomical connections between UBCs is likely due to the sparse labeling. First, the GRP mouse line only labels 20% of ON UBCs and we are unable to test whether postsynaptic connectivity of GRP ON UBCs is the same as that of the rest of the population of ON UBCs that are not labeled in the GRP mouse line. Second, the Brainbow reporter mouse only labels a small population of Cre expressing cells for unknown reasons. Third, the Brainbow reporter expression was so low that antibody amplification was necessary, which then limited the labeled cells to those close to the surface of the brain slices, because of known antibody penetration difficulties. Therefore, we refrained from estimating the density of these connections, because each of these variables reduced the labeling to unknown degrees and we reasoned that extrapolating our rare observations to the total population would be inaccurate.

      A paper that investigated UBC connectivity using organotypic slice cultures from P8 mice suggests that 2/3 of the UBC population receives UBC input, based on the observation that 2/3 of the mossy fibers did not degenerate as would be expected after 2 days in vitro if they were severed from a distant cell body (Nunzi and Mugnaini, 2000). It remains to be seen if this high proportion is due to the young age of these mice or is also the case in adult mice. Even if these connections are indeed rare, they are expected to have profound effects on the circuit, as each UBC has multiple mossy fiber terminals (Berthie and Axelrad, 1994), and mossy fiber terminals are estimated to contact 40 granule cells each (Jakab and Hamori, 1988). We have added a comment regarding this point to the discussion.

      4) Lack of critical parameters in NEURON model.

      A) The model uses # of molecules of glutamate released as the presumed quantal content, and this factor is constant. However, no consideration of changes in # of vesicles released from single versus trains of APs from MFs or UBCs is included. At most simple synapses, two sequential APs alters release probability, either up or down, and release probability changes dynamically with trains of APs. It is therefore reasonable to imagine UBC axon release probability is at least as complicated, and given the large surface area of contact between two UBCs, the number of vesicles released for any given AP is also likely more complex.

      B) the model does not include desensitization of AMPA receptors, which in the case of UBCs can paradoxically reduce response magnitude as vesicle release and consequent glutamate concentration in the cleft increases (Linney et al. 1997 JNeurophysiol, Lu et al. 2017 Neuron, Balmer et al. 2021 eLIFE), as would occur with trains of stimuli at MF to ON-UBCs.

      A) The model produces synaptic AMPA and mGluR2 currents that reproduce those we recorded in vitro. We did not find it necessary to implement changes in glutamate release during a train as the model was fit to UBC data with the assumption that the glutamate transient did not change during the train. If there is a change in neurotransmitter release during a train, it is therefore built into the model, which has the advantage of reducing its complexity. UBCs are a special case where the postsynaptic currents are mediated mostly by the total amount of transmitter released. Most of the evoked current occurs tens to hundreds of milliseconds after neurotransmitter release and is therefore much more sensitive to total release and less sensitive to how it is released during the train. Author response image 3 shows the effect of reducing the amount of glutamate released by 10% on each stimulus in the model. Despite a significant change in the pattern of neurotransmitter release, as well as a reduction in the total amount of glutamate, the slow EPSC still decays over the course of hundreds of milliseconds.

      Author response image 3.

      Effect of short-term depression of neurotransmitter release. A) The top trace shows the glutamate transient that drives the AMPA receptor model used in our study. No change in release is implemented, although the slow tail of each transient summates during the train. The bottom trace shows the modeled AMPA receptor mediated current. B) In this model the amount of glutamate released is reduced by 10% on each stimulus. The duration of the slow AMPA current that develops at the end of stimulation is similar, despite a profound change in the pattern of neurotransmitter exposure.

      B) The detailed kinetic AMPA receptor model used here accurately reproduces desensitization, and in fact recovery from desensitization is what mediates the slow ON UBC current. This AMPA receptor is a 13-state model, including 4 open states with 1-4 glutamates bound, 4 closed states with 1-4 glutamates bound, 4 desensitized states with 1-4 glutamates bound, and 5 closed states with 0-4 glutamates bound. The forward and reverse rates between different states in the model were fit to AMPA receptor currents recorded from dissociated UBCs and they accurately reproduced the ON UBC currents evoked by synaptic stimulation in our previous work (Balmer et al., 2021).

      5) Lack of quantification of various electrophysiological responses. UBCs are defined (ON or OFF) based on inward or outward synaptic response, but no information is provided about the range of the key parameter of duration across cells, which seems most critical to the current considerations. There is a similar lack of quantification across cells of AP duration in response to stimulation or current injections, or during baseline. The latter lack is particularly problematic because, in agreement with previous publications, the raw data in Fig. 1 shows ON UBCs as quiescent until MF stimulation and OFF UBCs firing spontaneously until MF stimulation, but, for example, at least one ON UBC in the NEURON model is firing spontaneously until synaptically activated by an OFF UBC (Fig. 11A), and an OFF UBC is silent until stimulated by a presynaptic OFF UBC (Fig. 11C). This may be expected/explainable theoretically, but then such cells should be observed in the raw data.

      To address this reasonable concern of a general lack of quantification of electrophysiological responses we have added data characterizing the slow inward and outward currents evoked by synaptic stimulation in GRP and P079 UBCs in the results section and in new panels in Figure 1. We report the action potential pause lengths in P079 UBCs and burst lengths in ON UBCs in the results section. However, we favor the duration of the currents to the length of burst and pause, because the currents do not depend on a stable resting membrane potential, which is itself difficult to determine in intracellular recordings of these small cells. We have added peak times and decay time constants of the slow inward and outward currents in ON and OFF UBCs in the results section and have added new panels to figure 1.

      In a series of recent publications that focused on UBC firing, the authors argue that cell-attached recordings are necessary to determine accurately the burst and pause lengths, as well as spontaneous firing rates (Guo et al., 2021; Huson et al., 2023). (The trade-off of these extracellular recordings is that the monosynaptic nature of the input is nearly impossible to confirm.) Spontaneous firing rates were variable within both GRP and P079 UBCs from silent to firing regularly or in bursts, as previously reported for UBCs (Kim et al., 2012; van Dorp and De Zeeuw, 2015). For clarity, we chose to model the GRP UBCs as silent unless receiving synaptic input and P079 UBCs as active unless receiving synaptic input. As the reviewer suggests, we have observed UBCs firing in the patterns similar to those shown in the model UBCs that have input from a spontaneously active presynaptic UBC. In Author response image 4 are some examples.

      Author response image 4.

      Examples of UBCs that receive spontaneous input. A) Three ON UBCs that had spontaneous EPSCs, suggesting the presence of an active presynaptic UBC. B) Two OFF UBCs that had spontaneous outward currents.

      Reviewer #2 (Public Review):

      In this paper, the authors presented a compelling rationale for investigating the role of UBCs in prolonging and diversifying signals. Based on the two types of UBCs known as ON and OFF UBC subtypes, they have highlighted the existing gaps in understanding UBCs connectivity and the need to investigate whether UBCs target UBCs of the same subtype, different subtypes, or both. The importance of this knowledge is for understanding how sensory signals are extended and diversified in the granule cell layer.

      The authors designed very interesting approaches to study UBCs connectivity by utilizing transgenic mice expressing GFP and RFP in UBCs, Brainbow approach, immunohistochemical and electrophysiological analysis, and computational models to understand how the feed-forward circuits of interconnected UBCs transform their inputs.

      This study provided evidence for the existence of distinct ON and OFF UBC subtypes based on their electrophysiological properties, anatomical characteristics, and expression patterns of mGluR1 and calretinin in the cerebellum. The findings support the classification of GRP UBCs as ON UBCs and P079 UBCs as OFF UBCs and suggest the presence of synaptic connections between the ON and OFF UBC subtypes. In addition, they found that GRP and P079 UBCs form parallel and convergent pathways and have different membrane capacitance and excitability. Furthermore, they showed that UBCs of the same subtype provide input to one another and modify the input to granule cells, which could provide a circuit mechanism to diversify and extend the pattern of spiking produced by mossy fiber input. Accordingly, they suggested that these transformations could provide a circuit mechanism for maintaining a sensory representation of movement for seconds.

      Overall, the article is well written in a sound detailed format, very interesting with excellent discovery and suggested model, however, I have some comments/suggestions that may help to improve this manuscript:

      • The discovery of UBCs innervating each other and their own subtypes, suggesting the presence of feed-forward networks in the cerebellum, is an incredibly fascinating and exciting finding followed by an intriguing model by authors. However, it is worth considering an alternative model as well. I acknowledge that visualizing such interactions using current tools and methods can be challenging ("The approaches used here were not able to determine the existence of networks of more than 2 UBCs connected one after the other. If present, 3 or more UBCs in series could extend and transform the input in even more dramatic ways. The temporal diversity that UBC circuits generate may underlie the flexibility of the cerebellum to coordinate movements over a broad range of behaviors."). Therefore, if this is the case in which more than 2 UBCs connected one after the other, then an alternative model PERHAPS resembles the basal nuclei, with its direct and indirect circuits, can be considered (maybe a type of circular model). The basal nuclei circuits are also regulated by modulators such as D1 dopamine receptors in the direct pathway, causing depolarization, and D2 dopamine receptors in the indirect pathway, resulting in hyperpolarization upon dopamine activation. This approach could involve using computational models to gain insight into potential alternatives within this pathway (may be a future direction).

      Thank you for this suggestion to consider the potentially similar circuit interactions in the basal nuclei. We will certainly investigate this further as we move forward with modeling the feed-forward networks in the cerebellum.

      • GRP UBCs are more densely distributed in lobes VI-IX, while P079 UBCs are more densely distributed in the dorsal leaflet of lobe X in sagittal sections. While the cerebellum is well known for its characteristic stripy pattern, are UBC distributions the same in coronal/transverse section?

      UBCs of different types, based on their expression of specific proteins, have overlapping but somewhat distinct distributions in coronal sections. The densities of calretinin-expressing UBCs are higher within Zebrin II positive zones and form sagittal stripes, whereas the densities of mGluR1-expressing and PLCb4-expressing UBCs vary less but are in their highest densities at the midline (Chung et al., 2009; Sekerkova et al., 2014). The difference noted by the reviewer between the dorsal and ventral leaflets of lobe X are the most distinct that we know of in the GRP and P079 populations.

      • The extension of the axons from both subtypes of UBCs show they are long enough to pass several UBCs and even projections are directed toward the white matter (e.g. Fig 9A), suggesting targeting the UBCs or granule cells in other lobules. Is it suggesting UBCs connectivity between different lobules (perhaps longitudinal connectivity)? Is there any observation or information in coronal/transverse section to visualize mediolateral connectivity?

      This is certainly worth exploring in future work. UBCs have been reported to project their axons into and across the white matter (Diño et al., 2000). To our knowledge, whether UBCs project their axons out of one lobule and into another has not been examined.

      • The limitation in identifying networks involving more than two sequentially connected UBCs was briefly noted. I suggest including a paragraph describing limitations and discussing the implications of the findings would enhance the overall impact of the research and broaden our understanding of cerebellar function.

      • It is a pity that there is no clear conclusion to the discussion of this very interesting study. I suggest providing the key points as a conclusion.

      Thank you for these suggestions. Limitations and implications are included throughout the discussion section and we feel that the summary figure and significance statement now sufficiently convey the key conclusions of the study.

      • Please make the correction in Figure 2A by relabeling it as IXa, IXb, and IXc to correct the typographical error.

      Fixed

      • I recommend rotating Figure 7A to align its orientation with the other figures for consistency.

      Fixed

      Reviewer #1 (Recommendations For The Authors):

      Minor comments that should be addressed for clarity:

      1) In the NEURON model, why was the reversal potential for the leak conductance and Gmax for Ih different for the two types of UBCs. Relatedly, why is Erev for GABAB -95mV if Ek is -90mV?

      The h-current (Ih) was estimated from a hyperpolarizing current step in both cell types and these data have been added to the result section and as a panel in Figure 1. The conductance of Ih in the model cells were adjusted accordingly, with OFF UBCs having ~3 times that of ON UBCs and approximated the measured voltage sag, as we now describe in the methods section. The reversal potential of the model mGluR2 current (which is based on a model of GABAB) has been fixed.

      2) Line 69 justification for their dual genetic approach is a bit too strong: "Paired recordings not possible". It may be difficult, but it is certainly possible.

      Reworded

      3) Confusing wording, only one stat for two parameters? Line 93: These currents were produced by both mGluR1 and AMPA receptors, as they were blocked by their antagonists JNJ16259685 and GYKI53655, respectively (92.86% {plus minus} 3.25; paired t-test; P=0.0066; n = 9; 95 mean {plus minus} SEM) (Fig 1D-E).

      Reworded

      References

      Balmer TS, Borges-Merjane C, Trussell LO (2021) Incomplete removal of extracellular glutamate controls synaptic transmission and integration at a cerebellar synapse. eLife 10:e63819.

      Berthie B, Axelrad H (1994) Granular layer collaterals of the unipolar brush cell axon display rosette-like excrescences. A Golgi study in the rat cerebellar cortex. Neuroscience Letters 167:161–165.

      Borges-Merjane C, Trussell LO (2015) ON and OFF unipolar brush cells transform multisensory inputs to the auditory system. Neuron 85:1029–1042.

      Chung SH, Sillitoe RV, Croci L, Badaloni A, Consalez G, Hawkes R (2009) Purkinje cell phenotype restricts the distribution of unipolar brush cells. Neuroscience 164:1496–1508.

      Diño MR, Schuerger RJ, Liu Y-B, Slater NT, Mugnaini E (2000) Unipolar brush cell: a potential feedforward excitatory interneuron of the cerebellum. Neuroscience 98:625–636.

      Guo C, Huson V, Macosko EZ, Regehr WG (2021) Graded heterogeneity of metabotropic signaling underlies a continuum of cell-intrinsic temporal responses in unipolar brush cells. Nat Commun 12:5491.

      Huson V, Newman LN, Regehr WG (2023) A continuum of response properties across the population of Unipolar Brush Cells in the Dorsal Cochlear Nucleus. J Neurosci Available at: https://www.jneurosci.org/content/early/2023/07/26/JNEUROSCI.0873-23.2023 [Accessed August 15, 2023].

      Jakab RL, Hamori J (1988) Quantitative morphology and synaptology of cerebellar glomeruli in the rat. Anatomy and embryology 179:81–88.

      Kim JA, Sekerkova G, Mugnaini E, Martina M (2012) Electrophysiological, morphological, and topological properties of two histochemically distinct subpopulations of cerebellar unipolar brush cells. Cerebellum 11:1012–1025.

      Nunzi M-G, Mugnaini E (2000) Unipolar brush cell axons form a large system of intrinsic mossy fibers in the postnatal vestibulocerebellum. Journal of Comparative Neurology 422:55–65.

      Sekerkova G, Watanabe M, Martina M, Mugnaini E (2014) Differential distribution of phospholipase C beta isoforms and diaglycerol kinase-beta in rodents cerebella corroborates the division of unipolar brush cells into two major subtypes. Brain structure & function 219:719–749.

      van Dorp S, De Zeeuw CI (2015) Forward signaling by unipolar brush cells in the mouse cerebellum. Cerebellum 14:528– 533.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First and foremost, we would like to thank all the editors and reviewers for their thoughtful and thorough evaluations of our manuscript. We greatly appreciate their assessment about the novelty and strength in this study and have revised the manuscript according to their recommendations. Below are our detailed responses and revisions based on the reviewer recommendations.

      Reviewer #1 (Recommendations For The Authors):

      1) It is unclear the rationale for choosing the P35-42 adolescent window for stimulating the mesofrontal dopamine system.

      The dopaminergic innervation in the mesofrontal circuit exhibits a protracted maturation from P21 to P56 (Kalsbeek, Voorn et al. 1988, Niwa, Kamiya et al. 2010, Naneix, Marchand et al. 2012, Hoops and Flores 2017). P35-42 is in the center of this period and captures the mid-adolescent stage in rodents (Spear 2000). We have previously shown that increasing dopamine neuron activity by wheel running or optogenetic stimulation during this period, but not adulthood, can induce formation of mesofrontal dopaminergic boutons and enhance mesofrontal circuit activity in wild-type mice (Mastwal, Ye et al. 2014). We therefore chose the P35-P42 adolescent window to stimulate the mesofrontal dopamine circuit and test the long-term effect of this intervention on the frontal circuit and memory-guided decision-making deficits in mutant mice. We have detailed this rationale in the revised manuscript when we first introduced this intervention.

      2). Please provide a justification for choosing the optical recording M2 neuronal activity instead of the prelimbic prefrontal cortex, which has been known to show the highest levels of dopamine terminals.

      While the prelimbic area has the highest level of dopamine terminals among frontal cortical regions, a robust presence of dopaminergic terminals and dopamine release in the M2 frontal cortex have been well documented (Berger, Gaspar et al. 1991, Mastwal, Ye et al. 2014, Aransay, Rodriguez-Lopez et al. 2015, Patriarchi, Cho et al. 2018). The M2 cortex plays an important role in action planning, generating the earliest neural signals among frontal cortical regions that are related to upcoming choice during spatial navigation (Sul, Kim et al. 2010, Sul, Jo et al. 2011). Our chemogenetic inactivation experiments (Supplementary Fig 1) has further confirmed the involvement of M2 in the memory-guided Y-maze navigation task used in this study. Technically, M2 has the advantage of being more amendable to optical recording of neuronal activity without the tissue damage caused by implanting a lens, which would be necessary for deeper areas such as the prelimbic cortex. We have provided this justification in the revised manuscript.

      3). What was the rationale for using the 3-day chemogenetic stimulation paradigm?

      Our previous work in wild-type adolescent mice showed that a single optogenetic stimulation session or a 2-hr wheel running session is sufficient to induce bouton formation in mesofrontal dopaminergic axons (Mastwal, Ye et al. 2014). In this study, we sought to rescue existing structural and functional deficits in the mesofrontal dopaminergic circuits due to genetic mutations. Because previous studies suggested that an optimal level of dopamine is important for normal cognitive function (Arnsten, Cai et al. 1994, Robbins 2000, Floresco 2013), we elected to do multiple stimulation sessions to boost the potential rescue effects. We tested both a 3-day and a 3-week stimulation paradigm, and found that the 3-day, but not the 3-week paradigm led to robust functional improvement (Fig. 5). These results indicate that moderate but not excessive stimulation of dopamine neurons can provide functional improvement of a deficient mesofrontal circuit. We have revised our text to clarify the rationale for these experiments.

      4). A major maturational event occurring in the prefrontal cortex is the gain of local GABAergic transmission, which is crucial for sustaining proper levels of Y-maze tasks. I am wondering if the authors have any thoughts about what is really happening at the postsynaptic level following adolescent dopamine stimulation.

      The developmental increases in dopaminergic innervation to the frontal cortex and local GABAergic transmission are likely synergistic processes, which both contribute to the maturation of high-order cognitive functions supported by the frontal cortex (Caballero and Tseng 2016, Larsen and Luna 2018). Previous electrophysiological studies have suggested that dopamine can act on five different receptors expressed in both excitatory and inhibitory postsynaptic neurons (Seamans and Yang 2004, Tseng and O'Donnell 2007, O'Donnell 2010). At the network level, dopaminergic signaling can increase the signal-to-noise ratio and temporal synchrony of neural activity during cognitive tasks (Rolls, Loh et al. 2008, Vander Weele, Siciliano et al. 2018, Lohani, Martig et al. 2019). As the frontal GABAergic inhibitory network undergoes major functional remodeling during adolescence (Caballero and Tseng 2016), adolescent stimulation of dopamine neurons may interact with this maturational process to promote a network configuration conducive for synchronous and high signal-to-noise neural computation (Porter, Rizzo et al. 1999, Murty, Calabro et al. 2016, Mukherjee, Carvalho et al. 2019). The microcircuit mechanisms underlying adolescent dopamine stimulation induced changes, particularly in the GABAergic inhibitory neurons, will be an exciting direction for future research. We have extended our discussion about these points in the revised manuscript.

      5). A change in the density of dopamine boutons is unlikely to be limited to the M2 region in Arc-/- mice. The authors should provide some data illustrating that similar changes are widespread across the medial prefrontal cortex, and that the optical recording in the M2 region was preferred for technical limitations and to avoid damaging areas in the frontal cortex.

      As discussed above, this study focused on the M2 region of the frontal cortex because it is functionally required for memory-guided Y-maze navigation, generates behavioral choice-related neural signals during spatial navigation, and is optically most accessible. The medial prefrontal regions (anterior cingulate, prelimbic and infralimbic) ventral to M2 also receive dense dopaminergic innervation and can act in concert with M2 in decision making (Sul, Kim et al. 2010, Sul, Jo et al. 2011, Barthas and Kwan 2017). As dopaminergic innervations to the frontal cortical regions progress in a ventral-to-dorsal direction during development (Kalsbeek, Voorn et al. 1988, Hoops and Flores 2017), how the changes induced by adolescent dopamine stimulation may proceed spatial-temporally across different frontal subregions requires more extensive investigation in the future. We have added this discussion into the revised manuscript.

      Reviewer #2 (Public Review):

      The manuscript by Mastwal and colleagues explores how transient adolescent stimulation of ventral midbrain neurons that project to the frontal cortex may help to improve performance on certain memory tasks. The manuscript provides an interesting set of observations that DREADD-based activation over only 3 days during adolescence provides a fast-acting and long-lasting improvement in performance on Y-maze spontaneous alternation as well as aspects of neuronal function as assessed using in vivo imaging methods. While interesting, there are several weaknesses. First and foremost, it is not clear that the effects the authors are observing are mediated by dopamine. It has been clearly documented that the DAT-Cre line provides a better representation of midbrain dopamine cells in the mouse, particularly near the midline of the ventral midbrain (Lammel et al., Neuron 2015). This is precisely where the cells that project to the frontal cortex are located. Therefore, the selection of TH-Cre is problematic. It is very likely that the authors are labeling a substantial number of non-dopaminergic cells.

      We agree with Review 2 that the DAT-Cre line can provide specific labeling of midbrain dopamine neurons, particularly those projecting to the striatum, as discussed in the cited study (Lammel, Steinberg et al. 2015). DAT transports the extracellularly released dopamine back into presynaptic terminals, but it is not essential for dopamine synthesis and release (Sulzer, Cragg et al. 2016). Mesocortical dopamine neurons in the ventral tegmental area (VTA) express very little DAT (Sesack, Hawrylak et al. 1998, Lammel, Hetzel et al. 2008, Li, Qi et al. 2013), which limits the use of the DAT-Cre line to target these neurons (Lammel, Steinberg et al. 2015). Because mesocortical dopamine neurons have strong expression of TH, a key enzyme involved in dopamine synthesis, TH-Cre lines have been extensively used to study the mesocortical pathway (Lammel, Lim et al. 2012, Gunaydin, Grosenick et al. 2014, Ellwood, Patel et al. 2017, Vander Weele, Siciliano et al. 2018, Lohani, Martig et al. 2019). We provide more details below about our rationales for using TH-Cre rather than DAT-Cre mice in our study and the revisions we made in response to the reviewer’s specific recommendations.

      Reviewer #2 (Recommendations For The Authors):

      1). The authors should rigorously demonstrate that there is a reasonable midbrain DA projection to the coordinates that they are assessing and that their effects are due to DA release from these cells. It is not clear that there is a VTA dopaminergic projection to M2 - it does not appear for example in the Allen Mouse Brain Connectivity Atlas (https://connectivity.brainmap.org/projection/experiment/siv/160540751? imageId=160541123&imageType=TWO_PHOTON,SEGMENTATION&initImage=TWO_PHOTON&x=17321&y=15284&z=3). Though there is a projection to the mPFC, at the coordinates the authors report, there does not appear to be any signal from DAT-Cre mice. However, there is much more signal when expression is not restricted to dopamine cells (https://connectivity.brain-map.org/projection/experiment/siv/165975096? imageId=165975158&imageType=TWO_PHOTON,SEGMENTATION&initImage=TWO_PHOTON&x=17950&y=11504&z=3). The argument that these cells may express less TH is not relevant for this particular issue. Therefore, it is possible that the vast majority of observed effects are not in fact mediated by dopamine but another neurotransmitter such as glutamate. While the experiment using SCH23390 does suggest DA receptors may be involved, this result in isolation doesn't alleviate this caveat - there can be, for example, DA release from NE cells (e.g., Takeuchi et al., Nature 2016). While this does not entirely invalidate the authors' results, as their effects of stimulation of ventral midbrain cells to the forebrain don't necessarily have to occur via dopamine - the mechanism by how this is occurring needs to be clear.

      While the prelimbic area has the highest level of dopaminergic terminals among frontal cortical regions, a robust presence of midbrain dopaminergic projections and dopamine release in the M2 frontal cortex have been well established by immunostaining, viral labeling, single-cell axon-tracing, and in vivo imaging of recently developed dopamine biosensors (Berger, Gaspar et al. 1991, Mastwal, Ye et al. 2014, Aransay, Rodriguez-Lopez et al. 2015, Ye, Mastwal et al. 2017, Patriarchi, Cho et al. 2018). It has also been reported repeatedly that mesocortical dopamine neurons in the VTA express very little DAT, which is different from mesostriatal dopamine neurons (Sesack, Hawrylak et al. 1998, Lammel, Hetzel et al. 2008, Li, Qi et al. 2013). This limitation in the use of the DAT-Cre line to target mesocortical dopamine neurons has been acknowledged in previous studies (Lammel, Steinberg et al. 2015) and is consistent with the reviewer’s observation of DAT-Cre labeling in the Allen Brain Mouse Connectivity atlas. Additionally, and interestingly, recent extensive evaluation of the DAT-Cre line reported ectopic labeling of multiple non-dopaminergic neuronal populations (Soden, Miller et al. 2016, Stagkourakis, Spigolon et al. 2018, Papathanou, Dumas et al. 2019). Our own evaluation of the DAT-Cre line’s utility for cortical imaging also revealed sparse axonal labeling and sporadic ectopic labeling of cortical cell somas. We have included representative DAT-Cre images in Author response image 1 to highlight the limitations of this line in the study of the dopaminergic mesocortical circuit.

      Author response image 1.

      Example images from DAT-Cre/Ai14 mice. Left most panel shows little axonal labeling in Layer 5/6 of M2. The center panel shows sparse axonal label in Layer 1/2 of M2, but also ectopic labeling of cell soma. The right panel shows a lack of labeling in L1/2 of prelimbic cortex as well. Scale bars 50um.

      We as well as others have confirmed that TH immunoreactivity in the frontal cortex can label dopaminergic axons originated from the VTA, and ablation of VTA dopaminergic neurons removes this labeling (Niwa, Jaaro-Peled et al. 2013, Ye, Mastwal et al. 2017). Because mesocortical dopamine neurons have much stronger TH expression than DAT expression (Sesack, Hawrylak et al. 1998, Lammel, Hetzel et al. 2008, Li, Qi et al. 2013, Lammel, Steinberg et al. 2015), TH-Cre lines have been frequently used to label these neurons and study the mesocortical pathway (Lammel, Lim et al. 2012, Gunaydin, Grosenick et al. 2014, Ellwood, Patel et al. 2017, Vander Weele, Siciliano et al. 2018, Lohani, Martig et al. 2019). While TH-Cre expression itself is not restricted to dopaminergic neurons, we targeted our viral injections to the VTA and optogenetic stimulation to the cortical dopaminergic projection target area in M2 (Patriarchi, Cho et al. 2018) to specifically modulate mesofrontal dopaminergic axons. In addition, we tested D1 antagonist’s effects in our manipulations. Although we targeted dopamine neurons in our adolescent stimulation, the final behavioral outcome likely includes contributions from co-released neurotransmitters such as glutamate and non-dopaminergic neurons via network effects (Morales and Margolis 2017, Lohani, Martig et al. 2019), which will be interesting directions for future research. We have revised our results and discussion sections to highlight our rationales for using the TH-Cre line and the open mechanistic questions for future studies.

      2) SSFOs don't increase excitability like DREADDs, but rather, cause long-lasting hyperactivity through continuous passage of cations. What the actual firing properties are of these cells over a long period of time is not clear.

      We did not measure the precise firing patterns of the dopaminergic neurons targeted by SSFOs but evaluated the effects of SSFO activation on the frontal cortex. Similar to our DREADD-Gq mediated activity changes in the mesofrontal circuit, we found increased frontal cortical activity post-light stimulation of frontal dopamine axons in our SSFO treated animals (Fig 6a-c, S6e). While quantitatively the firing patterns of DREADD-Gq and SSFO activated dopaminergic neurons likely differ, qualitatively both of these manipulations lead to increased mesofrontal circuit activity and improvements in cognitive behaviors. In our previous work with wild-type adolescent mice, both wheel running and a single 10-min session of phasic optogenetic stimulation of the VTA resulted in dopaminergic bouton outgrowth in the frontal cortex (Mastwal, Ye et al. 2014). Taken together, these results suggest that adolescent dopaminergic mesofrontal projections are highly responsive to neural activity changes and a variety of adolescent stimulation paradigms are sufficient to elicit lasting changes in this circuit. We have added this discussion of the limitations and implications of our study into the revised manuscript.

      3) It is not clear what the increase in boutons means, given that DA release is thought to largely occur via non-synaptic release.

      Although many of dopamine boutons are not associated with defined postsynaptic structures, these axonal boutons and the active zones they contain are the major release sites for dopamine (Goldman-Rakic, Leranth et al. 1989, Arbuthnott and Wickens 2007, Sulzer, Cragg et al. 2016, Liu, Goel et al. 2021). Past studies have established a consistent association between increased dopaminergic innervation in the frontal cortex and an increase in dopamine levels (Niwa, Kamiya et al. 2010, Naneix, Marchand et al. 2012). Our previous work also found that increasing dopaminergic boutons through adolescent VTA stimulation led to prolonged frontal local field potential responses with high-frequency oscillations (Mastwal, Ye et al. 2014), which is characteristic of increased dopaminergic signaling (Lewis and O'Donnell 2000, Gireesh and Plenz 2008, Wood, Kim et al. 2012, Lohani, Martig et al. 2019). Importantly, in our quantification of the structural changes in this study, we evaluated boutons which were labeled with synaptophysin, a molecular marker indicating the presence of synaptic vesicle release machinery (Li, Tasic et al. 2010, Oh, Harris et al. 2014). Thus, our study, taken in the context of the previous work, suggests the increased number of boutons signifying an increase in dopaminergic signaling within the mesofrontal circuit. We have added this discussion into the revised manuscript.

      4) The use of Arc and DISC mutants as models of schizophrenia is perhaps a bit overstated - while deficits in prefrontal innervation certainly occur, there are many differences between these models and the human disease states. Language should be toned down accordingly, particularly in the introduction.

      We strived to avoid overstating the extent to which the mouse lines are models for specific diseases, but we can appreciate that this may not have been clear in our original writing. We have adjusted our language to better distinguish between the utility of the animal models for the purposes of our study and their relationship to specific human disease states. Particularly in the introduction, we stated that: “Genetic disruptions of several genes involved in synaptic functions related to psychiatric disorders, such as Arc and DISC1, lead to hypoactive mesofrontal dopaminergic input in mice (Niwa, Kamiya et al. 2010, Niwa, Jaaro-Peled et al. 2013, Fromer, Pocklington et al. 2014, Purcell, Moran et al. 2014, Wen, Nguyen et al. 2014, Manago, Mereu et al. 2016). Although there are many differences between these mouse lines and specific human disease states, these mice offer opportunities to test whether genetic deficits in frontal cortex function can be reversed through circuit interventions.”

      5) Some experiments are missing proper controls, e.g., Figure 3g-I where a WT mouse should be used as a positive control.

      The goal of this experimental design (Fig 3g-i) was to evaluate the potential effects of chemogenetic VTA stimulation in the Arc-/- mice. We used Arc-/- mice with mCherry injections to control for the potential effects of CNO administration. While WT mice could be used to determine if adolescent VTA stimulation would lead to long-lasting enhancement of VTA-to-Cortical transmission, this wouldn’t necessarily be a positive control for our experiments, but rather a separate line of inquiry. As dopamine’s effects often display an inverted-U dose-response curve (Vijayraghavan, Wang et al. 2007, Floresco 2013), evaluating the effects adolescent VTA stimulation in the absence of underlying dopamine deficiency could be an interesting future research direction. We have added this discussion into the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      1) Did the SSFO stimulation of the TH+ axons in PFC during adolescence lead to the same long-term change in DA bouton number the authors saw with DREADDs?

      We did not examine the degree of bouton growth in the SSFO cohort, which is a limitation of this study. Accurate quantification of dopamine boutons requires the co-injection of another AAV vector encoding Synaptophysin-GFP to label the boutons. Because we used light to directly stimulate SSFO-labeled dopaminergic axons in the frontal cortex, we were concerned that co-injecting another AAV vector may dilute SSFO-labeling of axons and reduce the efficacy of optogenetic stimulation. Given the behavioral benefits we observed, we would expect an increase in bouton density after optogenetic stimulation. A systematic optimization of viral co-labeling and optogenetic stimulation protocols will facilitate examination of the impact of SSFO stimulation at the structural level in future studies. We have added a discussion of the limitation of this study in the revised manuscript.

      2) The DISC1 section is far less detailed than the Arc section, and it was not completely clear to me that the mechanisms of dysfunction and rescue were the same in these mice compared with the Arc mice. For example, there was no mention of DA bouton density or the patterned firing of the PFC neurons at the time of decision making.

      The initial motivation of this study was to test if adolescent dopamine stimulation can rescue the deficits in the mesofrontal dopaminergic circuit and cognitive function of Arc-/- mice, which were identified in our previous studies (Manago, Mereu et al. 2016). We first conducted multiple levels of analyses including viral tracing, in vivo calcium imaging, and behavioral tests to establish the coherent impacts of adolescent dopamine neuron stimulation on circuits and behaviors. We then examined a range of stimulation protocols to assess the efficacy requirements for cognitive improvement, which is our primary goal. Finally, we included DISC1 mice in our study to test if adolescent dopamine stimulation can also reverse the cognitive deficit in another genetic model for mesofrontal dopamine deficiency. By demonstrating a similar cognitive recuse effect of adolescent VTA stimulation in an independent mouse model, this study provides a foundation for future research to compare the detailed cellular mechanisms that underlie the functional rescue in different genetic models. We have added the discussion of the scope and limitation of this study to the revised manuscript.

      References

      Aransay, A., C. Rodriguez-Lopez, M. Garcia-Amado, F. Clasca and L. Prensa (2015). "Long-range projection neurons of the mouse ventral tegmental area: a single-cell axon tracing analysis." Front Neuroanat 9: 59.

      Arbuthnott, G. W. and J. Wickens (2007). "Space, time and dopamine." Trends Neurosci 30(2): 62-69.

      Arnsten, A. F., J. X. Cai, B. L. Murphy and P. S. Goldman-Rakic (1994). "Dopamine D1 receptor mechanisms in the cognitive performance of young adult and aged monkeys." Psychopharmacology (Berl) 116(2): 143-151.

      Barthas, F. and A. C. Kwan (2017). "Secondary motor cortex: where ‘sensory’meets ‘motor’in the rodent frontal cortex." Trends in neurosciences 40(3): 181-193.

      Berger, B., P. Gaspar and C. Verney (1991). "Dopaminergic innervation of the cerebral cortex: unexpected differences between rodents and primates." Trends Neurosci 14(1): 21-27.

      Caballero, A. and K. Y. Tseng (2016). "GABAergic Function as a Limiting Factor for Prefrontal Maturation during Adolescence." Trends Neurosci 39(7): 441-448.

      Ellwood, I. T., T. Patel, V. Wadia, A. T. Lee, A. T. Liptak, K. J. Bender and V. S. Sohal (2017). "Tonic or Phasic Stimulation of Dopaminergic Projections to Prefrontal Cortex Causes Mice to Maintain or Deviate from Previously Learned Behavioral Strategies." J Neurosci 37(35): 8315-8329.

      Floresco, S. B. (2013). "Prefrontal dopamine and behavioral flexibility: shifting from an "inverted-U" toward a family of functions." Front Neurosci 7: 62.

      Fromer, M., A. J. Pocklington, D. H. Kavanagh, H. J. Williams, S. Dwyer, P. Gormley, L. Georgieva, E. Rees, P. Palta, D. M. Ruderfer, N. Carrera, I. Humphreys, J. S. Johnson, P. Roussos, D. D. Barker, E. Banks, V. Milanova, S. G. Grant, E. Hannon, S. A. Rose, K. Chambert, M. Mahajan, E. M. Scolnick, J. L. Moran, G. Kirov, A. Palotie, S. A. McCarroll, P. Holmans, P. Sklar, M. J. Owen, S. M. Purcell and M. C. O'Donovan (2014). "De novo mutations in schizophrenia implicate synaptic networks." Nature 506(7487): 179-184.

      Gireesh, E. D. and D. Plenz (2008). "Neuronal avalanches organize as nested theta- and beta/gamma-oscillations during development of cortical layer 2/3." Proc Natl Acad Sci U S A 105(21): 7576-7581.

      Goldman-Rakic, P. S., C. Leranth, S. M. Williams, N. Mons and M. Geffard (1989). "Dopamine synaptic complex with pyramidal neurons in primate cerebral cortex." Proc Natl Acad Sci U S A 86(22): 9015-9019.

      Gunaydin, L. A., L. Grosenick, J. C. Finkelstein, I. V. Kauvar, L. E. Fenno, A. Adhikari, S. Lammel, J. J. Mirzabekov, R. D. Airan, K. A. Zalocusky, K. M. Tye, P. Anikeeva, R. C. Malenka and K. Deisseroth (2014). "Natural neural projection dynamics underlying social behavior." Cell 157(7): 1535-1551.

      Hoops, D. and C. Flores (2017). "Making Dopamine Connections in Adolescence." Trends Neurosci 40(12): 709-719.

      Kalsbeek, A., P. Voorn, R. M. Buijs, C. W. Pool and H. B. Uylings (1988). "Development of the dopaminergic innervation in the prefrontal cortex of the rat." J Comp Neurol 269(1): 58-72.

      Lammel, S., A. Hetzel, O. Hackel, I. Jones, B. Liss and J. Roeper (2008). "Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system." Neuron 57(5): 760-773.

      Lammel, S., A. Hetzel, O. Haeckel, I. Jones, B. Liss and J. Roeper (2008). "Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system." Neuron 57(5): 760-773.

      Lammel, S., B. K. Lim, C. Ran, K. W. Huang, M. J. Betley, K. M. Tye, K. Deisseroth and R. C. Malenka (2012). "Input-specific control of reward and aversion in the ventral tegmental area." Nature 491(7423): 212-217.

      Lammel, S., E. E. Steinberg, C. Foldy, N. R. Wall, K. Beier, L. Luo and R. C. Malenka (2015). "Diversity of transgenic mouse models for selective targeting of midbrain dopamine neurons." Neuron 85(2): 429-438.

      Larsen, B. and B. Luna (2018). "Adolescence as a neurobiological critical period for the development of higher-order cognition." Neurosci Biobehav Rev 94: 179-195.

      Lewis, B. L. and P. O'Donnell (2000). "Ventral tegmental area afferents to the prefrontal cortex maintain membrane potential 'up' states in pyramidal neurons via D(1) dopamine receptors." Cereb Cortex 10(12): 1168-1175.

      Li, L., B. Tasic, K. D. Micheva, V. M. Ivanov, M. L. Spletter, S. J. Smith and L. Luo (2010). "Visualizing the distribution of synapses from individual neurons in the mouse brain." PLoS One 5(7): e11503.

      Li, X., J. Qi, T. Yamaguchi, H. L. Wang and M. Morales (2013). "Heterogeneous composition of dopamine neurons of the rat A10 region: molecular evidence for diverse signaling properties." Brain Struct Funct 218(5): 1159-1176.

      Liu, C., P. Goel and P. S. Kaeser (2021). "Spatial and temporal scales of dopamine transmission." Nat Rev Neurosci 22(6): 345-358.

      Lohani, S., A. K. Martig, K. Deisseroth, I. B. Witten and B. Moghaddam (2019). "Dopamine Modulation of Prefrontal Cortex Activity Is Manifold and Operates at Multiple Temporal and Spatial Scales." Cell Rep 27(1): 99-114 e116.

      Manago, F., M. Mereu, S. Mastwal, R. Mastrogiacomo, D. Scheggia, M. Emanuele, M. A. De Luca, D. R. Weinberger, K. H. Wang and F. Papaleo (2016). "Genetic Disruption of Arc/Arg3.1 in Mice Causes Alterations in Dopamine and Neurobehavioral Phenotypes Related to Schizophrenia." Cell Rep 16(8): 2116-2128.

      Mastwal, S., Y. Ye, M. Ren, D. V. Jimenez, K. Martinowich, C. R. Gerfen and K. H. Wang (2014). "Phasic dopamine neuron activity elicits unique mesofrontal plasticity in adolescence." J Neurosci 34(29): 9484-9496.

      Morales, M. and E. B. Margolis (2017). "Ventral tegmental area: cellular heterogeneity, connectivity and behaviour." Nat Rev Neurosci 18(2): 73-85.

      Mukherjee, A., F. Carvalho, S. Eliez and P. Caroni (2019). "Long-Lasting Rescue of Network and Cognitive Dysfunction in a Genetic Schizophrenia Model." Cell 178(6): 1387-1402 e1314. Murty, V. P., F. Calabro and B. Luna (2016). "The role of experience in adolescent cognitive development: Integration of executive, memory, and mesolimbic systems." Neurosci Biobehav Rev 70: 46-58.

      Naneix, F., A. R. Marchand, G. Di Scala, J. R. Pape and E. Coutureau (2012). "Parallel maturation of goal-directed behavior and dopaminergic systems during adolescence." J Neurosci 32(46): 16223-16232.

      Niwa, M., H. Jaaro-Peled, S. Tankou, S. Seshadri, T. Hikida, Y. Matsumoto, N. G. Cascella, S. Kano, N. Ozaki, T. Nabeshima and A. Sawa (2013). "Adolescent stress-induced epigenetic control of dopaminergic neurons via glucocorticoids." Science 339(6117): 335-339.

      Niwa, M., A. Kamiya, R. Murai, K. Kubo, A. J. Gruber, K. Tomita, L. Lu, S. Tomisato, H. Jaaro-Peled, S. Seshadri, H. Hiyama, B. Huang, K. Kohda, Y. Noda, P. O'Donnell, K. Nakajima, A. Sawa and T. Nabeshima (2010). "Knockdown of DISC1 by in utero gene transfer disturbs postnatal dopaminergic maturation in the frontal cortex and leads to adult behavioral deficits." Neuron 65(4): 480-489.

      O'Donnell, P. (2010). "Adolescent maturation of cortical dopamine." Neurotox Res 18(3-4): 306-312.

      Oh, S. W., J. A. Harris, L. Ng, B. Winslow, N. Cain, S. Mihalas, Q. Wang, C. Lau, L. Kuan, A. M. Henry, M. T. Mortrud, B. Ouellette, T. N. Nguyen, S. A. Sorensen, C. R. Slaughterbeck, W. Wakeman, Y. Li, D. Feng, A. Ho, E. Nicholas, K. E. Hirokawa, P. Bohn, K. M. Joines, H. Peng, M. J. Hawrylycz, J. W. Phillips, J. G. Hohmann, P. Wohnoutka, C. R. Gerfen, C. Koch, A. Bernard, C. Dang, A. R. Jones and H. Zeng (2014). "A mesoscale connectome of the mouse brain." Nature 508(7495): 207-214.

      Papathanou, M., S. Dumas, H. Pettersson, L. Olson and A. Wallen-Mackenzie (2019). "Off-Target Effects in Transgenic Mice: Characterization of Dopamine Transporter (DAT)-Cre Transgenic Mouse Lines Exposes Multiple Non-Dopaminergic Neuronal Clusters Available for Selective Targeting within Limbic Neurocircuitry." eNeuro 6(5).

      Patriarchi, T., J. R. Cho, K. Merten, M. W. Howe, A. Marley, W. H. Xiong, R. W. Folk, G. J. Broussard, R. Liang, M. J. Jang, H. Zhong, D. Dombeck, M. von Zastrow, A. Nimmerjahn, V. Gradinaru, J. T. Williams and L. Tian (2018). "Ultrafast neuronal imaging of dopamine dynamics with designed genetically encoded sensors." Science 360(6396): 1420-+.

      Porter, L. L., E. Rizzo and J. P. Hornung (1999). "Dopamine affects parvalbumin expression during cortical development in vitro." J Neurosci 19(20): 8990-9003.

      Purcell, S. M., J. L. Moran, M. Fromer, D. Ruderfer, N. Solovieff, P. Roussos, C. O'Dushlaine, K. Chambert, S. E. Bergen, A. Kahler, L. Duncan, E. Stahl, G. Genovese, E. Fernandez, M. O. Collins, N. H. Komiyama, J. S. Choudhary, P. K. Magnusson, E. Banks, K. Shakir, K. Garimella, T. Fennell, M. DePristo, S. G. Grant, S. J. Haggarty, S. Gabriel, E. M. Scolnick, E. S. Lander, C. M. Hultman, P. F. Sullivan, S. A. McCarroll and P. Sklar (2014). "A polygenic burden of rare disruptive mutations in schizophrenia." Nature 506(7487): 185-190.

      Robbins, T. W. (2000). "Chemical neuromodulation of frontal-executive functions in humans and other animals." Exp Brain Res 133(1): 130-138.

      Rolls, E. T., M. Loh, G. Deco and G. Winterer (2008). "Computational models of schizophrenia and dopamine modulation in the prefrontal cortex." Nat Rev Neurosci 9(9): 696-709.

      Seamans, J. K. and C. R. Yang (2004). "The principal features and mechanisms of dopamine modulation in the prefrontal cortex." Prog Neurobiol 74(1): 1-58.

      Sesack, S. R., V. A. Hawrylak, C. Matus, M. A. Guido and A. I. Levey (1998). "Dopamine axon varicosities in the prelimbic division of the rat prefrontal cortex exhibit sparse immunoreactivity for the dopamine transporter." J Neurosci 18(7): 2697-2708.

      Soden, M. E., S. M. Miller, L. M. Burgeno, P. E. M. Phillips, T. S. Hnasko and L. S. Zweifel (2016). "Genetic Isolation of Hypothalamic Neurons that Regulate Context-Specific Male Social Behavior." Cell Rep 16(2): 304-313.

      Spear, L. (2000). "Modeling adolescent development and alcohol use in animals." Alcohol Res Health 24(2): 115-123.

      Stagkourakis, S., G. Spigolon, P. Williams, J. Protzmann, G. Fisone and C. Broberger (2018). "A neural network for intermale aggression to establish social hierarchy." Nat Neurosci 21(6): 834-842. Sul, J. H., S. Jo, D. Lee and M. W. Jung (2011). "Role of rodent secondary motor cortex in value-based action selection." Nat Neurosci 14(9): 1202-1208.

      Sul, J. H., H. Kim, N. Huh, D. Lee and M. W. Jung (2010). "Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making." Neuron 66(3): 449-460.

      Sulzer, D., S. J. Cragg and M. E. Rice (2016). "Striatal dopamine neurotransmission: regulation of release and uptake." Basal Ganglia 6(3): 123-148.

      Tseng, K. Y. and P. O'Donnell (2007). "Dopamine modulation of prefrontal cortical interneurons changes during adolescence." Cereb Cortex 17(5): 1235-1240.

      Vander Weele, C. M., C. A. Siciliano, G. A. Matthews, P. Namburi, E. M. Izadmehr, I. C. Espinel, E. H. Nieh, E. H. S. Schut, N. Padilla-Coreano, A. Burgos-Robles, C. J. Chang, E. Y. Kimchi, A. Beyeler, R. Wichmann, C. P. Wildes and K. M. Tye (2018). "Dopamine enhances signal-to-noise ratio in cortical-brainstem encoding of aversive stimuli." Nature 563(7731): 397-401.

      Vijayraghavan, S., M. Wang, S. G. Birnbaum, G. V. Williams and A. F. Arnsten (2007). "Inverted-U dopamine D1 receptor actions on prefrontal neurons engaged in working memory." Nat Neurosci 10(3): 376-384.

      Wen, Z., H. N. Nguyen, Z. Guo, M. A. Lalli, X. Wang, Y. Su, N. S. Kim, K. J. Yoon, J. Shin, C. Zhang, G. Makri, D. Nauen, H. Yu, E. Guzman, C. H. Chiang, N. Yoritomo, K. Kaibuchi, J. Zou, K. M. Christian, L. Cheng, C. A. Ross, R. L. Margolis, G. Chen, K. S. Kosik, H. Song and G. L. Ming (2014). "Synaptic dysregulation in a human iPS cell model of mental disorders." Nature 515(7527): 414-418.

      Wood, J., Y. Kim and B. Moghaddam (2012). "Disruption of prefrontal cortex large scale neuronal activity by different classes of psychotomimetic drugs." J Neurosci 32(9): 3022-3031.

      Ye, Y., S. Mastwal, V. Y. Cao, M. Ren, Q. Liu, W. Zhang, A. G. Elkahloun and K. H. Wang (2017). "Dopamine is Required for Activity-Dependent Amplification of Arc mRNA in Developing Postnatal Frontal Cortex." Cereb Cortex 27(7): 3600-3608.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      The main weaknesses of the paper are a lack of significance in key findings, and relatedly, concluding effects from insignificant findings. Additional elements could be improved to help strengthen this overall well-rounded and intriguing set of results.

      In the original manuscript, we reported that chemogenetic silencing of POA-social neurons (previously called POA-iso neurons; more details on rationale for renaming below in our responses to reviewer recommendations) tended to reduce mounting in both single-housed female and single-housed male mice, although these effects were non-significant. We have added samples to both datasets and now report that chemogenetic silencing of POA-social neurons significantly reduces the proportion of trials with mounting in both sexes (Fig. 2C and Fig. 6G). 

      We have also included new analyses to test whether optogenetic activation of POAsocial neurons in group-housed females promotes social investigation (in addition to USV production, as reported in the original manuscript). We now report that optogenetic activation of POA-social neurons significantly increases the probability of social investigation (Fig. 4E-F) and significantly increases the duration of social investigation bouts (Fig. 4G). 

      Additional recommendations from the reviewer are addressed in detail below. Thank you for your critical and insightful feedback.

      Reviewer 2:

      All the activity-dependent labeling experiments with TRAP mice, including the subsequent neural activity manipulation experiments (Figures 2, 3, 4, 5E-F), were conducted by labeling neurons only in socially isolated animals, not group-housed animals. The authors labeled neurons after 30-minute social interactions, raising the possibility that the labeled neurons simply represent a "social interaction/behavior population" (mediating mounting and USVs in females and males) rather than a set of neurons specific to social isolation.

      I strongly recommend including experimental groups that involve labeling neurons after 30minute social interactions in group-housed female or male mice and inhibit TRAPed neurons after social isolation or activate TRAPed neurons after group housing. If manipulating the grouphoused TRAP neurons has similar effects to manipulating the isolated TRAP neurons, it would suggest the current labeling paradigm is not isolating neurons specific to the effect of social isolation per se. Rather, the neurons may mediate more general social interaction or motivationrelated activities. Given the known role of POA in male mating behavior, a group-housed TRAP experiment in males with a female visitor is especially important for understanding the selectivity of the labeled cells.

      Without proper controls, referring to the labeled neurons as "POAiso" neurons is potentially misleading. The data thus far suggests these neurons may predominantly reflect a "POA social behavior" population rather than a set of cells distinctly responsive to isolated housing.

      We agree with the reviewer that the POA neurons we are studying regulate the production of social behaviors in females and males, rather than representing a set of cells distinctly responsive to single housing. To more clearly reflect our thinking, we have changed the name of the neurons from “POA-iso neurons” to “POA-social neurons”. Thank you for this helpful criticism.

      Our Fos data are consistent with the idea that the POA may regulate social behaviors in group-housed females (not just single-housed females). Namely, we found that counts of Fospositive POA neurons are significantly related to rates of social investigation (p = 0.01) and tend to be related to USV rates (p = 0.05) in group-housed females that engaged in same-sex interactions (Fig. S1C). We now include two new sets of experiments aimed at further testing the idea this idea. 

      First, we include 2 control groups in which TRAPing sessions were performed in grouphoused females following same-sex interactions. We find that chemogenetic silencing of grouphoused-TRAPed POA neurons fails to reduce social behaviors in females that are subsequently single-housed and given a same-sex social interaction (Fig. 5A-D), and that optogenetic activation of group-housed-TRAPed POA neurons fails to promote female social behavior (Fig. 5E-H). At face value, these findings do not support the idea that the POA contains neurons that regulate social behaviors in group-housed females.

      However, one important caveat is that group-housed females engage in low rates of social behaviors (low investigation time, no mounting, and few USVs), and thus TRAP-based labeling may not work efficaciously in these mice. There may be POA neurons that regulate social behaviors in group-housed females but that do not upregulate Fos following production of relatively low rates of social behaviors. To test this idea, we also include females in which POA neurons are chemogenetically silenced using a viral strategy that does not depend on activitydependent labeling. In this new experiment, we report that silencing of POA neurons significantly reduces USV production in group-housed females (Fig. 5J-L) and significantly reduces social investigation, mounting, and USV production when these same females are retested following single-housing (Fig. 5M-O). Together, these experiments suggest that the POA may regulate the production of social behaviors during same-sex interactions in group-housed females, but that these effects may be difficult to detect in some cases given the low rates at which group-housed females engage in social behaviors during same-sex interactions relative to single-housed females.

      Finally, we want to highlight an additional new dataset that supports the idea that POAsocial neurons regulate social behaviors, rather than encoding the “state” of social isolation. We now include a control group for the chemogenetic silencing of female POA-social neurons, in which females were single-housed but were not given a social interaction prior to 4-OHT treatment (N = 5 non-social controls). Rates of social behaviors were subsequently unaffected following CNO delivery in these females (Fig. S2D-G). These new data support the conclusion that POA-social neurons regulate the production of social behaviors, rather than encoding the state of social isolation. 

      Reviewer 3:

      While the authors should be commended for performing and reporting multiple circuit perturbation experiments (e.g., chemogenetics, ablation), the conflicting effects on behavior are hard to interpret without additional experiments. For example, chemogenetic silencing of the POA neurons (using DREADDs) attenuated all three behavioral measures but the ablation of the same POA neurons (using CASPACE) decreased mounting duration without impacting social investigation or USV production. Similarly, optogenetic activation of POA neurons was sufficient to generate USV production as reported in earlier studies but mounting or social investigation remained unaffected. 

      Do these discrepancies arise due to the efficiency differences between DREADD-mediated silencing vs. Casp3 ablation? Or does the chemogenetic result reflect off-manifold effects on downstream circuitry whereas a more permanent ablation strategy allows other brain regions to compensate due to redundancy? It is important to resolve whether these arise due to technical reasons or whether these reflect the underlying (perhaps messy) logic of neural circuitry. Therefore, while it is clear that POA neurons likely contribute to multiple behavioral readouts of social isolation, understanding their exact roles in any greater detail will require further experiments.

      We have added new analyses to consider the possibility that optogenetic activation of female POA-social neurons promotes social investigation. In the original manuscript, we analyzed the duration of social investigation bouts in POA-social-ChR2 females according to whether they overlapped with laser stimulation or whether they did not overlap. We realized that we made an error in this first analysis and inadvertently included social investigation bouts that occurred during the first 5 minutes of the social sessions, prior to any laser stimulation. Because these earlier bouts tend to be longer duration than later bouts, this mistake washed out the effect of laser stimulation on social bout duration. After correcting that error, we now report that optogenetic activation of female POA-social neurons lengthens social investigation bout duration (Fig. 4G). Inspired by this interesting finding, we also included analyses of the probability of social investigation following laser stimulation (Fig. 4E-F; excluding laser stimulations that were preceded by social investigation in the pre-laser baseline period). These analyses support the conclusion that optogenetic activation of POA-social neurons promotes both USV production and social investigation in group-housed females.  

      The majority of the females that we used in our TRAP2-based ablation experiments were heterozygous for TRAP2 (N = 11 of 15 POA-social-caspase subjects were TRAP2;Ai14 females), whereas all females used in our chemogenetic silencing experiments were homozygous for TRAP2. To test whether a more effective ablation of POA-social neurons might drive decreases in social investigation and USV production, we set up additional TRAP2 homozygous POA-social-caspase females and directly compare the effects of ablation between the two genotypes (Fig. S3; N = 11 hets in total and N = 9 homozygotes in total). These experiments revealed that effects on mounting were more pronounced following POA-social ablation in TRAP2 homozygotes vs. heterozygotes, but that neither group exhibited decreased social investigation or USV production following 4-OHT treatment.

      To ask whether caspase-mediated ablation in TRAP2 homozygotes was effective in eliminating neural activity associated with social behaviors in females, we performed Fos immunostaining in a subset of the POA-social-caspase TRAP2 homozygotes following a samesex interaction. We found that POA Fos expression was robustly reduced in these females relative to control group-housed and control single-housed females that also engaged in samesex interactions, down to levels seen in group-housed and single-housed females that did not engage in a social interaction (comparison shown in Fig. S3D; control female data same as in Fig. 1). Moreover, the remaining POA Fos in these TRAP2 homozygotes was no longer positively correlated to social investigation or USV production (Fig. S3E-F). Together, these findings lead us to favor the interpretation suggested by the reviewer below, that permanent ablation of POA-social neurons leads to compensation from other brain regions due to redundancy. In addition, our finding that optogenetic activation of POA-social neurons promotes both USV production and social investigation supports the idea that POA-social neurons directly regulate these behaviors. We agree with the reviewer that additional work is needed to understand the complex sex- and context-dependent role played by the POA in the regulation of mouse social behaviors.

      Recommendations for the Authors:

      Reviewer 1 Recommendations:

      (1) The largest issue is that many of the stated "key" behavioral findings are not statistically significant.

      (1a) Figure 2C is not significant and Figure 5G is not significant

      We have added N = 5 POA-social-hM4Di females, N = 3 POA-social-hM4Di males, and N = 3 POA-social-GFP males to the dataset. The decrease in mounting following chemogenetic silencing of POA-social neurons is now statistically significant in both sexes (p < 0.05 for both; see current Figs. 2C and 6G). We also simplified our statistical analysis of mounting in these experiments to consider the proportion of trials with and without resident-initiated mounting on saline vs. CNO days, using McNemar’s test for paired proportions. 

      (1b) Mounting graphs are completely omitted in Figure 4. 

      Given that mounting was only observed infrequently in POA-social-ChR2 females, we simply report this information in the Results text (lines 382-388). In our prior summary of the mounting results, we reported that mounting was observed in a total of 3 trials from 2 females, but we inadvertently included information from a duplicate trial from one of the POA-socialChR2 females in this summary (all other analyses of the POA-social-ChR2 females included one trial per female). We have corrected that error and now report that we observed mounting following laser stimulation in 1 trial from 1 POA-social-ChR2 female. We have expanded our consideration of potential effects of optogenetic activation of POA-social neurons on social investigation and include these new analyses as part of Figure 4 (Fig. 4E-G), following the existing analyses of USV production.

      (1c) Figure 3C shows a reduction of mounting following the ablation of POA (although no stats on the graph to denote significance), but this ablation approach can't resolve whether POA is required to encode the state produced by the short period of isolation, and/or whether it needs to be online at test.

      We have now added an asterisk in Fig. 3C to denote a p value less than 0.05. Thank you for catching our oversight.

      We designed our activity-dependent labeling experiments to TRAP and express viruses in POA neurons that increase their activity in conjunction with the production of social behaviors in single-housed females. We believe our findings our most consistent with the conclusion that these neurons regulate the production of social behaviors, rather than encoding the state of social isolation, and we have renamed these neurons as “POA-social” neurons to better reflect our thinking.

      We also now include control experiments (albeit chemogenetic inhibition, not caspase ablation) in which the TRAP2 strategy is used to express hM4Di in the POA of single-housed females that do not experience a social interaction prior to 4-OHT delivery (non-social controls, Fig. S2D-G). We report that chemogenetic inhibition of these neurons does not decrease social behavior in single-housed females during a subsequent same-sex interaction (p > 0.05 for saline vs. CNO rates of social investigation, mounting, and USVs). These additional findings support the idea that the activity of POA-social neurons is related to the production of social behaviors rather than to the state of social isolation. 

      The reviewer is correct that our ablation approach cannot resolve the question of whether POA-social neuronal activity is required online during testing, but our reversible chemogenetic inhibition experiments provide evidence that the activity of POA-social neurons is required online at the time of testing to regulate social behavior.

      (1d) A similar issue is seen regarding investigation (a general lack of significance with most of the LOF and GOF manipulations).

      As reported in the original manuscript, we find that chemogenetic inhibition of POAsocial neurons reduces social investigation in females, while caspase-mediated ablation of female POA-social neurons does not. Our original caspase dataset used mostly but not all TRAP2 heterozygous females (N = 11 TRAP2 heterozygotes (TRAP2;Ai14), generated by crossing TRAP2 mice with Ai14 mice, for the purpose of visualizing the absence of tdTomato labeling to estimate spread of the caspase virus; and N = 4 TRAP2 homozygotes). By adding to the TRAP2 homozygous caspase dataset and comparing the effects on female social behavior of ablation of POA-social neurons in TRAP2 heterozygous vs. TRAP2 homozygous females, we

      now provide evidence that the attenuation of mounting is more efficacious in TRAP2 homozygous females than in heterozygotes (Fig. S3B). Nonetheless, we fail to see effects on social investigation and USV production, even when caspase ablation of POA-social neurons is performed in TRAP2 homozygous females (Fig. S3A,C). 

      In spite of the lack of effect on these behaviors, we show that caspase-mediated ablation of POA-social neurons in TRAP2 homozygous females leads to a dramatic reduction in social interaction-induced Fos expression in the POA. POA Fos expression in these caspase females is reduced to the levels seen in control group-housed and single-housed females that are not given social interactions and are significantly lower than Fos expression in group-housed and single-housed females that are given a same-sex interaction (Fig. S3D). Moreover, the remaining POA Fos expression in the caspase females is no longer related to rates of social investigation (Fig. S3E), as is normally the case in group-housed and single-housed control females (Fig. S1C, left). Together, these data support the idea that some type of neuronal compensation outside of the POA is occurring following ablation of POA-social neurons, and this compensation permits normal levels of USV production and social investigation.

      As in the original manuscript, we report that chemogenetic inhibition of POA-social neurons in male mice reduces mounting but does not reduce social investigation (or USV production). We now include quantification of social behaviors produced by male and female POA-social-hM4Di mice in the TRAPing sessions that preceded 4-OHT delivery (Fig. S5). These measurements show that males spent significantly more time than females engaged in mounting, and we speculate that this bias in TRAPing session behavior might have led to a bias in TRAP-mediated viral labeling of male POA neurons that regulate mounting, at the expense of male POA neurons that regulate social investigation (or USV production).

      We have added new analyses to consider the possibility that optogenetic activation of female POA-social neurons promotes social investigation. In the original manuscript, we analyzed the duration of social investigation bouts in POA-social-ChR2 females according to whether they overlapped with laser stimulation or whether they did not overlap. We realized that we made an error in this first analysis and inadvertently included social investigation bouts that occurred during the first 5 minutes of the social sessions, prior to any laser stimulation. Because these earlier bouts tend to be longer duration than later bouts, this mistake washed out the effect of laser stimulation on social bout duration. After correcting that error, we now report that optogenetic activation of female POA-social neurons lengthens social investigation bout duration (Fig. 4G). Inspired by this encouraging finding, we also included analyses of the probability of social investigation following laser stimulation (Fig. 4E-F; excluding laser stimulations that were preceded by social investigation in the pre-laser baseline period). These analyses support the conclusion that optogenetic activation of POA-social neurons promotes both USV production and social investigation in group-housed females.

      (2) In Figure 1 and elsewhere, the authors use a Mann-Whitney U test, which should be used for non-parametric data, but in other places, they use statistical tests for normally distributed data. Why? How was the normality of distributions tested?

      We tested the normality of data distributions using the Shapiro-Wilk test. Parametric tests were used for analyses that contained normally distributed data, and non-parametric tests were used for analyses that contained non-normally distributed data. This information is included in the Methods (lines 997-1000), and full details of statistical analyses can be found in Table S1.

      (3) The method for "trapping" neurons that are part of the short-term isolation ensemble has some caveats that have not been adequately addressed. First, 4-OHT was administered after social interaction, but before 24 hours of isolation, making it unclear exactly WHAT is being trapped.

      i) Is it neurons that encode the recent 3-day iso experience? (seems unlikely, as this would have been hours after the end of that iso window)

      We now include a group of control females to directly test this possibility (Fig. S2D-G). These TRAP2 females were single-housed for 3 days but were not given a social interaction prior to 4-OHT treatment (N = 5 non-social controls). Presumably, POA neurons TRAPed in these females might encode the experience of short-term isolation. However, we found that chemogenetic inactivation of these TRAPed neurons during a subsequent same-sex interaction failed to decrease social behaviors in single-housed females (Fig. S2E-G; p > 0.05 for CNO vs. saline rates of social investigation, mounting, and USV production). These control experiments support the idea that we are TRAPing neurons whose activity is related to the production of social behaviors, and we have renamed the neurons as “POA-social” neurons to reflect this thinking.

      ii) Is it neurons that encode the recent behavior impacted by the 3-day iso? (this seems to be the goal, but the authors do not provide evidence that the time course of their injection is efficient enough to recruit the recently activated neurons, nor do they provide evidence that opening the trapping window directly after the behavior is better than directly before)

      We opted to perform IP injections of 4-OHT immediately following the behavior session, rather than behavior, due to concern that handling the mice and delivering IP injections prior to behavior sessions would stress the mice, leading to lower rates of social behaviors. The nonsocial female hM4Di experiments described above support the idea that we are TRAPing neurons related to the production of social behaviors, as the reviewer suggests. 

      iii) Is it trapping neurons active during the subsequent 24 hours of isolation? (seems possible, but this would mean that the authors are looking at a different population of neurons than they claim).

      If chemogenetic silencing of POA neurons that were TRAPed following 3-days of social isolation but in the absence of a social interaction (N = 5 non-social controls, Fig. S2D-G) does not alter social behaviors, there is no compelling reason to hypothesize that TRAPing POA neurons activated following the 24 hours of social isolation that follow a social interaction would do so. Moreover, in the original study characterizing the TRAP2 mice (DeNardo et al., 2019), the authors performed experiments to characterize the time course of TRAPing relative to 4-OHT treatment and concluded that the majority of TRAPing occurs within a 6-hour window centered around the 4-OHT injection.

      (4) Relatedly, the authors seem to find a fair bit of variability in their TRAP-mediated experiments. This begs the question - are the effects of their GOF and LOF approaches

      i) dependent on the iso-behaviors that were "trapped" for each animal (in other words, how does behavior at test 1 correlate with behavior at test 2)? 

      To test the reviewer’s idea, we compared rates of TRAPing session behaviors for the POA-social-hM4Di females to the subsequent effects of neuronal silencing on these behaviors (calculated as (CNO behavior – saline behavior). These correlations are shown in Fig. S2A-C and are all non-significant. We also include below for the reviewer the same types of correlations for the other datasets in our study (loss-of-function experiments: female POAsocial-caspase, male POA-social-hM4Di; and gain-of-function experiments: female POA-socialChR2).

      Author response image 1.

      The only loss-of-function experiment comparison in the above figure that reveals a negative and significant correlation is the mounting comparison for the POA-social-hM4Di males (time spent mounting during TRAPing session vs. (CNO time spent mounting -saline time spent mounting). This significant correlation likely reflects that fact that (1) no males mounted in the CNO session and (2) that mounting rates for individual males are relatively consistent over time (in comparison to female mounting, which is more variable; see Author response image 2 below of TRAPing session vs. saline mounting in male vs. female POA-social-hM4Di experiments). The correlation between TRAPing session and testing session mounting is significant for the POA-social-ChR2 females, but despite the significant correlation, we would want to see more instances of optogenetically-elicited mounting to make any claim about its relationship to TRAPing session behavior.

      Author response image 2.

      Nonetheless, we agree with the reviewer’s intuition that one would expect the effects of POA activity manipulations on different behaviors to scale with rates at which these behaviors were performed during the TRAPing session. We speculate that variability in the TRAPing process might have obscured such a relationship. There is inevitable variability in the exact body cavity placement of IP injections, which can affect drug absorption, and another point is that we delivered a fixed volume of 4-OHT (10 mg/mL 4-OHT in 150 uL filtered corn oil) to all mice in the study, regardless of their weight, which likely added variability in TRAPing efficacy from animal to animal. This detail was reported inaccurately in the Methods, and that error has been corrected (line 920). With regard to our male POA-social-hM4Di dataset, we find that these males spend more time mounting during their TRAPing sessions than female POA-socialhM4Di (Fig. S5; males also spent less time investigating and tended to produce fewer USVs than females), a fact that we hypothesize may have led to a bias toward TRAPing mountingrelated POA neurons in male subjects. In addition, however, the fact that male mice typically weigh more than females and would have received a slightly lower effective dosage of 4-OHT may also have contributed to the weaker effects on behavior in the male POA-social-hM4Di experiments relative to the female POA-social-hM4Di experiments.

      We also want to highlight that interpreting correlations for females between time spent mounting during the TRAPing session and time spent mounting during the test sessions can be complicated. For example, we see 2 cases in the female POA-social-hM4Di dataset in which the female did not mount in the TRAPing session, and then mounted on the saline day (12s and 10s total mounting for those 2 females) but not on the CNO day. One interpretation of the data from these 2 females is that mounting on the TRAPing day is not required to attenuate mounting on the later test days. However, female mounting behavior itself is variable, both across different females and across different tests of a given female, as noted above. If we consider all singlehoused females included in our dataset for which we quantified control behavioral data (i.e., behavior trials from unmanipulated females and TRAPing sessions from females that were later manipulated), we find that mounting is not observed in ~30% of the females (24 of 83). In ongoing behavioral experiments not included in this manuscript, we are investigating factors that regulate female mounting following single-housing. In that dataset, we also see little evidence that female mounting in one social interaction predicts mounting in a subsequent interaction

      (i.e., there don’t appear to stable “high mounters” and “low mounters” following single housing). Thus, the small number of cases in which females did not mount in the TRAPing session and then displayed mounting on the CNO only day are difficult to interpret. 

      Two additional considerations are that TRAPing may not be equally efficacious for POA neurons that regulate different behaviors, and that different behaviors may be differentially sensitive to perturbations of the POA. Previous elegant calcium imaging work has shown that different subsets of Esr1+ POA neurons exhibit activity that is “tuned” to specific behaviors (sniffing vs. mounting in males interacting with females; Yang et al., 2023). However, it is possible that these subsets of neurons display differential levels of Fos expression following the production of their preferred behavior and that some behavior-related subsets may thus be more easily TRAPed than others. It may also be the case that some behaviors are more easily disrupted by POA activity manipulations than others (e.g., perturbation in a smaller percentage of behavior-related POA neurons may be required to disrupt some behaviors relative to others). 

      Despite these caveats, we have two lines of evidence that the effects of chemogenetic silencing of POA-social neurons depends on the behaviors produced during the TRAPing sessions.

      (1) Social behavior is required during the TRAPing session to see subsequent effects on social behavior following chemogenetic silencing of TRAPed POA neurons. In control females that were single-housed but were not given a social interaction prior to 4OHT treatment, social behaviors are not reduced by chemogenetic silencing of TRAPed POA neurons (Figs. S2D-G).

      (2) To directly test whether mounting in the TRAPing session is required to see attenuation of mounting during subsequent chemogenetic silencing of POA-social neurons, we performed control experiments in which single-housed females interacted with a female visitor that was placed under a cup during the TRAPing session prior to 4-OHT treatment. Mounting was not possible in this context, and we also found that females produced lower rates of USVs during the TRAPing session relative to single-housed females engaged in free social interaction. However, subject females spent more time engaged in social investigation of the visitor relative to single-housed females engaged in free social interactions (see Author response image 3 below).

      Author response image 3.

      Unfortunately, none of the experimental females in this cohort displayed mounting in the CNO or saline sessions. Given that we could use this dataset to address the intended question, we did not include it in the manuscript. However, it is quite interesting that female subjects displayed higher than normal social investigation and lower than normal USV production in their TRAPing sessions (relative to single-housed females engaged in free interactions), and subsequently, chemogenetic inhibition of TRAPed POA neurons decreased social investigation but did not decrease USV production (Author response image 4 below). 

      Author response image 4.

      Together, we think our data support the idea that the POA neurons that are TRAPed are related to the social behaviors performed by the animals, but these relationships may be complex and difficult to detect from comparisons across animals within a single experimental group.

      And/or are they

      ii) influenced by the spread or amount of virus for each animal? These correlations could help shed light on what exactly is being trapped - is it specific behaviors or is it the "state" of shortterm isolation?

      Our control experiments with females that were single-housed but did not receive a social interaction prior to 4-OHT treatment provide evidence that the production of social behaviors is required to see subsequent effects on behavior following chemogenetic inhibition of TRAPed POA neurons (Figs. S2D-G).

      The same volume of virus was injected across all activity manipulation experiments (200 nL). Because of the trajectory of our POA viral injections (performed at a slight rostral angle relative to vertical), we did sometimes see viral labeling that spread into the AH caudal to the POA. For this reason, we included the AH TRAPed control group (Fig. 2), to rule out the possibility that viral spread into the AH could account for the effects of chemogenetic silencing of POA-social neurons on female social behaviors. Also because of the injection angle used, we don’t see substantial viral spread rostral to our injection coordinates. In short, there isn’t systematic variability in the targeting or spread of our POA viral injections that can account for variability in the effects on USV production and social investigation of our LOF and GOF manipulations (female hM4Di and female ChR2 experiments).

      In older lesion studies in male rodents and birds, there is some support for the idea that rostral vs. caudal POA neurons differentially regulate appetitive vs. consummatory sexual behaviors (as reviewed in Balthazart and Ball, 2007). However, all of our viral injections were placed in what that review paper would have considered ‘caudal’ POA. We also note that more recent imaging studies have reported that subsets of POA neurons are differentially tuned to male sniffing vs. male mounting (Yang et al.,2023), and these subsets must be relatively co-localized given that they are imaged in the same field of view. Whether distinct subsets of POA neurons regulate the production of different female social behaviors, and if so, how these subsets are localized within the POA, remains an important question for future study.

      (5) The authors label their region of interest as the "POA" but images throughout (e.g. their fos image, Figure 1E), look more like the MPO. Why label it POA?

      The POA neurons in our study are found in a band that spans the medial POA, as well as a bit of the lateral POA. To avoid over-specifying, we call this region the POA more generally.

      (6) In all the experiments, mice are isolated and then re-group housed with siblings. Do all the siblings in the group belong to the same experimental group, or are siblings naïve? This may be critical to help determine whether some of the effects observed may be "group" effects.

      In general, multiple (although not always all) mice in a cage belonged to the same experimental group. In our inhibitory DREADDs experiments, it is unclear how that could drive our observed effects on behavior, given that home cage behavior would only be expected to differ for a given mouse in the time period following their CNO session. 

      For the female POA-social-caspase mice, we cannot rule out the possibility that their home cage behaviors differed in the time period following 4-OHT treatment and re-grouphousing and prior to post-4-OHT behavior measurements. However, given that the only social behavior affected by ablation of POA-social neurons was mounting, and that rates of mounting would be expected to be very low in group-housed females within home cages, it is unclear how our experimental result could be attributed to group effects.

      If by “group” effects the reviewer means “litter” effects, we include a plot below that shows the CNO vs. saline behaviors for the POA-social-hM4Di females, separated by cage ID. There is no evidence that the effects of chemogenetic silencing of POA-social-hM4Di females are being driven by only certain cages (only social investigation and USVs are shown, because mounting was uniformly low (1 of 17 females mounted) in the CNO session).

      Author response image 5.

      (7) For chemogenetic experiments, the authors state that CNO and Saline were given in a counterbalanced order (eg line 189). Did the authors see any order effects?

      We did not see order effects, and we can include plots of those data below for the female and male POA-social-hM4Di groups, with mice plotted according to which treatment they received first.

      Author response image 6.

      (8) In the control experiments in Figure 2 where VMH or AH are chemogenetically silenced, it isn't clear whether these groups include mice that were subjected to 3 days of isolation. Please clarify.

      Yes, these female groups were also subjected to 3 days of isolation (first prior to the TRAPing session, and for a second time prior to the onset of the CNO/saline testing sessions). That information has been clarified in the Results section (line 214) and in the Methods (lines 935-938).

      (9) Line 312. The title for this section, "POA neurons increase their activity....." is somewhat misleading. It sounds like the authors imaged trapped neurons. I think what they mean is that more POA neurons are activated following opposite-sex interactions with males.

      Thanks for this catch. We have modified the section title, as well as the title of the first results sub-section.

      (10) Figure 5A, right panels. The authors fail to find an increase in the investigation of male-male pairs following the short-term isolation of one. This contrasts with the main finding in Matthews et al., 2016 Cell, where short periods of isolation are said to promote pro-social behaviors. The authors could comment on this discrepancy in their discussion (eg difference in testing apparatus/test type? Difference in the number of days of isolation? etc.).

      In current Fig. 6A, there is no significant interaction between the two main effects, but each main effect is significant: single-housed males spend more time investigating partners than group-housed males, and males spend more time investigating female partners than male partners. The significant main effect of housing condition is consistent with the findings of Matthews et al., 2016 and is included within the Results (lines 486-492). 

      (11) Figure 5F, the authors seem to have a main effect of virus (more overall investigation in dreadds mice). Nothing about this is addressed.

      We sometimes see differences in social behavior between cohorts of males when they are tested at different times and, correspondingly, with different groups of female social partners. Our POA-social-hM4Di and POA-social-GFP males were set-up and tested at largely non-overlapping times. We have added a brief note to the Results section to include this information (lines 535-539).

      Reviewer 2 Recommendations:

      (1) (C)ritical control experiments are missing to support this claim (that a population of preoptic hypothalamic neurons contribute to the effects of short-term social isolation on the social behaviors of female mice).  

      (1a) All the activity-dependent labeling experiments with TRAP mice, including the subsequent neural activity manipulation experiments (Figures 2, 3, 4, 5E-F), were conducted by labeling neurons only in socially isolated animals, not group-housed animals. The authors labeled neurons after 30-minute social interactions, raising the possibility that the labeled neurons simply represent a "social interaction/behavior population" (mediating mounting and USVs in females and males) rather than a set of neurons specific to social isolation behaviors of mice)… The data thus far suggests these neurons may predominantly reflect a "POA social behavior" population rather than a set of cells distinctly responsive to isolated housing.

      We agree with the reviewer that the POA neurons we are studying regulate the production of social behaviors in females and males, rather than representing a set of cells distinctly responsive to single housing. To more clearly reflect our thinking, we have changed the name of the neurons from “POA-iso neurons” to “POA-social neurons”. Thank you for this helpful criticism.

      Our Fos data are consistent with the idea that the POA may regulate social behaviors in group-housed females (not just single-housed females). Namely, we found that counts of Fospositive POA neurons are significantly related to rates of social investigation (p = 0.01) and tend to be related to USV rates (p = 0.05) in group-housed females that engaged in same-sex interactions (Fig. S1C). We now include two new sets of experiments aimed at further testing the idea this idea. 

      First, we include 2 control groups in which TRAPing sessions were performed in grouphoused females following same-sex interactions. We find that chemogenetic silencing of these group-housed-TRAPed POA neurons fails to reduce social behaviors in females that are subsequently single-housed and given a same-sex social interaction (Fig. 5A-D; GH-TRAPed POA hM4Di females), and that optogenetic activation of group-housed-TRAPed POA neurons fails to promote female social behavior (Fig. 5E-H; GH-TRAPed POA ChR2 females). At face value, these findings do not support the idea that the POA contains neurons that regulate social behaviors in group-housed females.

      However, one important caveat is that group-housed females engage in low rates of social behaviors (low investigation time, no mounting, and few USVs), and thus TRAP-based labeling may not work efficaciously in these mice. There may be POA neurons that regulate social behaviors in group-housed females but that do not upregulate Fos following production of relatively low rates of social behaviors. To test this idea, we also include females in which POA neurons are chemogenetically silenced using a viral strategy that does not depend on activitydependent labeling. In this new experiment, we report that silencing of POA neurons significantly reduces USV production in group-housed females (Fig. 5J-L) and significantly reduces social investigation, mounting, and USV production when these same females are retested following single-housing (Fig. 5M-O).

      (2) Please add strain background information of subject animals in the methods.

      This information has been added to the Animals section within the Methods (lines 788802).

      Responses to Reviewer 3 Recommendations:

      (1a) (T)he conflicting effects on behavior are hard to interpret without additional experiments….Similarly, optogenetic activation of POA neurons was sufficient to generate USV production as reported in earlier studies but mounting or social investigation remained unaffected. 

      We have added new analyses to consider the possibility that optogenetic activation of female POA-social neurons promotes social investigation. In the original manuscript, we analyzed the duration of social investigation bouts in POA-social-ChR2 females according to whether they overlapped with laser stimulation or whether they did not overlap. We realized that we made an error in this first analysis and inadvertently included social investigation bouts that occurred during the first 5 minutes of the social sessions, prior to any laser stimulation. Because these earlier bouts tend to be longer duration than later bouts, this mistake washed out the effect of laser stimulation on social bout duration. After correcting that error, we now report that optogenetic activation of female POA-social neurons lengthens social investigation bout duration (Fig. 4G). Inspired by this interesting finding, we also included analyses of the probability of social investigation following laser stimulation (Fig. 4E-F; excluding laser stimulations that were preceded by social investigation in the pre-laser baseline period). These analyses support the conclusion that optogenetic activation of POA-social neurons promotes both USV production and social investigation in group-housed females.

      (1b) Do these discrepancies (between hM4Di and caspase) arise due to the efficiency differences between DREADD-mediated silencing vs. Casp3 ablation? Or does the chemogenetic result reflect off-manifold effects on downstream circuitry whereas a more permanent ablation strategy allows other brain regions to compensate due to redundancy? It is important to resolve whether these arise due to technical reasons or whether these reflect the underlying (perhaps messy) logic of neural circuitry.  

      The possibility that the difference in effects on behavior between chemogenetic silencing and caspase ablation at face value seems inconsistent with the findings of previous experiments, in which ablation of large numbers of POA neurons failed to reduce USV production in male mice (POA lesions in Bean et al., 1981; ablation of VGAT+ POA neurons by Gao et al., 2018). These findings stand in contrast to those using chemogenetic silencing of large numbers of POA neurons, which report reduced USV production in male mice (VGAT+/Esr1+ in Karigo et al., 2021; Esr1+ in Chen et al., 2021).

      However, it is the case that the majority of the females that we used in our TRAP2-based ablation experiments were heterozygous for TRAP2 (N = 11 of 15 POA-social-caspase subjects were TRAP2;Ai14 females), whereas all females used in our chemogenetic silencing experiments were homozygous for TRAP2. To test whether a more effective ablation of POAsocial neurons might drive decreases in social investigation and USV production, we set up additional TRAP2 homozygous POA-social-caspase females and directly compare the effects of ablation between the two genotypes (Fig. S3; N = 11 hets in total and N = 9 homozygotes in total). These experiments revealed that effects on mounting were more pronounced following POA-social ablation in TRAP2 homozygotes vs. heterozygotes, but that neither group exhibited decreased social investigation or USV production following 4-OHT treatment.

      To ask whether caspase-mediated ablation in TRAP2 homozygotes was effective in eliminating neural activity associated with social behaviors in females, we performed Fos immunostaining in a subset of the POA-social-caspase TRAP2 homozygotes following a samesex interaction. We found that POA Fos expression was robustly reduced in these females relative to control group-housed and control single-housed females that also engaged in samesex interactions, down to levels seen in group-housed and single-housed females that did not engage in a social interaction (comparison shown in Fig. S3D; control female data same as in Fig. 1). Moreover, the remaining POA Fos in these TRAP2 homozygotes was no longer positively correlated to social investigation or USV production (Fig. S3E-F). Together, these findings lead us to favor the interpretation suggested by the reviewer below, that permanent ablation of POA-social neurons leads to compensation from other brain regions due to redundancy.

      Given the negative results above, we favor this possibility and indicate so in our Discussion. In addition, our finding that optogenetic activation of POA-social neurons promotes both USV production and social investigation supports the idea that POA-social neurons directly regulate these behaviors. We agree with the reviewer that additional work is needed to understand the complex sex- and context-dependent role played by the POA in the regulation of mouse social behaviors.

      (2) L 49: Please define Mesolimbic circuitry the first time it is mentioned.

      We have added a definition (lines 52-53).

      (3) L 210: In Figure 2C, the mounting duration baseline (saline) distribution seems lower than the same experimental baseline in Figures 1C and 3C. Does this reflect natural variability in the behavioral assay and might this be mitigated by additional sampling of animals?

      Yes, there is substantial variability in the display of mounting behavior by single-housed females, including in the proportion of trials with mounting as well as in the total duration of mounting. In the revised manuscript, we have simplified our analysis of mounting in our TRAPbased experiments to quantify the proportion of trials with mounting, rather than considering the total time spent mounting. After adding N = 5 additional females to the POA-social-hM4Di dataset, we now report a statistically significant decrease in the proportion of trials with mounting following chemogenetic silencing of POA-social neurons (Fig. 2C; McNemar’s test for paired proportions). 

      (4) L 310: The authors claim that "These findings suggest that a subset of POAiso neurons overlap with GABAergic, PAG-projecting POA neurons that have been demonstrated in previous work to promote USVs via disinhibition of excitatory PAG neurons important to USV production (Chen et al., 2021; Michael et al., 2020)." I think the data reported suggests the opposite since only 18.3% of all POA->PAG neurons are cFos+. Perhaps better rephrased as "A subset (18.3%) of POA->PAG neurons are labelled by cFos and that is sufficient to drive the production of USVs". Is it surprising?

      We modified the phrasing (lines 468-469), but a bit differently than suggested above, because although we suspect that optogenetic activation of the PAG-projecting neurons within the larger population of POA-social neurons is responsible for eliciting USV production, we did not technically demonstrate this to be the case in the current dataset. 

      We do find it surprising that so few (only ~20%) of PAG-projecting POA neurons upregulate Fos following female-female interactions marked by high rates of USV production. Even though optogenetic activation of PAG-projecting POA neurons elicits USV production, our finding suggests that the majority of PAG-projecting POA neurons may not play a role in regulating vocalization. In future work, it may be useful to apply an intersectional approach to further understand how the POA regulates USV production (for example, measure or manipulate activity selectively in projection-defined subsets of POA-social neurons).

      (5) Given the considerable prior evidence of POA->PAG circuit in promoting USVs, it is hard to understand why chemogenetic inactivation of POA neurons in males affects mounting but not USV production (Figures 5F-H). Any potential explanation for this discrepancy?

      We have two ideas about this surprising result. First, we examined the TRAPing session social behaviors of female and male POA-social-hM4Di mice. We found that male POA-socialhM4Di mice spent more time than female subjects mounting during the TRAPing sessions, and conversely, males spent less time investigating visitors and tended to produce fewer USVs than female subjects (Fig. S5). Given that our labeling method is activity-dependent, one possibility is that this bias in behavior is reflected in a bias toward labeling of POA neurons related to mounting.  

      Second, each mouse in the TRAP2-based hM4Di datasets received an IP injection of the same amount of 4-OHT (150 nL of 10 mg/mL 4-OHT in filtered corn oil) not adjusted for weight of the mouse. This information was not reported accurately in the Methods, and we have adjusted that section accordingly (line 920). As a result, because male mice typically weigh more than females and would have received a lower effective dosage of 4-OHT, another possibility is that TRAPing in males was less efficient than in females and accounts for the less complete effects on social behaviors. We have added language to the Results to discuss these possibilities (lines 540-560).

      (6) L 472: Typo. "we found that short-term isolation exerts more robust on the effects of male behavior during subsequent interactions with females than during interactions with males."

      Thank you for catching this mistake.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors address whether the dorsal nucleus of the inferior colliculus (DCIC) in mice encodes sound source location within the front horizontal plane (i.e., azimuth). They do this using volumetric two-photon Ca2+ imaging and high-density silicon probes (Neuropixels) to collect single-unit data. Such recordings are beneficial because they allow large populations of simultaneous neural data to be collected. Their main results and the claims about those results are the following:

      (1) DCIC single-unit responses have high trial-to-trial variability (i.e., neural noise);

      (2) approximately 32% to 40% of DCIC single units have responses that are sensitive tosound source azimuth;

      (3) single-trial population responses (i.e., the joint response across all sampled single unitsin an animal) encode sound source azimuth "effectively" (as stated in title) in that localization decoding error matches average mouse discrimination thresholds;

      (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleusof the inferior colliculus (as stated in Abstract);

      (5) evidence of noise correlation between pairs of neurons exists;

      and 6) noise correlations between responses of neurons help reduce population decoding error.

      While simultaneous recordings are not necessary to demonstrate results #1, #2, and #4, they are necessary to demonstrate results #3, #5, and #6.

      Strengths:

      - Important research question to all researchers interested in sensory coding in the nervous system.

      - State-of-the-art data collection: volumetric two-photon Ca2+ imaging and extracellularrecording using high-density probes. Large neuronal data sets.

      - Confirmation of imaging results (lower temporal resolution) with more traditionalmicroelectrode results (higher temporal resolution).

      - Clear and appropriate explanation of surgical and electrophysiological methods. I cannot comment on the appropriateness of the imaging methods.

      Strength of evidence for claims of the study:

      (1) DCIC single-unit responses have high trial-to-trial variability - The authors' data clearlyshows this.

      (2) Approximately 32% to 40% of DCIC single units have responses that are sensitive tosound source azimuth - The sensitivity of each neuron's response to sound source azimuth was tested with a Kruskal-Wallis test, which is appropriate since response distributions were not normal. Using this statistical test, only 8% of neurons (median for imaging data) were found to be sensitive to azimuth, and the authors noted this was not significantly different than the false positive rate. The Kruskal-Wallis test was not performed on electrophysiological data. The authors suggested that low numbers of azimuth-sensitive units resulting from the statistical analysis may be due to the combination of high neural noise and relatively low number of trials, which would reduce statistical power of the test. This may be true, but if single-unit responses were moderately or strongly sensitive to azimuth, one would expect them to pass the test even with relatively low statistical power. At best, if their statistical test missed some azimuthsensitive units, they were likely only weakly sensitive to azimuth. The authors went on to perform a second test of azimuth sensitivity-a chi-squared test-and found 32% (imaging) and 40% (e-phys) of single units to have statistically significant sensitivity. This feels a bit like fishing for a lower p-value. The Kruskal-Wallis test should have been left as the only analysis. Moreover, the use of a chi-squared test is questionable because it is meant to be used between two categorical variables, and neural response had to be binned before applying the test.

      The determination of what is a physiologically relevant “moderate or strong azimuth sensitivity” is not trivial, particularly when comparing tuning across different relays of the auditory pathway like the CNIC, auditory cortex, or in our case DCIC, where physiologically relevant azimuth sensitivities might be different. This is likely the reason why azimuth sensitivity has been defined in diverse ways across the bibliography (see Groh, Kelly & Underhill, 2003 for an early discussion of this issue). These diverse approaches include reaching a certain percentage of maximal response modulation, like used by Day et al. (2012, 2015, 2016) in CNIC, and ANOVA tests, like used by Panniello et al. (2018) and Groh, Kelly & Underhill (2003) in auditory cortex and IC respectively. Moreover, the influence of response variability and biases in response distribution estimation due to limited sampling has not been usually accounted for in the determination of azimuth sensitivity.

      As Reviewer #1 points out, in our study we used an appropriate ANOVA test (KruskalWallis) as a starting point to study response sensitivity to stimulus azimuth at DCIC. Please note that the alpha = 0.05 used for this test is not based on experimental evidence about physiologically relevant azimuth sensitivity but instead is an arbitrary p-value threshold. Using this test on the electrophysiological data, we found that ~ 21% of the simultaneously recorded single units reached significance (n = 4 mice). Nevertheless these percentages, in our small sample size (n = 4) were not significantly different from our false positive detection rate (p = 0.0625, Mann-Whitney, See Author response image 1 below).  In consequence, for both our imaging (Fig. 3C) and electrophysiological data, we could not ascertain if the percentage of neurons reaching significance in these ANOVA tests were indeed meaningfully sensitive to azimuth or this was due to chance. 

      Author response image 1.

      Percentage of the neuropixels recorded DCIC single units across mice that showed significant median response tuning, compared to false positive detection rate (α = 0.05, chance level).

      We reasoned that the observed markedly variable responses from DCIC units, which frequently failed to respond in many trials (Fig. 3D, 4A), in combination with the limited number of trial repetitions we could collect, results in under-sampled response distribution estimations. This under-sampling can bias the determination of stochastic dominance across azimuth response samples in Kruskal-Wallis tests. We would like to highlight that we decided not to implement resampling strategies to artificially increase the azimuth response sample sizes with “virtual trials”, in order to avoid “fishing for a smaller p-value”, when our collected samples might not accurately reflect the actual response population variability.

      As an alternative to hypothesis testing based on ranking and determining stochastic dominance of one or more azimuth response samples (Kruskal-Wallis test), we evaluated the overall statistical dependency to stimulus azimuth of the collected responses.  To do this we implement the Chi-square test by binning neuronal responses into categories. Binning responses into categories can reduce the influence of response variability to some extent, which constitutes an advantage of the Chi-square approach, but we note the important consideration that these response categories are arbitrary.

      Altogether, we acknowledge that our Chi-square approach to define azimuth sensitivity is not free of limitations and despite enabling the interrogation of azimuth sensitivity at DCIC, its interpretability might not extend to other brain regions like CNIC or auditory cortex. Nevertheless we hope the aforementioned arguments justify why the Kruskal-Wallis test simply could not “have been left as the only analysis”.

      (3) Single-trial population responses encode sound source azimuth "effectively" in that localization decoding error matches average mouse discrimination thresholds - If only one neuron in a population had responses that were sensitive to azimuth, we would expect that decoding azimuth from observation of that one neuron's response would perform better than chance. By observing the responses of more than one neuron (if more than one were sensitive to azimuth), we would expect performance to increase. The authors found that decoding from the whole population response was no better than chance. They argue (reasonably) that this is because of overfitting of the decoder modeltoo few trials used to fit too many parameters-and provide evidence from decoding combined with principal components analysis which suggests that overfitting is occurring. What is troubling is the performance of the decoder when using only a handful of "topranked" neurons (in terms of azimuth sensitivity) (Fig. 4F and G). Decoder performance seems to increase when going from one to two neurons, then decreases when going from two to three neurons, and doesn't get much better for more neurons than for one neuron alone. It seems likely there is more information about azimuth in the population response, but decoder performance is not able to capture it because spike count distributions in the decoder model are not being accurately estimated due to too few stimulus trials (14, on average). In other words, it seems likely that decoder performance is underestimating the ability of the DCIC population to encode sound source azimuth.

      To get a sense of how effective a neural population is at coding a particular stimulus parameter, it is useful to compare population decoder performance to psychophysical performance. Unfortunately, mouse behavioral localization data do not exist. Therefore, the authors compare decoder error to mouse left-right discrimination thresholds published previously by a different lab. However, this comparison is inappropriate because the decoder and the mice were performing different perceptual tasks. The decoder is classifying sound sources to 1 of 13 locations from left to right, whereas the mice were discriminating between left or right sources centered around zero degrees. The errors in these two tasks represent different things. The two data sets may potentially be more accurately compared by extracting information from the confusion matrices of population decoder performance. For example, when the stimulus was at -30 deg, how often did the decoder classify the stimulus to a lefthand azimuth? Likewise, when the stimulus was +30 deg, how often did the decoder classify the stimulus to a righthand azimuth?

      The azimuth discrimination error reported by Lauer et al. (2011) comes from engaged and highly trained mice, which is a very different context to our experimental setting with untrained mice passively listening to stimuli from 13 random azimuths. Therefore we did not perform analyses or interpretations of our results based on the behavioral task from Lauer et al. (2011) and only made the qualitative observation that the errors match for discussion.

      We believe it is further important to clarify that Lauer et al. (2011) tested the ability of mice to discriminate between a positively conditioned stimulus (reference speaker at 0º center azimuth associated to a liquid reward) and a negatively conditioned stimulus (coming from one of five comparison speakers positioned at 20º, 30º, 50º, 70 and 90º azimuth, associated to an electrified lickport) in a conditioned avoidance task. In this task, mice are not precisely “discriminating between left or right sources centered around zero degrees”, making further analyses to compare the experimental design of Lauer et al (2011) and ours even more challenging for valid interpretation.

      (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleusof the inferior colliculus - It is unclear what exactly the authors mean by this statement in the Abstract. There are major differences in the encoding of azimuth between the two neighboring brain areas: a large majority of neurons in the CNIC are sensitive to azimuth (and strongly so), whereas the present study shows a minority of azimuth-sensitive neurons in the DCIC. Furthermore, CNIC neurons fire reliably to sound stimuli (low neural noise), whereas the present study shows that DCIC neurons fire more erratically (high neural noise).

      Since sound source azimuth is reported to be encoded by population activity patterns at CNIC (Day and Delgutte, 2013), we refer to a population activity pattern code as the “similar format” in which this information is encoded at DCIC. Please note that this is a qualitative comparison and we do not claim this is the “same format”, due to the differences the reviewer precisely describes in the encoding of azimuth at CNIC where a much larger majority of neurons show stronger azimuth sensitivity and response reliability with respect to our observations at DCIC. By this qualitative similarity of encoding format we specifically mean the similar occurrence of activity patterns from azimuth sensitive subpopulations of neurons in both CNIC and DCIC, which carry sufficient information about the stimulus azimuth for a sufficiently accurate prediction with regard to the behavioral discrimination ability.

      (5) Evidence of noise correlation between pairs of neurons exists - The authors' data andanalyses seem appropriate and sufficient to justify this claim.

      (6) Noise correlations between responses of neurons help reduce population decodingerror - The authors show convincing analysis that performance of their decoder increased when simultaneously measured responses were tested (which include noise correlation) than when scrambled-trial responses were tested (eliminating noise correlation). This makes it seem likely that noise correlation in the responses improved decoder performance. The authors mention that the naïve Bayesian classifier was used as their decoder for computational efficiency, presumably because it assumes no noise correlation and, therefore, assumes responses of individual neurons are independent of each other across trials to the same stimulus. The use of decoder that assumes independence seems key here in testing the hypothesis that noise correlation contains information about sound source azimuth. The logic of using this decoder could be more clearly spelled out to the reader. For example, if the null hypothesis is that noise correlations do not carry azimuth information, then a decoder that assumes independence should perform the same whether population responses are simultaneous or scrambled. The authors' analysis showing a difference in performance between these two cases provides evidence against this null hypothesis.

      We sincerely thank the reviewer for this careful and detailed consideration of our analysis approach. Following the reviewer’s constructive suggestion, we justified the decoder choice in the results section at the last paragraph of page 18:

      “To characterize how the observed positive noise correlations could affect the representation of stimulus azimuth by DCIC top ranked unit population responses, we compared the decoding performance obtained by classifying the single-trial response patterns from top ranked units in the modeled decorrelated datasets versus the acquired data (with noise correlations). With the intention to characterize this with a conservative approach that would be less likely to find a contribution of noise correlations as it assumes response independence, we relied on the naive Bayes classifier for decoding throughout the study. Using this classifier, we observed that the modeled decorrelated datasets produced stimulus azimuth prediction error distributions that were significantly shifted towards higher decoding errors (Fig. 5B, C) and, in our imaging datasets, were not significantly different from chance level (Fig. 5B). Altogether, these results suggest that the detected noise correlations in our simultaneously acquired datasets can help reduce the error of the IC population code for sound azimuth.”

      Minor weakness:

      - Most studies of neural encoding of sound source azimuth are done in a noise-free environment, but the experimental setup in the present study had substantial background noise. This complicates comparison of the azimuth tuning results in this study to those of other studies. One is left wondering if azimuth sensitivity would have been greater in the absence of background noise, particularly for the imaging data where the signal was only about 12 dB above the noise. The description of the noise level and signal + noise level in the Methods should be made clearer. Mice hear from about 2.5 - 80 kHz, so it is important to know the noise level within this band as well as specifically within the band overlapping with the signal.

      We agree with the reviewer that this information is useful. In our study, the background R.M.S. SPL during imaging across the mouse hearing range (2.5-80kHz) was 44.53 dB and for neuropixels recordings 34.68 dB. We have added this information to the methods section of the revised manuscript.

      Reviewer #2 (Public Review):

      In the present study, Boffi et al. investigate the manner in which the dorsal cortex of the of the inferior colliculus (DCIC), an auditory midbrain area, encodes sound location azimuth in awake, passively listening mice. By employing volumetric calcium imaging (scanned temporal focusing or s-TeFo), complemented with high-density electrode electrophysiological recordings (neuropixels probes), they show that sound-evoked responses are exquisitely noisy, with only a small portion of neurons (units) exhibiting spatial sensitivity. Nevertheless, a naïve Bayesian classifier was able to predict the presented azimuth based on the responses from small populations of these spatially sensitive units. A portion of the spatial information was provided by correlated trial-to-trial response variability between individual units (noise correlations). The study presents a novel characterization of spatial auditory coding in a non-canonical structure, representing a noteworthy contribution specifically to the auditory field and generally to systems neuroscience, due to its implementation of state-of-the-art techniques in an experimentally challenging brain region. However, nuances in the calcium imaging dataset and the naïve Bayesian classifier warrant caution when interpreting some of the results.

      Strengths:

      The primary strength of the study lies in its methodological achievements, which allowed the authors to collect a comprehensive and novel dataset. While the DCIC is a dorsal structure, it extends up to a millimetre in depth, making it optically challenging to access in its entirety. It is also more highly myelinated and vascularised compared to e.g., the cerebral cortex, compounding the problem. The authors successfully overcame these challenges and present an impressive volumetric calcium imaging dataset. Furthermore, they corroborated this dataset with electrophysiological recordings, which produced overlapping results. This methodological combination ameliorates the natural concerns that arise from inferring neuronal activity from calcium signals alone, which are in essence an indirect measurement thereof.

      Another strength of the study is its interdisciplinary relevance. For the auditory field, it represents a significant contribution to the question of how auditory space is represented in the mammalian brain. "Space" per se is not mapped onto the basilar membrane of the cochlea and must be computed entirely within the brain. For azimuth, this requires the comparison between miniscule differences between the timing and intensity of sounds arriving at each ear. It is now generally thought that azimuth is initially encoded in two, opposing hemispheric channels, but the extent to which this initial arrangement is maintained throughout the auditory system remains an open question. The authors observe only a slight contralateral bias in their data, suggesting that sound source azimuth in the DCIC is encoded in a more nuanced manner compared to earlier processing stages of the auditory hindbrain. This is interesting, because it is also known to be an auditory structure to receive more descending inputs from the cortex.

      Systems neuroscience continues to strive for the perfection of imaging novel, less accessible brain regions. Volumetric calcium imaging is a promising emerging technique, allowing the simultaneous measurement of large populations of neurons in three dimensions. But this necessitates corroboration with other methods, such as electrophysiological recordings, which the authors achieve. The dataset moreover highlights the distinctive characteristics of neuronal auditory representations in the brain. Its signals can be exceptionally sparse and noisy, which provide an additional layer of complexity in the processing and analysis of such datasets. This will be undoubtedly useful for future studies of other less accessible structures with sparse responsiveness.

      Weaknesses:

      Although the primary finding that small populations of neurons carry enough spatial information for a naïve Bayesian classifier to reasonably decode the presented stimulus is not called into question, certain idiosyncrasies, in particular the calcium imaging dataset and model, complicate specific interpretations of the model output, and the readership is urged to interpret these aspects of the study's conclusions with caution.

      I remain in favour of volumetric calcium imaging as a suitable technique for the study, but the presently constrained spatial resolution is insufficient to unequivocally identify regions of interest as cell bodies (and are instead referred to as "units" akin to those of electrophysiological recordings). It remains possible that the imaging set is inadvertently influenced by non-somatic structures (including neuropil), which could report neuronal activity differently than cell bodies. Due to the lack of a comprehensive ground-truth comparison in this regard (which to my knowledge is impossible to achieve with current technology), it is difficult to imagine how many informative such units might have been missed because their signals were influenced by spurious, non-somatic signals, which could have subsequently misled the models. The authors reference the original Nature Methods article (Prevedel et al., 2016) throughout the manuscript, presumably in order to avoid having to repeat previously published experimental metrics. But the DCIC is neither the cortex nor hippocampus (for which the method was originally developed) and may not have the same light scattering properties (not to mention neuronal noise levels). Although the corroborative electrophysiology data largely eleviates these concerns for this particular study, the readership should be cognisant of such caveats, in particular those who are interested in implementing the technique for their own research.

      A related technical limitation of the calcium imaging dataset is the relatively low number of trials (14) given the inherently high level of noise (both neuronal and imaging). Volumetric calcium imaging, while offering a uniquely expansive field of view, requires relatively high average excitation laser power (in this case nearly 200 mW), a level of exposure the authors may have wanted to minimise by maintaining a low the number of repetitions, but I yield to them to explain.

      We assumed that the levels of heating by excitation light measured at the neocortex in Prevedel et al. (2016), were representative for DCIC also. Nevertheless, we recognize this approximation might not be very accurate, due to the differences in tissue architecture and vascularization from these two brain areas, just to name a few factors. The limiting factor preventing us from collecting more trials in our imaging sessions was that we observed signs of discomfort or slight distress in some mice after ~30 min of imaging in our custom setup, which we established as a humane end point to prevent distress. In consequence imaging sessions were kept to 25 min in duration, limiting the number of trials collected. However we cannot rule out that with more extensive habituation prior to experiments the imaging sessions could be prolonged without these signs of discomfort or if indeed influence from our custom setup like potential heating of the brain by illumination light might be the causing factor of the observed distress. Nevertheless, we note that previous work has shown that ~200mW average power is a safe regime for imaging in the cortex by keeping brain heating minimal (Prevedel et al., 2016), without producing the lasting damages observed by immunohistochemisty against apoptosis markers above 250mW (Podgorski and Ranganathan 2016, https://doi.org/10.1152/jn.00275.2016).

      Calcium imaging is also inherently slow, requiring relatively long inter-stimulus intervals (in this case 5 s). This unfortunately renders any model designed to predict a stimulus (in this case sound azimuth) from particularly noisy population neuronal data like these as highly prone to overfitting, to which the authors correctly admit after a model trained on the entire raw dataset failed to perform significantly above chance level. This prompted them to feed the model only with data from neurons with the highest spatial sensitivity. This ultimately produced reasonable performance (and was implemented throughout the rest of the study), but it remains possible that if the model was fed with more repetitions of imaging data, its performance would have been more stable across the number of units used to train it. (All models trained with imaging data eventually failed to converge.) However, I also see these limitations as an opportunity to improve the technology further, which I reiterate will be generally important for volume imaging of other sparse or noisy calcium signals in the brain.

      Transitioning to the naïve Bayesian classifier itself, I first openly ask the authors to justify their choice of this specific model. There are countless types of classifiers for these data, each with their own pros and cons. Did they actually try other models (such as support vector machines), which ultimately failed? If so, these negative results (even if mentioned en passant) would be extremely valuable to the community, in my view. I ask this specifically because different methods assume correspondingly different statistical properties of the input data, and to my knowledge naïve Bayesian classifiers assume that predictors (neuronal responses) are assumed to be independent within a class (azimuth). As the authors show that noise correlations are informative in predicting azimuth, I wonder why they chose a model that doesn't take advantage of these statistical regularities. It could be because of technical considerations (they mention computing efficiency), but I am left generally uncertain about the specific logic that was used to guide the authors through their analytical journey.

      One of the main reasons we chose the naïve Bayesian classifier is indeed because it assumes that the responses of the simultaneously recorded neurons are independent and therefore it does not assume a contribution of noise correlations to the estimation of the posterior probability of each azimuth. This model would represent the null hypothesis that noise correlations do not contribute to the encoding of stimulus azimuth, which would be verified by an equal decoding outcome from correlated or decorrelated datasets. Since we observed that this is not the case, the model supports the alternative hypothesis that noise correlations do indeed influence stimulus azimuth encoding. We wanted to test these hypotheses with the most conservative approach possible that would be least likely to find a contribution of noise correlations. Other relevant reasons that justify our choice of the naive Bayesian classifier are its robustness against the limited numbers of trials we could collect in comparison to other more “data hungry” classifiers like SVM, KNN, or artificial neuronal nets. We did perform preliminary tests with alternative classifiers but the obtained decoding errors were similar when decoding the whole population activity (Author response image 2A). Dimensionality reduction following the approach described in the manuscript showed a tendency towards smaller decoding errors observed with an alternative classifier like KNN, but these errors were still larger than the ones observed with the naive Bayesian classifier (median error 45º). Nevertheless, we also observe a similar tendency for slightly larger decoding errors in the absence of noise correlations (decorrelated, Author response image 2B). Sentences detailing the logic of classifier choice are now included in the results section at page 10 and at the last paragraph of page 18 (see responses to Reviewer 1).

      Author response image 2.

      A) Cumulative distribution plots of the absolute cross-validated single-trial prediction errors obtained using different classifiers (blue; KNN: K-nearest neighbors; SVM: support vector machine ensemble) and chance level distribution (gray) on the complete populations of imaged units. Cumulative distribution plots of the absolute cross-validated singletrial prediction errors obtained using a Bayes classifier (naive approximation for computation efficiency) to decode the single-trial response patterns from the 31 top ranked units in the simultaneously imaged datasets across mice (cyan), modeled decorrelated datasets (orange) and the chance level distribution associated with our stimulation paradigm (gray). Vertical dashed lines show the medians of cumulative distributions. K.S. w/Sidak: Kolmogorov-Smirnov with Sidak.

      That aside, there remain other peculiarities in model performance that warrant further investigation. For example, what spurious features (or lack of informative features) in these additional units prevented the models of imaging data from converging?

      Considering the amount of variability observed throughout the neuronal responses both in imaging and neuropixels datasets, it is easy to suspect that the information about stimulus azimuth carried in different amounts by individual DCIC neurons can be mixed up with information about other factors (Stringer et al., 2019). In an attempt to study the origin of these features that could confound stimulus azimuth decoding we explored their relation to face movement (Supplemental Figure 2), finding a correlation to snout movements, in line with previous work by Stringer et al. (2019).

      In an orthogonal question, did the most spatially sensitive units share any detectable tuning features? A different model trained with electrophysiology data in contrast did not collapse in the range of top-ranked units plotted. Did this model collapse at some point after adding enough units, and how well did that correlate with the model for the imaging data?

      Our electrophysiology datasets were much smaller in size (number of simultaneously recorded neurons) compared to our volumetric calcium imaging datasets, resulting in a much smaller total number of top ranked units detected per dataset. This precluded the determination of a collapse of decoder performance due to overfitting beyond the range plotted in Fig 4G.

      How well did the form (and diversity) of the spatial tuning functions as recorded with electrophysiology resemble their calcium imaging counterparts? These fundamental questions could be addressed with more basic, but transparent analyses of the data (e.g., the diversity of spatial tuning functions of their recorded units across the population). Even if the model extracts features that are not obvious to the human eye in traditional visualisations, I would still find this interesting.

      The diversity of the azimuth tuning curves recorded with calcium imaging (Fig. 3B) was qualitatively larger than the ones recorded with electrophysiology (Fig. 4B), potentially due to the larger sampling obtained with volumetric imaging. We did not perform a detailed comparison of the form and a more quantitative comparison of the diversity of these functions because the signals compared are quite different, as calcium indicator signal is subject to non linearities due to Ca2+ binding cooperativity and low pass filtering due to binding kinetics. We feared this could lead to misleading interpretations about the similarities or differences between the azimuth tuning functions in imaged and electrophysiology datasets. Our model uses statistical response dependency to stimulus azimuth, which does not rely on features from a descriptive statistic like mean response tuning. In this context, visualizing the trial-to-trial responses as a function of azimuth shows “features that are not obvious to the human eye in traditional visualizations” (Fig. 3D, left inset).

      Finally, the readership is encouraged to interpret certain statements by the authors in the current version conservatively. How the brain ultimately extracts spatial neuronal data for perception is anyone's guess, but it is important to remember that this study only shows that a naïve Bayesian classifier could decode this information, and it remains entirely unclear whether the brain does this as well. For example, the model is able to achieve a prediction error that corresponds to the psychophysical threshold in mice performing a discrimination task (~30 {degree sign}). Although this is an interesting coincidental observation, it does not mean that the two metrics are necessarily related. The authors correctly do not explicitly claim this, but the manner in which the prose flows may lead a non-expert into drawing that conclusion.

      To avoid misleading the non-expert readers, we have clarified in the manuscript that the observed correspondence between decoding error and psychophysical threshold is explicitly coincidental.

      Page 13, end of middle paragraph:

      “If we consider the median of the prediction error distribution as an overall measure of decoding performance, the single-trial response patterns from subsamples of at least the 7 top ranked units produced median decoding errors that coincidentally matched the reported azimuth discrimination ability of mice (Fig 4G, minimum audible angle = 31º) (Lauer et al., 2011).”

      Page 14, bottom paragraph:

      “Decoding analysis (Fig. 4F) of the population response patterns from azimuth dependent top ranked units simultaneously recorded with neuropixels probes showed that the 4 top ranked units are the smallest subsample necessary to produce a significant decoding performance that coincidentally matches the discrimination ability of mice (31° (Lauer et al., 2011)) (Fig. 5F, G).”

      We also added to the Discussion sentences clarifying that a relationship between these two variables remains to be determined and it also remains to be determined if the DCIC indeed performs a bayesian decoding computation for sound localization.

      Page 20, bottom:

      “… Concretely, we show that sound location coding does indeed occur at DCIC on the single trial basis, and that this follows a comparable mechanism to the characterized population code at CNIC (Day and Delgutte, 2013). However, it remains to be determined if indeed the DCIC network is physiologically capable of Bayesian decoding computations. Interestingly, the small number of DCIC top ranked units necessary to effectively decode stimulus azimuth suggests that sound azimuth information is redundantly distributed across DCIC top ranked units, which points out that mechanisms beyond coding efficiency could be relevant for this population code.

      While the decoding error observed from our DCIC datasets obtained in passively listening, untrained mice coincidentally matches the discrimination ability of highly trained, motivated mice (Lauer et al., 2011), a relationship between decoding error and psychophysical performance remains to be determined. Interestingly, a primary sensory representations should theoretically be even more precise than the behavioral performance as reported in the visual system (Stringer et al., 2021).”

      Moreover, the concept of redundancy (of spatial information carried by units throughout the DCIC) is difficult for me to disentangle. One interpretation of this formulation could be that there are non-overlapping populations of neurons distributed across the DCIC that each could predict azimuth independently of each other, which is unlikely what the authors meant. If the authors meant generally that multiple neurons in the DCIC carry sufficient spatial information, then a single neuron would have been able to predict sound source azimuth, which was not the case. I have the feeling that they actually mean "complimentary", but I leave it to the authors to clarify my confusion, should they wish.

      We observed that the response patterns from relatively small fractions of the azimuth sensitive DCIC units (4-7 top ranked units) are sufficient to generate an effective code for sound azimuth, while 32-40% of all simultaneously recorded DCIC units are azimuth sensitive. In light of this observation, we interpreted that the azimuth information carried by the population should be redundantly distributed across the complete subpopulation of azimuth sensitive DCIC units.

      In summary, the present study represents a significant body of work that contributes substantially to the field of spatial auditory coding and systems neuroscience. However, limitations of the imaging dataset and model as applied in the study muddles concrete conclusions about how the DCIC precisely encodes sound source azimuth and even more so to sound localisation in a behaving animal. Nevertheless, it presents a novel and unique dataset, which, regardless of secondary interpretation, corroborates the general notion that auditory space is encoded in an extraordinarily complex manner in the mammalian brain.

      Reviewer #3 (Public Review):

      Summary:

      Boffi and colleagues sought to quantify the single-trial, azimuthal information in the dorsal cortex of the inferior colliculus (DCIC), a relatively understudied subnucleus of the auditory midbrain. They used two complementary recording methods while mice passively listened to sounds at different locations: a large volume but slow sampling calcium-imaging method, and a smaller volume but temporally precise electrophysiology method. They found that neurons in the DCIC were variable in their activity, unreliably responding to sound presentation and responding during inter-sound intervals. Boffi and colleagues used a naïve Bayesian decoder to determine if the DCIC population encoded sound location on a single trial. The decoder failed to classify sound location better than chance when using the raw single-trial population response but performed significantly better than chance when using intermediate principal components of the population response. In line with this, when the most azimuth dependent neurons were used to decode azimuthal position, the decoder performed equivalently to the azimuthal localization abilities of mice. The top azimuthal units were not clustered in the DCIC, possessed a contralateral bias in response, and were correlated in their variability (e.g., positive noise correlations). Interestingly, when these noise correlations were perturbed by inter-trial shuffling decoding performance decreased. Although Boffi and colleagues display that azimuthal information can be extracted from DCIC responses, it remains unclear to what degree this information is used and what role noise correlations play in azimuthal encoding.

      Strengths:

      The authors should be commended for collection of this dataset. When done in isolation (which is typical), calcium imaging and linear array recordings have intrinsic weaknesses. However, those weaknesses are alleviated when done in conjunction with one another - especially when the data largely recapitulates the findings of the other recording methodology. In addition to the video of the head during the calcium imaging, this data set is extremely rich and will be of use to those interested in the information available in the DCIC, an understudied but likely important subnucleus in the auditory midbrain.

      The DCIC neural responses are complex; the units unreliably respond to sound onset, and at the very least respond to some unknown input or internal state (e.g., large inter-sound interval responses). The authors do a decent job in wrangling these complex responses: using interpretable decoders to extract information available from population responses.

      Weaknesses:

      The authors observe that neurons with the most azimuthal sensitivity within the DCIC are positively correlated, but they use a Naïve Bayesian decoder which assume independence between units. Although this is a bit strange given their observation that some of the recorded units are correlated, it is unlikely to be a critical flaw. At one point the authors reduce the dimensionality of their data through PCA and use the loadings onto these components in their decoder. PCA incorporates the correlational structure when finding the principal components and constrains these components to be orthogonal and uncorrelated. This should alleviate some of the concern regarding the use of the naïve Bayesian decoder because the projections onto the different components are independent. Nevertheless, the decoding results are a bit strange, likely because there is not much linearly decodable azimuth information in the DCIC responses. Raw population responses failed to provide sufficient information concerning azimuth for the decoder to perform better than chance. Additionally, it only performed better than chance when certain principal components or top ranked units contributed to the decoder but not as more components or units were added. So, although there does appear to be some azimuthal information in the recoded DCIC populations - it is somewhat difficult to extract and likely not an 'effective' encoding of sound localization as their title suggests.

      As described in the responses to reviewers 1 and 2, we chose the naïve Bayes classifier as a decoder to determine the influence of noise correlations through the most conservative approach possible, as this classifier would be least likely to find a contribution of correlated noise. Also, we chose this decoder due to its robustness against limited numbers of trials collected, in comparison to “data hungry” non linear classifiers like KNN or artificial neuronal nets. Lastly, we observed that small populations of noisy, unreliable (do not respond in every trial) DCIC neurons can encode stimulus azimuth in passively listening mice matching the discrimination error of trained mice. Therefore, while this encoding is definitely not efficient, it can still be considered effective.

      Although this is quite a worthwhile dataset, the authors present relatively little about the characteristics of the units they've recorded. This may be due to the high variance in responses seen in their population. Nevertheless, the authors note that units do not respond on every trial but do not report what percent of trials that fail to evoke a response. Is it that neurons are noisy because they do not respond on every trial or is it also that when they do respond they have variable response distributions? It would be nice to gain some insight into the heterogeneity of the responses.

      The limited number of azimuth trial repetitions that we could collect precluded us from making any quantification of the unreliability (failures to respond) and variability in the response distributions from the units we recorded, as we feared they could be misleading. In qualitative terms, “due to the high variance in responses seen” in the recordings and the limited trial sampling, it is hard to make any generalization. In consequence we referred to the observed response variance altogether as neuronal noise. Considering these points, our datasets are publicly available for exploration of the response characteristics.

      Additionally, is there any clustering at all in response profiles or is each neuron they recorded in the DCIC unique?

      We attempted to qualitatively visualize response clustering using dimensionality reduction, observing different degrees of clustering or lack thereof across the azimuth classes in the datasets collected from different mice. It is likely that the limited number of azimuth trials we could collect and the high response variance contribute to an inconsistent response clustering across datasets.

      They also only report the noise correlations for their top ranked units, but it is possible that the noise correlations in the rest of the population are different.

      For this study, since our aim was to interrogate the influence of noise correlations on stimulus azimuth encoding by DCIC populations, we focused on the noise correlations from the top ranked unit subpopulation, which likely carry the bulk of the sound location information.  Noise correlations can be defined as correlation in the trial to trial response variation of neurons. In this respect, it is hard to ascertain if the rest of the population, that is not in the top rank unit percentage, are really responding and showing response variation to evaluate this correlation, or are simply not responding at all and show unrelated activity altogether. This makes observations about noise correlations from “the rest of the population” potentially hard to interpret.

      It would also be worth digging into the noise correlations more - are units positively correlated because they respond together (e.g., if unit x responds on trial 1 so does unit y) or are they also modulated around their mean rates on similar trials (e.g., unit x and y respond and both are responding more than their mean response rate). A large portion of trial with no response can occlude noise correlations. More transparency around the response properties of these populations would be welcome.

      Due to the limited number of azimuth trial repetitions collected, to evaluate noise correlations we used the non parametric Kendall tau correlation coefficient which is a measure of pairwise rank correlation or ordinal association in the responses to each azimuth. Positive rank correlation would represent neurons more likely responding together. Evaluating response modulation “around their mean rates on similar trials” would require assumptions about the response distributions, which we avoided due to the potential biases associated with limited sample sizes.

      It is largely unclear what the DCIC is encoding. Although the authors are interested in azimuth, sound location seems to be only a small part of DCIC responses. The authors report responses during inter-sound interval and unreliable sound-evoked responses. Although they have video of the head during recording, we only see a correlation to snout and ear movements (which are peculiar since in the example shown it seems the head movements predict the sound presentation). Additional correlates could be eye movements or pupil size. Eye movement are of particular interest due to their known interaction with IC responses - especially if the DCIC encodes sound location in relation to eye position instead of head position (though much of eye-position-IC work was done in primates and not rodent). Alternatively, much of the population may only encode sound location if an animal is engaged in a localization task. Ideally, the authors could perform more substantive analyses to determine if this population is truly noisy or if the DCIC is integrating un-analyzed signals.

      We unsuccessfully attempted eye tracking and pupillometry in our videos. We suspect that the reason behind this is a generally overly dilated pupil due to the low visible light illumination conditions we used which were necessary to protect the PMT of our custom scope.

      It is likely that DCIC population activity is integrating un-analyzed signals, like the signal associated with spontaneous behaviors including face movements (Stringer et al., 2019), which we observed at the level of spontaneous snout movements. However investigating if and how these signals are integrated to stimulus azimuth coding requires extensive behavioral testing and experimentation which is out of the scope of this study. For the purpose of our study, we referred to trial-to-trial response variation as neuronal noise. We note that this definition of neuronal noise can, and likely does, include an influence from un-analyzed signals like the ones from spontaneous behaviors.

      Although this critique is ubiquitous among decoding papers in the absence of behavioral or causal perturbations, it is unclear what - if any - role the decoded information may play in neuronal computations. The interpretation of the decoder means that there is some extractable information concerning sound azimuth - but not if it is functional. This information may just be epiphenomenal, leaking in from inputs, and not used in computation or relayed to downstream structures. This should be kept in mind when the authors suggest their findings implicate the DCIC functionally in sound localization.

      Our study builds upon previous reports by other independent groups relying on “causal and behavioral perturbations” and implicating DCIC in sound location learning induced experience dependent plasticity (Bajo et al., 2019, 2010; Bajo and King, 2012), which altogether argues in favor of DCIC functionality in sound localization.

      Nevertheless, we clarified in the discussion of the revised manuscript that a relationship between the observed decoding error and the psychophysical performance, or the ability of the DCIC network to perform Bayesian decoding computations, both remain to be determined (please see responses to Reviewer #2).

      It is unclear why positive noise correlations amongst similarly tuned neurons would improve decoding. A toy model exploring how positive noise correlations in conjunction with unreliable units that inconsistently respond may anchor these findings in an interpretable way. It seems plausible that inconsistent responses would benefit from strong noise correlations, simply by units responding together. This would predict that shuffling would impair performance because you would then be sampling from trials in which some units respond, and trials in which some units do not respond - and may predict a bimodal performance distribution in which some trials decode well (when the units respond) and poor performance (when the units do not respond).

      In samples with more that 2 dimensions, the relationship between signal and noise correlations is more complex than in two dimensional samples (Montijn et al., 2016) which makes constructing interpretable and simple toy models of this challenging. Montijn et al. (2016) provide a detailed characterization and model describing how the accuracy of a multidimensional population code can improve when including “positive noise correlations amongst similarly tuned neurons”. Unfortunately we could not successfully test their model based on Mahalanobis distances as we could not verify that the recorded DCIC population responses followed a multivariate gaussian distribution, due to the limited azimuth trial repetitions we could sample.

      Significance:

      Boffi and colleagues set out to parse the azimuthal information available in the DCIC on a single trial. They largely accomplish this goal and are able to extract this information when allowing the units that contain more information about sound location to contribute to their decoding (e.g., through PCA or decoding on top unit activity specifically). The dataset will be of value to those interested in the DCIC and also to anyone interested in the role of noise correlations in population coding. Although this work is first step into parsing the information available in the DCIC, it remains difficult to interpret if/how this azimuthal information is used in localization behaviors of engaged mice.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      General:

      The manuscript is generally well written, but could benefit from a quick proof by a native English speaker (e.g., "the" inferior colliculus is conventionally used with its article). The flow of arguments is also generally easy to follow, but I would kindly ask the authors to consider elaborating or clarifying the following points (including those already mentioned in my public review).

      (1) Choice of model:

      There are countless ways one can construct a decoder or classifier that can predict a presented sensory stimulus based on a population neuronal response. Given the assumptions of independence as mentioned in my public review, I would ask the authors to explicitly justify their choice of a naïve Bayesian classifier.

      A section detailing the logic of classifier choice is now included in the results section at page 10 and the last paragraph of page 18 from the revised version of the manuscript.

      (2) Number of imaging repetitions:

      For particularly noisy datasets, 14 repetitions is indeed quite few. I reckon this was not the choice of the authors, but rather limited by the inherent experimental conditions. Despite minimisation of required average laser power during the development of s-TeFo imaging, the authors still required almost 200 mW (which is still quite a lot of exposure). Although 14 repetitions for 13 azimuthal locations every 5 s is at face value a relatively short imaging session (~15 min.), at 191 mW, with the desire to image mice multiple times, I could imagine that this is a practical limitation the authors faced (to avoid excessive tissue heating or photodamage, which was assessed in the original Nature Methods article, but not here). Nevertheless, this logic (or whatever logic they had) should be explained for non-imaging experts in the readership.

      This is now addressed in the answers to the public reviews.

      (3) Redundancy:

      It is honestly unclear to me what the authors mean by this. I don't speculate that they mean there are "redundant" (small) populations of neurons that sufficiently encode azimuth, but I'm actually not certain. If that were the case, I believe this would need further clarification, since redundant representations would be both inconsistent with the general (perhaps surprising) finding that large populations are not required in the DCIC, which is thought to be the case at earlier processing stages.

      In the text we are referring to the azimuth information being redundantly distributed across DCIC top ranked units. We do not mention redundant “populations of neurons”.

      (4) Correspondence of decoding accuracy with psychometric functions in mice: While this is an interesting coincidental observation, it should not be interpreted that the neuronal detection threshold in the DCIC somehow is somehow responsible its psychometric counterpart (which is an interesting yet exceedingly complex question). Although I do not believe the authors intended to suggest this, I would personally be cautious in the way I describe this correspondence. I mention this because the authors point it out multiple times in the manuscript (whereas I would have just mentioned it once in passing).

      This is now clarified in the revised manuscript.

      (5) Noisy vs. sparse:

      I'm confident that the authors understand the differences between these terms, both in concept (stochastic vs. scattered) and in context (neuronal vs. experimental), but I personally would be cautious in the way I use them in the description of the study. Indeed, auditory neuronal signals are to my knowledge generally thought to be both sparse and noisy, which is in itself interesting, but the study also deals with substantial experimental (recording) noise, and I think it's important for the readership to understand when "noise" refers to the recordings (in particular the imaging data) and to neuronal activity. I mention this specifically because "noisy" appears in the title.

      We have clarified this issue at the bottom of page 5 by adding the following sentences to the revised manuscript:

      “In this section we used the word “noise” to refer to the sound stimuli used and recording setup background sound levels or recording noise in the acquired signals. To avoid confusion, from now on in the manuscript the word “noise” will be used in the context of neuronal noise, which is the trial-to-trial variation in neuronal responses unrelated to stimuli, unless otherwise noted.”

      (6)  More details in the Methods:

      The Methods section is perhaps the least-well structured part of the present manuscript in my view, and I encourage the authors to carefully go through it and add the following information (in case I somehow missed it).

      a. Please also indicate the number of animals used here.

      Added.

      b. How many sessions were performed on each mouse?

      This is already specified in the methods section in page 25:

      “mice were imaged a total of 2-11 times (sessions), one to three times a week.”

      We added for clarification:

      “Datasets here analyzed and reported come from the imaging session in which we observed maximal calcium sensor signal (peak AAV expression) and maximum number of detected units.”

      c. For the imaging experiments, was it possible to image the same units from session tosession?

      This is not possible for sTeFo 2P data due to low spatial resolution which makes precisely matching neuron ROIs across sessions challenging.

      d. Could the authors please add more detail to the analyses of the videos (to track facialmovements) or provide a reference?

      Added citation.

      e. The same goes for the selection of subcellular regions of interest that were used as"units."

      Added to page 25:

      “We used the CaImAn package (Giovannucci et al., 2019) for automatic ROI segmentation through constrained non negative matrix factorization and selected ROIs (Units) showing clear Ca transients consistent with neuronal activity, and IC neuron somatic shape and size (Schofield and Beebe, 2019).”

      Specific: In order to maximise the efficiency of my comments and suggestions (as there are no line numbers), my numerated points are organised in sequential order.

      (1) Abstract: I wouldn't personally motivate the study with the central nucleus of the IC (i.e. Idon't think this is necessary). I think the authors can motivate it simply with the knowledge gaps in spatial coding throughout the auditory system, in which such large data sets such as the ones presented here are of general value.

      (2) Page 4: 15-50 kHz "white" noise is incorrect. It should be "band-passed" noise.

      Changed.

      (3) Supplemental figure 1, panel A: Since the authors could not identify cell bodiesunequivocally from their averaged volume timeseries data, it would be clearer to the readership if larger images are shown, so that they can evaluate (speculate) for themselves what subcellular structures were identified as units. Even better would be to include a planar image through a cross-section. As mentioned above, not everything determined for the cortex or hippocampus can be assumed to be true for the DCIC.

      The raw images and segmentations are publicly available for detailed inspections.

      (4) Supplemental figure 2, panel A: This panel requires further explanation, in particular thepanel on the right. I assume that to be a simple subtraction of sequential frames, but I'm thrown off by the "d(Grey)" colour bar. Also, if "grey" refers to the neutral colour, it is conventionally spelled "gray" in US-American English.

      Changed.

      (5) Supplemental figure 2, panel B: I'm personally curious why the animals exhibitedmovement just prior to a stimulus. Did they learn to anticipate the presentation of a sound after some habituation? Is that somehow a pre-emptive startle response? We observe that in our own experiments (but as we stochastically vary the inter-trial-intervals, the movement typically occurs directly after the stimulus). I don't suggest the authors dwell on this, but I find it an interesting observation.

      It is indeed interesting, but we can’t conclude much about it without comparing it to random inter-trial-intervals.

      (6) Supplemental figure 3: I personally find these data (decoding of all electrophysiologicaldata) of central relevance to the study, since it mirrors the analyses presented for its imaging data counterpart and encourage the authors to move it to the main text.

      Changed.

      (7) Page 12: Do the authors have any further analyses of spatial tuning functions? We allknow they can parametrically obscure (i.e., bi-lobed, non-monotonic, etc.), but having these parameters (even if just in a supplemental figure) would be informative for the spatial auditory community.

      We dedicated significant effort to attempt to parametrize and classify the azimuth response dependency functions from the recorded DCIC cells in an unbiased way. Nevertheless, given the observed response noise and the “obscure” properties of spatial tuning functions mentioned by the reviewer, we could only reach the general qualitative observation of having a more frequent contralateral selectivity.

      (8) Page 14 (end): Here, psychometric correspondence is referenced. Please add theLauer et al., (2011) reference, or, as I would, remove the statement entirely and save it for the discussion (where it is also mentioned and referenced).

      Changed.

      (9) Figure 5, Panels B and C: Why don't the authors report the Kruskal-Wallis tests (forincreasing number of units training the model), akin to e.g., Panel G of Figure 4? I think that would be interesting to see (e.g., if the number of required units to achieve statistical significance is the same).

      Within class randomization produced a moderate effect on decoder performance, achieving statistical significance at similar numbers of units, as seen in figure 5 panels B and C. We did not include these plots for the sake of not cluttering the figure with dense distributions and fuzzing the visualization of the differences between the distributions shown.

      (10) Figure 5, Panels B and C (histograms): I see a bit of skewedness in the distributions(even after randomisation). Where does this come from? This is just a small talking point.

      We believe this is potentially due to more than one distribution of pairwise correlations combined into one histogram (like in a Gaussian mixture model).

      (11) Page 21: Could the authors please specify that the Day and Delgutte (2013) study wasperformed on rabbits? Since rabbits have an entirely different spectral hearing range compared to mice, spatial coding principles could very well be different in those animals (and I'm fairly certain such a study has not yet been published for mice).

      Specified.

      (12) Page 22: I'd encourage the authors to remove the reference to Rayleigh's duplextheory, since mice hardly (if at all) use interaural time differences for azimuthal sound localisation, given their generally high-frequency hearing range.

      That sentence is meant to discuss beyond the mouse model an exciting outlook of our findings in light of previous reports, which is a hypothetical functional relationship between the tonotopy in DCIC and the spatial distribution of azimuth sensitive DCIC neurons. We have clarified this now in the text.

      (13) Page 23: I believe the conventional verb for gene delivery with viruses is still"transduce" (or "infect", but not "induce"). What was the specific "syringe" used for stereotactic injections? Also, why were mice housed separately after surgery? This question pertains to animal welfare.

      Changed. The syringe was a 10ml syringe to generate positive or negative pressure, coupled to the glass needle through a silicon tubing via a luer 3-way T valve. Single housing was chosen to avoid mice compromising each other’s implantations. Therefore this can be seen as a refinement of our method to maximize the chances of successful imaging per implanted mouse.

      (14) Page 25: Could the authors please indicate the refractory period violation time windowhere? I had to find it buried in the figure caption of Supplementary figure 1.

      Added.

      (15) Page 27: What version of MATLAB was used? This could be important for reproductionof the analyses, since The Mathworks is infamously known to add (or even more deplorably, modify) functions in particular versions (and not update older ones accordingly).

      Added.

      Reviewer #3 (Recommendations For The Authors):

      Overall I thought this was a nice manuscript and a very interesting dataset. Here are some suggestions and minor corrections:

      You may find this work of interest - 'A monotonic code for sound azimuth in primate inferior colliculus' 2003, Groh, Kelly & Underhill.

      We thank the reviewer for pointing out this extremely relevant reference, which we regrettably failed to cite. It is now included in the revised version of the manuscript.

      In your introduction, you state "our findings point to a functional role of DCIC in sound location coding". Though your results show that there is azimuthal information contained in a subset of DCIC units there's no evidence in the manuscript that shows a functional link between this representation and sound localization.

      This is now addressed in the answers to the public reviews.

      I found the variability in your DCIC population quite striking - especially during the intersound intervals. The entrainment of the population in the imaging datatset suggests some type of input activating the populations - maybe these are avenues for further probing the variability here:

      (1) I'm curious if you can extract eye movements from your video. Work from Jennifer Grohshows that some cells in the primate inferior colliculus are sensitive to different eye positions (Groh et. al., 2001). With recent work showing eye movements in rodents, it may explain some of the variance in the DCIC responses.

      This is now addressed in the answers to the public reviews.

      (2) I was also curious if the motor that moves the speaker made noise It could be possiblesome of the 'on going' activity could be some sound-evoked response.

      We were careful to set the stepper motor speed so that it produced low frequency noise, within a band mostly outside of the hearing range of mice (<4kHz). Nevertheless, we cannot fully rule out that a very quiet but perhaps very salient component of the motor noise could influence the activity during the inter trial periods. The motor was stationary and quiet for a period of at least one stimulus duration before and during stimulus presentation.  

      (3) Was the sound you present frozen or randomly generated on each trial? Could therebe some type of structure in the noise you presented that sometimes led cells to respond to a particular azimuth location but not others?

      The sound presented was frozen noise. This is now clarified in the methods section.

      It may be useful to quantify the number of your units that had refractory period violations.

      Our manual curation of sorted units was very stringent to avoid mixing differently tuned neurons. The single units analyzed had very infrequent refractory period violations, in less than ~5% of the spikes, considering a 2 ms refractory period.

      Was the video recording contralateral or ipsilateral to the recording?

      The side of the face ipsilateral to the imaged IC was recorded. Added to methods.

      I was struck by the snout and ear movements - in the example shown in Supplementary Figure 2B it appears as they are almost predicting sound onset. Was there any difference in ear movements in the habituated and non-habituated animals? Also, does the placement of the cranial window disturb any of the muscles used in ear movement?

      Mouse snout movements appear to be quite active perhaps reflecting arousal (Stringer et al., 2019). We cannot rule out that the cranial window implantation disturbed ear movement but while moving the mouse headfixed we observed what could be considered normal ear movements.

      Did you correlate time-point by time-point in the average population activity and movement or did you try different temporal labs/leads in case the effect of the movements was delayed in some way?

      Point by point due to 250ms time resolution of imaging.

      Are the video recordings only available during the imaging? It would be nice to see the same type of correlations in the neuropixel-acquired data as well.

      Only imaging. For neuropixels recordings, we were skeptical about face videography as we suspected that face movements were likely influenced by the acute nature of the preparation procedure. Our cranial window preparation in the other hand involved a recovery period of at least 4 weeks. Therefore we were inclined to perform videographical interrogation of face movements on these mice instead.

      If you left out more than 1 trial do you think this would help your overfitting issue (e.g. leaving out 20% of the data).

      Due to the relatively small number of trial repetitions collected, fitting the model with an even smaller training dataset is unlikely to help overfitting and will likely decrease decoder performance.

      It would be nice to see a confusion matrix - even though azimuthal error and cumulative distribution of error are a fine way to present the data - a confusion matrix would tell us which actual sounds the decoder is confusing. Just looking at errors could result in some funky things where you reduce the error generally but never actually estimate the correct location.

      We considered confusion matrices early on in our study but they were not easily interpretable or insightful, likely due to the relatively low discrimination ability of the mouse model with +/- 30º error after extensive training. Therefore, we reasoned that in passively listening mice (and likely trained mice too) with limited trial repetitions, an undersampled and diffuse confusion matrix is expected which is not an ideal means of visualizing and comparing decoding errors. Hence we relied on cumulative error distributions.

      Do your top-ranked units have stronger projections onto your 10-40 principal components?

      It would be interesting to know if the components are mostly taking into account those 30ish percent of the population that is dependent upon azimuth.

      Inspection of PC loadings across units ranked based on response dependency to stimulus azimuth does not show a consistent stronger projection of top ranked units onto the first 10-40 principal components (Author response image 3).

      Author response image 3.

      PC loading matrices for each recorded mouse. The units recorded in each mouse are ranked in descending order of response dependency to stimulus azimuth based on  the p value of the chi square test. Units above the red dotted line display a chi square p value < 0.05, units below this line have p values >= 0.05.

      How much overlap is there in the tuning of the top-ranked units?

      This is quite varying from mouse to mouse and imaging vs electrophysiology, which makes it hard to make a generalization since this might depend on the unique DCIC population sampled in each mouse.

      I'm not really sure I follow what the nS/N adds - it doesn't really measure tuning but it seems to be introduced to discuss/extract some measure of tuning.

      nS/N is used to quantify how noisy neurons are, independent of how sensitive their responses are to the stimulus azimuth.

      Is the noise correlation - observed to become more positive - for more contralateral stimuli a product of higher firing rates due to a more preferred stimulus presentation or a real effect in the data? Was there any relationship between distance and strength of observed noise correlation in the DCIC?

      We observed a consistent and homogeneous trend of pairwise noise correlation distributions either shifted or tailed towards more positive values across stimulus azimuths, for imaging and electrophysiology datasets (Author response image 3). The lower firing frequency observed in neuropixels recordings in response to ipsilateral azimuths could have affected the statistical power of the comparison between the pairwise noise correlation coefficient distribution to its randomized chance level, but the overall histogram shapes qualitatively support this consistent trend across azimuths (Author response image 4).

      Author response image 4.

      Distribution histograms for the pairwise correlation coefficients (Kendall tau) from pairs of simultaneously recorded top ranked units across mice (blue) compared to the chance level distribution obtained through randomization of the temporal structure of each unit’s activity to break correlations (purple). Vertical lines show the medians of these distributions. Imaging data comes from n = 12 mice and neuropixels data comes from n = 4 mice.

      Typos:

      'a population code consisting on the simultaneous" > should on be of?

      'half of the trails' > trails should be trials?

      'referncing the demuxed channels' > should it be demixed?

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Tateishi et al. report a Tn-seq-based analysis of genetic requirements for growth and fitness in 8 clinical strains of Mycobacterium intracellulare Mi), and compare the findings with a type strain ATCC13950. The study finds a core set of 131 genes that are essential in all nine strains, and therefore are reasonably argued as potential drug targets. Multiple other genes required for fitness in clinical isolates have been found to be important for hypoxic growth in the type strain.

      Strengths:

      The study has generated a large volume of Tn-seq datasets of multiple clinical strains of Mi from multiple growth conditions, including from mouse lungs. The dataset can serve as an important resource for future studies on Mi, which despite being clinically significant remains a relatively understudied species of mycobacteria.

      Thank you for reviewing our manuscript and finding the significance of our data.

      Weaknesses:

      The paper lacks clarity in data presentation and organization. For example, some of the key data on cfu counts of clinical Mi strains in a mouse model can be presented along with the Tn-seq dataset in Figure 6, the visualization of which can be improved with volcano plots. etc. Improvement in data visualization is perhaps necessary throughout the paper.

      Thank you for the comment on the data presentation of in vivo studies. We previously revealed the time-course data on CFUs, animal survival, and tissue pathology from the pure strains (Tateishi Y. BMC Microbiol. 2023; new Ref #22) . Based on these data, we assumed that we would be able to harvest sufficient number of colonies from mice infected with M.i.27 or M.i.198, and we performed in vivo TnSeq studies using these two strains. We have referred to our previous publication (new Ref #22) on the virulence of MAC-PD strains used in this study for mice in the revised manuscript (page12, line 212).

      The data of CFU counts were shown in new Supplementary Fig. 3b. In the manuscript text, we explained as follows (page 12, lines 212-216): “The time course of the changes in the bacterial burden showed a pattern similar to those of the wild-type strains M.i.198 and M.i.27, respectively, except that it was not possible to harvest sufficient colonies (as few as 104/mouse) in the few mice infected with the M.i.27 Tn mutant strain in week 8 and week 16 (page 12, lines 212-216; new Supplementary Fig, 3b, new Supplementary Table 8)”.

      Regarding the suggestion to include volcano plots, we appreciate the proposal but chose not to adopt this format, as the main aim of this study was to identify genes commonly required for in vitro and in vivo fitness across multiple M. intracellulare strains, rather than to highlight differential genetic requirements within a single strain. Volcano plots are useful for visualizing differential values and significance for a single dataset but are less suited for cross-strain comparisons of shared gene sets. Our approach is aligned with the methodology used by Cary et al. (PLoS Pathog. 2018; new Ref#8), who similarly focused on identifying conserved genetic requirements across M. tuberculosis genotypes without employing volcano plots.

      [References]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      The primary claim of the study that the clinical strains are better adapted for hypoxic growth is not well-supported by the data presented in Figure 7.

      Thank you for the comments on the difference of adaptation for hypoxic growth between ATCC13950 and clinical MAC-PD strains. To clarify, growth rates shown in Figure 7 were calculated at the inflection point (midpoint) of the growth curves, which were modeled using a four-parameter logistic (4P logistic) model. As described in the Discussion, we found the pattern of hypoxic adaptation characteristics of the clinical MAC-PD strains from the growth curve forms. Taking into consideration the impact of growing bacteria on the disease progression of MAC-PD, the slow growth with early entry to log phase implicates the continuous impact on the infected hosts during logarithmic bacterial growth, which may be involved in the persistent and steadily progressive illness of MAC-PD for years in humans.

      Unlike time-lapse imaging assay, the completely seamless sampling of culture for CFU assay is impossible. Nevertheless, we collected sufficient timepoints to allow reliable curve fitting with the 4P logistic model and thus consider our growth data to represent a valid approximation of continuous growth dynamics.

      Regarding the suggestion of mixed culture experiments, we agree that such studies could be informative. However, co-culture conditions introduce additional variables, including inter-strain competition or synergy, which can obscure the specific contributions of hypoxic adaptation in each strain. Therefore, we believe that the current approach using monoculture growth curves under defined oxygen conditions offers a clearer interpretation of strain-specific hypoxic responses.

      The title of the paper is misleading as the study doesn't provide any mechanistic aspect of hypoxic adaptation in Mi.

      Thank you for the comment on the article title. We admit that this paper does not directly reveal the mechanism of hypoxic adaptation in M. intracellulare strains but provides the data on the different pattern of hypoxic adaptation between M. intracellulare strains in relation to the difference of genetic requirements. Therefore, we revised the title as ”Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare

      Reviewer #2 (Public Review):

      Summary:

      In the study titled "Functional genomics reveals the mechanism of hypoxic adaptation in nontuberculous mycobacteria" by Tateishi et al., the authors have used TnSeq to identify the common essential and growth-defect-associated genes that represent the genomic diversity of clinical M. intracellulare strains in comparison to the reference type strain. By estimating the frequency of Tn insertion, the authors speculate that genes involved in gluconeogenesis, the type VII secretion system, and cysteine desulfurase are relatively critical in the clinical MAC-PD strains than in the type strain, both for the extracellular survival and in a mouse lung infection model.

      Based on their analysis, the authors proposed to identify the mechanism of hypoxic adaptation in nontuberculous mycobacteria (NTM) which offer promising drug targets in the strains causing clinical Mycobacterium avium-intracellulare complex pulmonary disease (MAC-PD).

      Strengths:

      A major strength of the manuscript is the performance of the exhaustive set of TnSeq experiments with multiple strains of M. intracellulare during in vitro growth and animal infection.

      Thank you for reviewing our manuscript and acknowledging the performance of producing datasets in this study.

      Weaknesses:

      (1) The study suffers from the authors' preconceived bias toward a small subset of genes involved in hypoxic pellicle formation in ATCC13950.

      Thank you for the comment regarding a potential bias toward a small subset of genes involved in hypoxic pellicle formation in ATCC13950. The rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains is that the profiles of genetic requirements in each bacterial strain reflect the adaptation to the environment in which each strain lives. When the strains are placed in a special environment, they can adapt to the situation by altering the profiles of genetic requirements, resulting in the remodeling of metabolic pathways.

      In this study, we found that several of these pellicle-associated genes also showed increased genetic requirement in the clinical MAC-PD strains, suggesting a possible overlap in hypoxic adaptation mechanisms. We did not insist that clinical MAC-PD strains showed an increase of genetic requirements in all genes of hypoxic pellicle formation. Except for the gene sets involved in hypoxic pellicle formation in ATCC13950, almost no global information has been revealed on the pathogenesis of nontuberculous mycobacterial disease, which differs from the case of tuberculosis. Along with this finding, we investigated the effect of gene silencing on bacterial growth and preferential hypoxic adaptation observed by growth kinetics in clinical MAC-PD strains compared to ATCC13950. At first glance, to focus on the gene sets of hypoxic pellicle formation seems to be “biased”, but we proceeded this research step by step based on our achievements. We consider these data provide valuable information on the pathogenesis of MAC-PD by clinical MAC-PD strains.

      We have added the description of the rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains in the revised manuscript (page 9, lines 148-155).

      (2) An important set of data with the ATCC13950 reference strain is missing in the mouse infection study. In the absence of this, it is difficult to establish whether the identified genes are critical for infection/intracellular proliferation, specifically in the clinical isolates that are relatively more adapted for hypoxia.

      Thank you for the comment on the necessity of setting ATCC13950 as a control strain of mouse TnSeq experiment. To set ATCC13950 as a control strain in mouse infection experiments would be ideal. However, we proved that ATCC13950 is eliminated within 4 weeks of infection (Tateishi Y. BMC Microbiol. 2023; new Ref#22). That means, it is impossible to perform in vivo TnSeq study due to the inability to harvest sufficient number of colonies.

      [Reference]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      (3) Statistical enrichment analysis of gene sets by GSEA wrongly involves genes required for hypoxic pellicle formation in ATCC13950 together with the gene sets found essential in the clinical MAC-PD strains, to claim that a significant % of genes belong to hypoxia-adaptation pathways. It could be factually incorrect because a majority of these might overlap with those found critical for the in vitro survival of MAC-PD strains (and may not be related to hypoxia).

      Thank you for the suggestion on the re-analysis of gene enrichment analysis of genes required for M.i.27 and M.i.198 in vivo infection, individually with genes involved in hypoxic pellicle formation in ATCC13950 and with those showing increased genetic requirements in clinical MAC-PD strains compared to ATCC13950.

      About 50% (92 and 94 out of 181 genes through Day 1 to Week 16 and Week4 to Week16 of infection) and 40% (70 and 79 genes out of 179 through Day 1 to Week 16 and Week 4 to Week 16 of infection) of genes required for hypoxic pellicle formation in ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively. In addition, about 42% (54 and 56 out of 128 genes through Day 1 to Week 16 and thorough Week 4 to Week 16 of infection) and 40% (79 and 68 out of 179 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) of genes showing increased requirements in clinical MAC-PD strains compared to ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively.

      These data indicate that about 40-50% genes required for in vitro hypoxic pellicle formation are shared with the genes required for in vivo bacterial growth, and that about 40% strain-dependent/accessory essential genes are shared with the genes required for in vivo bacterial growth. Thus, the genes required for the growth of M.i.27 and M.i.198 in mouse lungs are enriched individually with those involved in hypoxic pellicle formation in ATCC13950, and with the gene sets found critical for in vitro growth. We have added the description of the reanalyzed data of GSEA in the manuscript (pages 16-17, lines 287-290). And the details of reanalyzed data of GSEA have been shown in Supplementary Fig. 5 and 6 as well as Supplementary Tables 15 and 16.

      (4) Validation of mouse infection experiments with individual mutants is missing.

      Thank you for the suggestion on the validation of the TnSeq hit genes on the in vivo survival. We acknowledge the importance of validating the TnSeq-hit genes by constructing knockout mutants. We have recently succeeded in constructing the vectors for making knockout strains of M. intracellulare (Tateishi. Microbiol Immunol. 2024). We will proceed to the infection experiment of knockout mutants by using our system for constructing them.

      [Reference]

      Tateishi Y. et al. Construction of knockout mutants in Mycobacterium intracellulare ATCC13950 strain using a thermosensitive plasmid containing negative selection marker rpsL. Microbiol Immunol 68, 339-347 (2024).

      (5) Phenotypes with TnSeq and CRISPRi-based KD exhibit poor correlation with misleading justifications by the authors.

      Thank you for the comment on the issue of inconsistent results between TnSeq and CRISPR-i based knockdown. We acknowledge that some inconsistencies were observed, particularly among strain-dependent/accessory essential or growth-defect-associated genes. By contrast, we found consistent data between TnSeq and CRISPR-i based knockdown results among universal essential genes such as glcB, inhA, gyrB and embB. Although the mechanism has not been fully proven yet, we consider that such inconsistent phenotypes with TnSeq and CRISPR- based knockdown may be related to the recently revealed bypass mechanism of gene essentiality which is characteristically observed in strain-specific/accessory essential genes (Rosconi F. Nat Micorbiol. 2022; new Ref#14). They suggested this bypass mechanism of gene essentiality in strain-dependent/accessory essential or growth-defect-associated genes from the ‘forced-evolution experiments’ of 36 clinical Streptococcus pneumoniae strains. For example, knockout mutants are successfully recovered from transformation experiments targeting strain-specific/accessory essential genes in TnSeq such as cytidine monophosphate kinase cmk, formate tetrahydrofolate ligase fhs and farnesyl-diphosphate synthase fpp. The bypassing of gene essentiality can be suggested by observing suppressor mutations and synthetic lethality in knockout strains. By contrast, universal essential genes fulfill the following three categories: i) high levels of conservation within and often across species, iii) limited genetic diversity, and iii) high and stable expression levels. Consequently, the universal essential genes are rigid, largely immutable key components to an organism’s survival. In the universal essential genes, the knockout recovery fails as shown by no colonies or only appearance of merodiploids. Taking into consideration such bypass mechanism of gene essentiality in strain-dependent/accessory essential genes, the lower effect of gene silencing of strain-dependent/accessory essential genes on bacterial growth may reflect pathway rewiring that helps the bacterial growth under suppression of the target gene expression.

      We have added the description of the possible reason for inconsistency between TnSeq and CRISPR-i results in the Result and Discussion in the revised manuscript (page 21, lines 367-376; pages 28-29, lines 489-519).

      [Reference]

      Rosconi, F. et al. A bacterial pan-genome makes gene essentiality strain-dependent and evolvable. Nat Microbiol 7, 1580–1592 (2022).

      In summary, this study is unable to provide mechanistic insights into why and how different MAC-PD mutant strains exhibit differential survival (in vitro and in animals) and adaptation to hypoxia. It remains to understand why the clinical strains show better adaptation to hypoxia and what is the impact of other stresses on their growth rates.

      Thank you for the comments on the issue of being unable to prove the mechanism of MAC-PD pathogenesis and adaptation to hypoxia. We admit that the original manuscript did not provide the apparent reason and mechanism of MAC-PD pathogenesis and adaptation to hypoxia. Following the comment, we have modified the manuscript tile as “Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare”.

      However, we revealed the diversity of genetic requirements among the genus M. intracellulare including the type strain ATCC13950 and clinical MAC-PD strains. We revealed the characteristics of genetic requirements in clinical MAC-PD strains as increased genetic requirements of gluconeogenesis, type VII secretion system and cysteine desulfurase, the former two of which are also required in hypoxic pellicle formation in ATCC13950. Along with this, we demonstrated the difference of growth behavior under hypoxia between clinical MAC-PD strains and ATCC13950. Overall, we consider that we could provide the basic information suggesting the involvement of difference of genetic requirements among strains in the pathogenesis of MAC-PD.

      Reviewer #3 (Public Review):

      Summary:

      The study by Tateishi et al. utilized TnSeq in nine genetically diverse M. intracellulare strains, identifying 131 common essential and growth-defect-associated genes across those strains, which could serve as potential drug targets. The authors also provided an overview of the differences in gene essentiality required for hypoxic growth between the reference strain and the clinical strains. Furthermore, they validated the universal and accessory/strain-dependent essential genes by knocking down their expression using CRISPRi technique. Overall, this study offers a comprehensive assessment of gene requirements in different clinical strains of M. intracellular.

      Thank you for reviewing our manuscript and finding the significance of our data.

      (1) The rationale for using ATCC13950 versus clinical strains needs to be clarified. The reference strain ATCC13950 was obtained from the abdominal lymph node of a patient around 10 years ago and is therefore considered a clinical strain that has undergone passages in vitro. How many mutations have accumulated during these in vitro passages? Are these mutations significant enough to cause the behavior of ATCC13950 to differ from other recently sampled clinical strains? From the phylogenetic tree, ATCC13950 is located between M018 and M.i.27. Did the authors observe a similarity in gene essentiality between ATCC13950 and its neighbor strains? What is the key feature that separates ATCC13950 from these clinical strains? The authors should provide a strong rationale for how to interpret the results of this comparison in a clinical or biological context.

      Thank you for the comments on the rationale for using ATCC13950 versus clinical strains and the key feature that separates ATCC13950 from clinical MAC-PD strains.

      ATCC13950 was isolated in 1949, (not around 10 years ago) from 34-month-old female abdominal lymph node (Cuttino. Am J Pathol 1949; new Ref#11). Of note, the clinical background of the patient infected with ATCC13950 is quite different from the patients with MAC-pulmonary disease (MAC-PD), the incidence rate of which has been increasing worldwide without predisposing immunological disorders. ATCC13950 has been regarded as a type strain of genus M. intracellulare historically. And ATCC13950 is the first M. intracellulare strain to be sequenced in 2012 (Kim. J Bacteriol 2012; new Ref#59).

      The rationale for using ATCC13950 versus clinical MAC-PD strains is to find the answer to the question whether the essential genes and genetic requirements are similar or different between clinical MAC-PD strains and historical type strain ATCC13950. So far, there are two reports on TnSeq that compare genetic requirements between clinical mycobacterial strains and the type strains, one of which is M. tuberculosis (Carey AF. PLoS Pathogens. 2018; new Ref#8) and the other is M. abscessus (Akusobi C. mBio. 2025; new Ref#9, published after submission of our manuscript). They reported the difference and diversity of genetic requirements between clinical strain and type strains such as M. tuberculosis H37Rv and M. abscessus ATCC19977. We have added the mention of these previous reports to explain the rationale for setting the type strain ATCC13950 as a referential control strain. (page 5, lines 83-87)

      The genetic and functional analysis of clinical MAC-PD strains has not been conducted for a long time. In 2021, we have revealed the genomic diversity between clinical MAC-PD and ATCC13950 by comparative genomic analysis (Tateishi BMC Microbiol. 2021; new Ref#5). Except for our TnSeq study of ATCC13950 (Tateishi Sci Rep 2020; new Ref#10), no functional analysis has been conducted in clinical M. intracellulare strains. On our research stream of clinical MAC-PD strains, we expected that we could reveal the functional genomic characteristics of clinical MAC-PD strains by setting ATCC13950 as a referential control strain for analyzing TnSeq data.

      It seems an interesting viewpoint to consider the relationship between accumulation of mutations by in vitro passages during prolonged periods from first isolation in ATCC13950, and the difference of phenotypes between ATCC13950 and recently sampled clinical MAC-PD strains. However, there are no time-course samples of ATCC13950 isolates available. Therefore, we can neither investigate how many mutations have accumulated in a time-course manner, nor evaluate how much the accumulated mutations influence the phenotype in ATCC13950. It can be expected that the accumulation of in vitro mutations may cause the behavior of ATCC13950 different from clinical MAC-PD strains. However, it is to be elucidated yet which kinds of factors contribute to the characteristics of ATCC13950 that differ from clinical MAC-PD strains specifically.

      It seems an interesting viewpoint to investigate the similarity of gene essentiality between genetical neighbor strains. However, we focused on the overview of the profiles of gene essentiality in clinical MAC-PD strains compared to ATCC13950. Thus, it was out of scope to elucidate the details of gene essentiality in each genetic phylogeny that clinical MAC-PD strains belong. The overview of phylogenetic trees should be referred to our previous publication on the comparative genomic analysis of 55 strains (Tateishi Y. BMC Microbiol. 2021; new Ref#5, new Supplementary Fig. 1), and we have shown Fig. 1 as the extracted phylogenetic tree of subject strains. To elucidate the details of gene essentiality in each genetic clade, it would be necessary to include a considerable number of strains that we used for comparative genomic analysis in 2021 (Tateishi Y. BMC Microbiol. 2021; new Ref#5). Furthermore, it would be necessary to set a referential control strain other than ATCC13950 for comparing gene essentiality. So far, it is not the highest priority for us to elucidate the similarity of gene essentiality between phylogenetic clades in detail, and such investigation will be planned as a future study.

      The key features that separate ATCC13950 and clinical MAC-PD strains have not been proved yet, in contrast to the case of M. tuberculosis such as mutations in the gene of the response regulator PhoPR in the type strain H37Rv vs most clinical strains. However, the features that separate ATCC13950 and clinical MAC-PD strains may not be explained by a single genetic factor but may be explained by complicated factors such as epigenetic and/or regulatory factors. For example, the reason for the weakened virulence of H37Ra compared to H37Rv has not been able to be explained by simple genetic differences (Brosch R. Infect Immun. 1999).

      In summary, we set the historical type strain ATCC13950 which is derived from infant abdominal lymphadenitis as a referential control strain for TnSeq analysis, because we intended to reveal the characteristics of clinical MAC-PD strains in terms of the gene essentiality and genetic requirements by comparing the clinical MAC-PD strains with the non-MAC-PD reference strain. We consider that the profiles of gene essentiality and genetic requirements specific to clinical MAC-PD strains confer the pathogenesis in an increasing number of MAC-PD patients worldwide without predisposing immunological disorders.

      [References]

      Cuttino, J.T. & Mc, C.A. Pure granulomatous nocardiosis, a new fungus disease distinguished by intracellular parasitism; a description of a new disease in man due to a hitherto undescribed organism, Nocardia intracellularis, n. sp., including a study of the biologic and pathogenic properties of this species. Am J Pathol 25, 1-47 (1949).

      Kim, B.J. et al. Complete genome sequence of Mycobacterium intracellulare clinical strain MOTT-64, belonging to the INT1 genotype. J Bacteriol 194, 3268 (2012).

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      Akusobi. C. et al.. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      Tateishi, Y. et al. Comparative genomic analysis of Mycobacterium intracellulare: implications for clinical taxonomic classification in pulmonary Mycobacterium avium-intracellulare complex disease. BMC Microbiol 21, 103 (2021).

      Tateishi, Y. et al. Genome-wide identification of essential genes in Mycobacterium intracellulare by transposon sequencing - Implication for metabolic remodeling. Sci Rep 10, 5449 (2020)

      Brosch R. et al. Genomic analysis reveals variation between Mycobacterium tuberculosis H37Rv and the attenuated M. tuberculosis H37Ra strain. Infect Immun. 67, 5768-74 (1999).

      (2) Regarding the 'nine representative strains of M. intracellulare with diverse genotypes in this study,' how were these nine strains selected? To what extent do they represent the genetic diversity of the M. intracellulare population? A phylogenetic tree illustrating the global genetic diversity of the M. intracellulare population, with these strains marked on it, would be important to demonstrate their genetic representativeness.

      Thank you for the comments on the selection of 9 subject strains. We selected the 9 strains based on the phylogenetic tree we published in 2021 (BMC Microbiol 2021; new Ref#5). We have shown the global phylogenetic tree of the M. intracellulare population in new supplementary Fig. 1. We have selected 4 or 5 strains from the major two groups (typical M. intracellulare group and M. paraintracellulare-M. indicus pranii group) for this TnSeq study, respectively.

      [Reference]

      Tateishi, Y. et al. Comparative genomic analysis of Mycobacterium intracellulare: implications for clinical taxonomic classification in pulmonary Mycobacterium avium-intracellulare complex disease. BMC Microbiol 21, 103 (2021).

      (3) The authors observed a considerable amount of differential gene requirements in clinical strains. However, the genetic underpinning underlying the differential requirement of genes in clinical strains was not investigated or discussed. Because M. intracellulare has a huge number of accessory genes, the authors should at least check whether the differential requirement could be explained by the existence of a second copy of functional analogous genes or duplications.

      Thank you for the comments on the effect of gene duplication on the change of genetic requirements between strains. Following the comments, we conducted blast search for the 162 genes showing increased Tn insertion reads in each subject strain. We found that M019 has duplicate genes of OCU_RS44705 coding adenosylhomocysteinase (LOCUS_42940: ahcY_1, LOCUS_21000: ahcY_2). However, there were no duplicate genes found in the remaining 161 genes showing increased Tn insertion reads.

      From these results, we consider that gene duplication has minor effects on the change of genetic requirements between strains. Rather, sequence differences and accessory genes may play a key role in determining the difference of genetic requirements.

      We have added a description of the above-mentioned result in the Result section (pages11-12, lines 191-199).

      (4) Growth in aerobic and hypoxic conditions: The authors concluded that clinical strains are better adapted to hypoxia, as reflected by their earlier entry into the log phase. They presented the 'Time at midpoint' and 'Growth rate at midpoint.' However, after reviewing the growth curves, I noticed that ATCC13950 had a longer lag phase compared to other strains under hypoxic conditions, and its phylogenetic neighbor M018 also had a longer lag phase. Hence, I do not believe a conclusion can be drawn that clinical strains are better adapted to hypoxia, as this behavior could be specific to a particular clade. It's also possible that the ATCC13950 strain has adapted to aerobic growth. I would suggest that the authors include growth curves in the main figures. The difference in 'Time at midpoint' could be attributed to several factors, and visualizing the growth curves would provide additional context and clarity.

      Thank you for the comments on the possibility of genotypes as a determinant of growth pattern in M. intracelulare. Following the comments, we performed aerobic and hypoxic growth assay in the two strains (M005 and M016) that neighbor ATCC13950.

      Author response image 1.

      The phylogenetic relationship between M005, M016 and ATCC13950. The former two strains are squared in blue.

      M005 reached midpoint later than ATCC13950 both in aerobic and hypoxic conditions. By contrast, M016 reached midpoint three quarters earlier than ATCC13950 under hypoxic conditions. The growth rate was not significantly different between M005, M016 or ATCC13950 under either aerobic or hypoxic conditions, although P-value of M005 vs ATCC13950 was 0.0512 under aerobic conditions on Steel’s multiple comparisons test.

      From the data of growth pattern in M005 and M016, we suggest that in addition to gene essentiality, genotypes may have some impact on the bacterial growth pattern under hypoxia; however, since there was a significant difference in the timing of hypoxic adaptation among ATCC13950 and its neighbor strains, bacterial growth pattern under hypoxia is considered to be determined by multiple factors such as genetic requirements and unproven regulatory systems. Taking into consideration that there are lots of genetically diverse strains other than ATCC13950 clade, many clinical MAC-PD strains are possibly better adapted to hypoxia.

      Responding to the reviewer’s recommendation, we have added the description of the above-mentioned result in the revised manuscript (page 18, lines 313-322). And we have shown the data of growth curves of the original 9 subject strains in the new Fig 7. And we have added the data of the growth curves of M005 and M016 in new Supplementary Fig 7. Additionally, we have corrected the label of y-axis in new Fig. 7a and new Supplementary Fig. 7a and added the description as “Data are represented as CFUs in 4 μl sample at each timepoint.” in the Figure legends. (page 58, lines 1027-1028 and Supplementary Fig. 7a legend)

      (5) Lack of statistical statement: The authors emphasized the role of pellicle-formation-associated genes in strain-dependent essential and accessory essential genes. Additionally, the authors observed that 10% of the genes required for mouse infection are also required for hypoxic pellicle formation. However, these are merely descriptive statements. There is no enrichment analysis to justify whether pellicle-formation-associated genes are significantly enriched in these groups.

      Thank you for the comments on the enrichment of pellicle-formation associated genes in strain-dependent essential and accessory essential genes. We performed GSEA and found that 9.1% (16 out of 175) genes were hit as core enrichment. Of them, 4 genes were hit commonly as genes showing increased genetic requirements analyzed by resampling plus HMM analyses including genes of phosphoenolpyruvate carboxykinase pckA (OCU_RS48660), type VII secretion-associated serine protease mycP5 (OCU_RS38275), type VII secretion protein eccC5 (OCU_RS38345) and glycine cleavage system amino-methyltransferase gcvT (OCU_RS35955).

      Author response image 2.

      We have added the description of GSEA result in the revised manuscript (page 8, lines 137-144; Supplementary Fig. 2; Supplementary Table 5).

      Reviewer #1 (Recommendations For The Authors):

      Tn-seq and hypoxia adaption in clinical isolates of M. intracellulare (Mi): The authors claim that clinical strains are better adapted to hypoxia because their genetic requirements for optimum fitness overlap with genetic requirements for fitness of the type strain under hypoxia. This is a reasonable hypothesis, but it has not been well-supported by the data presented in Figure 7. The growth rates (Figure 7b) of most of the clinical strains under hypoxia appear to be less than the type strain, although they all seem to grow better than the type strain under normoxia. Perhaps a continuous growth curve of each strain, both as pure and mixed cultures under these conditions will provide a clearer picture.

      Thank you for the comments on the difference of adaptation of hypoxic growth between ATCC13950 and MAC-PD strains. To clarify, growth rates shown in Figure 7 were calculated at the inflection point (midpoint) of the growth curves, which were modeled using a four-parameter logistic (4P logistic) model. As described in the Discussion, we found the pattern of hypoxic adaptation characteristics of the clinical MAC-PD strains from the growth curve forms. Taking into consideration the impact of growing bacteria on the disease progression of MAC-PD, the slow growth with early entry to log phase implicates the continuous impact on the infected hosts during logarithmic bacterial growth, which may be involved in the persistent and steadily progressive illness of MAC-PD for years in humans.

      Unlike time-lapse imaging assay, the completely seamless sampling of cultures for CFU assay is impossible. Nevertheless, we collected sufficient timepoints to allow reliable curve fitting with the 4P logistic model, and thus consider our growth data to represent a valid approximation of continuous growth dynamics.

      Regarding the suggestion of mixed culture experiments, we agree that such studies could be informative. However, co-culture conditions introduce additional variables, including inter-strain competition or synergy, which can obscure the specific contributions of hypoxic adaptation in each strain. Therefore, we believe that the current approach using monoculture growth curves under defined oxygen conditions offers a clearer interpretation of strain-specific hypoxic responses.

      In vivo studies: It is unclear how virulent the two clinical strains, Mi27 and Mi198 are in the mouse model. The CFU data in Figure S1b reports the bacterial burden of the Tn libraries of the two strains, of which the overall population of Mi27 library seems to be declining. Without any information on the CFU, animal survival, and tissue pathology from the pure strains, data from the library will have limited implications.

      Thank you for the comments on the data presentation of in vivo studies. We previously revealed the time-course data on CFUs, animal survival, and tissue pathology from the pure strains (Tateishi Y. BMC Microbiol. 2023; new Ref#22). Based on these data, we assumed that we would be able to harvest sufficient number of colonies from mice infected with M.i.27 or M.i.198, and we performed in vivo TnSeq studies using these two strains. We have referred to our previous publication on the virulence of MAC-PD pure strains used in this study for mice in the revised manuscript (page 12, line 212; new Ref #22).

      The data of CFU counts were shown in new Supplementary Figure 3b. In the manuscript text, we explained as follows (page 12, lines 212-216): “The time course of the changes in the bacterial burden showed a pattern similar to those of the wild-type strains M.i.198 and M.i.27, respectively (Tateishi Y. BMC Microbiol. 2023; new Ref#22), except that it was not possible to harvest sufficient colonies (as few as 104/mouse) in the few mice infected with the M.i.27 Tn mutant strain in week 8 and week 16 (new Supplementary Fig, 3b, new Supplementary Table 8)”. The decline of overall population of M.i.27 Tn mutant library strains in the infected lungs can be explained by the lower virulence of M.i.27 pure strain that shows intermediate virulence phenotype than M.i.198 that shows high virulence phenotype.

      [References]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      Data presentation: The manuscript suffers from a lack of clarity in data visualization and presentation, especially the Tn-Seq datasets. Panels describe the experimental workflow with a densely-worded paragraph, making it difficult to navigate through the major findings.

      Thank you for the comments on the issue of Fig. 1b. Following the suggestion, we have modified the new Fig. 1b entitled “Strategy of the procedure of TnSeq analyses”.

      Language: The paper should be extensively revised for language. Often the authors have mixed-up the terms like 'core' and 'accessory' 'genes' in lines 116-119 with 'core and accessory genomes' in Figure 2c, which is not even mentioned in the paper. It is further unclear how they identified 3153 and 5824 core and accessory genes, respectively, from 55 different strains of Mi. Line 251: ..."involved by confer..." needs revision. The terms "increased gene essentiality" and 'growth-defected associated genes" are very confusing. The essentiality of a gene is either absolute or conditional but is not quantitative. Similarly, 'growth-defect associated' can be replaced with a better phrase that alludes to fitness loss in the clone. Additional typos were found throughout the paper that need to be fixed.

      Thank you for the comments on the issue of scientific words including “'core and accessory genomes” and “gene essentiality” used in this study.

      Based on Rosconi’s paper (Panel C of Fig. 1 in Nat Microbiol. 2022; new Ref#14), we used the phrases “accessory genome and core genome” as a meaning of a whole set of genes belonging to accessory and core genes. To avoid the confusion and keep consistency, we replaced the term “genomes” to “genes” in the revised manuscript.

      In our previous comparative genomic analysis, we identified 3153 and 5824 core and accessory genes, respectively, from 55 different strains of M. intracellulare (Tateishi Y. BMC Microbiol. 2021; new Ref #5). To perform pangenomic analysis, we used the software Bacterial Pan-Genome Analysis tool (BGAP) (Narendrakumar NM. Sci Rep 2016).

      We admit that gene essentiality is a qualitative but not a quantitative trait. We have corrected the term "increased gene essentiality" as "increased genetic requirements" throughout the manuscript.

      We have used the term “growth-defect (GD)” based on the classification of gene essentiality calculated by the Hidden Markov Model (HMM) complied by TRANSIT software (DeJesus. PLoS Comput Biol. 2015; new Ref#12). The HMM classifies genes as essential (ES), GD, non-essential (NE), growth-advantage (GA). GD means difficulties of growth (growth deficiency) in aerobic conditions in vitro, because Tn insertions are less frequent. The suggested phrases “fitness loss” or “less fit” may include the meaning of the comparison of two different conditions such as culture conditions exposed to a single bacterial strain. Since the HMM analysis is performed in data of a single strain in a specific bacterial condition, we consider that the phrase including “fitness” is somewhat unsuitable for describing the classification of gene essentiality. Thus, it is difficult for us to rephrase GD to the word that implies fitness levels between two conditions in a single bacterial strain.

      [References]

      Rosconi, F. et al. A bacterial pan-genome makes gene essentiality strain-dependent and evolvable. Nat Microbiol 7, 1580–1592 (2022).

      Tateishi, Y. et al. Comparative genomic analysis of Mycobacterium intracellulare: implications for clinical taxonomic classification in pulmonary Mycobacterium avium-intracellulare complex disease. BMC Microbiol 21, 103 (2021).

      Narendrakumar NM et al. BPGA- an ultra-fast pan-genome analysis pipeline. Sci Rep 2016 6, 24373 (2016).

      DeJesus, M.A. et al. TRANSIT--A Software Tool for Himar1 TnSeq Analysis. PLoS Comput Biol 11, e1004401 (2015).

      Reviewer #2 (Recommendations For The Authors):

      Major Comments:

      (1) Result 1 (Page 6-7): Common essential and growth-defect-associated genes representing the genomic diversity of M. intracellulare strains.

      (1a) From Table S1, it is observed that the numbers of Tn-inserted TA sites significantly vary (p >0.05) among biological replicates for each strain when compared with the reference strain ATCC13950. the authors should provide an explanation of how they overcame this variation in their analysis.

      Thank you for the comment on the issue of a variable number of Tn-inserted TA sites among biological replicates for each strain of MAC-PD.

      On TRANSIT software, we set the replicate option as Sum to combine read counts. And we used Beta-Geometric correction (BGC) to normalize the datasets to fit an “ideal” geometric distribution with a variable probability parameter ρ.

      Following the comment, we have added the description on which option we used for handling the replicate data and normalization (page 36, lines 640-643).

      (1b) Importantly, saturation in most of the strains is only ~50-60%. In such a case, there will be a high probability that Tn will not hit a nonessential region due to chance instead of selection (See DeJasus et al., mBio, 2017). It has been observed that the sequence pattern (GC)GNTANC(GC) is strongly associated with non-permissiveness. As shown earlier, the authors need to carefully look for the potential non-permissive sites before concluding the fate of a gene. Also, they should acknowledge the potential limitations of their approach due to the suboptimal level of saturation.

      Thank you for the comment on the saturation of Tn mutant libraries. Our method of comparison of genetic requirements between strains are the same as a previous report that used duplicate Tn mutant libraries of clinical Mtb strains of different genotypes and triplicate Tn mutant libraries of H37Rv for identifying increased genetic requirements of clinical Mtb strains (Carey. PLoS Pathog 2018; new Ref#8). Our method is also based on the coauthor’s TnSeq study on H37Rv (Minato Y. mSystems 2019; new Ref#61). Moreover, by combining replicates, the saturation of our Tn mutant libraries became 62-79% as follows: ATCC13950: 67.6%, M001: 72.9%, M003: 63.0%, M018: 62.4%, M019: 74.5%, M.i.27: 76.6%, M.i.198: 68.0%, MOTT64: 77.6%, M021: 79.9%. That is, we calculated gene essentiality from the Tn mutant libraries with 62-79% saturation in each strain. The levels of saturation of transposon libraries in our study is similar to the very recent TnSeq anlaysis by Akusobi where 52-80% saturation libraries (so-called “high-density” transposon libraries) are used for HMM and resampling analyses (Supplemental Methods Table 1[merged saturation] in Akusobi. mBio. 2025; new Ref#9). The saturation of Tn insertion in individual replicates of our libraries is also comparable to that reported by DeJesus (Table S1 in mBio 2017; new Ref#57). Thus, we consider that our TnSeq method of identifying essential genes and detecting the difference of genetic requirements between clinical MAC-PD strains and ATCC13950 is acceptable.

      As the Reviewer indicates, there is non-permissive sequence pattern in mariner transposon mutagenesis. Using more than 10 replicates of Tn mutant libraries is quite an accurate method for detecting essential genes in nonstructural small genes such as small regulatory RNAs. However, as DeJesus shows, the number essential genes identified by TnSeq are comparable in large genes possessing more than 10 TA sites between 2 and 14 TnSeq datasets, most of which seem to be structural genes (Supplementary Fig 2 in mBio 2017; new Ref#57). Thus, we do not consider that we made a serious mistake for the classification of essentiality in most of the structural genes that encode proteins. With respect to the coverage of non-permissive sites, our TnSeq method might need to be improved if it is intended to classify the gene essentiality quite accurately on the small genes including small regulatory RNAs.

      We investigated the non-permissive TA sites in ATCC13950. There are 4136 (6.43% of total ORFs) nonpermissive TA sites in ATCC13950, which is less than in H37Rv (9% of total ORFs) (DeJesus MA. mBio 20171; new Ref#57) and in M. abscessus ATCC19977 (8.1% of total ORFs)(Rifat D. mBio. 2021; new Ref#58). As for larger ORFs (TA sites > = 10), there are nonpermissive TA sites in 89 genes (ORFs) of common “essential (ES)” or “growth-defect-associated (GD)” (4.82% of a total of 1844 larger ORFs in ATCC13950). As for small ORFs (2-9 TA sites), there are nonpermissive TA sites in 41 genes (ORFs) of common ES or GD (1.35% of a total of 3021 smaller ORFs in ATCC13950).

      We appreciate the idea of concluding the fate of gene essentiality by the presence/absence of non-permissive TA sites. However, we cannot conclude the fate of gene essentiality classification only by the presence/absence of potential non-permissive sites. Because, strictly to say, it is impossible to conclude the scientific truth of gene essentiality without functional analysis using gene manipulation. In accurate, TnSeq can “predict” the gene essentiality but cannot perfectly guarantee the functional significance. However, in the current situation, most of the recent TnSeq studies have been published only by the TnSeq analysis without functional analysis that uses gene manipulation strains of all targets they identified. Taking such limitations of TnSeq including non-permissive sites into consideration, we consider that the essentiality of the detected genes should be determined in further studies, mainly including biological experiments such as functional studies using gene manipulation strains.

      We have added the above-mentioned contents in the revised manuscript (pages 32-33, lines 559-580).

      [References]

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      Minato, Y., et al. Genomewide assessment of Mycobacterium tuberculosis conditionally essential metabolic pathways. mSystems. 4, e00070-192019 (2019).

      Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      DeJesus, M.A. et al. Comprehensive essentiality analysis of the Mycobacterium tuberculosis genome via saturating transposon mutagenesis. mBio 8, e02133-16 (2017).

      Rifat, D., Chen L., Kreiswirth, B.N. & Nuermberger, E.L.. Genome-wide essentiality analysis of Mycobacterium abscessus by saturated transposon mutagenesis and deep sequencing. mBio 12, e0104921 (2021).

      (1c) Line 100: Authors report a total of 131 genes identified as essential or growth-defect-associated with the HMM analysis across all M. intracellulare strains. It should be explained in more detail how gene essentiality was determined (see above comment in (1b)). Furthermore, in Table S3 authors should mention the essential and growth defective trait of each of the 131 genes.

      Thank you for the comment on how to classify the 131 genes as essential or growth-defect-associated with the HMM analysis across all M. intracellulare strains. As replied in (1b), the average saturation of Tn insertion of our libraries became 62-79% when combining duplicate or triplicate data in each strain. The levels of saturation of transposon libraries in our study is similar to the very recent TnSeq analysis by Akusobi where 52-80% saturation libraries (so-called “high-density” transposon libraries) were used for HMM and resampling analyses, and most of triplicate libraries ranges 70-79% saturation (Supplemental Methods Table 1[merged saturation] in Akusobi. mBio. 2025; new Ref#9). The saturation of Tn insertion in individual replicates of our libraries is also comparable to those with DeJesus (Table S1 in mBio 2017; new Ref#57). Thus, we consider that our TnSeq libraries are acceptable for identifying essential genes and growth-defect-associated genes by the HMM method.

      We used the HMM method as reported by DeJesus (DeJesus. PLoS Comput Biol. 2015; new Ref#12). HMM method can categorize the gene essentiality throughout the genome including “Essential”, “Growth-defect”, “Non-essential” and “Growth-advantage”. “Essential” genes are defined as no insertions in all or most of their TA sites. “Non-essential” genes are defined as regions that have usual read counts. “Growth-defect” genes are defined as regions that have unusually low read counts. “Growth-advantage” genes are defined as regions that have unusually high low read counts.

      Following the previous report (Carey AF. PLos Pathog 2018; new Ref#8), the annotation for the clinical MAC-PD strains was adapted from that of ATCC13950 by adjusting the START and END coordinates of each ORF in the clinical MAC-PD strains according to their alignment with the corresponding ORFs of ATCC13950. By using an adjusted annotation table, gene essentiality was classified by the HMM analysis.

      We have added the explanation of how we identified essential and growth-defect-associated genes in the Methods (pages 35-36, lines 620-632). And following the comment, we have added the data of classification of gene essentiality in the 131 genes in the new Supplementary Table 3 in the revised manuscript.

      [Reference]

      DeJesus, M.A. et al. TRANSIT--A Software Tool for Himar1 TnSeq Analysis. PLoS Comput Biol 11, e1004401 (2015).

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      DeJesus, M.A. et al. Comprehensive essentiality analysis of the Mycobacterium tuberculosis genome via saturating transposon mutagenesis. mBio 8, e02133-16 (2017).

      (1d) In Table S4, the authors show strain-specific putative essential genes from the core and accessory gene sets. For the sake of clarity, it is important to have the name of all the strains against each gene in which it is predicted essential or growth defective.

      Thank you for the comment on the hit strains on the genes classified as strain-specific and accessory putative essential of growth-defect associated. Following the comment, we have added the data of hit strains in the new Supplementary Table 4 in the revised manuscript.

      (1e) Lines 123-126: It is not clear what is the relevance of highlighting genes involved in hypoxic pellicle formation in ATCC13950. These appear to be randomly distributed across different clinical isolates and is not clear whether they correlate with differential susceptibility of the reference strain and clinical isolates to hypoxia.

      Thank you for the comment on the relevance of highlighting genes involved in hypoxic pellicle formation in ATCC13950. The rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains is that the profiles of genetic requirements in each bacterial strain reflect the adaptation to the environment in which each strain lives. When the strains are placed in a special environment, they can adapt to the situation by altering the profiles of genetic requirements, resulting in the remodeling of metabolic pathways. We indeed found that the genetic requirements of several hypoxic pellicle genes were increased in clinical MAC-PD strains in vitro situations. These data suggest the hypoxic pellicle genes become more important in clinical MAC-PD strains for in vitro growth than in ATCC13950.

      Moreover, hypoxia is known to be one of the characteristic conditions in vivo including clinical lesions (McKeown. Br Br J Radiol. 2014). We consider it reasonable to expect that the strains derived from MAC-PD patients without predisposing immunological disorders may adapt under hypoxic conditions for maintaining bacterial survival. Therefore, we highlighted the genes involved in hypoxic pellicle formation in ATCC13950.

      We have added the description of the rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains in the revised manuscript (page 9, lines 148-155).<br /> [Reference]

      McKeown, S.R., et al. Defining normoxia, physoxia and hypoxia in tumours-implications for treatment response. Br Br J Radiol 87,: 20130676 (2014).

      (2) Result 2 (pages 8-10): Genes with increased gene essentiality in clinical MAC-PD strains are also required for hypoxic pellicle formation in the type strain.

      (2a) As reported by authors (lines 123-126), only a small fraction of genes showing essentiality in clinical MAC-PD strains are required for hypoxic pellicle formation in the reference strain, which might be due to random distribution. Authors should avoid making such a generalised statement that reflects the association of the entire essential gene pool in clinical MAC-PD strains with hypoxic pellicle formation.

      Thank you for the comment on the issue of a small fraction of genes showing increased genetic requirements in clinical MAC-PD strains that is shared with genes required for hypoxic pellicle formation in the type strain ATCC13950. We admit that the section title may mislead that the genes required for hypoxic pellicle formation confer the entire essential gene pool of clinical MAC-PD strains. Following the comment, we have revised the section title as “Partial overlap of the genes showing increased genetic requirements in clinical MAC-PD strains with those required for hypoxic pellicle formation in ATCC13950” (page 9, lines 146-147).

      We consider that it cannot be explained by a mere coincidence that we obtained the data of partial overlap of genes showing essentiality in clinical MAC-PD strains with genes required for hypoxic pellicle formation in ATCC13950, because we demonstrated the supporting data such as the pattern of genetic requirements suggesting gluconeogenic metabolic shift (Fig. 5) and the different pattern of hypoxic growth curves between clinical MAC-PD strains and ATCC13950 (Fig. 7).

      (2b) I fail to understand how the number of Tn insertions determines "more" or "less" essentiality of a gene particularly with 50-60% saturation. To my understanding, essentiality is a qualitative trait. Either a gene will be essential (based on no Tn insertion despite having the permissive sites), critical (poor representation of Tn insertions at the permissive sites due to growth defect of the strain in the pool), non-essential (expected frequency of insertion) or growth-advantageous (higher representation of Tn insertions at the permissive sites due to growth advantage of the strain in the pool). Hence, authors should avoid quantifying the essentiality of a gene.

      Thank you for the comments on the trait of gene essentiality. We realize that essentiality is a qualitative trait, not a quantitative trait. Taking into consideration the number of Tn insertions determines "more" or "less" requirements of a gene, we have corrected the manuscript by using the phrase “genetic requirements” instead of “gene essentiality”.

      As mentioned earlier, our method of comparison of genetic requirements between strains are the same as a previous report that used duplicate Tn mutant libraries of clinical Mtb strains of different genotypes and triplicate Tn mutant libraries of H37Rv for identifying increased genetic requirements of clinical Mtb strains (Carey AF. PLoS Pathog 2018; new Ref#8). Moreover, as described in rebuttal (1b), the saturation of our Tn mutant libraries by combining replicates are 62-79% as follows: ATCC13950: 67.6%, M001: 72.9%, M003: 63.0%, M018: 62.4%, M019: 74.5%, M.i.27: 76.6%, M.i.198: 68.0%, MOTT64: 77.6%, M021: 79.9%. That is, we calculated gene essentiality from the Tn mutant libraries with 62-79% saturation in each strain. The levels of saturation of transposon libraries in our study is similar to the recent TnSeq analysis by Akusobi where 52-80% saturation libraries (“high-density” transposon libraries) were used for HMM and resampling analyses (Supplemental Methods Table 1[merged saturation] in Akusobi C. mBio. 2025; new Ref#9).

      Thus, we consider that our data of the difference of genetic requirements between clinical MAC-PD strains and ATCC13950 are acceptable.

      [Reference]

      Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      (2c) From Figures 3-4, it seems the authors intend to highlight the insertion frequencies of certain genes in the clinical isolates compared to those in the reference strain to conclude whether a gene has become more critical and its disruption results in the growth defective phenotype (poor representation) in the clinical isolates, or a critical/essential gene has become dispensable in these strains.

      Based on these arguments, I suggest that the authors modify the title of the result such as "Tn insertion reveals differential requirement of genes for in vitro growth of clinical MAC-PD strains" or "Identification of genes differentially required for in vitro growth of clinical MAC-PD strains" as this is precisely the information we gain from this section of the study. Also, it is suggested to re-draft the rationale of this section as only 4 genes associated with hypoxic pellicle formation, were found to exhibit reduced insertion frequencies in the clinical isolates out of total of 283 genes. Hypoxia-related genes can be highlighted in the next section (see below).

      Thank you for the suggestion to modify the section title and to re-draft the rationale of the section. Following the comment, we modified the section title as “Partial overlap of the genes showing increased genetic requirements in clinical MAC-PD strains with those required for hypoxic pellicle formation in ATCC13950 (page 9, lines 146-147)

      Following the suggestion, we have revised the rationale of this section as follows: “The sharing of strain-dependent and accessory essential or growth-defect-associated genes with genes required for hypoxic pellicle formation in ATCC13950 prompts us to consider that the profiles of gene essentiality in clinical MAC-PD strains may be associated with the genes required for hypoxic pellicle formation in ATCC13950.” (page 9, lines 151-155)

      The reviewer points out that only 4 genes associated with hypoxic pellicle formation were found to exhibit reduced insertion frequencies in the clinical isolates out of total of 283 genes. However, to discuss how much proportion of the genes were detected to be increasingly required in clinical MAC-PD strains compared to ATCC13950, we should focus on the 121 genes showing increased requirements in clinical MAC-PD strains compared to ATCC13950, excluding the 162 genes indispensable for clinical MAC-PD strains. Thus, we described that 4 genes associated with hypoxic pellicle formation, were found to exhibit reduced insertion frequencies in the clinical isolates out of the 121 genes having significantly fewer Tn insertions than ATCC13950 in the manuscript (Fig. 3).

      (3) Result 3 (Page 10-14): Requirement of genes with increased gene essentiality in the clinical MAC-PD strains for mouse lung infection.

      (3a) The title should be modified to "Identification of genes in the clinical MAC-PD strains required for mouse lung infection".

      Following the comment, we have modified the section title as "Identification of genes in the clinical MAC-PD strains required for mouse lung infection". (page 12, lines 201-202).

      (3b) Further, the rationale of this experiment needs to be modified. As mentioned above, up until now the impact of hypoxic pellicle formation genes in the growth of MAC-PD strains remains unconvincing. The rationale of mouse infection experiments could be straightforward- "to identify genes critical for animal infection of the clinical isolates".

      Thank you for the comment on the rationale of the in vivo TnSeq experiment. Following the comment, we have revised the rationale as “The impact of hypoxia on mycobacteria under various ecological circumstances implies that the genes required for pathogenesis of MAC-PD may be in some degrees, overlapped with the genes with increased requirements in the clinical MAC-PD strains compared to ATCC13950, and also with the genes required for hypoxic pellicle formation in ATCC13950. To identify genes required for in vivo infection of clinical MAC-PD strains,” in the revised manuscript (page 12, lines 204-210).

      (3c) The authors should avoid using the term "genes with increased essentiality" for the reasons explained above in point #2b.

      Following the comment, we have corrected the term as “genes with increased requirements” in the revised manuscript (page 12, line 207).

      (3d) From Tables S8 and S9, I can find 93 genes in Mi198Tn and 74 genes in Mi27Tn for which Tn insertion mutants are under-represented in TnSeq at all time points from Day 1 to Wk 16 in comparison to input. Importantly, excluding results from Day 1 when the infection has just settled, I find 172 and 121 genes in Mi198Tn and Mi27Tn, respectively, under-represented in lungs between Wk 4-16. My suggestion is that authors should focus more on such genes and identify the characteristics of these genes and what fraction belongs to those involved in hypoxic pellicle formation in the reference strain. I am perplexed why authors have categorically ignored other genes and only focused on a set of genes that correspond to ~10-12% of entire differentially abundant mutant pool.

      Thank you for the suggestion on the genes that Tn insertion mutants are under-represented in TnSeq from Weeks 4-16 in the infected mouse lungs be analyzed for overlapping the genes involved in hypoxic pellicle formation in the type strain ATCC13950. We found that at all timepoints from Day1 to Week 16, 74 genes and 99 genes were under-represented in lungs infected with M.i.27Tn and M.i.198Tn, respectively. Of them, 21 (28.3%) and 12 (12.1%) genes belonged to the genes involved in the genes required for hypoxic pellicle formation in the type strain. We found that at timepoints from Week 4 to Week 16, 121 genes and 172 genes were under-represented in lungs infected with M.i.27Tn and M.i.198Tn, respectively. Of them, 21 (23.1%) and 30 (18.0%) genes belonged to genes involved in hypoxic pellicle formation in the type strain. These hypoxic pellicle-associated genes detected both in M.i.27 and M.i.198 encoded methionine synthesis, acyl-CoA dehydrogenase, isocitrate lyase, MMPL family transporter at all time points (from Day1 to Week 16). And additionally, multifunctional oxoglutarate decarboxylase/dehydrogenase, proteasome subunits, ABC transporter ATP-binding protein/permease, lipase chaperone at all time points (from Week 4 to Week 16). We have described these results in the Result section (page 14 lines 236-248) and new Supplementary Tables 12 and 13.

      As for M. intracellulare, conditionally essential genes have not been revealed except for those required for hypoxic pellicle formation in ATCC13950 revealed by us (Tateishi Y. Sci Rep. 2020; new Ref#10). This study is the first to focus on the relationship between the difference of genetic requirements among strains and hypoxic adaptation. We found a certain proportion of overlapped genes required for mouse lung infection and ATCC13950’s hypoxic pellicle formation. We consider it reasonable to focus on the category of genes required for hypoxic pellicle formation to analyze the datasets of TnSeq in mice.

      [Reference]

      Tateishi, Y. et al. Genome-wide identification of essential genes in Mycobacterium intracellulare by transposon sequencing - Implication for metabolic remodeling. Sci Rep 10, 5449 (2020).

      (3e) Page 13, lines 224-227: "Despite the differences in the profiles of the genes required for in vivo infection between strains, these data suggest that increased gene essentiality for hypoxic growth confers advantages for pathogenesis in vivo."

      For the reason described above, I find it a misleading hypothesis that hypoxic growth confers advantages for pathogenesis in vivo. How come only 10-12% of the entire gene sets which include genes of varying functions, can be the sole contributors to bacterial survival in host organelles during infection?

      More importantly, the mouse is not considered a good model for hypoxia as mouse infection does not lead to the formation of solid granuloma with a hypoxic core Though I am not convinced with the authors' bias toward hypoxia-related genes, however, if at all they aim to investigate the role of such genes by an unbiased enrichment of TnSeq mutant, they should have used C3HeJ mice which are known to form granulomas (Boute et al., 2017 (doi: 10.1186/s13567-017-0477-7)).

      Thank you for the comments on the issue of the contribution of genes required for hypoxic growth and on the difference of hypoxic levels between mouse lineages. We did not intend to mention that a set of genes required for hypoxic growth is the sole contributor to bacterial survival in host organs during infection. As we discussed in the Discussion section, we acknowledge that the adaptation to the difference of carbon source between in vitr_o and _in vivo infection (i.e. preferential usage of lipid carbon source in vivo) is involved in the pathogenesis of mycobacterial diseases (Yang. Front Microbiol 2018; new Ref#33, Gouzy. Proc Natl Acad Sci U S A 2021; new Ref#29, Quinonez. mBio 2022; new Ref#40, Pandey. Proc Natl Acad Sci U S A 2008; new Ref#41). We consider that not only the genes required for hypoxic pellicle formation but also strain-dependent/accessory genes conferring kinds of metabolism other than hypoxic pellicle formation can be estimated to be involved in the in vivo mouse lung infection.

      We have modified the sentence to clearly express our intention as follows: “These in vivo TnSeq data suggest that, despite the differences in the profiles of the genes required for in vivo infection between strains, increase of genetic requirements for hypoxic growth in part contribute to the pathogenesis in vivo.” (pages 15-16, lines 269-271)

      It seems to be an interesting idea to perform TnSeq by using C3HeJ mice. The granuloma formed in C3HeJ mice becomes extremely hypoxic (less than 1%, corresponding the level of “pathological” hypoxia) which is as severe as the detection range by pimonidazole. In our model, the effect of such pathological levels of hypoxia on granuloma formation might not be detected. However, the lesion formed in C57BL/6 mice becomes a “physiological” level of hypoxia (5% O2) (McKeown SR. Br Br J Radiol. 2014) which is the same O2 level for M. intracellulare to form pellicles. In principle, oxygen levels inside human bodies are physiologically hypoxic, and many biological events are experimentally investigated in this condition. Thus, we consider that we were able to observe the effect of physiological hypoxia on M. intracellulre growth both in vitro (hypoxic pellicles) and in vivo (infected C57BL/6 mice).

      [Reference]

      Yang, T. et al. Pan-genomic study of Mycobacterium tuberculosis reflecting the primary/secondary genes, generality/individuality, and the interconversion through copy number variations. Front Microbiol 9, 1886 (2018).

      Gouzy, A., Healy, C., Black, K.A., Rhee, K.Y. & Ehrt, S. Growth of Mycobacterium tuberculosis at acidic pH depends on lipid assimilation and is accompanied by reduced GAPDH activity. Proc Natl Acad Sci U S A 118, e2024571118 (2021).

      Quinonez, C.G. et al. The role of fatty acid metabolism in drug tolerance of Mycobacterium tuberculosis. mBio 13, e0355921 (2022).

      Pandey, A.K. & Sassetti, C.M. Mycobacterial persistence requires the utilization of host cholesterol. Proc Natl Acad Sci U S A 105, 4376-4380 (2008).

      McKeown., S.R. et al. Defining normoxia, physoxia and hypoxia in tumours-implications for treatment response. Br Br J Radiol 87, 20130676 (2014).

      (3f) An important set of data with the ATCC13950 reference strain is missing here. It is suggested that authors perform this study with the reference strain to identify whether the enrichment of genes is similar across all strains or is specific to the clinical isolates.

      Thank you for the comment on the setting of ATCC13950 as a control strain in the mouse infection experiment. However, we proved that bacterial burden of ATCC13950 is reduced continuously from 4 weeks of infection, and that ATCC13950 is almost completely eliminated from 8 to 16 weeks of infection (BMC Microiol 2023; new Ref#22). Therefore, it is impossible to perform TnSeq to detect the genes required for persistent infection in mice infected with ATCC13950.

      [Reference]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      (3g) Pages 13-14, lines 228-245: "We have performed a statistical enrichment analysis of gene sets by GSEA...".

      The comparison made here is not clear to me. It seems the authors do compare genes required for the growth of M.i.27 and M.i.198 in mouse lungs with the gene sets required for hypoxic pellicle formation in ATCC13950 together with the gene sets showing increased gene essentiality observed in the clinical MAC-PD strains, and claim that a significant % of genes belong to hypoxia-adaptation pathways. It is factually incorrect because a majority of these might overlap with those found critical for the in vitro survival of MAC-PD strains. It is suggested that authors re-analyze their data by comparing genes required for the growth of M.i.27 and M.i.198 in mouse lungs individually with those involved in hypoxic pellicle formation in ATCC13950, and with the gene sets found critical for in vitro growth, and present accordingly.

      Thank you for the suggestion on the re-analysis of gene enrichment analysis of genes required for M.i.27 and M.i.198 in vivo infection, individually with genes involved in hypoxic pellicle formation in ATCC13950 and with those showing genetic requirements in clinical MAC-PD strains compared to ATCC13950.

      About 50% (92 and 94 out of 181 genes through Day 1 to Week 16 and through Week4 to Week16 of infection) and 40% (70 and 79 out of 179 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) of genes required for hypoxic pellicle formation in ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively. In addition, about 42% (54 and 56 out of 128 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) and 40% (79 and 68 out of 179 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) of genes showing increased requirements in clinical MAC-PD strains compared to ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively.

      The tables and graphs of GSEA results are shown in Supplementary Figs. 5, 6.

      These data indicate that 40-50% of the genes required for in vitro hypoxic pellicle formation and the strain-dependent/accessory essential genes are significantly enriched individually with in vivo bacterial growth. We have added the result of reanalyzed data of GSEA in the Result (pages 16-17, lines 287-290). We have shown the detail of reanalyzed data of GSEA in Supplementary Figs. 5, 6 and Supplementary Tables 15, 16.

      (3h) Since authors have used Tnseq of pooled mutants, which often yields misleading information, it is important to validate some of their findings upon mouse infection with individual mutants that yield prominent as well as baseline reduction at different time points. In the absence of validation, it remains a mere speculation of the role of these genes in the infection of these strains to animals.

      Thank you for the suggestion on the validation of the TnSeq hit genes on the in vivo survival. We acknowledge the importance of validating the TnSeq-hit genes by constructing knockout mutants. We have recently succeeded in constructing the vectors for making knockout strains of M. intracellulare (Tateishi Y. Microbiol Immunol. 2024). We will proceed to the infection experiment of knockout mutants by using our system for constructing them.

      [Reference]

      Tateishi Y. et al. Construction of knockout mutants in Mycobacterium intracellulare ATCC13950 strain using a thermosensitive plasmid containing negative selection marker rpsL+. Microbiol Immunol 68, 339-347 (2024).

      (4) Result 4 (Page 14-15): Preferential hypoxic adaptation of clinical MAC-PD strains evaluated with bacterial growth kinetics.

      (4a) "The metabolic remodeling, such as the increased gene essentiality of gluconeogenesis and the type VII secretion system..". As stated above, the essentiality of a gene, being a qualitative trait, should not be presented in quantitative terms. The authors should re-phrase this statement.

      Following the comment, we have corrected the term as “The metabolic remodeling, such as the increased genetic requirements of gluconeogenesis and the type VII secretion system.” (page 17, lines 296-297)

      (4b) "overlap of the genes required for mouse lung infection and those required for hypoxic pellicle formation involved by conferring these metabolic pathways..". There is a syntax error in this statement and needs revision.

      Following the comment, we have corrected the phrase as “overlap of the genes required for mouse lung infection and those required for hypoxic pellicle formation involved by these metabolic pathways”. (page 17, lines 297-299)

      (4c) The altered requirement of genes in different clinical strains for survival provides only circumstantial evidence of metabolic remodeling. Authors are suggested to perform metabolic profiling of representative clinical and reference strains, as it is important to examine whether these bacteria indeed undergo metabolic shift.

      Thank you for the comment on the metabolic profiling of the representative clinical and reference strains. We previously published the TnSeq result of ATCC13950 and we produced the current data by organizing with our previous findings (Fig. 4 in Tateishi Y. Sci Rep 2020; new Ref#10). The priority of the current study was to elucidate the difference and diversity of genetic requirements between clinical MAC-PD strains and ATCC13950. We consider that it is of some value to show the even circumstantial evidence of metabolic remodeling by TnSeq, because it provides a strong rationale for proceeding to the next study including metabolomic analysis.

      [Reference]

      Tateishi, Y. et al. Genome-wide identification of essential genes in Mycobacterium intracellulare by transposon sequencing - Implication for metabolic remodeling. Sci Rep 10, 5449 (2020).

      (5) Result 5 (Page 16-18): Effects of knockdown of universal and accessory/strain-dependent essential or growth-defect-associate genes in clinical MAC-PD strains.

      (5a) Lines 273-277: The rationale of using CRISPRi should be correctly presented to evaluate the effect of individual genes' suppression on the downstream phenotype and not to establish the CRISPRi silencing tool in MAC.

      Thank you for the comment on the rationale of the section of the CRISPR-i experiment. Following the comment, we have modified the sentence as follows: “With an intention to evaluate the effect of suppressing TnSeq-hit genes on bacterial growth.” (page 19, lines 333-334 in the revised manuscript).

      (5b) Line 278: pRH2052/pRH2521 are the plasmids and not the CRISPRi system.

      Following the comment, we have corrected the phrase as “pRH2052/pRH2521 clustered regularly interspaced short palindromic repeats interference (CRISPR-i) plasmids.” (page 19, lines 334-335 in the revised manuscript).

      (5c) Line 280: Other pioneering studies on the use of CRISPRi for gene silencing in mycobacteria (Chaudhary et al., Nat Comm, Rock et al., Nat Microbio) should also be cited.

      Thank you for the comment on adding the reference papers on CRISPR-i in mycobacteria. We have added the two suggested papers in the revised manuscript as new Ref #30 and #31. (page 19, line 336).

      (5d) Lines 282-283: It is not clear why M001 and MOTT64 could not be transformed. Did the authors use any control plasmid to evaluate the transformation efficiency of these strains?

      Thank you for the comment on the failure of transformation in M001 and MOTT64.

      Following the comment, we have performed the experiment for evaluating the efficiency of transformation in the 9 M. intracellulare strains we used in this study. We have used an E. coli-mycobacteria shuttle vector pSO246KM-Prhsp65-luc that expresses firefly luciferase as a control plasmid (Aoki K. J Biol Chem 2004). For obtaining transformed colonies, we used 7H10/OADC agar plates containing the same concentration of kanamycin that we used for preparing Tn mutant libraries and for obtaining CRSISPR-i knockdown strains.

      We have observed no colonies grown on agar plates in MOTT64 after electroporation of the pSO246KM-Prhsp65-luc plasmid. In most of the remaining strains, the transformed colonies have emerged fully on day 10 of culture after electroporation of the plasmid. However, we have observed that M001 needs twice as long as a period for the emergence of transformed colonies. On day 21, the number of colonies in M 001 have finally become comparable to that of the other strains. We have checked the luciferase activity of 6-12 colonies in each strain except for MOTT64, and we have confirmed the transformation of the plasmid by the data of higher luciferase activity in the colonies undergoing electroporation of the plasmid than in those not undergoing electroporation.

      The possible reason for the incapability of obtaining transformants of CRISPR-i vectors in MOTT64 may be due to the extremely low efficiency of acquiring foreign DNA. And the possible reason for the incapability of obtaining transformants of CRISPR-i vectors in M001 may be intolerable to the stress caused by transformation of plasmids compared to other M. intracellulare strains. For M001, pSO246KM-Prhsp65-luc plasmid may cause tolerable stress for transformation, resulting in the delayed emergence of transformed colonies. By contrast, the CRIPSR-i plasmids may cause greater stress for M001 than pSO246KM-Prhsp65-luc plasmid, resulting in being intolerable for transformation.

      Author response table 1.

      Author response image 3.

      Result of luciferase activities before and after transformation of pS0246KM-Prhsp65-luc plasmid. Fifty microliter of cultures were mixed with 50 u L of assay reagents (Luciferase assay system E1500, Promega) and luciferase activity was measured by the luminometer (FilterMax F5, Molecular Devices). Data are shown as mean ± SD of 6-12 colonies

      [Reference]

      Aoki K. Extracellular mycobacterial DNA-binding protein 1 participates in Mycobacterium-lung epithelial cell interaction through hyaluronic acid. J Biol Chem 279, 39798–39806 (2004).

      (5e) Lines 283-186: "To confirm the gene essentiality detected with the HMM analysis, we evaluated the consequent growth inhibition in the knockdown strains of representative universal essential or growth-defect-associated genes, including glcB, inhA, gyrB, and embB.." It is not clear what was the level of suppression of these genes in the respective KD strains. Authors should include the level of suppression of these genes also by qRT-PCR.

      Thank you for the comment on the suppression levels of gene expression in knockdown strains of universal essential genes. Following the comment, we have evaluated them by qRT-PCR and we observed comparable levels of knockdown efficiency in the knockdown strains between universally essential genes and strain-specific/accessory essential genes (new Supplementary Fig. 9). Overall, the gene expression was suppressed to 20 - 70% in the knockdown strains compared to the vector control strains that do not express sgRNA.

      We have added the data of qRT-PCR of knockdown strains of universal essential genes such as glcB, inhA, gyrB, and embB (new Supplementary Fig. 9). We have revised the Result and Discussion in the manuscript (page 21, lines 367-376; page28, lines 490-497).

      (5f) Lines 293-: I am unable to establish any correlation between the growth of the knockdown with Tn insertion reads in the respective genes. For instance, pckA exhibits reduced Tn insertion reads in almost all the strains except in M.i.27, but the effect of its KD on growth is seen only in M.i.198 and M003; glpX exhibits reduced Tn insertion reads in M003, M019, M021 but the effect of its KD on growth is seen only in M003; csd exhibits reduced Tn insertion reads in M.i.198, M003, M019 but the effect of its KD on growth is seen only in M.i.198 and M003. The authors argue that these contradictory phenotypes are due to difficulties in the effective operation of genetically modified systems using foreign genes from different bacterial species in MAC-PD strains (Lines 310-312) or the desired effect on growth could not be observed due to the inability of CRISPRi to yield >99% suppression (Line 314) are not the valid justifications. Indeed, a close look at the RT-PCR data (Figure S5) reveals that pckA levels are ~0.22, 0.5, 0.2, 0.22, 0.2, 0.5, and 0.3 fold relative to sigA in M.i.198, M.i.27, ATCC13950, M018, M019, M003 and M021, respectively, but the effect of its suppression on growth by CRISPRi is seen only in M.i.198 and M003. Secondly, >99% suppression is not a universal prerequisite for all the genes to show growth defect (as might be the case with glcB, inhA, gyrB, and embB genes in this study). Hence, it remains unclear why contrasting results are obtained for most of the genes by TnSeq and CRISPRi.

      Thank you for the comments on the issue of inconsistent results between TnSeq and CRISPR-i based knockdown. We acknowledge that some inconsistencies were observed, particularly among strain-dependent/accessory essential or growth-defect associated genes. By contrast, we found consistent data between TnSeq and CRISPR-i based knockdown results of universal essential genes. By obtaining the data of suppression levels of gene expression in the knockdown strains of universal essential genes, we have acknowledged that the low efficiency of knockdown does not explain the reason of the discrepancy between TnSeq and CRISPR-i results because the levels of knockdown efficiency were comparable between strain-dependent/accessory essential genes and universally essential genes.  

      Although the mechanism has not been fully proven yet only from the current study, we consider that such inconsistent phenotypes with TnSeq and CRISPR-i based knockdown may be related to the recently revealed the bypass mechanism of gene essentiality which is characteristically observed in strain-dependent/accessory essential or growth-defect-associated genes. According to the publication by Rosconi (Nat Microbiol. 2022: new Ref#14) reporting the ‘forced-evolution experiments’ of 36 clinical Streptococcus pneumoniae strains, gene essentiality can be bypassed by several mechanisms including the composition of the accessory genome and pathway rewiring. They recovered successfully knockout mutants from transformation experiments in strain-specific/accessory essential genes such as cytidine monophosphate kinase, a folate pathway enzyme formate tetrahydrofolate ligase and an undecaprenyl phosphate-biosynthesis pathway enzyme farnesyl-diphosphate synthase. The bypassing of gene essentiality could be suggested by observing suppressor mutations and synthetic lethality in knockout strains. By contrast, universal essential genes were reported to fulfill the three categories including high levels of conservation within and often across species, limited genetic diversity, and high and stable expression levels. Consequently, universal essential genes are estimated to be rigid, largely immutable key components to an organism’s survival.

      We consider that this is the case with our study on NTM because NTM is pangenomic. The knockdown of universal essential genes resulted in the clear growth suppression; however, the knockdown of strain-dependent/accessory essential genes did not show the consistent growth suppression. We consider that the bypass mechanism of gene essentiality can explain the inconsistent effect of gene silencing of strain-dependent/accessory genes on bacterial growth suppression.

      We have added the above-mentioned description in the Discussion (pages 28-29, lines 497-519).

      [Reference]

      Rosconi, F. et al. A bacterial pan-genome makes gene essentiality strain-dependent and evolvable. Nat Microbiol 7, 1580–1592 (2022).

      Minor Comments:

      (1) The authors should mention the cut-off of fold-change for all the experiments in the methods section.

      Thank you for the comment on the cut-off of fold-change. We set the cut-off of fold-change as adjusted P-value < 0.05. We added the description in the Methods section. (page 41, lines 724-725)

      (2) Figure 7 legend (Lines 888-889): "Data are shown as the means {plus minus} SD of triplicate experiments. Data from one experiment representative of three independent experiments (N = 3) are shown."

      Figure S3 legend: Data on the growth curves are the means of triplicate experiments. Data from one experiment representative of three independent experiments (N = 3) are shown.

      Figure S4 legend: Data are shown as the means {plus minus} SD of triplicate experiments. Data from one experiment representative of two independent experiments (N = 2) are shown.

      Figure S5 legend: Gene expression data are the means {plus minus} SD of triplicate experiments. Data from one experiment representative of two independent experiments (N = 2) are shown.

      These statements need clarification. Whether multiple independent experiments (biological repeats), each with 2-3 technical replicates performed and the data shown represent one of the multiple biological repeats?

      Thank you for the comments on the number of experiments performed and the number of replicates. We have performed two or three independent experiments with 2-3 technical replicates. The data shown represent one of the independent experiments.

      (3) Figure 7b: Statistics are missing in the bar graph for growth rate under aerobic conditions.

      Thank you for the comment on the statistics of the data regarding growth rate under aerobic conditions. We have added the statistics in the new Fig. 7c.

      (4) The authors should check the y-axis in Figure 7b, as it is not clear whether bacteria indeed show a growth rate of 1-3 CFUs/day.

      Thank you for the comment on the y-axis in Figure 7b. We have corrected the label of y-axis as “log10[CFUs]/day” in the new Fig. 7c. Additionally, we have corrected the label of y-axis in new Fig. 7a and added the description as “Data are represented as CFUs in 4 μl sample at each timepoint.” in the Fig. 7a legend.

      Reviewer #3 (Recommendations For The Authors):

      (1) It's notable that strains M001 and MOTT64 failed to undergo a transformation, while seven other strains did. Given that M001, MOTT64, and M019 belong to the same phylogenetic clade, it raises questions about why particular strains within this clade showed different transformation outcomes. It might be valuable for them to discuss this discrepancy in their study.

      Thank you for the comment on the difference in capacity of transformation between strains belonging to the same genomic subgroup. Although the direct mechanism determining the competency for foreign DNA has not been elucidated in M. intracellulare and other pathogenic NTM species, several studies on general bacteria suggest the difficulties of introducing foreign DNA into clinical strains compared to the laboratory strains. As suggested in Staphylococcus aureus (Covaglia AR. PNAS. 2010; new ref#55), some clinical strains develop elimination system of foreign nucleic acids such as a type III-like restriction restriction endonuclease. As suggested in gran-negative bacteria (Qin J. Sci Rep. 2022; new Ref#56), there may be some difference in cell surface structures between strains, resulting in the necessity of polymyxin B nonapeptide targeting cell membrane for transforming clinical strains. The efficiency of eliminating foreign DNA may be attributed to various kinds of strain-specific factors including restriction endonuclease, natural CRISPR-interference system and cell wall structures rather than a simple genotypic factor.

      We have added the description on the difference of capability in transformation in the Discussion. (page 31, lines 546-558)

      [References]

      Corvaglia, A.R., François, P., Hernandez, D., Perron, K., Linder, P. & Schrenzel, J. A type III-like restriction endonuclease functions as a major barrier to horizontal gene transfer in clinical Staphylococcus aureus strains. Proc Natl Acad Sci U S A 107, 11954-11958 (2010).

      Qin, J., Hong, Y., Pullela, K., Morona, R., Henderson, I.R. & Totsika, M. A method for increasing electroporation competence of Gram-negative clinical isolates by polymyxin B nonapeptide. Sci Rep 12,:11629 (2022).

      (2) The authors should consider specifying M. intracellulare in their title.

      Thank you for the comment on the manuscript title. Following the comments from all Reviewers, we have modified the title as “Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable finding on the immunophenotypes of cancer treatment-related pneumonitis. The evidence supporting the claims of the authors is solid, although the inclusion of controls, as suggested by one of the reviewers, strengthened the study. The work will be of interest to cancer immunologists.

      Response: We are thankful for the editor's recognition of the contribution our study makes to understanding the immunophenotypes associated with cancer treatment-related pneumonitis. We agree that the inclusion of control data is pivotal for benchmarking biomarkers. While our initial study design was constrained by the availability of BALF from healthy individuals within clinical settings, we addressed this limitation by incorporating scRNA-seq data from healthy control and COVID-19 BALF cells sourced from the GSE145926 dataset. This additional analysis has provided a baseline for comparison, revealing that CD16 is expressed in a minority of T cells in healthy BALF, specifically 1.0% of CD4+ T cells and 1.6% of CD8+ T cells. The inclusion of this data as Figures 6H and 6I in our manuscript offers a robust context for the significant increase in CD16-expressing T cells observed in patients with PCP, thus enhancing the robustness of our study's conclusions.

      Author response image 1.

      Reviewer #1 (Recommendations For The Authors):

      Many thanks for giving me the opportunity to review your paper. I really enjoyed the way you carried out this work - for example, your use of a wide panel of markers and the use of two analytical methods - you have clearly given great thought to bias avoidance. I also greatly appreciated your paragraph on the limitations, as there are several, but you do not 'over-sell' your conclusions so there is no issue here for me.

      To improve the piece, there are a few typos (eg 318 - specific to alpha-myosin) and I was briefly confused about the highlighted clusters in Figure 4. Perhaps mention why they are highlighted when they first appear in 4D instead of E?

      Response: We have corrected the typos, and we have rearranged the sequence of Figures 3E and 3F, as well as 4D and 4E, to ensure a logical flow. Citrus-generated violin plots are now presented prior to the heatmap of the clusters, which better illustrates the progression of our analysis and the derivation of the clusters.

      In terms of improvements to the data, obviously it would have been ideal if you had had some sort of healthy control as a point of reference for all cohorts, but working in the field I understand the difficulties in getting healthy BAL. It would be worth your while however trying to find more supportive data in the literature in general. There are studies which assess various immune markers in healthy BAL eg https://journal-inflammation.biomedcentral.com/articles/10.1186/1476-9255-11-9. and so I think it is worth looking wrt the main findings. For example, are CD16+ T cells seen in healthy BAL or any other conditions (at present the COVID study is being over-relied on)? Could these cells be gamma deltas? (gamma deltas frequently express CD8 and CD16, and can switch to APC like phenotypes).

      Response: We are grateful for the reviewer's consideration of the practical challenges associated with collecting BALF from healthy individuals. Alternatively, we have supplemented our analysis with single-cell RNA sequencing data from BALF cells of healthy controls, as found in existing literature (Nature Medicine 2020; 26: 842-844). We have accessed to GSE145926 and downloaded data of BALF cells from healthy control (n=3) and severe COVID19 (n=6). The filtered gene-barcode matrix was first normalized using ‘NormalizeData’ methods in Seurat v.4 with default parameters. The top 2,000 variable genes were then identified using the ‘vst’ method in Seurat FindVariableFeatures function. Then PCA and UMAP was performed. T cells were identified as CD2 >1 and CD3E >1, and FCGR3A expression was explored using an expression threshold of 0.5. Violin plots and bar plots were generated by ggplot function.

      Regarding the pivotal finding of increased CD16-expressing T cells in patients with PCP, the scRNA-seq data mining indicates that CD16 is expressed by a minority of T cells in healthy BALF—1.0% of CD4+ T cells and 1.6% of CD8+ T cells. These figures, now incorporated into our revised manuscript as Figures 6H and 6I, substantiate our findings. These cells could be gamma delta T cells, but we could not confirm it with the limited data. We will investigate in the future study. The main text has been updated to reflect these findings.

      Author response image 2.

      I would agree with your approach of not going down the transcript route, so just focus on protein expression.

      I think you need to mention more about the impact of ICI on PD1 expression - in the methods you lose one approach owing to low T cell expression (132) but in the discussion you mention ICI induced high expression (311) as previously reported. This apparent contradiction needs an explanation.

      Response: We acknowledge the need for clarification regarding the impact of ICIs on PD-1 expression. In the methods section, the low detection of PD-1 expression on T cells in patients treated with nivolumab was indeed noted; this was due to the competitive nature of the PD-1 detection antibody EH12.2 with nivolumab. As reported by Suzuki et al. (International Immunology 2020; 32: 547-557), T cells from patients with ICI-induced ILD, including those treated with nivolumab, exhibit upregulated PD-1 expression, where the PD-1 detection antibody (clone: MIH4). Conversely, as outlined by Yanagihara et al. (BBRC 2020; 527: 213-217), the PD-1 detection antibody clone EH12.2 conjugated with 155Gd (#3155009B) used in our study is unable to detect PD-1 when patients are under nivolumab treatment due to competitive inhibition. The absence of a metal-conjugated PD-1 antibody with the MIH4 clone presented a limitation in our study. Ideally, we would have conjugated the MIH4 antibody with 155Gd for our analysis, which is a refinement we aim to incorporate in future research. We have now included this discussion in our manuscript to clarify the contradiction between the methodological limitations and the high PD-1 expression induced by ICIs, as reported in the literature. This addition will guide readers through the nuances of antibody selection and its implications for detecting PD-1 expression in the context of ICI treatment.

      Finally, since you have the severity data, it would be good to assess all the significantly different clusters against this metric, as you have done for CD16+ T cells. Not only may this reveal more wrt the impact of other immune populations, but it'll also give a point of reference for the CD16+ T cell data.

      Response: Thank you for the suggestion to assess all significantly different clusters against the disease severity metric. We have expanded our analysis to include a thorough correlation study between the disease severity and intensity of various T-cell markers. Notably, we observed that intensity of CCR7 expression correlates with the disease severity. Although the precise biological significance of this correlation remains to be elucidated, it may suggest a role for CCR7+ T cells in the pathogenesis or progression of the disease. We have considered the potential implications of this finding and included it as Supplementary Figure 5. We have also discussed this observation in the discussion section.

      Author response image 3.

      Overall though I think this is a really nice study, with a potentially very significant finding in linking CD16+ T cells with severity. Congratulations.

      Response: We would like to thank the reviewer’s heartful comments on our manuscript.

      Reviewer #2 (Recommendations For The Authors):

      General:

      1) The fact that this is a retrospective study should be indicated earlier in the paper.

      Response: Now we have mentioned the retrospective nature of the study in the method section as follows: In this retrospective study, patients who were newly diagnosed with PCP, DI-ILD, and ICI-ILD and had undergone BALF collection at Kyushu University Hospital from January 2017 to April 2022 were included. The retrospective study was approved by the Ethics Committee of Kyushu University Hospital (reference number 22117-00).

      2) tSNE and UMAP are dimensionality reduction techniques that don't cluster the cells, the authors should specify what clustering algorithm was used subsequently (e.g FlowSOM)

      Response: The cluster was determined manually by their expression pattern.

      3) With regards to the role of CD16 in a potential exacerbated cytotoxicity in the fatal PCP case, the authors could measure the levels of C3a related proteins in patient serum to link to a common immunopathogenic pathway with COVID.

      Response: We did not collect serum from the patients in this study as our research protocol was approved by the Ethics committee for the use of BALF only. However, we agree with your assessment that the measurement of serum C3a levels would be informative. In future studies, we will incorporate the measurement of serum C3a levels to provide more comprehensive insights into the impact of C3a on immune function. Thank you for your valuable feedback and for helping us to improve the quality of our research.

      Line-specific:

      101 The authors should provide some information on how the cryopreservation of the BALF was carried out.

      Response: Upon collection, BALF samples were immediately centrifuged at 300 g for 5 minutes to pellet the cells. The resultant cell pellets were then resuspended in Cellbanker 1 cryopreservation solution (Takara, catalog #210409). This suspension was aliquoted into cryovials and gradually frozen to –80ºC using a controlled rate freezing method to ensure cell viability. The samples were stored at –80ºC until required for experimental analysis. We have added the information in the method section.

      Fig 3B: It would be very helpful if the authors could add a supplementary figure with marker expression on the UMAP projection.

      Response: We have added Supplementary Figure 4 with marker expression on the UMAP projection in Figure 3B.

      Fig 4A: Same as Fig 3B

      Response: We have added Supplementary Figure 5 with marker expression on the UMAP projection in Figure 4A.

      Fig 5B: Same as Fig 3B

      Response: We have added Supplementary Figure 6 with marker expression on the tSNE projection in Figure 5B.

      266 Authors should state if the data is not shown with regards to differences in myeloid cell fractions

      430 Marker intensity is not shown in panel D

      Re: Corrected as follows: “Citrus network tree visualizing the hierarchical relationship of each marker between identified T cell ~”

      446 The legend says patients have IPF, CTD-ILD, sarcoidosis but the figure shows PCP, DI-ILD, ICI-ILD.

      Re: Corrected.

      451 What do the authors mean in "Graphical plots represent individual samples"? Panel B is a dot plot of all samples.

      Response: Corrected as “Dot plots represent ~”.

      472 What do the authors mean in "Graphical plots represent individual samples"? Panel C is a dot plot of all samples.

      Response: Corrected as “Dot plots represent ~”.

      Reviewer #3 (Recommendations For The Authors):

      An important thing is to add comparisons against healthy donors, at least. A common baseline is needed to firmly establish any biomarkers.

      Response: We acknowledge the reviewer's concern regarding the comparison with healthy donors. Although our study did not initially include BALF collection from healthy controls due to the constraints of clinical practice, we recognize the importance of a control baseline to validate biomarkers. To address this, we have integrated scRNA-seq data from healthy control BALF cells available in public datasets (Nature Medicine 2020; 26: 842-844), accessed from GSE145926. This dataset includes BALF cells from healthy controls (n=3) alongside severe COVID-19 patients (n=6). Data mining confirmed that CD16 expression is in a minority of T cells in healthy BALF—1.0% of CD4+ T cells and 1.6% of CD8+ T cells. We have included this comparative data in our manuscript as Figures 6H and 6I to provide context for the observed increase in CD16-expressing T cells in PCP patients, which substantiates our findings.

      Author response image 4.

      Data analysis needs to go deeper. There are several other tools on Cytobank alone that would allow a more quantitative analysis of the data. Fold changes in marker expressions would be very important as measurements of phenotypic changes.

      Response: We thank the reviewer for their constructive feedback on the depth of our data analysis. We acknowledge the value of a more quantitative approach, including the use of fold change measurements to assess phenotypic alterations, and recognize the potential insights such tools on Cytobank could provide. Due to the scope and limited space of the current study, we have focused our analysis on the most pertinent findings relevant to our research questions. We believe the present analysis serves the immediate objectives of this study. However, we agree that further quantitative analysis would enhance the understanding of the data. We have expanded our analysis to include a thorough correlation study between the disease severity of PCP and intensity of various T-cell markers. Notably, we observed that intensity of CCR7 expression correlates with the disease severity of PCP. Although the precise biological significance of this correlation remains to be elucidated, it may suggest a role for CCR7+ T cells in the pathogenesis or progression of the disease. We have considered the potential implications of this finding and included it as Supplementary Figure 5. We have also discussed this observation in the discussion section. We aim to consider these approaches in future work to build upon the foundation laid by this study. Your suggestions are invaluable and will be kept at the forefront as we plan subsequent research phases.

      Author response image 5.

      Reviewer #1 (Public Review):

      Cytotoxic agents and immune checkpoint inhibitors are the most commonly used and efficacious treatments for lung cancers. However their use brings two significant pulmonary side-effects; namely Pneumocystis jirovecii infection and resultant pneumonia (PCP), and interstitial lung disease (ILD). To observe the potential immunological drivers of these adverse events, Yanagihara et al. analysed and compared cells present in the bronchoalveolar lavage of three patient groups (PCP, cytotoxic drug-induced ILD [DI-ILD], and ICI-associated ILD [ICI-ILD]) using mass cytometry (64 markers). In PCP, they observed an expansion of the CD16+ T cell population, with the highest CD16+ T proportion (97.5%) in a fatal case, whilst in ICI-ILD, they found an increase in CD57+ CD8+ T cells expressing immune checkpoints (TIGIT+ LAG3+ TIM-3+ PD-1+), FCRL5+ B cells, and CCR2+ CCR5+ CD14+ monocytes. Given the fatal case, the authors also assessed for, and found, a correlation between CD16+ T cells and disease severity in PCP, postulating that this may be owing to endothelial destruction. Although n numbers are relatively small (n=7-9 in each cohort; common numbers for CyTOF papers), the authors use a wide panel (n=65) and two clustering methodologies giving greater strength to the conclusions. The differential populations discovered using one or two of the analytical methods are robust: whole population shifts with clear and significant clustering. These data are an excellent resource for clinical disease specialists and pan-disease immunologists, with a broad and engaging contextual discussion about what they could mean.

      Strengths:

      • The differences in immune cells in BAL in these specific patient subgroups is relatively unexplored.

      • This is an observational study, with no starting hypothesis being tested.

      • Two analytical methods are used to cluster the data.

      • A relatively wide panel was used (64 markers), with particular strength in the alpha beta T cells and B cells.

      • Relevant biomarkers, beta-D-glucan and KL-6 were also analysed

      • Appropriate statistics were used throughout.

      • Numbers are low (7 cases of PCP, 9 of DI-ILD, and 9 of ICI-ILD) but these are difficult samples to collect and so in relative terms, and considering the use of CyTOF, these are good numbers.

      • Beta-D-glucan shows potential as a biomarker for PCP (as previously reported) whilst KL-6 shows potential as a biomarker for ICI-ILD (not reported before). Interestingly, KL-6 was not seen to be increased in DI-ILD patients.

      • Despite the relatively low n numbers and lack of matching there are some clear differentials. The CD4/CD8+CD16+HLA-DR+CXCR3+CD14- T cell result is striking - up in PCP (with EM CD4s significantly down) - whilst the CD8 EMRA population is clear in ICI-ILD and 'non-exhausted' CD4s, with lower numbers of EMRA CD8s in DI-ILD.

      • The authors identify 17/31 significantly differentiated clusters of myeloid cells, eg CD11bhi CD11chi CD64+ CD206+ alveolar macrophages with HLA-DRhi in PCP.

      • With respect to B cells, the authors found that FCRL5+ B cells were more abundant in patients with ICI-ILD compared to those with PCP and DI-ILD, suggesting these FCRL5+ B cells may have a role in irAE.

      • One patient's extreme CD16+ T cell (97.5% positive) and death, led the authors to consider CD16+ T cells as an indicator of disease severity in PCP. This was then tested and found to be correct.

      • Authors discuss results in context of literature leading them to suggest that CD16+ T cells may target endothelial cells and wonder if anti-complement therapy may be efficacious in PCP.

      • Great discussion on auto-reactive T cell clones where the authors suggest that in ICI-ILD CD8s may react against healthy lung, driving ILD.

      • An observation of CXCR3 in different CD8 populations in ICI-ILD and PCP lead the authors to hypothesise on the chemoattractants in the microenvironment.

      • Excellent point suggesting CD57 may not always be a marker of senescence on T cells - reflective of growing change within the community.

      • Well considered suggestion that FCRL5+ B cells may be involved in ICI-ILD driven autoimmunity.

      • The authors discuss the main weaknesses in the discussion and stress that the findings detailed in the paper "demonstrate a correlation rather than proof of causation".

      • Figures and legends are clear and pleasing to the eye.

      Weaknesses:

      • This is an observational study, with no starting hypothesis being tested.

      • Only patients who were able to have a lavage taken have been recruited.

      • One set of analysis wasn't carried out for one subgroup (ICI-ILD) as PD1 expression was negative owing to the use of nivolumab.

      • Some immune cell subsets wouldn't be picked up with the markers and gating strategies used; e.g. NK cells.

      • Some immune cells would be disproportionately damaged by the storage, thawing and preparation of the samples; e.g. granulocytes.

      • Numbers are low (7 cases of PCP, 9 of DI-ILD, and 9 of ICI-ILD), sex, age and adverse event matching wasn't performed, and treatment regimen are varied and 'suspected' (suggesting incomplete clinical data) - but these are difficult samples to collect. These numbers drop further for some analyses e.g. T cell clustering owing to factors such as low cell number.

      • The disease comparisons are with each other, there is no healthy control.

      • Samples are taken at one time point.

      • The discussion on probably the stand out result - the CD16+ T cells in PCP - relies on two papers - leading to a slightly skewed emphasis on one paper on CD16+ cells in COVID. There are other papers out there that have observed CD16+ T cells in other conditions. It is also worth being in mind that given the markers used, these CD16+ T cell may be gamma deltas.

      • The discussion on ICI patient consistently showing increased PD1, could have been greater, as given the ICI is targeting PD1, one would expect the opposite as commented on, and observed, in the methods section.

      Reviewer #2 (Public Review):

      Yanagihara and colleagues investigated the immune cell composition of bronchoalveolar lavage fluid (BALF) samples in a cohort of patients with malignancy undergoing chemotherapy and with with lung adverse reactions including Pneumocystis jirovecii pneumonia (PCP) and immune-checkpoint inhibitors (ICIs) or cytotoxic drug induced interstitial lung diseases (ILDs). Using mass cytometry, their aim was to characterize the cellular and molecular changes in BAL to improve our understanding of their pathogenesis and identify potential biomarkers and therapeutic targets. In this regard, the authors identify a correlation between CD16 expression in T cells and the severity of PCP and an increased infiltration of CD57+ CD8+ T cells expressing immune checkpoints and FCLR5+ B cells in ICI-ILD patients.

      The conclusions of this paper are mostly well supported by data, but some aspects of the data analysis need to be clarified and extended.

      1) The authors should elaborate on why different set of markers were selected for each analysis step. E.g., Different set of markers were used for UMAP, CITRUS and viSNE in the T cell and myeloid analysis.

      2) The authors should state if a normality test for the distribution of the data was performed. If not, non-parametric tests should be used.

      3) The authors should explore the correlation between CD16 intensity and the CTCAE grade in T cell subsets such as EMRA CD8 T cells, effector memory CD4, etc as identified in Figure 1B.

      4) The authors could use CITRUS to better assess the B cell compartment.

      Reviewer #3 (Public Review):

      The authors collected BALF samples from lung cancer patients newly diagnosed with PCP, DI-ILD or ICI-ILD. CyTOF was performed on these samples, using two different panels (T-cell and B-cell/myeloid cell panels). Results were collected, cleaned-up, manually gated and pre-processed prior to visualisation with manifold learning approaches t-SNE (in the form of viSNE) or UMAP, and analysed by CITRUS (hierarchical clustering followed by feature selection and regression) for population identification - all using Cytobank implementation - in an attempt to identify possible biomarkers for these disease states. By comparing cell abundances from CITRUS results and qualitative inspection of a small number of marker expressions, the authors claimed to have identified an expansion of CD16+ T-cell population in PCP cases and an increase in CD57+ CD8+ T-cells, FCRL5+ B-cells and CCR2+ CCR5+ CD14+ monocytes in ICI-ILD cases.

      By the authors' own admission, there is an absence of healthy donor samples and, perhaps as a result of retrospective experimental design, also an absence of pre-treatment samples. The entire analysis effectively compares three yet-established disease states with no common baseline - what really constitutes a "biomarker" in such cases? The introduction asserts that "y characterizing the cellular and molecular changes in BAL from patients with these complications, we aim to improve our understanding of their pathogenesis and identify potential therapeutic targets" (lines 82-84). Given these obvious omissions, no real "changes" have been studied in the paper. These are very limited comparisons among three, and only these three, states.

      Even assuming more thorough experimental design, the data analysis is unfortunately too shallow and has not managed to explore the wealth of information that could potentially be extracted from the results. CITRUS is accessible and convenient, but also make a couple of big assumptions which could affect data analysis - 1) Is it justified to concatenate all FCS files to analyse the data in one batch / small batches? Could there be batch effects or otherwise other biological events that could confuse the algorithm? 2) With a relatively small number of samples, and after internal feature selection of CITRUS, is the regression model suitable for population identification or would it be too crude and miss out rare populations? There are plenty of other established methods that could be used instead. Have those methods been considered?

      Colouring t-SNE or UMAP (e.g. Figure 6C) plots by marker expression is useful for quick identification of cell populations but it is not a quantitative analysis. In a CyTOF analysis like this, it is common to work out fold changes of marker expressions between conditions. It is inadequate to judge expression levels and infer differences simply by looking at colours.

      The relatively small number of samples also mean that most results presented in the paper are not statistical significant. Whilst it is understandable that it is not always possible to collect a large number of patient samples for studies like this, having several entire major figures showing "n.s." (e.g. Figures 3A, 4B and 5C), together with limitations in the comparisons themselves and inadequate analysis, make the observations difficult to be convincing, and even less so for the single fatal PCP case where N = 1.

      It would also be good scientific practice to show evidence of sample data quality control. Were individual FCS files examined? Did the staining work? Some indication of QC would also be great.

      This dataset generated and studied by the authors have the potential to address the question they set out to answer and thus potentially be useful for the field. However, in the current state of presentation, more evidence and more thorough data analysis are needed to draw any conclusions, or correlations, as the authors would like to frame them.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper provides useful information about how the ionome of Arabidopsis thaliana adapts to very high CO2-levels, backed up by solid evidence and carefully designed studies. However, the broader claims of the paper about climate change and food security - heavily emphasized in the abstract, introduction, and discussion - are inappropriate, as there is no direct link to the presented work.

      We sincerely thank you for the work you have done in reviewing our manuscript. We very much appreciate your overall positive assessment of the experimental work as a whole, its value and robustness.

      In this revised version, we took on board the majority of your suggestions and your comments. In particular, we understood your critical point about overstating our objectives, which might in turn seem uncorrelated with our results. We fully agree with the comments that have been made on this point. Consequently, we have made substantial modifications and corrections in order to clarify our objectives and their implications: exploring in depth the natural variation of the shoot ionome response to elevated CO2, and generating a valuable resource allowing a better understanding of the genetic and molecular mechanisms involved in the regulation of plant mineral nutrition by the elevation of atmospheric CO2.

      We also made modifications in response to the other suggestions, including a clarification of the functional experiments carried out around the function of TIP2;2 in response to elevated CO2. Figure 7 now comprises the comparison between both ambient and elevated CO2 conditions, which is much more informative that what appeared in the previous version.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study's abstract, introduction, and conclusions are not supported by the methods and results conducted. In fact, the results presented suggest that Arabidopsis could easily adapt to an extremely high CO2 environment.

      We understand the reviewer’s comment. Although our work is considered useful, robust and well designed, we agree with the reviewer's point. We have certainly overemphasized the significance of our work to address the issue of food security in response to rising atmospheric CO2, at the expense of the factual description of the results of our fundamental study of the mechanisms at the interface between CO2 and mineral nutrition. We have clarified this focus by modifying the text of the introduction, objectives and discussion. We hope that these modifications will enable readers to better appreciate the core of this work.

      Regarding the last part of the comment, our results do suggest that genetic variation could allow adaptation to rising atmospheric CO2, and our study does indeed aim to identify the extent and basis of this genetic variation.

      This study offers good evidence pointing to a genetic basis for Arabidopsis thaliana's response to elevated CO2 (eCO2) levels and its subsequent impact on the leaf ionome. The natural variation analyses in the study support the hypothesis that genetic factors, rather than local adaptation, guide the influence of eCO2 on the ionome of rosette leaves in Arabidopsis. However, the manuscript's claim regarding its role in "the development of biofortified crops adapted to a high-CO2 world" (line 23) is overstated, especially given the absence of any analysis on the influence of eCO2 on the seed ionome and Arabidopsis is a poor model for harvest index for any crop. The manuscript, in its current form, necessitates massive revisions, particularly in clarifying its broader implications and in providing more substantial evidence for some of its assertions.

      We thank the reviewer for this comment, and we would like to thank the reviewer for the positive appreciation for the identification of genetic basis for Arabidopsis thaliana's response to elevated CO2 and its subsequent impact on the leaf ionome. Nevertheless, it is true that the study of the leaf ionome is far from being able to lead to the development of biofortified plants. Some papers described that nutrient harvest index in Arabidopsis is a potential indicator of nutrient use efficiency (for instance, Masclaux-Daubresse and Chardon, Journal of Experimental Botany 2011 or Aranjuelo et al., Journal of Experimental Botany 2013). However, as we did not include any seed ionome data in the paper, we added clear mentions that our analyses were made on leaves (lines 56/57/250/319) and a comment in the discussion section to address this limitation (lines 325-328).

      Major Drawbacks and Questions:

      (1) Evidence for the Central Premise:

      The foundational premise of the study is the assertion that rising atmospheric CO2 levels result in a decline in plant mineral content. This phenomenon is primarily observed in C3 plants, with C4 plants seemingly less affected. The evidence provided on this topic is scant and, in some instances, contradicts the authors' own references. The potential reduction of certain minerals, especially in grains, can be debated. For instance, reduced nitrogen (N) and phosphorus (P) content in grains might not necessarily be detrimental for human and animal consumption. In fact, it could potentially mitigate issues like nitrogen emissions and phosphorus leaching. Labeling this as a "major threat to food security" (line 30) is exaggerated. While the case for microelements might be more compelling, the introduction fails to articulate this adequately. Furthermore, the introduction lacks any discussion on how eCO2 might influence nutrient allocation to grains, which would be crucial in substantiating the claim that eCO2 poses a threat to food security. A more comprehensive introduction that clearly delineates the adverse effects of eCO2 and its implications for food security would greatly enhance the manuscript.

      We partially agree with this comment. The decline in mineral status of C3 plants under conditions of elevated atmospheric CO2 has been widely described in the literature, and specifically documented for the cereal grains. While there are variations in this effect (depending on species, ecotype, cultivar), there is no debate about its acceptance. Here are just a few of the many works describing this effect, both on a global scale and at the level of the individual plant (Cotrufo MF (1998) Elevated CO2 reduces the nitrogen concentration of plant tissues. Global Change Biology 4: 43-54; Loladze I (2014) Hidden shift of the ionome of plants exposed to elevated CO(2)depletes minerals at the base of human nutrition. eLife 3: e02245; Myers SS (2014) Increasing CO2 threatens human nutrition. Nature 510: 139-142; Poorter H (1997) The effect of elevated CO2 on the chemical composition and construction costs of leaves of 27 C3 species. Plant, Cell & Environment 20: 472-482 ; Soares JC (2019) Preserving the nutritional quality of crop plants under a changing climate: importance and strategies. Plant and Soil 443: 1-26; Stitt] M (1999) The interaction between elevated carbon dioxide and nitrogen nutrition: the physiological and molecular background. Plant, Cell & Environment 22: 583-621; Uddling J (2018) Crop quality under rising atmospheric CO2. Curr Opin Plant Biol 45: 262-267).

      In addition to this, the threat to food security posed by this alteration in plant mineral status has also been well described in the literature by several modeling approaches (Beach RH (2019) Combining the effects of increased atmospheric carbon dioxide on protein, iron, and zinc availability and projected climate change on global diets: a modelling study. Lancet Planet Health 3: e307-e317; Ebi KL (2019) Elevated atmospheric CO(2) concentrations and climate change will affect our food's quality and quantity. Lancet Planet Health 3: e283-e284; Medek DE (2017) Estimated Effects of Future Atmospheric CO2 Concentrations on Protein Intake and the Risk of Protein Deficiency by Country and Region. Environ Health Perspect 125: 087002; Smith MR (2018) Impact of anthropogenic CO2 emissions on global human nutrition. Nature Climate Change 8: 834-839; Weyant C (2018) Anticipated burden and mitigation of carbon-dioxide-induced nutritional deficiencies and related diseases: A simulation modeling study. PLoS Med 15: e1002586; Zhu C (2018) Carbon dioxide (CO2) levels this century will alter the protein, micronutrients, and vitamin content of rice grains with potential health consequences for the poorest rice-dependent countries. Sci Adv 4: eaaq1012). To reinforce this point, we have added a sentence and references (lines 30-33). Nevertheless, we understand the reviewer's comment on the nuance to be given to the intensity of this potential threat. We have therefore modified the text, replacing "major threat" by "significant threat" (lines 3 and 29).

      We also would like to answer the reviewer’s comment on the potential environmental benefit associated with reduced N and P content in grains (mitigation of N emissions and P leaching). Indeed, if this reduced N and P content results from a lowered use efficiency of soil nutrients by plants, as suggested by several studies (Bloom 2010, Cassan 2023, Gojon 2023 and references therein), this may at the opposite favor N oxides emission and P leaching from the soil.

      (2) Exaggerated Concerns:

      The paper begins with the concern that carbon fertilization will lead to carbon dilution in our foods. While we indeed face numerous genuine threats in the coming decades, this particular issue is manageable. The increase in CO2 alone offers many opportunities for boosting yield. However, the heightened heat and increased evapotranspiration will pose massive challenges in many environments.

      While there are indeed multiple threats that we are facing in the coming decades, we don't fully agree with this comment. At present, there's no evidence to say that the negative effect of CO2 on plant mineral content will be manageable. Furthermore, there is compelling evidence that altered mineral nutrition and mineral status of plants will be an important factor limiting the high CO2-induced increase in yield, as will be heat or increased evapotranspiration (see for instance Coskun et al (2016) Nutrient constraints on terrestrial carbon fixation: The role of Nitrogen. J. Plant Physiol. 203: 95-109; Jiang M (2020) Low phosphorus supply constrains plant responses to elevated CO2 : A meta-analysis. Glob Chang Biol 26: 5856-5873 ; Reich PB (2006) Nitrogen limitation constrains sustainability of ecosystem response to CO2. Nature 440: 922-925). Thus, although we do not negate the crucial importance of heat and water stress, we believe it is relevant to study the basic mechanisms responsible for the negative effect of CO2 on plant mineral composition.

      Figure 4 in fact suggests that 43% of the REGMAP panel (cluster 3) is already pre-adapted to very high CO2 levels. This suggests annual species could adapt very rapidly.

      We agree with the reviewer. However, this suggests that genetic variation exists in some ecotypes to support adaptation to elevated CO2. The purpose of this work is indeed to identify this genetic variation, in order to characterize the mechanisms behind.

      (3) Assumptions on CO2 Levels:

      The assumption of 900ppm seems to be based on a very extreme climate change scenario. Most people believe we will overshoot the 1.5°C scenario, however, it seems plausible that 2.5 to 3°C scenarios are more likely. This would correspond to around 500ppm of CO2. https://www.nature.com/articles/s41597-022-01196-7/tables/4

      We agree with the reviewer that the CO2 concentration we used corresponds to a high value in the IPCC projections. That said, this value is currently considered very plausible: the following figure (from Smith and Myers (2018) Nature Climate Change) shows that current CO2 emissions align with the IPCC's most extreme model (RCP 8.5), which would result in a CO2 concentration of around 900 ppm in 2100. Furthermore, nothing allows to exclude the 4°C scenario in the 6th IPCC report.

      Author response image 1.

      (4) Focus on Real Challenges:

      We have numerous real challenges, such as extreme heat and inconsistent rainfall, to address in the context of climate change. However, testing under extreme CO2 conditions and then asserting that carbon dilution will negatively impact nutrition is exaggerated.

      While we fully agree that several threats linked to climate change exist, and all deserve to be studied, we find it questionable to consider that the potential effect of high CO2 on the mineral nutrition of plants is not a real challenge. The mineral nutrition of plants is already a current major environmental challenge. This perspective seems to reflect the reviewer's personal opinion rather than an analysis of our work.

      In contrast, the FACE experiments are fundamental and are conducted at more realistic eCO2 levels. Understanding the interaction between a 20% increase in CO2 and new precipitation patterns is key for global carbon flux prediction.

      Again, we do not fully understand this comment, as the aim of our study was not to perform a global carbon flux prediction, but to unravel genes and mechanisms underlying the negative effect of elevated CO2 on the nutrient content of Arabidopsis rosettes. However, we agree with the reviewer’s comment and with the fact that FACE are useful facilities to explore the CO2 response in more natural environments, and we highlight the fact that the decrease in mineral status of C3 plants has been widely documented in FACE studies. FACE experiments do not facilitate, however, to conduct fully controlled experiments (temperature, rainfall, wind and light intensities are not controllable in FACE), that allow to disentangle the mechanisms by which elevated CO2 regulates the signaling pathways associated with the plant mineral composition. In the longer term, studying the mechanisms we have identified in a more global context of climate change could be highly relevant.

      As I look at the literature on commercial greenhouse tomato production, 1000ppm of eCO2 is common, but it also looks like the breeders and growers have already solved for flavor and nutrition under these conditions.

      Indeed, tomato is often cultivated in CO2-enriched greenhouses at 1000 ppm. According to the literature, this results in a 20-25% reduction in vitamin C or lycopene, and requires a significantly higher nitrogen and water intake to reach expected sugar levels (Doddrell H (2023) Horticulture Research). In addition, the negative effect of elevated CO2 on tomato nutrient content seems to have significant repercussions on nutrition-health properties (Boufeldja (2023), Molecules).

      Conclusion:

      While the study provides valuable insights into the genetic underpinnings of Arabidopsis thaliana's response to elevated CO2 levels, it requires an entirely revised writeup, especially in its abstract, broader claims and implications. The manuscript would benefit from a more thorough introduction, a clearer definition of its scope, and a clear focus on the limits of this study.

      We thank the reviewer for the comments made on our manuscript. In addition to the responses that we provide to these comments, we have modified the main text of the introduction, objectives and discussion to take these comments into consideration. We believe that this will significantly improve the manuscript.

      Reviewer #2 (Public Review):

      Strengths:

      The authors have conducted a large, well-designed experiment to test the response to eCO2. Overall, the experimental design is sound and appropriate for the questions about how a change in CO2 affects the ionome of Arabidopsis. Most of the conclusions in this area are well supported by the data that the authors present.

      We thank the reviewer for this positive appreciation.

      Weakness:

      While the authors have done good experiments, it is a big stretch from Arabidopsis grown in an arbitrary concentration of CO2 to relevance to human and animal nutrition in future climates. Arabidopsis is a great model plant, but its leaves are not generally eaten by humans or animals.

      We agree with the reviewer’s comment. We recognized that implying a direct contribution of our work to human nutrition in the future climates is overstated, as mentioned by the reviewer 1 as well. This was not an intentional overstatement, as we have always been convinced that our work contributed to the understanding of the basic mechanisms involved in the negative regulation of plant mineral nutrition by high CO2. We have significantly modified the text to correct any misunderstanding of our work’s implication.

      The authors don't justify their choice of a CO2 concentration. Given the importance of the parameter for the experiment, the rationale for selecting 900 ppm as elevated CO2 compared to any other concentration should be addressed. And CO2 is just one of the variables that plants will have to contend with in future climates, other variables will also affect elemental concentrations.

      We agree with this comment. We added a justification of the high CO2 concentration used in this work in the Material and Methods section (lines 343-344). You can also read the explanation of this choice in the response to the reviewer 1’s point 3.

      Given these concerns, I think the emphasis on biofortification for future climates is unwarranted for this study.

      Anew, we agree with this comment and we have significantly modified the text to correct any misunderstanding of our work’s implication.

      Additionally, I have trouble with these conclusions:

      -Abstract "Finally, we demonstrate that manipulating the function of one of these genes can mitigate the negative effect of elevated CO2 on the plant mineral composition."

      -Discussion "Consistent with these results, we show that manipulating TIP2;2 expressions with a knock-out mutant can modulate the Zn loss observed under high CO2."

      The authors have not included the data to support this conclusion as stated. They have shown that this mutant increases the Zn content of the leaves when compared to WT but have not demonstrated that this response is different than in ambient CO2. This is an important distinction: one way to ameliorate the reduction of nutrients due to eCO2 is to try to identify genes that are involved in the mechanism of eCO2-induced reduction. Another way is to increase the concentration of nutrients so that the eCO2-induced reduction is not as important (i.e. a 10% reduction in Zn due to eCO2 is not as important if you have increased the baseline Zn concentration by 20%). The authors identified tip2 as a target from the GWAS on difference, but their validation experiment only looks at eCO2.

      We thank the reviewer for this comment, and we agree with it. It is much more interesting, especially in the context of this paper, to analyze the function of a candidate gene not only in elevated CO2, but in both ambient and elevated CO2. Therefore, we added in Figure 7 data for the expression of TIP2;2 in contrasted haplotypes under ambient CO2, in comparison to those already presented under elevated CO2 (now Fig. 7C and 7D). This showed that TIP2;2 expression is lower in haplotype 0 also under ambient CO2. We also added in Figure 7 (Fig. 7E) the Zn level in WT and tip2;2-1 mutant under ambient CO2, in comparison to those already presented under elevated CO2. This showed that that the tip2;2-1 mutant line did not present any decrease in Zn shoot content in response to elevated CO2, in opposition to what is observed for the WT.

      We have added comments associated to these new results in the Results and Discussion sections and in the discussion section (lines 233-242 in the results section, and lines 310-314 in the discussion section).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Reviewer Comments on the Article's Approach to Ionome Analysis

      (1) Omission of Phosphorus from the Ionome:

      It's surprising that phosphorus (P) was not measured in the ionome. After nitrogen (N), P is often the most limiting mineral for plant development and yield, making it a significant component of the ionome. Why did the authors omit this crucial element?

      We agree with the reviewer that P is an important mineral for plant growth. The absence of data related to P content is due to feasibility constraints rather than oversight. The MP-AES instrument we used to analyze the ionome (except N and C, that we obtained from an Elementar Analyzer) would have required an extra-step and an extra-analysis to obtain data for macronutrient such as P or K. In the context of this large-scale experiment, we faced the necessity to compromise and proceed without these data.

      (2) Relationship Between Leaf Ionome and Seed:

      The manuscript lacks evidence demonstrating the relationship between the leaf ionome and the seed. This connection is vital to establish the study's aims as outlined in lines 20-24. If the central argument is that eCO2 threatens food security, it's essential for the authors to either:

      • Provide evidence that eCO2 induces changes in the ionome profiles of seeds.

      • Show that changes in the rosette leaf ionome lead to alterations in seed ionome profiles.

      We agree with the reviewer. Although we know that seed ionome composition of Arabidopsis model accession such as Columbia is indeed negatively affected by eCO2, we do not provide the data that support some of the terms used in lines 20-24. The correspondence between leaf and seed ionome in natural population under eCO2 is certainly a next question that we will address. Therefore, to align our stated objectives with our data, we have modified the sentence in lines 20-24. We also added a comment on this point lines on the discussion section (lines 324-328).

      (3) Analysis of Ionome in Rosette Leaves:

      Why did the authors choose to analyze the ionome specifically in rosette leaves? Is there a known correlation between the ionome profile in rosette leaves and seeds?

      See our answer to the above comment.

      (4) Experimental Design Comments:

      • The layout of the accession growouts, the methods of randomization, blocking, and controls/checks should be detailed.

      • Were BLUEs (Best Linear Unbiased Estimators) or BLUPs (Best Linear Unbiased Predictors) employed to account for experimental design conditions? If not, it's recommended that they be used.

      We thank the reviewer for this comment. A note on replicates has been added in the Method/Plant Material section. Concerning the BLUEs/BLUPs, although I am not familiar with their use, I do not think that these approaches are relevant in our experimental design. Indeed, we pooled 3 to 5 replicates for each accession to measure the ionome (as mentioned in the Method/Ionome analysis section – we realized this was perhaps not clear enough, and thus we reinforced this point in this section). Therefore, we do not have the variance data required to perform BLUEs/BLUPs.

      (5) Carbon Dilution Effect:

      The statement, "The first component of the PCA described a clear antagonistic trend between C content and the change of other mineral elements (Fig. 3B)..." suggests a well-understood carbon dilution effect. These results are anticipated and align with existing knowledge.

      We thank the reviewer for this comment. However, this sentence does not relate to the biomass dilution hypothesis referred to by the reviewer. Indeed, the composition of each mineral (C and others) is expressed as a percentage of biomass, not as an absolute value. Therefore, this reflects more a probable effect of the increase in carbon compounds (notably soluble sugars), which could influence mineral composition.

      (6) Heritability Estimates:

      The authors should report both the broad-sense heritability and an estimate of heritability based on a GRM or Kinship matrix.

      We thank the reviewer for this suggestion. We are skeptical of using a kinship matrix to estimate heritability in our study. Estimating narrow-sense heritability using a kinship matrix is conceptually based on the infinitesimal model of Fisher, thereby meaning that phenotypic variation is driven by hundreds to thousands of QTLs with small effects. If this is the case, GWAS conducted on several hundred (or even thousands) of genotypes will not be powerful enough to detect such QTLs. Accordingly, estimates of broad-sense heritability based on estimates of variance components can drastically differ from estimates of narrow-sense heritability based on the use of a kinship matrix, as illustrated in the study of Bergelson et al. (2019 Scientific Reports).

      (7) Application of the Breeder's Equation:

      It would be beneficial if the authors applied the breeder's equation to estimate the species' potential rate of response. Based on the allele frequency of the adapted cluster 3 (69 ecotypes or 43% frequency of Figure 3B), it seems plausible that the populations could adapt within 23 generations.

      We thank the reviewer for this suggestion. Indeed, it would be really interesting to test whether sub-populations could adapt in comparison with others, and over what period of time. It is nevertheless not possible to do so using the Breeder’s equation in our case, as this requires fitness data under conditions of ambient or elevated CO2 (i.e. production of seeds) to be applied, and we do not have these data at the level of the whole population.

      (8) Overall Quality:

      In general, the authors have executed a high-quality ionome mapping experiment. However, the abstract, introduction, and discussion should be entirely rewritten and reframed.

      We thank the reviewer for the positive evaluation of our experiment. As previously mentioned, we are for the most part in agreement with the comments made about the need to align our stated objectives with our experimental data and conclusions. To do so, we have rewritten part of the abstract, introduction and discussion. The details of these modifications are described in the responses made to each comment.

      Here's a line-by-line list of suggestions on writing:

      Line 30 would read better with a comma after thus (or by replacing thus with therefore and then a comma at the start of the sentence).

      Line 33 nevertheless would read better in between commas.

      Lines 45 - 48 sentence is too long, could probably divide it into two.

      Lines 90 - 94 are hard to interpret, recommend rephrasing for clarity.

      Line 130 - keep verbs in the past tense for consistency (ran instead of run).

      Line 194 - what do the authors mean by crossed? I'm inferring they looked at the intersection of DEGs with the list of genes identified by GWA mapping, probably should use a more concise word.

      There's a concurrent use of the adjective strong (Lines 80, 142, 144, 197, 245). I would advise using a more concise adjective or avoiding its use to let the reader form their own opinion on the data.

      Lines 174-176 the cited reference (No. 15) is incorrect. The study by Katz et al. (2022) does not provide information on the role of ZIF1 in zinc sequestration mechanisms under elevated CO2 conditions.

      We thank the reviewer for these detailed recommendations. We have corrected or rephrased the text according to these suggestions.

      Reviewer #2 (Recommendations For The Authors):

      Technical points:

      900 ppm as elevated CO2: Given the importance of the parameter for the experiment, the rationale for selection 900 ppm as elevated CO2 compared to any other concentration should be addressed.

      We acknowledge the reviewer's point and have previously addressed related aspects earlier in our response. In line with this, we have included a justification for this particular parameter in the Method section.

      The authors do not mention what genotype was used for their root/shoot RNAseq experiment.

      We thank the reviewer for this comment, and indeed, this information was not mentioned. This is now done, in the Method section.

      Line 125: Spelling error "REGMPA".

      This has been corrected.

      Line 338: Removal of outlier observations - "Prior to GWAS and multivariate analyses such as PCA or clustering, mineral composition measures were pre-processed to remove technical outliers". The authors should mention the exact number of outliers that were removed and what the explicit criteria were for removal.

      The number of outliers removed from each dataset is now indicated in Supplemental Table 7 (this is cited in the Method section). The explicit criteria used for this analysis is actually mentioned in the corresponding Method section: “the values positioned more than 5 median absolute deviations away from the median were removed from the dataset”.

      Line 379: "Lowly expressed genes with an average value across conditions under 25 reads were excluded from the analysis". Providing information about the number of the lowly expressed genes that were removed from the analysis can help with the interpretation of the likelihood of the candidates selected being correct.

      This is a standard procedure in RNAseq analysis. It avoids many false positives in the differential analysis of gene expression based on ratios (where a very small number in the denominator can lead to a very high variation in expression, of no real significance). For information, this step led to the removal of 11607 and 10121 genes for the shoot and root datasets.

      Line 384: It's not clear how many biological replicates were used.

      This has been corrected.

      Additional comment: We have also become aware of a confusion concerning one of the candidate genes located close to GWA peaks: line 180 of the first version, we mentioned CAX1 (AT1G16380) for its role on nutrient deficiency response. There are actually two genes annotated as CAX1 in TAIR (both are cation exchangers), but the one involved in nutrient deficiency response is AT2G38170. We therefore removed the sentence mentioning AT1G16380/CAX1 as a potential candidate gene.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This paper performed a functional analysis of the poorly characterized pseudo-phosphatase Styxl2, one of the targets of the Jak/Stat pathway in muscle cells. The authors propose that Styxl2 is essential for de novo sarcomere assembly by regulating autophagic degradation of non-muscle myosin IIs (NM IIs). Although a previous study by Fero et al. (2014) has already reported that Styxl2 is essential for the integrity of sarcomeres, this study provides new mechanistic insights into the phenomenon. In vivo studies in this manuscript are compelling; however, I feel the contribution of autophagy in the degradation of NM IIs is still unclear.

      Major concerns:

      1) The contribution of autophagy in the degradation of Myh9 is still unclear to this reviewer.

      It has been reported that autophagy is dispensable for sarcomere assembly in mice (Cell Metab, 2009, PMID; 1994508). In Fig. 7A, the authors showed that overexpressed Styxl2 downregulated the amount of ectopically expressed Myh9 in an ATG5-dependent manner in C2C12 cells; however, the experiment is far from a physiological condition. Therefore, the authors should test ATG5 knockdown and the genetic interaction between Styxl2 and ATG5 in vivo. That is, 1) loss of ATG5 on sarcomere assembly in zebrafish, and 2) the genetic interaction between Styxl2 and ATG5; co-injection of Styxl2 mRNA and ATG5-MO into the zebrafish embryos.

      Our response: In fact, the reference cited by the reviewer (Cell Metab, 2009; PMID; 19945408) clearly indicated that autophagy is required for sarcomere assembly. Moreover, another paper using the fish extraocular muscle regeneration model (Autophagy, 2014, PMID: 27467399), also showed that the sarcomere structure was disrupted in the regenerated muscles when autophagy was inhibited by chloroquine. In addition, other references (Nature medicine, 2007, PMID: 17450150; Autophagy, 2010, PMID: 20431347) also showed that loss of Atg5 in mouse cardiac muscles led to disorganized sarcomere structure. We also performed the Atg5 knockdown experiments as suggested by the reviewer. However, the sarcomere structure defects were not so obvious as Styxl2 knockdown (see Author response image 1 below). In fact, it was reported that Atg5 knockdown may not be a desirable strategy to disrupt autophagy as it was found “--- only a small amount of Atg5 is needed for autophagy, knockdown of Atg5 to levels low enough to block autophagy might be difficult to achieve, --” (Nature medicine, 2007, PMID: 17450150). Due to the ineffectiveness of the Atg5 MO in our assays, we did not perform the second experiment suggested by the reviewer. Moreover, as Styxl2 is not a key component of the autophagy machinery, it is less likely that overexpression of Styxl2 alone can rescue the autophagy defects caused by Atg5.

      Author response image 1.

      The fish zygotes were injected with Atg5 or Ctrl MO. 48 hpf, the fish were stained with an anti-Actinin antibody. Some fast muscle fibers were disrupted when Atg5 was knocked down. The number in numerator at the bottom of each image represents fish embryos showing normal Actinin staining pattern, while that in denominator represents the total number of embryos examined. Scale bar, 10 µm.

      2) As referenced, Yamamoto et al. reported that Myh9 is degraded by autophagy. Mechanistically, Nek9 acts as an autophagic adaptor that bridges Atg8 and Myh9 through interactions with both. Inconsistent with the model, the authors mentioned on page 12, lines 365-367, "A recent report showed that Myh9 could also undergo Nek9-mediated selective autophagy (Yamamoto et al., 2021), suggesting that Myh9 is ubiquitinated". I think it is not yet explored whether autophagic degradation of Myh9 requires its ubiquitination. Moreover, I cannot judge whether Myh9 is ubiquitinated in a Styxl2-dependent manner from the data in Fig. 7C. The author should test whether Nek9 is required for Myh9 degradation in muscles. If Nek plays a role in the Myh9 degradation, it would be better to remove Fig. 7C.

      Our response: Indeed, as pointed out by the reviewer, it has not been explored whether Myh9 is ubiquitinated or not. However, it has been well-established that some proteins undergoing autophagic degradation are ubiquitinated, which are linked to Atg8/LC3 via p62 and NBR1 (Mol Cell, 2009, PMID: 19250911; J Biol Chem, 2007, PMID: 17580304). To improve the data quality, we repeated the Myh9 ubiquitination experiment in cells with or without Styxl2 by using a slightly different strategy: as shown in the revised Figure 7C, we first co-transfect HEK 293T cells with HA-Myh9, Myc-ubiquitin, and Flag-Styxl2. We then immunoprecipitated Myc-tagged Ubiquitin from the whole cell lysates, and then blot for HAMyh9. We detected an obvious increase in Ubiquitin-conjugated HA-Myh9 (revised Figure 7C). As suggested by the reviewer, we also tested whether knockdown of Nek9 affects the degradation of Myh9. We failed to detect an obvious effect (see Author response image 2 below) caused by Nek9 knockdown. One possible explanation for this negative result is that Nek9 itself is a negative regulator of selective autophagy (J Biol Chem, 2020, PMID: 31857374). By knocking it down, the functions of the autophagy machinery are expected to be enhanced instead of being impaired. This may explain why we failed to detect an effect on Myh9 degradation simply by knocking down Nek9. To further elucidate whether Nek9 is involved in Myh9 degradation in myoblasts, we may need to use a dominant-negative mutant of Nek9 missing the LCIII-binding motif as shown by Yamamoto (Nat Commun, 2021, PMID: 34078910). This will be addressed in our future study.

      Author response image 2.

      C2C12 cells were transfected with negative control siRNA (NC), siNek9#2 or siNek9#3. 18 h later, the cells were transfected with plasmids HA-Myh9 and Flag-Styxl2 or Flag-Stk24. After another 24 h, the cells were harvested for RT-qPCR (left panel) or western blot (right panel).

      3) In Fig. 5F, the protein level of Styxl2 and Myh10 should be checked because the efficiency of Myh10-MO was not shown anywhere in this manuscript.

      Our response: As suggested by the reviewer, a Western blot showing the protein levels of Myh10 was shown in Figure 5-figure supplement 1B.

      Reviewer #2 (Public Review):

      The authors investigated the role of the Jak1-Stat1 signaling pathway in myogenic differentiation by screening the transcriptional targets of Jak1-Stat1 and identified Styxl2, a pseudophosphatase, as one of them. Styxl2 expression was induced in differentiating muscles. The authors used a zebrafish knockdown model and conditional knockout mouse models to show that Styxl2 is required for de novo sarcomere assembly but is dispensable for the maintenance of existing sarcomeres. Styxl2 interacts with the non-muscle myosin IIs, Myh9 and Myh10, and promotes the replacement of these non-muscle myosin IIs by muscle myosin IIs through inducing autophagic degradation of Myh9 and Myh10. This function is independent of its phosphatase domain.

      A previous study using zebrafish found that Styxl2 (previously known as DUSP27) is expressed during embryonic muscle development and is crucial for sarcomere assembly, but its mechanism remains unknown. This paper provides important information on how Styxl2 mediates the replacement of non-muscle myosin with muscle myosin during differentiation. This study may also explain why autophagy deficiency in muscles and the heart causes sarcomere assembly defects in previous mouse models.

      Reviewer #3 (Public Review):

      Wu and colleagues are characterising the function of Styxl2 during muscle development, a pseudo-phosphatase that was already described to have some function in sarcomere morphogenesis or maintenance (Fero et al. 2014). The authors verify a role for Styxl2 in sarcomere assembly/maintenance using zebrafish embryonic muscles by morpholino knockdown and by a conditional Styxl2 allele in mice (knocked-out in satellite cells with Pax7 Cre).

      Experiments using a tamoxifen inducible Cre suggest that Styxl2 is dispensable for sarcomere maintenance and only needed for sarcomere assembly.

      BioID experiments with Styxl2 in C2C 12 myoblasts suggest binding of nonmuscle myosins (NMs) to Styxl2. Interestingly, both NMs are downregulated when muscles differentiate after birth or during regeneration in mice. This down-regulation is reduced in the Styxl2 mutant mice, suggesting that Styxl2 is required for the degradation of these NMs.

      Impressively, reducing one NM (zMyh10) by double morpholino injection in a Styxl2 morphant zebrafish, does improve zebrafish mobility and sarcomere structure. Degradation of Mhy9 is also stimulated in cell culture if Styxl2 is co-expressed. Surprisingly, the phosphatase domain is not needed for these degradation and sarcomere structure rescue effects. Inhibitor experiments suggest that Styxl2 does promote the degradation of NMs by promoting the selective autophagy pathway.

      Strengths:

      A major strength of the paper is the combination of various systems, mouse and fish muscles in vivo to test Styxl2 function, and cell culture including a C2C12 muscle cell line to assay protein binding or protein degradation as well as inhibitor studies that can suggest biochemical pathways.

      Weakness:

      The weakness of this manuscript is that the sarcomere phenotypes and also the western blots are not quantified. Hence, we rely on judging the results from a single image or blot. Also, Styxl2 role in sarcomere biology was not entirely novel.

      Few high resolution sarcomere images are shown, myosins have not been stained for.

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns:

      4) The position of molecular weight markers should be shown in all Western blot data.

      Our response: As suggested by the reviewer, the molecular weight markers have been added in the Western blot data.

      5) Schematic models of Styxl2deltaN509 and N513 construct would be helpful for the readers.

      Our response: A schematic has been added in Figure 6B (upper panel) to show Styxl2deltaN509 and Styxl2N513.

      6) Several data were described but not shown (data not shown). I think the data need to be included in the main or supplemental figures.

      Our response: As suggested by the reviewer, the raw data were now included in the Figure 6-figure supplement 1A and Figure 7-figure supplement 1.

      Reviewer #2 (Recommendations For The Authors):

      1) In Fig. 5E, the authors suggest that the needle touch response was improved by additional knockdown of Myh10. This is a bit confusing because the germline knockout of Myh10 is lethal (line 445). The authors should provide more explanation on this point. Additionally, it would be better to include Myh10-MO in Fig. 5E.

      Our response:<br /> In line 445 of our original manuscript, we stated that germline knockout of mouse Myh10 gene is lethal based on a published report (Proc Natl Acad Sci USA, 1997, PMID: 9356462). Here, in zebrafish zygotes, we only knocked down zMyh10, thus, we do not expect to get a lethal phenotype. In addition, other groups who knocked down Myh10 in fish also did not get a lethal phenotype (Dev Biol, 2015, PMID: 25446029). As to the control involving Myh10MO in the experiment in Fig.5E, we did include it in our experiments. As we did not observe any obvious effects on either motility or sarcomere structures, we did not include the data set in the figure.

      2) It was suggested that Myh9 and Myh10 form a complex (Rao et al. PLoS One 9, e114087, 2014). Thus, the IP experiments do not rule out the possibility that Styxl2 directly interacts with either Myh9 or Myh10 and indirectly with the other.

      Our response: In known myosin-II complexes, different myosin molecules can associate with each other through their tail domains (Bioarchitecture, 2013, PMID: 24002531). Thus, if we use fulllength myosin molecules in our co-immunoprecipitation assays, it will be difficult to exclude the possibility raised by the reviewer. However, by using truncated myosin proteins, we showed that the head domain of either Myh9 or Myh10 could interact with Styxl2 in the absence of the tail domain (Figure 4E, F). This result strongly suggests that both Myh9 and Myh10 can independently interact with Styxl2.

      Reviewer #3 (Recommendations For The Authors):

      1) The western blot shown in Figure 3B supporting the induced deletion of Styxl2 should be quantified. Ideally, some other blots, e.g., in Figure 5, too. Please add the age of the mice in Figure 5B to the figure legend.

      Our response:<br /> As suggested by the reviewer, we quantified the data in Figures.3B, 3F, 5B, 5D, and 7A and the data were included in the revised figures. In Fig.5B, we already indicated the age of the mice (i.e., P1) in the legend.

      2) A quantification of the sarcomere phenotypes in the double knock-down of zMyh10 and Styxl2 compared to Styxl2 single would make the paper significantly stronger. Furthermore, a double morpholino control should be included to rule out any RNAi machinery 'dilution effect'.

      Our response: As suggested by the reviewer, we quantified the sarcomere structures using the line scan analysis in ImageJ and the scan images were placed as inserts in the upper corner of the immunofluorescent images (revised Figures 5F, and 6C). To avoid potential “dilution effects”, in all the experiments involving the use of two different MOs, the total amount of MO was kept the same in all control samples by including a control MO (e.g., in samples treated with one specific MO, an equal amount of a control MO was also included, while in samples without any specific MO, twice as much control MO was used).

      3) The sarcomere phenotypes in figure 6 should also be better quantified, for example using simple line scans of the alpha-actinin stains and assay periodicity or calculating the autocorrelation coefficients. How about myosin stains?

      Our response: We quantified Figure 6C as suggested by the reviewer. We also performed myosin staining. The results were similar to that shown by the a-actinin antibody (see revised Figure 6-Fig supplement 1B).

      4) Do the authors see periodic NMs patterns in developing mouse muscle fibers as indicated by the model in in in figure 7D? It is unclear if nonmuscle myosin is present in a PERIODIC pattern in early myofibrils. NM myosin periodic patterns that have been observed have a periodicity of only about 1 µm fitting the shorter length of the NM bipolar filaments (about 300 nm only, PMID 28114270).

      Our response: The reviewer raised a good point here. Ideally, we should examine developing mouse muscle fibers to prove that NM shows periodic patterns. However, due to the difficulty in catching myocytes undergoing sarcomere assembly, the majority of the studies involving NM in sarcomeres use cultured cardiomyocytes. Using TA muscles from P1 new-born mice, we failed to detect the presence of NM in sarcomeres (see Author response image 3 below). Actually, nearly all the myofibers showed mature sarcomere pattern without the NM signal. More work is needed in the future to examine developing mouse fibers at different embryonic stages to look for the presence of NM in developing sarcomeres.

      Author response image 3.

      The TA muscles were collected from male and female P1 mice. The muscles were sectioned and co-stained for a-actinin (Actn) and Myh9. The majority of myofibrils is mature without the NM II signal. Scale bar, 10 µm.

      5) Recent work suggested that mechanical tension is key to assemble the first long periodic myofibril containing immature sarcomeres. Tension is likely produced by a combination of NM and Mhc in the assembling sarcomeres themselves. This could be included in the introduction or discussion (PMIDs 24631244, 29316444, 29702642, 35920628).

      Our response: We thank the reviewer for pointing to us additional relevant references. We have added them in the Introduction.

      6) I suggest replacing "sarcomeric muscles" with "striated muscles".

      Our response: We revised the term in the manuscript as suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      We appreciate the valuable and constructive comments of Reviewer #1 on our manuscript. We have addressed the comments from Reviewer #1 in the public review in the response to the recommendations for the authors, as the public review comments largely overlap with that of the recommendations for the authors.

      Reviewer #1 (Recommendations For The Authors):

      (1.1) Figure 1 did not use a mock-infected control for the development of R-loops but only a time before infection. I think it would have been a good control to have that after the same time of infection non-infected cells did not show increases in R-loops and this is not a product of the cell cycle.

      We prepared our DRIPc-seq library using cell extracts harvested at 0, 3, 6, and 12 h post-infection (hpi), all at the same post-seeding time point. Each sample was infected with HIV-1 virus in a time-dependent manner. Therefore, it is unlikely that the host cellular R-loop induction observed in our DRIPc-seq results was due to R-loop formation during the cell cycle. In Lines 93–95 of the Results section of the revised manuscript, we have provided a more detailed description of our DRIPc-seq library experimental scheme. Thank you. 

      (1.2) Figure 2 should have included a figure showing the proportion of DRIPc-seq peaks located in different genome features relative to one another instead of whether they were influenced by time post-infection. Figure 2C was performed in HeLa cells, but primary T cell data would have been more relevant as primary CD4+ T cells are more relevant to HIV infection.

      We have included a new figure presenting the relative proportion of DRIPc-seq peaks mapped to different genomic features at each hpi (Fig. 2C of the revised manuscript). We found that the proportion of DRIPc-seq peaks mapped to various genomic compartments remained consistent over the hours following the HIV-1 infection. This further supports our original claim that HIV-1 infection does not induce R-loop enrichment at specific genomic features but that the accumulation of R-loops after HIV-1 infection is widely distributed.

      We considered HeLa cells as the primary in vitro infection model, therefore, we conducted RNA-seq only on HeLa cells. However, we agree with the reviewer's opinion that data from primary CD4+ T cells may be more physiologically relevant. Nevertheless, as demonstrated in the new figure (Fig. 2C of the revised manuscript), HIV-1 infection did not significantly alter the proportion of R-loop peaks mapped to specific genomic compartments, such as gene body regions, in HeLa, primary CD4+ T, and Jurkat cells. Therefore, we anticipate no clear correlation between changes in gene expression levels and R-loop peak detection upon HIV-1 infection, even in primary T cells. Thank you.   

      (1.3) Figure 5G is very hard to see when printed, is there a change in brightness or contrast that could be used? The arrows are helpful but they don't seem to be pointing to much.

      We have highlighted the intensity of the PLA foci and magnified the images in Fig. 5G in the revised manuscript. While editing the images according to your suggestion, we found a misannotation regarding the multiplicity of infection in the number of PLA foci per nucleus quantification analysis graph in Fig. 5G of the original manuscript. We have corrected this issue and hope that it is now much clearer. 

      (1.4) The introduction provided a good background for those who may not have a comprehensive understanding of DNA-RNA hybrids and R-loops, but the rationale that integration in non-expressed sequence implies that R-loops may be involved is very weak and was not addressed experimentally. A better rationale would have been to point out that, although integration in genes is strongly associated with gene expression, the association is not perfect, particularly in that some highly expressed genes are, nonetheless, poor integration targets.

      In accordance with the reviewer's comment, we revised the Introduction. We have deleted the statement and reference in the introduction "... the most favored region of HIV-1 integration is an intergenic locus, ...”, which may overstate the relevance of the R-loop in HIV-1 integration events in non-expressed sequences. Instead, we introduced a more recent finding that high levels of gene expression do not always predict high levels of integration, together with the corresponding citation (Lines 46– 47 of the revised manuscript), according to the reviewer’s suggestion in the reviewer's public review 2)-(a).

      (1.5) The discussion was seriously lacking in connecting their conclusions regarding R-loop targeting of integration to how integration works at the structural level, where it is very clear that concerted integration on the two DNA strands ca 5 bp apart is essential to correct, 2-ended integration. It is very difficult to visualize how this would be possible with the triple-stranded R-loop as a target. The manuscript would be greatly strengthened by an experiment showing concerted integration into a triplestranded structure in vitro using PICs or pure integrase.

      We believe there has been a misunderstanding of our interpretation regarding the putative role of R-loop structures in the HIV-1 integration site mechanism because of some misleading statements in our original manuscript. Based primarily on our current data, we believe that R-loop structures are bound by HIV-1 integrase proteins and lead to HIV-1 viral genome integration into the vicinity regions of the host genomic R-loops. By carefully revising our manuscript, we found that the title, abstract, and discussion of our original manuscript includes phrases, such as “HIV-1 targets R-loops for integration,” which may overstate our finding on the role of R-loop in HIV-1 integration site selection. We replaced these phrases. For example, we used phrases, such as, “HIV-1 favors vicinity regions of R-loop for the viral genome integration,” in the revised manuscript. We apologize for the inconvenience caused by the unclear and nonspecific details of our findings.  

      Using multiple biochemical experiments, we successfully demonstrated the interaction between the cellular R-loop and HIV-1 integrase proteins in cells and in vitro (Fig. 5 of the revised manuscript). However, we could not validate whether the center of the triple-stranded R-loops is the extraction site of HIV-1 integration, where the strand transfer reaction by integrase occurs. This is because an R-loop can be multi-kilobase in size (1, 2); therefore, we displayed a large-scale genomic region (30-kb windows) to present the integration sites surrounding the R-loop centers. Nevertheless, we believe that we validated R-loop-mediated HIV-1 integration in R-loop-forming regions using our pgR-poor and pgR-rich cell line models. When infected with HIV-1, pgR-rich cells, but not pgR-poor cells, showed higher infectivity upon R-loop induction in designated regions following DOX treatment (Fig. 3C and 3D of the revised manuscript). In addition, we quantified site-specific integration events in R-loop regions, and found that a greater number of integration events occurred in designated regions of the pgR-rich cellular genome upon R-loop induction by DOX treatment, but not in pgR-poor cells (Fig. 3E–G of the revised manuscript). 

      We agree with the reviewer that an experiment showing the concerted integration of purified PICs into a triple-stranded structure in vitro would greatly strengthen our manuscript. We attempted the purification of viral DNA (vDNA)-bound PICs using either Sso7d-tagged HIV-1 integrase proteins or non-tagged HIV-1 integrase proteins (F185K/C280S) procured from the NIH HIV reagent program (HRP-20203), following the method described by Passos et al., Science, 2017; 355 (89-92) (3). Despite multiple attempts, we could not purify the nucleic acid-bound protein complexes for in vitro integration assays. However, we believe that pgR-poor and pgR-rich cell line models provide a strong advantage in specificity of our primer readouts. Compounded with our in cellulo observation, we believe that our work provides strong evidence for a causative relationship between R-loop formation/R-loop sites and HIV-1 integration.

      Additionally, in the Discussion section of the revised manuscript, we have expanded our discussion on the role of genomic R-loops contributing in molding the host genomic environment for HIV-1 integration site selection, and the potential explanation on how R-loops are driving integration over long-range genomic regions. Thank you. 

      (1.6) There are serious concerns with the quantitation of integration sites used here, which should be described in detail following line 503 but isn't. In Figure 3, E-G, they are apparently shown as reads per million, while in Figure 4B as "sites (%)" and in 4C as log10 integration frequency." Assuming the authors mean what they say, they are using the worst possible method for quantitation. Counting reads from restriction enzyme-digested, PCR-digested DNA can only mislead. At the numbers provided (MOI 0.6, 10 µg DNA assayed) there would be about 1 million proviruses in the samples assayed, so the probability of any specific site being used more than once is very low, and even less when one considers that a 10% assay efficiency is typical of integration site assays. Although the authors may obtain millions of reads per experiment, the number of reads per site is an irrelevant value, determined only by technical artefacts in the PCR reactions, most significantly the length of the amplicons, a function of the distance from the integration site to the nearest MstII site, further modified by differences in Tm. Better is to collapse identical reads to 1 per site, as may have been done in Figure 4B, however, the efficiency of integration site detection will still be inversely related to the length of the amplicon. Indeed, if the authors were to plot the read frequency against distance to the nearest MstII site, it is likely that they would get plots much like those in Figure 4B.

      Detailed methods for integration site sequencing data processing are described in the Materials and Methods section of the revised manuscript (Line 621–631 of the revised manuscript). We primarily followed HIV-1 integration site sequencing data processing methods previously described by Li et al., mBio, 2020; 11(5) (4).  

      While it may be correct that the HIV-1 integration event cannot occur more than once at a given site, our Fig. 3E, 4C, and 4D of the revised manuscript present the number of integration-site sequencing read counts expressed in reads-per-million (RPM) units or as log10-normalized values. Based on the number of mapped reads from the integration site sequencing results, we can infer that there was an integration event at this site, whether it was a single or multiple event.

      We believe that the original annotation of y-axis, “Integration frequency,” may be misleading as it can be interpreted as a probability of any specific site being used for HIV-1 integration. Therefore, we corrected it as “number of mapped read” for clarity (Fig. 3E–G, 4C and 4D, and the corresponding figure legends of the revised manuscript). We apologize for any confusion. Thank you.

      Other points:

      (1.7) Overall: There are numerous grammatical and usage errors, especially in agreement of subject and verb, and missing articles, sometimes multiple times in the same sentence. These must be corrected prior to resubmission.

      The revised manuscript was edited by a professional editing service. Thank you.

      (1.8) Line 126-134: A striking result, but it needs more controls, as discussed above, including a dose-response analysis.

      We determined the doses of NVP and RAL inhibitors in HeLa cells by optimizing the minimum dose of drug treatment that provided a sufficient inhibitory effect on HIV1 infection (Author response image 1). The primary objective of this experiment was to determine R-loop formation while reverse transcription or integration of the HIV-1 life cycle was blocked, therefore, we do not think that a dose-dependent analysis of inhibitors is required.

      Author response image 1.

      (A and B) Representative flow cytometry histograms of VSV-G-pseudotyped HIV-1-EGFP-infected HeLa cells at an MOI of 1, harvested at 48 hpi. The cells were treated with DMSO, the indicated doses of nevirapine (NVP) (A) or indicated doses of raltegravir (RAL) (B) for 24 h before infection. 

      (1.9) Line 183: Please tell us what ECFP is and why it was chosen. Is there a reference for its failure to form R-loops?

      Ibid: The human AIRN gene is a very poor target for HIV integration in PBMC.

      A high GC skew value (> 0) is a predisposing factor for R-loop formation at the transcription site. This is because a high GC skew causes a newly synthesized RNA strand to hybridize to the template DNA strand, and the non-template DNA strand remains looped out in a single-stranded conformation (5) (Ref 36 in the revised manuscript). The ECFP sequence possessed a low GC skew value, as previously used for an R-loop-forming negative sequence (6) (Ref 17 of the revised manuscript). We have added this description and the corresponding references to Lines 188–192 of the revised manuscript.  

      The human AIRN gene (RefSeq DNA sequence: NC_000006.12) sequence possesses a GC skew value of -0.04, in a window centered at base 2186, while the mouse AIRN (mAIRN) sequence is characterized by a GC skew value of 0.213. The ECFP sequence gave a GC skew value of -0.086 in our calculation. We anticipated that the human AIRN gene region does not form a stable R-loop, and in fact, it did not harbor R-loop enrichment upon HIV-1 infection in our DRIPc-seq data analysis of multiple cell types (Author response image 2)

      Author response image 2.

      Genome browser screenshot over the chromosomal regions in 20-kb windows centered on human AIRN showing results from DRIPc-seq in the indicated HIV-1-infected cells (blue, 0 hpi; yellow, 3 hpi; green, 6 hpi; red, 12 hpi)

      (1.10) Line 190: You haven't shown dependence. Associated is a better word.

      Thank you for the suggestion. We have changed “R-loop-dependent site-specific HIV-1 integration events...” to “R-loop-associated site-specific HIV-1 integration events...” (Line 198 of the revised manuscript) according to the reviewer’s suggestion in the revised manuscript. 

      (1.11) Line 239: What happened to P1? What is the relationship of the P and N regions to genes?

      We have added superimpositions of the P1 chromatin region on DRIPc-seq and the HIV-1 integration frequency to Figure 4C of the revised manuscript. We observed a relevant integration event within the P1 R-loop region, but to a lesser extent than in the P2 and P3 R-loop regions, perhaps because the P1 region has relatively less R-loop enrichment than the P2 and P3 regions, as examined by DRIP-qPCR in S3A Fig. of the revised manuscript.

      Genome browser screenshots with annotations of accommodating genes in the P and N regions are shown in S2A–E Fig. of the revised manuscript, and RNA-seq analysis of the relative gene expression levels of the P1-3 and N1,2 R-loop regions are shown in S4 Table of the revised manuscript. Thank you.

      (1.12) Line 261: But the binding affinity of integrase to the R-loop is somewhat weaker than to double-stranded DNA according to Figure 5A.

      Nucleic acid substrates were loaded at the same molarity, and the percentage of the unbound fraction was calculated by dividing the intensity of the unbound fraction in each lane by the intensity of the unbound fraction in the lane with 0 nM integrase in the binding reaction. The calculated percentages of the unbound fraction from three independent replicate experiments are shown in Fig. 5A, right of the revised manuscript. In our analysis and measurements, the integrase proteins showed higher binding affinities to the R-loop and R-loop comprising nucleic acid structures than to dsDNA in vitro. We hope that this explanation clarifies this point. 

      (1.13) Line 337: "accumulate". This is a not uncommon misinterpretation of the results of studies on the distribution of intact proviruses in elite controllers. The only possible correct interpretation of the finding is that proviruses form everywhere else but cells containing them are eliminated, most likely by the immune system.

      Thank you for the suggestion. We have changed the Line 337 of the original manuscript to “... HIV-1 proviruses in heterochromatic regions are not eliminated but selected by immune system,” in Lines 361-363 of the revised manuscript. 

      (1.14) Line 371 How many virus particles per cell does this inoculum amount to?

      We determined the amount of GFP reporter viruses required to transduce ∼50% of WT Jurkat T cells, corresponding to an approximate MOI of 0.6. We repeatedly obtained 30–50% of VSV-G-pseudotyped HIV-1-EGFP positively infected cells for HIV1 integration site sequencing library construction for Jurkat T cells. 

      (1.15) Line 503 and Figures 3 and 4: There must be a clear description of how integration events are quantitated.

      Detailed methods for integration site sequencing data processing are described in the Materials and Methods section of the revised manuscript (Line 621–631 of the revised manuscript). We primarily followed HIV-1 integration site sequencing data processing methods previously described in Li et al., mBio, 2020; 11(5) (4).

      Reviewer #2 (Public Review):

      Retroviral integration in general, and HIV integration in particular, takes place in dsDNA, not in R-loops. Although HIV integration can occur in vitro on naked dsDNA, there is good evidence that, in an infected cell, integration occurs on DNA that is associated with nucleosomes. This review will be presented in two parts. First, a summary will be provided giving some of the reasons to be confident that integration occurs on dsDNA on nucleosomes. The second part will point out some of the obvious problems with the experimental data that are presented in the manuscript.

      We appreciate your comments. We have carefully addressed the concerns expressed as follows (your comments are in italics):  

      (2.1) 2017 Dos Passos Science paper describes the structure of the HIV intasome. The structure makes it clear that the target for integration is dsDNA, not an R-loop, and there are very good reasons to think that structure is physiologically relevant. For example, there is data from the Cherepanov, Engelman, and Lyumkis labs to show that the HIV intasome is quite similar in its overall structure and organization to the structures of the intasomes of other retroviruses. Importantly, these structures explain the way integration creates a small duplication of the host sequences at the integration site. How do the authors propose that an R-loop can replace the dsDNA that was seen in these intasome structures?

      We do appreciate the current understanding of the HIV-1 integration site selection mechanism and the known structure of the dsDNA-bound intasome. Our study proposes an R-loop as another contributor to HIV-1 integration site selection. Recent studies providing new perspectives on HIV-1 integration site targeting motivated our current work. For instance, Ajoge et al., 2022 (7) indicated that a guanine-quadruplex (G4) structure formed in the non-template DNA strand of the R-loop influences HIV-1 integration site targeting. Additionally, I. K. Jozwik et al., 2022 (8) showed retroviral integrase protein structure bound to B-to-A transition in target DNA. R-loop structures are a prevalent class of alternative non-B DNA structures (9). We acknowledge the current understanding of HIV-1 integration site selection and explore how R-loop interactions may contribute to this knowledge in the Discussion section of our manuscript. 

      Primarily based on our current data, we believe that R-loop structures are bound by HIV-1 integrase proteins and lead to HIV-1 viral genome integration into the vicinity regions of the host genomic R-loops, but we do not claim that R-loops completely replace dsDNA as the target for HIV-1 integration. An R-loop can be multi-kilobase in size and the R-loop peak length widely varies depending on the immunoprecipitation and library construction methods (1, 2), therefore, we could not validate whether the center of triple-stranded R-loops is the extraction site of HIV-1 integration where the strand transfer reaction by integrase occurs. Therefore, we replaced phrases such as, “HIV-1 targets R-loops for integration,” which may overstate our finding on the role of R-loop in HIV-1 integration site selection, with phrases, such as, “HIV-1 favors vicinity regions of R-loop for the viral genome integration,” in the revised manuscript. We apologize for the inconvenience caused by the unclear and non-specific details of our findings. Nevertheless, we believe that we validated R-loop-mediated HIV-1 integration in R-loop-forming regions using our pgR-poor and pgR-rich cell line models. We quantified site-specific integration events in the R-loop regions, and found that a greater number of integration events occurred in designated regions of the pgR-rich cellular genome upon R-loop induction by DOX treatment, but not in pgR-poor cells (Fig. 3E–G of the revised manuscript). 

      dsDNA may have been the sole target of the intasome demonstrated in vitro possibly because dsDNA has only been considered as a substrate for in vitro intasome assembly. We hope that our work will initiate and advance future investigations on target-bound intasome structures by considering R-loops as potential new targets for integrated proteins and intasomes.  

      (2.2) As noted above, concerted (two-ended) integration can occur in vitro on a naked dsDNA substrate. However, there is compelling evidence that, in cells, integration preferentially occurs on nucleosomes. Nucleosomes are not found in R loops. In an infected cell, the viral RNA genome of HIV is converted into DNA within the capsid/core which transits the nuclear pore before reverse transcription has been completed. Integration requires the uncoating of the capsid/core, which is linked to the completion of viral DNA synthesis in the nucleus. Two host factors are known to strongly influence integration site selection, CPSF6 and LEDGF. CPSF6 is involved in helping the capsid/core transit the nuclear pore and associate with nuclear speckles. LEDGF is involved in helping the preintegration complex (PIC) find an integration site after it has been released from the capsid/core, most commonly in the bodies of highly expressed genes. In the absence of an interaction of CPSF6 with the core, integration occurs primarily in the lamin-associated domains (LADs). Genes in LADs are usually not expressed or are expressed at low levels. Depending on the cell type, integration in the absence of CPSF6 can be less efficient than normal integration, but that could well be due to a lack of LEDGF (which is associated with expressed genes) in the LADs. In the absence of an interaction of IN with LEDGF (and in cells with low levels of HRP2) integration is less efficient and the obvious preference for integration in highly expressed genes is reduced. Importantly, LEDGF is known to bind histone marks, and will therefore be preferentially associated with nucleosomes, not R-loops. LEDGF fusions, in which the chromatin binding portion of the protein is replaced, can be used to redirect where HIV integrates, and that technique has been used to map the locations of proteins on chromatin. Importantly, LEDGF fusions in which the chromatin binding component of LEDGF is replaced with a module that recognizes specific histone marks direct integration to those marks, confirming integration occurs efficiently on nucleosomes in cells. It is worth noting that it is possible to redirect integration to portions of the host genome that are poorly expressed, which, when taken with the data on integration into LADs (integration in the absence of a CPSF6 interaction) shows that there are circumstances in which there is reasonably efficient integration of HIV DNA in portions of the genome in which there are few if any R-loops.

      Although R-loops may not wrap around nucleosomes, long and stable R-loops likely cover stretches of DNA corresponding to multiple nucleosomes (10). For example, R-loops are associated with high levels of histone marks, such as H3K36me3, which LEDGF recognizes (2, 11). R-loops dynamically regulate the chromatin architecture. Possibly by altering nucleosome occupancy, positioning, or turnover, R-loop structures relieve superhelical stress and are often associated with open chromatin marks and active enhancers (2, 10). These features are also distributed over HIV-1 integration sites (12). In the Discussion section of the revised manuscript, we explored the R-loop molding mechanisms in the host genomic environment for HIV-1 integration site selection and its potential collaborative role with LEDGF/p75 and CPSF6 governing HIV-1 integration site selection. 

      By carefully revising our original manuscript, with respect to the reviewer's comment, we recognized the need to tone down our statements. We found that the title, abstract, and discussion of our original manuscript includes phrases, such as, “HIV-1 targets Rloops for integration,” which may overstate our finding on the role of R-loop in HIV-1 integration site selection. We replaced these phrases. For example, we used phrases, such as “HIV-1 favors vicinity regions of R-loop for the viral genome integration,” in the revised manuscript. We apologize for the inconvenience caused by the unclear and non-specific details of our findings.

      (2.3) Given that HIV DNA is known to preferentially integrate into expressed genes and that R-loops must necessarily involve expressed RNA, it is not surprising that there is a correlation between HIV integration and regions of the genome to which R loops have been mapped. However, it is important to remember that correlation does not necessarily imply causation.

      We understand the reviewer's concern regarding the possibility of a coincidental correlation between the R-loop regions and HIV-1 integration sites, particularly when the interpretation of this correlation is primarily based on a global analysis. 

      Therefore, we designed pgR-poor and pgR-rich cell lines, which we believe are suitable models for distinguishing between integration events driven by transcription and the presence of R-loops. Although the two cell lines showed comparable levels of transcription at the designated region upon DOX treatment via TRE promoter activation (Fig. 3B of the revised manuscript), only pgR-rich cells formed R-loops at the designated regions (Fig. 3C of the revised manuscript). When infected with HIV1, pgR-rich cells, but not pgR-poor cells, showed higher infectivity after DOX treatment (Fig. 3D of the revised manuscript). Moreover, we quantified site-specific integration events in the R-loop regions, and found that a greater number of integration events occurred in designated regions of the pgR-rich cellular genome upon R-loop induction by DOX treatment, but not in pgR-poor cells (Fig. 3E of the revised manuscript). Therefore, we concluded that transcriptional activation without an R-loop (in pgR-poor cells) may not be sufficient to drive HIV-1 integration. We believe that our work provides strong evidence for a causative relationship between R-loop formation/Rloop sites and HIV-1 integration. We hope that our explanation addresses your concerns. Thank you.

      If we consider some of the problems in the experiments that are described in the manuscript:

      (2.4) In an infected individual, cells are almost always infected by a single virion and the infecting virion is not accompanied by large numbers of damaged or defective virions. This is a key consideration: the claim that infection by HIV affects R-loop formation in cells was done with a VSVg vector in experiments in which there appears to have been about 6000 virions per cell. Although most of the virions prepared in vitro are defective in some way, that does not mean that a large fraction of the defective virions cannot fuse with cells. In normal in vivo infections, HIV has evolved in ways that avoid signaling infected the cell of its presence. To cite an example, carrying out reverse transcription in the capsid/core prevents the host cell from detecting (free) viral DNA in the cytoplasm. The fact that the large effect on R-loop formation which the authors report still occurs in infections done in the absence of reverse transcription strengthens the probability that the effects are due to the massive amounts of virions present, and perhaps to the presence of VSVg, which is quite toxic. To have physiological relevance, the infections would need to be carried out with virions that contain HIV even under circumstances in which there is at most one virion per cell.

      Our virus production and in vitro and ex vivo HIV-1 infection experimental conditions, designed for infecting cell types, such as HeLa cells and primary CD4+ T cells with VSV-G pseudotyped HIV, were based on a comprehensive review of numerous references. At the very beginning of this study, we tested HIV-1-specific host genomic R-loop induction using empty virion particles (virus-like particles, VLP) or other types of viruses (non-retrovirus, SeV; retroviruses, FMLV and FIV), all produced with a VSV G protein donor. We could not include a control omitting the VSV G protein or using natural HIV-1 envelope protein to prevent viral spread in culture. We observed that despite all types of virus stocks being prepared using VSV-G, only cells infected with HIV-1 viruses showed R-loop signal enrichment (Author response image 3). Therefore, we omitted the control for the VSV G protein in subsequent analyses, such as DRIPcseq. We have also revised our manuscript to provide a clearer description of the experimental conditions. In particular, we now clearly stated that we used VSV-G pseudotyped HIV-1 in this study, throughout the abstract, results, and discussion sections of the revised manuscript. Thank you.

      Author response image 3.

      (A) Dot blot analysis of the R-loop in gDNA extracts from HIV-1 infected U2OS cells with MOI of 0.6 harvested at 6 hpi. The gDNA extracts were incubated with or without RNase H in vitro before membrane loading (anti-S9.6 signal). (B) Dot blot analysis of the R-loop in gDNA extracts from HeLa cells infected with 0.3 MOI of indicated viruses. The infected cells were harvested at 6 hpi. The gDNA extracts were incubated with or without RNase H in vitro before membrane loading (anti-S9.6 signal).

      HIV-1 co-infection may also be expected in cell-free HIV-1 infections. However, it was previously suggested that the average number of infection events varies within 1.02 to 1.65 based on a mathematical model that estimates the frequency of multiple infections with the same virus (Figure 4c of Ito et al., Sci. Rep, 2017; 6559) (13). 

      (2.5) Using the Sso7d version of HIV IN in the in vitro binding assays raises some questions, but that is not the real question/problem. The real problem is that the important question is not what/how HIV IN protein binds to, but where/how an intasome binds. An intasome is formed from a combination of IN bound to the ends of viral DNA. In the absence of viral DNA ends, IN does not have the same structure/organization as it has in an intasome. Moreover, HIV IN (even Sso7d, which was modified to improve its behavior) is notoriously sticky and hard to work with. If viral DNA had been included in the experiment, intasomes would need to be prepared and purified for a proper binding experiment. To make matters worse, there are multiple forms of multimeric HIV IN and it is not clear how many HIV INs are present in the PICs that actually carry out integration in an infected cell.

      As the reviewer has noted, HIV IN, even with Sso7d tagging, is difficult. We attempted the purification of viral DNA (vDNA)-bound PICs using either Sso7d-tagged HIV-1 integrase proteins or non-tagged HIV-1 integrase proteins (F185K/C280S), procured from the NIH HIV reagent program (HRP-20203), following the method described by Passos et al., Science, 2017; 355 (89-92) (3). Despite multiple attempts, we were unable to purify the vDNA-bound IN protein complexes for in vitro assays. However, through multiple biochemical experiments, we believe that we have successfully demonstrated the interaction between cellular R-loops and HIV-1 integrase proteins both in cells and in vitro (Fig. 5A–F of the revised manuscript). We also observed a close association between integrase proteins and host cellular Rloops in HIV-1-infected cells, using a fluorescent recombinant virus (HIV-IN-EGFP) with intact IN-EGFP PICs (Fig. 5G of the revised manuscript). 

      (2.6) As an extension of comment 2, the proper association of an HIV intasome/PIC with the host genome requires LEDGF and the appropriate nucleic acid targets need to be chromatinized.

      The interaction between cellular R-loops and HIV-1 integrase proteins in HeLa cells endogenously expressing LEDGF/p75 was examined using reciprocal immunoprecipitation assays in Fig. 5C–F, S6B, and S6D Fig. of the revised manuscript. In addition, as discussed in more detail in our response to comment [28], we observed a close association between host cellular R-loops and HIV-1 integrase proteins by PLA assay, in HIV-1-infected HeLa cells. 

      (2.7) Expressing any form of IN, by itself, in cells to look for what IN associates with is not a valid experiment. A major factor that helps to determine both where integration takes place and the sites chosen for integration is the transport of the viral DNA and IN into the nucleus in the capsid core. However, even if we ignore that important part of the problem, the IN that the authors expressed in HeLa cells won't be bound to the viral DNA ends (see comment 2), even if the fusion protein would be able to form an intasome. As such, the IN that is expressed free in cells will not form a proper intasome/PIC and cannot be expected to bind where/how an intasome/PIC would bind.

      As discussed in more detail in our response to comment [2-8], we believe that our PLA experiment using the pVpr-IN-EGFP virus, which has previously been examined for virion integrity, as well as the IN-EGFP PICs (14), demonstrated a close association between host cellular R-loops and HIV-1 integrase proteins in HIV-1-infected cells. 

      (2.8) As in comment 1, for the PLA experiments presented in Figure 5 to work, the number of virions used per cell (which differs from the MOI measured by the number of cells that express a viral marker) must have a high, which is likely to have affected the cells and the results of the experiment. However, there is the additional question of whether the IN-GFP fusion is functional. The fact that the functional intasome is a complex multimer suggests that this could be a problem. There is an additional problem, even if IN-GFP is fully functional. During a normal infection, the capsid core will have delivered copies of IN (and, in the experiments reported here, the IN-GFP fusion) into the nucleus that is not part of the intasome. These "free" copies of IN (here IN-GFP) are not likely to go to the same sites as an intasome, making this experiment problematic (comment 4).

      The HIV-IN-EGFP virus stock was produced by polyethylenimine-mediated transfection of HEK293T cells with 6 µg of pVpr-IN-EGFP, 6 µg of HIV-1 NL4-3 noninfectious molecular clone (pD64E; NIH AIDS Reagent Program 10180), and 1 µg of pVSV-G as previously described in (14), and described in the Materials and Methods section of our manuscript. The pVpr-IN-EGFP vector used to produce HIV-1-IN-EGFP virus stock was provided by Anna Cereseto group (Albanese et al., PLOS ONE, 2008; 6(6); Ref 34 of the revised manuscript). It was previously reported that the HIV-1INEGFP virions produced by IN-EGFP trans-incorporation through Vpr are intact and infective viral particles (Figure 1 of Albanese et al., PLOS ONE, 2008; 6(6)). Therefore, we believe that the HIV-IN-EGFP used in our PLA experiments was functional. 

      Additionally, Albanese et al. showed that the EGFP signal of HIV-IN-EGFP virions colocalizes with the viral protein matrix (p17MA) and capsid (P24CA) as well as with the newly synthesized cDNA produced by reverse transcriptase by labeling and visualizing the synthesized cDNA (14). In addition, the fluorescent recombinant virus (HIV-INEGFP) was structurally intact at the nuclear level (Figure 6 of Albanese et al., PLOS ONE, 2008; 6(6)). Therefore, we believe that our PLA experimental result is not likely misled as the reviewer concerns due to the integrity of the HIV-IN-EGFP virion as well as IN-EGFP PICs.

      Furthermore, the in vitro HIV-1 infection setting of our PLA experiments was carefully determined based on multiple studies that performed image-based assays on HIV-1infected cells. For instance, Albanese et al. infected 4 × 104 cells with viral loads equivalent to 1.5 or 3 µg of HIV-1 p24 for their immunofluorescence analysis, in their previous report (14). We titrated the fluorescent HIV-1 virus stocks by examining both the multiplicity of infection (MOI) and quantifying the HIV-1 p24 antigen content (Author response image 4). In our calculation, we infected 5 × 104 HeLa cells with viral loads equivalent to 1.3 ug of HIV-1 p24, which is indicated as 2 MOI in Fig. 5G of our manuscript, for our PLA experiments. 

      Image-Based Assays often require increased and enhanced signal for statistical robustness. For example, Achuthan et al. infected cells with VSV-G-pseudotyped HIV1 at the approximate MOI of 350 for vDNA and PIC visualization (15). Therefore, we believe our experimental condition for PLA experiments, which we carefully designed based on previous study that are frequently referred, are reasonable. We really hope that our discussion sufficiently addressed the reviewer’s concern. 

      Author response image 4.

      Gating strategy used to determine HIV-1-infectivity in HeLa cells at 48 hpi. Cells were infected with a known p24 antigen content in the stock of the VSV-G-pseudotyped HIV-1-EGFP-virus. The percentages of GFP-positive cell population are indicated.

      (2.9) In the Introduction, the authors state that the site of integration affects the probability that the resulting provirus will be expressed. Although this idea is widely believed in the field, the actual data supporting it are, at best, weak. See, for example, the data from the Bushman lab showing that the distribution of integration sites is the same in cells in which the integrated proviruses are, and are not, expressed. However, given what the authors claim in the introduction, they should be more careful in interpreting enzyme expression levels (luciferase) as a measure of integration efficiency in experiments in which they claim proviruses are integrated in different places.

      We thank the reviewer for the constructive comment. We have changed the statement in Lines 41–42 in the Introduction section of our original manuscript to “The chromosomal landscape of HIV-1 integration influences proviral gene expression, persistence of integrated proviruses, and prognosis of antiretroviral therapy.” (Lines 39-41 of the revised manuscript). We believe that this change can tone-down the relevance between the site of integration and the provirus expression level.

      The piggyBac transposase randomly insert the “cargo (transposon)” into TTAA chromosomal sites of the target genome, generating efficient insertions at different genomic loci (16, 17). We believe that this random insertion of the pgR-poor/rich vector mediated by the piggyBac system allows us not to mislead the R-loop-mediated HIV1 integration site because of the genome locus bias of the vector insertion. Therefore, Figure 3 in our manuscript does not claim any relevance between the site of integration and the resulting provirus expression levels. Instead, as noted in Line 214 of the revised manuscript, using the luciferase reporter HIV-1 virus, we attempted to examine HIV-1 infection in cells with an "extra number of R-loops” in the host cellular genome. We observed that pgR-rich cells showed higher luciferase activity upon DOX treatment than pgR-poor cells (Fig. 3D of the revised manuscript). We believe that this is because a greater number of HIV-1 integration events may occur in pgR-rich cells, where DOX-inducible de novo R-loop regions are introduced. This has been further examined in Fig. 3E–G of the revised manuscript. We hope this explanation clarifies the Figure 3. Thank you. 

      (2.10) Using restriction enzymes to create an integration site library introduces biases that derive from the uneven distribution of the recognition sites for the restriction enzymes.

      As described in the Materials and Methods section, we adopted a sequencing library construction method using a previously established protocol (18, 19). Although we recognize the advantages of DNA fragmentation by sonication, in in vitro or ex vivo HIV-1 infection settings, where the multiplicity of infection is carefully determined based on multiple references, more copies of integrated viral sequences are expected compared to that in samples from infected patients (18). Therefore, in these settings, restriction enzyme-based DNA fragmentation and ligation-mediated PCR sequencing are well-established methods that provide significant data sources for HIV-1 integration site sequencing (15, 20-22). Furthermore, our data showing the proportion of integration sites over R-loop regions (Fig. 4B of the revised manuscript) are presented alongside the respective random controls (i.e., proportion of integration sites within the 30-kb windows centered on randomized DRIPc-seq peaks, gray dotted lines; control comparisons between randomized integration sites with DRIPc-seq peaks, black dotted lines; and randomized integration sites with randomized DRIPcseq peaks, gray solid lines), which do not show such a correlation between the HIV-1 integration sites and nearby areas of the R-loop regions. Therefore, we believe that our results from the integration site sequencing data analysis are unlikely to be biased. 

      Reviewer #3 (Public Review):

      In this manuscript, Park and colleagues describe a series of experiments that investigate the role of R-loops in HIV-1 genome integration. The authors show that during HIV-1 infection, R-loops levels on the host genome accumulate. Using a synthetic R-loop prone gene construct, they show that HIV-1 integration sites target sites with high R-loop levels. They further show that integration sites on the endogenous host genome are correlated with sites prone to R-loops. Using biochemical approaches, as well as in vivo co-IP and proximity ligation experiments, the authors show that HIV-1 integrase physically interacts with R-loop structures.

      My primary concern with the paper is with the interpretations the authors make about their genome-wide analyses. I think that including some additional analyses of the genome-wide data, as well as some textual changes can help make these interpretations more congruent with what the data demonstrate. Here are a few specific comments and questions:

      We are grateful for the time and effort we spent on our behalf and the reviewer’s appreciation for the novelty of our work, in particular, R-loop induction by HIV-1 infection and the correlation between host R-loops and the genomic site of HIV-1 integration. In the following sections, we provide our responses to your comments and suggestions. Your comments are in italics. We have carefully addressed the following issues.

      (3.1) I think Figure 1 makes a good case for the conclusion that R-loops are more easily detected HIV-1 infected cells by multiple approaches (all using the S9.6 antibody). The authors show that their signals are RNase H sensitive, which is a critical control. For the DRIPc-Seq, I think including an analysis of biological replicates would greatly strengthen the manuscript. The authors state in the methods that the DRIPc pulldown experiments were done in biological replicates for each condition. Are the increases in DRIPc peaks similar across biological replicates? Are genomic locations of HIV-1-dependent peaks similar across biological replicates? Measuring and reporting the biological variation between replicate experiments is crucial for making conclusions about increases in R-loop peak frequency. This is partially alleviated by the locus-specific data in Figure S3A. However, a better understanding of how the genome-wide data varies across biological replicates will greatly enhance the quality of Figure 1.

      DRIPc-seq experiments were conducted with two biological replicates. To define consensus DRIPc-seq peaks using these two replicates, we used two methods applicable to ChIP-seq analysis: the irreproducible discovery rate (IDR) method and sequencing data pooling. We found that the sequencing data pooling method yielded significantly more DRIPc-seq peaks than consensus peak identification through IDR, and we decided to utilize R-loop peaks from pooled sequencing data for our downstream analyses, as described in the figure legends and Materials and Methods of the revised manuscript. 

      As noted by the reviewer, it is important to verify whether the increasing trend in the number of R-loop peaks and genomic locations of HIV-1 dependent R-loops were consistently observed across the two biological replicates. Therefore, we independently performed R-loop calling on each replicate of the sequencing data of primary CD4+ T cells from two individual donors to verify that the increase in R-loop numbers was consistent (Author response image 5). Additionally, the overlap of the R-loop peaks between the two replicates was statistically significant across the genome (Author response table 1). Thank you.

      Author response image 5.

      Bar graph indicating DRIPc-seq peak counts for HIV-1-infected primary CD4+ T cells harvested at the indicated hours post infection (hpi). Pre-immunoprecipitated samples were untreated (−) or treated (+) with RNase H, as indicated. Each dot corresponds to an individual data set from two biologically independent experiments.

      Author response table 1.

      DRIPc-seq peak length and Chi-square p-value in CD4+ T cells from individual donor 1 and 2 

      (3.2) I think that the conclusion that R-loops "accumulate" in infected cells is acceptable, given the data presented. However, in line 134 the authors state that "HIV1 infection induced host genomic R-loop formation". I suggest being very specific about the observation. Accumulation can happen by (a) inducing a higher frequency of the occurrence of individual R-loops and/or (b) stabilizing existing R-loops. I'm not convinced the authors present enough evidence to claim one over the other. It is altogether possible that HIV-1 infection stabilizes R-loops such that they are more persistent (perhaps by interactions with integrase?), and therefore more easily detected. I think rephrasing the conclusions to include this possibility would alleviate my concerns.

      We thank the reviewer for the considerable discussion on our manuscript. We have now changed Line 134 to, “HIV-1 infection induces host genomic R-loop enrichment” (Lines 132-133 of the revised manuscript), and added a new conclusion sentence implicating the possible explanation for the R-loop signal enrichment upon HIV-1 infection (Lines 133–135 of the revised manuscript), according to the reviewer's suggestion.    

      (3.3) A technical problem with using the S9.6 antibody for the detection of R-loops via microscopy is that it cross-reacts with double-stranded RNA. This has been addressed by the work of Chedin and colleagues (as well as others). It is absolutely essential to treat these samples with an RNA:RNA hybrid-specific RNase, which the authors did not include, as far as their methods section states. Therefore, it is difficult to interpret all of the immunofluorescence experiments that depend on S9.6 binding.

      We understand the reviewer's concern regarding the cross-reactivity of the S9.6 antibody with more abundant dsRNA, particularly in imaging applications. We carefully designed the experimental and analytical methods for R-loop detection using microscopy. For example, we pre-extracted the cytoplasmic fraction before staining with the S9.6 antibody and quantified the R-loop signal by subtracting the nucleolar signal. Both of these steps were taken to eliminate the possibility of misdetecting Rloops via microscopy because of the prominent cytoplasmic and nucleolar S9.6 signals, which primarily originate from ribosomal RNA. In addition, we included R-loop negative control samples in our microscopy analysis that were subjected to intensive RNase H treatment (60U/mL RNase H for 36 h) and observed a significant reduction in the S9.6 signal (Figure 1E of the revised manuscript). RNase H-treated samples served as essential and widely accepted negative controls for R-loop detection. 

      We would like to point out that recent studies have reported strong intrinsic specificity of S9.6 anybody for DNA:RNA hybrid duplex over dsDNA and dsRNA, along with the structural elucidations of S9.6 antibody recognition of hybrids (23, 24). Therefore, our interpretation of host cellular R-loop enrichment after HIV-1 infection using S9.6 antibodies in multiple biochemical approaches is well supported. Nevertheless, we agree with the reviewer's opinion that additional negative controls for the detection of R-loops via microscopy, such as RNase T1-and RNase III-treated samples, could improve the robustness and accuracy of R-loop imaging data (25).  

      (3.4) Given that there is no clear correlation between expression levels and R-loop peak detection, combined with the data that show increased detection of R-loop frequency in non-genic regions, I think it will be important to show that the R-loop forming regions are indeed transcribed above background levels. This will help alleviate possible concerns that there are technical errors in R-loop peak detection.

      Figures S5D and S5E in the revised manuscript show the relative gene expression levels of the R-loop-forming positive regions (P1-3) and the referenced Rloop-positive loci (RPL13A and CALM3). The gene expression levels of these R-loopforming regions were significantly higher than those of the ECFP or mAIRN genes without DOX treatment, which can be considered background levels of transcription in cells. Thank you. 

      (3.5) In Figures 4C and D the hashed lines are not defined. It is also interesting that the integration sites do not line up with R-loop peaks. This does not necessarily directly refute the conclusions (especially given the scale of the genomic region displayed), but should be addressed in the manuscript. Additionally, it would greatly improve Figure 4 to have some idea about the biological variation across replicates of the data presented 4A.

      We thank the reviewer for the considerable comment on our study. First of all, we added an annotation for the dashed lines in the figure legends of Figures 4C and 4D in the revised manuscript.

      We agree with the reviewer's interpretation of the relationship between the integration sites and R-loop peaks. Primarily based on our current data, we believe R-loop structures are bound by HIV-1 integrase proteins and lead HIV-1 viral genome integration into the “vicinity” regions of the host genomic R-loops. We displayed a large-scale genomic region (30-kb windows) to present integration sites surrounding R-loop centers because an R-loop can be multi-kilobase in size (1, 2). Depending on the immunoprecipitation and library construction methods, the R-loop peaks varied in size, and the peak length showed a wide distribution (Figure 3B of Malig et al., 2020, Figure 1B of Sanz et al., 2016, and Figure 2A of the revised manuscript). Therefore, presenting integration site events within a wide window of R-loop peaks could be more informative and better reflect the current understanding of R-loop biology.

      R-loop formation recruits diverse chromatin-binding protein factors, such as H3K4me1, p300, CTCF, RAD21, and ZNF143 (Figure 6A and 6B of Sanz et al., 2016) (26), which allow R-loops to exhibit enhancer and insulator chromatin states, which can act as distal regulatory elements (26, 27). We have demonstrated physical interactions between host cellular R-loops and HIV-1 integrase proteins (Figure 5 of the revised manuscript), therefore, we believe that this ‘distal regulatory element-like feature’ of the R-loop can be a potential explanation for how R-loops drive integration over longrange genomic regions.

      According to your suggestion, we added this explanation to the relevant literature in the Discussion section of the revised manuscript.

      Author response image 6 which represents the biological variation across replicates of the data shown in Figure 4A. The integration site sequencing data for Jurkat cells were adopted from SRR12322252 (4), which consists of the integration site sequencing data of HIV-1-infected wild type Jurkat cells with one biological replicate. We hope that our explanations and discussion have successfully addressed your concerns. Thank you. 

      Author response image 6.

      Bar graphs showing the quantified number of HIV-1 integration sites per Mb pair in total regions of 30-kb windows centered on DRIPc-seq peaks from HIV-1 infected HeLa cells and primary CD4+ T cells (magenta) or non-R-loop region in the cellular genome (gray). Each dot corresponds to an individual data set from two biologically independent experiments.

      (3.6) The authors do not adequately describe the Integrase mutant that they use in their biochemical experiments in Figure 5A. Could this impact the activity of the protein in such a way that interferes with the interpretation of the experiment? The mutant is not used in subsequent experiments for Figure 5 and so even though the data are consistent with each other (and the conclusion that Integrase interacts with R-loops) a more thorough explanation of why that mutant was used and how it impacts the biochemical activity of the protein will help the interpretation of the data presented in Figure 5.

      We appreciate the reviewer’s suggestions. In our EMSA analysis, we purified and used Sso7d-tagged HIV-1 integrase proteins with an active-site amino acid substitution, E152Q. First, we used the Sso7d-tagged HIV-1 integrase protein, as it has been suggested in previous studies that the fusion of small domains, such as Sso7d (DNA binding domain) can significantly improve the solubility of HIV integrase proteins without affecting their ability to assemble with substrate nucleic acids and their enzymatic activity (Figure 1B of Li et al., PLOS ONE, 2014;9 (8) (28, 29). We used an integrase protein with an active site amino acid substitution, E152Q, in our mobility shift assay, because the primary goal of this experiment was to examine the ability of the protein to bind or form a complex with different nucleic acid substrates. We thought that abolishing the enzymatic activity of the integrase protein, such as 3'-processing that cleaves DNA substrates, would be more appropriate for our experimental objective. This Sso7d tagged- HIV-1 integrase with the E152Q mutation has also been used to elucidate the structural model of the integrase complex with a nucleic acid substrate by cryo-EM (3) and has been shown to not disturb substrate binding.   Based on the reviewer’s comments, we have added a description of the E152Q mutant integrase protein in Lines 268–270 of the revised manuscript. Thank you.

      Reviewer #3 (Recommendations For The Authors):

      The paper suffers from many grammatical errors, which sometimes interfere with the interpretations of the experiments. In the view of this reviewer, the manuscript must be carefully revised prior to publication. For example, lines 247-248 "Intasomes consist of HIV-1 viral cDNA and HIV-1 coding protein, integrases." It is unclear from this sentence whether there are multiple integrases or multiple proteins that interact with the viral genome to facilitate integration. This makes the subsequent experiments in Figure 5 difficult to interpret. There are many other examples, too numerous to point out individually.

      We thoughtfully revised the original manuscript, making the best efforts to provide clearer details of our findings. We believe that we have made substantial changes to the manuscript, including Lines 247–248 of the original manuscript that the reviewer noted. Furthermore, the revised manuscript was edited by a professional editing service. Thank you.     (1) M. Malig, S. R. Hartono, J. M. Giafaglione, L. A. Sanz, F. Chedin, Ultra-deep Coverage Singlemolecule R-loop Footprinting Reveals Principles of R-loop Formation. J Mol Biol 432, 22712288 (2020).

      (2) L. A. Sanz et al., Prevalent, Dynamic, and Conserved R-Loop Structures Associate with Specific Epigenomic Signatures in Mammals. Mol Cell 63, 167-178 (2016).

      (3) D. O. Passos et al., Cryo-EM structures and atomic model of the HIV-1 strand transfer complex intasome. Science 355, 89-92 (2017).

      (4) W. Li et al., CPSF6-Dependent Targeting of Speckle-Associated Domains Distinguishes Primate from Nonprimate Lentiviral Integration. mBio 11,  (2020).

      (5) P. A. Ginno, Y. W. Lim, P. L. Lott, I. Korf, F. Chedin, GC skew at the 5' and 3' ends of human genes links R-loop formation to epigenetic regulation and transcription termination. Genome Res 23, 1590-1600 (2013).

      (6) S. Hamperl, M. J. Bocek, J. C. Saldivar, T. Swigut, K. A. Cimprich, Transcription-Replication Conflict Orientation Modulates R-Loop Levels and Activates Distinct DNA Damage Responses. Cell 170, 774-786 e719 (2017).

      (7) H. O. Ajoge et al., G-Quadruplex DNA and Other Non-Canonical B-Form DNA Motifs Influence Productive and Latent HIV-1 Integration and Reactivation Potential. Viruses 14,  (2022).

      (8) I. K. Jozwik et al., B-to-A transition in target DNA during retroviral integration. Nucleic Acids Res 50, 8898-8918 (2022).

      (9) F. Chedin, C. J. Benham, Emerging roles for R-loop structures in the management of topological stress. J Biol Chem 295, 4684-4695 (2020).

      (10) F. Chedin, Nascent Connections: R-Loops and Chromatin Patterning. Trends Genet 32, 828838 (2016).

      (11) P. B. Chen, H. V. Chen, D. Acharya, O. J. Rando, T. G. Fazzio, R loops regulate promoterproximal chromatin architecture and cellular differentiation. Nat Struct Mol Biol 22, 9991007 (2015).

      (12) A. R. Schroder et al., HIV-1 integration in the human genome favors active genes and local hotspots. Cell 110, 521-529 (2002).

      (13) Y. Ito et al., Number of infection events per cell during HIV-1 cell-free infection. Sci Rep 7, 6559 (2017).

      (14) A. Albanese, D. Arosio, M. Terreni, A. Cereseto, HIV-1 pre-integration complexes selectively target decondensed chromatin in the nuclear periphery. PLoS One 3, e2413 (2008).

      (15) V. Achuthan et al., Capsid-CPSF6 Interaction Licenses Nuclear HIV-1 Trafficking to Sites of Viral DNA Integration. Cell Host Microbe 24, 392-404 e398 (2018).

      (16) X. Li et al., piggyBac transposase tools for genome engineering. Proc Natl Acad Sci U S A 110, E2279-2287 (2013).

      (17) Y. Cao et al., Identification of piggyBac-mediated insertions in Plasmodium berghei by next generation sequencing. Malar J 12, 287 (2013).

      (18) E. Serrao, P. Cherepanov, A. N. Engelman, Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites. J Vis Exp,  (2016).

      (19) K. A. Matreyek et al., Host and viral determinants for MxB restriction of HIV-1 infection. Retrovirology 11, 90 (2014).

      (20) G. A. Sowd et al., A critical role for alternative polyadenylation factor CPSF6 in targeting HIV-1 integration to transcriptionally active chromatin. Proc Natl Acad Sci U S A 113, E10541063 (2016).

      (21) B. Lucic et al., Spatially clustered loci with multiple enhancers are frequent targets of HIV-1 integration. Nat Commun 10, 4059 (2019).

      (22) P. K. Singh, G. J. Bedwell, A. N. Engelman, Spatial and Genomic Correlates of HIV-1 Integration Site Targeting. Cells 11,  (2022).

      (23) C. Bou-Nader, A. Bothra, D. N. Garboczi, S. H. Leppla, J. Zhang, Structural basis of R-loop recognition by the S9.6 monoclonal antibody. Nat Commun 13, 1641 (2022).

      (24) Q. Li et al., Cryo-EM structure of R-loop monoclonal antibody S9.6 in recognizing RNA:DNA hybrids. J Genet Genomics 49, 677-680 (2022).

      (25) J. A. Smolka, L. A. Sanz, S. R. Hartono, F. Chedin, Recognition of RNA by the S9.6 antibody creates pervasive artifacts when imaging RNA:DNA hybrids. J Cell Biol 220,  (2021).

      (26) L. A. Sanz, F. Chedin, High-resolution, strand-specific R-loop mapping via S9.6-based DNARNA immunoprecipitation and high-throughput sequencing. Nat Protoc 14, 1734-1755 (2019).

      (27) M. Merkenschlager, D. T. Odom, CTCF and cohesin: linking gene regulatory elements with their targets. Cell 152, 1285-1297 (2013).

      (28) M. Li, K. A. Jurado, S. Lin, A. Engelman, R. Craigie, Engineered hyperactive integrase for concerted HIV-1 DNA integration. PLoS One 9, e105078 (2014).

      (29) M. Li et al., A Peptide Derived from Lens Epithelium-Derived Growth Factor Stimulates HIV1 DNA Integration and Facilitates Intasome Structural Studies. J Mol Biol 432, 2055-2066 (2020).

    1. Author Response

      The following is the authors’ response to the original reviews.

      General remarks for the Editor and the Reviewers

      We would like to thank the Editor and the Reviewers for their feedback. Below we address their comments and present our point-by-point responses as well as the related changes in the manuscript.

      In addition to these changes, in a few cases we have found it necessary to move some texts and provide some additional explanations within the manuscript. We emphasize that these amendments have been made for only technical reasons, and do not alter the results and conclusions of the paper, but may help to render the text more coherent and understandable to readers with little knowledge of the subject.

      These minor corrections are:

      • We extended the Introduction section by a sentence (lines 40-42) that is intended to fit the proposed template directed, non-enzymatic replication mechanism into a more general prebiotic evolutionary context, thus emphasizing its biological relevance. This sentence includes an additional reference (Rosenberger et al., 2021).

      • Two very methodologically oriented and repeated descriptions of random sequence generation have been moved to the Methods section (lines 178-185) from the Results section (lines 336-339 and lines 351-354).

      • We complemented the Data availability statement with licensing information (lines 684-685).

      • Further minor changes (also indicated by red texts) have been implemented to remedy logical and grammatical glitches.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Szathmary and colleagues explore the parabolic growth regime of replicator evolution. Parabolic growth occurs when nucleic acid strain separation is the rate-limiting step of the replication process which would have been the case for non-enzymatic replication of short oligonucleotide that could precede the emergence of ribozyme polymerases and helicases. The key result is that parabolic replication is conducive to the maintenance of genetic diversity, that is, the coexistence of numerous master sequences (the Gause principle does not apply). Another important finding is that there is no error threshold for parabolic replication except for the extreme case of zero fidelity.

      Strengths:

      I find both the analytic and the numerical results to be quite convincing and well-described. The results of this work are potentially important because they reveal aspects of a realistic evolutionary scenario for the origin of replicators.

      Weaknesses:

      There are no obvious technical weaknesses. It can be argued that the results represent an incremental advance because many aspects of parabolic replication have been explored previously (the relevant publications are properly cited). Obviously, the work is purely theoretical, experimental study of parabolic replication is due. In the opinion of this reviewer, though, these are understandable limitations that do not actually detract from the value of this work.

      We are grateful that this Reviewer appreciates our work. We completely agree that the ultimate validation must come from experiments. It is important to stress that in this field theory often preceded experimental work by decades, and the former often guided the latter. We hope that for the topic of the present paper experiments will follow considerably faster.

      Reviewer #2 (Public Review):

      Summary:

      A dominant hypothesis concerning the origin of life is that, before the appearance of the first enzymes, RNA replicated non-enzymatically by templating. However, this replication was probably not very efficient, due to the propensity of single strands to bind to each other, thus inhibiting template replication. This phenomenon, known as product inhibition, has been shown to lead to parabolic growth instead of exponential growth. Previous works have shown that this situation limits competition between alternative replicators and therefore promotes RNA population diversity. The present work examines this scenario in a model of RNA replication, taking into account finite population size, mutations, and differences in GC content. The main results are (1) confirmation that parabolic growth promotes diversity, but that when the population size is small enough, sequences least efficient at replicating may nevertheless go extinct; (2) the observation that fitness is not only controlled by the replicability of sequences, but also by their GC content; (3) the observation that parabolic growth attenuates the impact of mutations and, in particular, that the error threshold to which exponentially growing sequences are subject can be exceeded, enabling sequence identity to be maintained at higher mutation rates.

      Strengths:

      The analyses are sound and the observations are intriguing. Indeed, it has been noted previously that parabolic growth promotes coexistence, its role in mitigating the error threshold catastrophe - which is often presented as a major obstacle to our understanding of the origin of life - had not been examined before.

      Weaknesses:

      Although all the conclusions are interesting, most are not very surprising for people familiar with the literature. As the authors point out, parabolic growth is well known to promote diversity (SzathmaryGladkih 89) and it has also been noted previously that a form of Darwinian selection can be found at small population sizes (Davis 2000).

      Given that under parabolic growth, no sequence is ever excluded for infinite populations, it is also not surprising to find that mutations have a less dramatic exclusionary impact.

      In the two articles cited (Szathmary-Gladkih 1989 and Davis 2000) the subexponentiality of the system was implemented in a mechanistic way, by introducing the exponent 0 < 𝑝 < 1. Although the behaviour of these models is more or less consistent with experimental findings (von Kiedrowski, 1986; Zielinski and Orgel, 1987), the divergence of per capita growth rates (𝑥̇/𝑥) at very low concentrations–which guarantees the ability to maintain unlimited diversity in the case of infinite population sizes–makes this formal approach partly unrealistic.

      To avoid the possible artefacts of this mechanistic approach, and as there are no previous studies analysing the diversity maintaining ability of finite populations of parabolic replicators in an individual-based model context, we implemented a simplified template replication mechanism leading to parabolic growth and analysed the dynamics in an individual-based stochastic model context. The key point of our investigation is that considerable diversity can be maintained in the system even when the population size is quite small.

      Regarding the Reviewer’s comment on selection: Darwinian selection can only occur in a simple subexponential dynamics if the ratio of replicabilities diverges, cf. Eq. (8) and the preceding paragraph in Davis, 2000.

      Our results also show (Figs. 4B and 4C) that high mutation rates and the error threshold problem can still be considered as a major limiting factor for parabolically replicating systems in terms of their diversity-maintaining ability. In the light of the above, potential mechanisms to relax the error threshold in such systems, one of which is demonstrated in the present study, seem to be important steps to account for the sequence diversification and increase in molecular complexity during the early evolution of RNA replicators.

      A general weakness is the presentation of models and parameters, whose choices often appear arbitrary. Modeling choices that would deserve to be further discussed include the association of the monomers with the strands and the ensuing polymerization, which are combined into a single association/polymerization reaction (see also below), or the choice to restrict to oligomers of length L = 10. Other models, similar to the one employed here, have been proposed that do not make these assumptions, e.g. Rosenberger et al. Self-Assembly of Informational Polymers by Templated Ligation, PRX 2021. To understand how such assumptions affect the results, it would be helpful to present the model from the perspective of existing models.

      The assumption of one-step polymerization reactions that we used here is a common technique for modelling template replication of sequence-represented replicators [see, e.g., Fontana and Schuster, 1998 (10.1126/science.280.5368.1451), Könnyű et al., 2008 (10.1186/1471-2148-8267), Vig-Milkovics et al, 2019 (10.1016/j.jtbi.2018.11.020) or Szilágyi et al., 2020 (10.1371/journal.pgen.1009155)]. This is because assuming base-to-base polymerisation of the copy would lead to a very large number of different types of intermediates, which a Gillespietype stochastic simulation algorithm could not handle in reasonable computation times, even if the sequences were relatively short. For comparison, in our model, where polymerization is one-step, the characteristic time of a simulation for 𝐿 = 10, 𝑁 = 105 and 𝛿 = 0.01 was 552 hours.

      Note that in Rosenberg et al. (PRX 2021), in contrast to a pioneering work [Fernando et al, 2007 (10.1007/s00239-006-0218-4)], sequences of replicators are not represented, which makes this approach completely inapplicable to our case, in which sequence defines the fitness. In sum, we suggest that this valid criticism points to possible future work.

      The values of the (many) parameters, often very specific, also very often lack justifications. For example, why is the "predefined error factor" ε = 0.2 and not lower or higher? How would that affect the results?

      A general remark. For the more important parameters , several values were used to test the behaviour of the model (see Table 1), but due to the considerable number of parameters, it is impossible to examine all possible combinations. 𝑐+ = 1 fixes the timescale, 𝐿 is set to 10 to obtain reasonable running times (see above).

      𝜀 characterizes how replicability decreases as the number of mutations increases. In the manuscript we used the following default vector: 𝜀 = (0.05, 0.2, 1) in which the third element corresponds to the mutation-free sequence, so it must to be 1. The first element determines the baseline replicability (see Methods), which we preferred not to change because it would fundamentally alter the ratio of replication propensities to association and dissociation propensities (as the substantial amount of complementary sequences of the master sequences are of baseline replicability) and thus would alter the reaction kinetics to an extent that it is not comparable with the original results. Therefore, only the second element can be adjusted. Accordingly, we have analysed the behaviour of the model in the cases of a steeper and a more gradual loss of replicability using the following two vectors, respectively: 𝜀, = (0.05, 𝟎. 𝟎𝟓, 1) and 𝜀,, = (0.05, 𝟎. 𝟓, 1). The choice of 𝜀, is chemically more plausible, since for very short oligomers the loss of chemical activity and replicability as a function of the number of mutations can be very sharp. We performed a series of simulations with all possible combinations of 𝛿 = 0.001, 0.005, 0.1 and 𝑁 = 103, 104, 105 for 𝜀′ and 𝜀,,in the constant population and chemostat model context (36 different runs). For other parameters, we took the default values, see Table 1. These values also correspond to the parameters we used in Figures 2 and 6. The results show that the steeper loss of replicability (𝜀,) slightly increases the diversity maintaining ability of the system, whereas the more gradual loss of replicability (𝜀,,) moderately decreases the diversity-maintaining ability of the system, and that these shifts are more pronounced in the constant population size model (Author response image 1) than in the chemostat model (Author response image 2). Altogether, these results confirm that the qualitative outcome of the model is robust in a wide range of loss of replicability (𝜀 vector) values.

      Author response image 1.

      Replicator coexistence in the constant population model with different loss of replicability (𝜀 vector) values. Within a given combination of 𝛿 and 𝑁 parameter values, the upper panel corresponds to the steeper loss of replicability (𝜀!), the middle panel to the default 𝜀 vector (Figure 2A), and the bottom panel to the more gradual loss of replicability vector (𝜀!!). Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different 𝜀 vectors for comparability.

      Author response image 2.

      Replicator coexistence in the chemostat model with different loss of replicability (𝜀 vector) values. Within a given combination of 𝛿 and 𝑁 parameter values, the upper panel corresponds to the steeper loss of replicability (𝜀!), the middle panel to the default 𝜀 vector (Figure 6A), and the bottom panel to the more gradual loss of replicability vector (𝜀!!). Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different 𝜀 vectors for comparability.

      Similarly, in equation (11), where does the factor 0.8 come from?

      This factor scales the decay rate of duplex sequences (𝑐"!") as the function of the binding energy

      (𝐸b). The value of 0.8 is an arbitrary choice, the value should be in the interval (0,1) and is only relevant in the chemostat model. It is expected to have a similar effect on the dynamics as the duplex decay factor parameter 𝑓, which we have investigated in a wide range of different values (cf. Table 1, Fig. 6), although 𝑓 is independent of the binding energy (𝐸/): increasing/decreasing the 0.8 factor is expected to decrease/increase the average total population size. We have investigated the diversity maintaining ability of the system at smaller (0.6) and larger (0.9) parameter values at different population sizes (𝑁 ≈ 103, 104 and 105) and at different replicability distances (δ = 0.001, 0.005 and 0.01) as shown in Fig. 6. We have found that the number of coexisting master types changes very little in response to changes in this factor. Only two shifts could be detected (underlined): factor 0.9 combined with 𝑁 ≈ 104 and 𝛿 = 0.001 caused the number of surviving master types to decrease by one, while factor 0.9 combined with 𝑁 ≈ 103 and 𝛿 = 0.01 caused the number of surviving master types to increase by one (Author response table 1). Factor 0.6 produced the same number of surviving types as the default (Author response table 1). In summary, the model shows marked robustness to changes in the values of this parameter.

      Author response table 1.

      Number of coexisting master types in the chemostat model with different binding energy dependent duplex decay rates. Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different factor values: 0.6, 0.8 (the original) and 0.9 for comparability.

      Why is the kinetic constant for duplex decay reaction 1.15e10−8?

      Note that this value is the minimum of the duplex decay rate, Table 1 correctly shows the interval of this kinetic constant as: [1.15 ⋅ 10-8, 6.4 ⋅ 10-5]. Both values are derived from the basic parameters of the system and can be computed according to Eq. (11). The minimum: as the parameter set corresponding to this value is: . The maximum: with .

      Are those values related to experiments, or are they chosen because specific behaviors can happen only then?

      See above.

      The choice of the model and parameters potentially impact the two main results, the attenuation of the error threshold and the role of GC content:

      Regarding the error threshold, it is also noted (lines 379-385) that it disappears when back mutations are taken into account. This suggests that overcoming the error threshold might not be as difficult as suggested, and can be achieved in several ways, which calls into question the importance of the particular role of parabolic growth. Besides, when the concentration of replicators is low, product inhibition may be negligible, such that a "parabolic replicator" is effectively growing exponentially and an error catastrophe may occur. Do the authors think that this consideration could affect their conclusion? Can simulations be performed?

      The assumption of back mutation only provides a theoretical solution to the error threshold problem: back mutation guarantees a positive (non-zero) concentration of a master type, but, since the probability of back mutation is generally very low, this equilibrium concentration may be extremely low, or negligible for typical system sizes. Consequently, back mutation alone does not solve the problem of the error catastrophe: in our system back mutation is present (the probability that a sequence with 𝑘 errors mutates back to a master sequence is 𝜇k(1−𝜇)L-k), and the diversity-maintaining ability is limited. The effect of back mutation decreases exponentially with increasing sequence length.

      Regarding the role of the GC content, GC-rich oligomers are found to perform the worst but no rationale is provided.

      For GC-rich oligonucleotides the dissociation probability of a template-copy complex is relatively low (cf. Eqs. (9, 10)), thus they have a relatively low number of offspring, cf. lines 557-561: “a relatively high dissociation probability and the consequential higher propensity of being in a simple stranded form provides an advantage for sequences with relatively low GC content in terms of their replication affinity, that is, the expected number of offspring in case of such variants will be relatively high.”. Note that the simulation results shown in Fig. 3A, demonstrate the realization of this effect with prepared sequences (along a GC content gradient).

      One may assume that it happens because GC-rich sequences are comparatively longer to release the product. However, it is also conceivable that higher GC content may help in the polymerization of the monomers as the monomers attach longer on the template (as described in Eq. (9)). This is an instance where the choice to pull into a single step the association and polymerization reactions are pulled into a single step independent of GC content may be critical.

      It would be important to show that the result arises from the actual physics and not from this modeling choice.

      Some more specific points that would deserve to be addressed:

      • Line 53: it is said that p "reflects how easily the template-reaction product complex dissociates". This statement is not correct. A reaction order p<1 reflects product inhibition, the propensity of templates to bind to each other, not slow product release. Product release can be limiting, yet a reaction order of 1 can be achieved if substrate concentrations are sufficiently high relative to oligomer concentrations (von Kiedrowski et al., 1991).

      We think the key reference is Von Kiedrowski (1993) in this case. Other things being equal, his Table 1 on p. 134 shows that a sufficient increase in 𝐾4, i.e., the stability of the duplex (template and copy) (association rate divided by dissociation rate) throws the system into the parabolic regime. This is what we had in mind. In order to clarify this, we modified the quoted sentence thus: “In this kinetics, the growth order is equal or close to 0.5 (i.e., the dynamics is sub-exponential) because increased stability of the template-copy complex (rate of association divided by dissociation) promotes parabolic growth (von Kiedrowski et al., 1991; von Kiedrowski & Szathmáry, 2001).”

      • Population size is a key parameter, and a comparison is made between small (10^3) and large (10^5) populations, but without explaining what determines the scale (small/large relative to what?).

      The “small” value (103) corresponds to the smallest meaningful population size, significantly smaller population sizes (e.g. 102) cannot maintain the 10 master types (or any subset of them) and are chemically unrealistic. The “large value” (105) is the largest population size for which simulation times are still acceptable, in the case of 106 the runtimes are in the order of months.

      • In the same vein, we might expect size not to be the only important parameter, but also concentration.

      With constant volume population size and concentration are strictly coupled.

      • Lines 543-546: if understanding correctly, the quantitative result is that the error threshold rises from 0.1 in the exponential case to 0.196 in the parabolic. Are the authors suggesting that a factor of 2 is a significant difference?

      In this paragraph we compared the empirical error threshold of our system (which is close to 𝑝"#$ = 0.15) with the error threshold of the well-known single peak fitness landscape (which can be approximated by ) as a reference case. To make the message even clearer we have extended the last sentence (lines 596-597) as follows: “but note that applying this approach to our system is a serious oversimplification”. The 0.196 is simply the probability of error-free replication of a sequence when , but we have removed this sentence (“corresponding to the replication accuracy of a master sequence”) from the manuscript as it seems to be confusing.

      • Figure 3C: this figure shows no statistically significant effect?

      Thank you for pointing out this. We statistically tested the hypothesis that the GC content between the survived and the extinct master subsets are different. This analysis revealed that the differences between these two groups are statistically significant, which we now included in the manuscript at lines 380-390: “A direct investigation of whether the sequence composition of the master types is associated with their survival outcome was conducted using the data from the constant population model simulation results (Figure 2). In these data, the average GC content was measured to be lower in the surviving master subpopulations than in the extinct subpopulations (Figure 3C). To determine whether this difference was statistically significant, nonparametric, two-sample Wilcoxon rank-sum tests (Hollander & Wolfe, 1999) were performed on the GC content of the extinct-surviving master subsets. The GC content was significantly different between these two groups in all nine investigated parameter combinations of population size (N) and replicability distance (δ) at p<0.05 level, indicating a selective advantage for a lower GC content in the constant population model context. The exact p values obtained from this analysis are shown in Figure 3C.”

      • line 542: "phase transition-like species extension (Figure 4B)": such a clear threshold is not apparent.

      Thank you for pointing out the incorrect phrasing. As there is no clear threshold in the number of coexisting types as a function of the mutation rate, we removed the “phase transition-like” expression: “However, when finite population sizes and stochastic effects are taken into account, at the largest investigated per-base mutation rate (𝑝mut = 0.15), the summed relative steady-state master frequencies approach zero (Figure 4C) with accelerating species extinction (Figure 4B), indicating that this value is close to the system׳s empirical error threshold.” (lines 589-594).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      On the whole, the work is well done and presented, there are no major recommendations. It seems a good idea to cite and briefly discuss this recent paper: https://pubmed.ncbi.nlm.nih.gov/36996101/ which develops a symbiotic scenario of the coevolution of primordial replicators and reproducers that appears to be fully compatible with the results of the current work.

      Thank you for bringing this article to our attention. We have inserted the following sentence at lines 621-624: “The demonstrated diversity-maintaining mechanism of finite parabolic populations can be used as a plug-in model to investigate the coevolution of naked and encapsulated molecular replicators (e.g., Babajanyan et al., 2023).”

      The manuscript is well written, but there are some minor glitches that merit attention. For example:

      l. 5 "carriers presents a problem, because product formation and mutual hybridization" - "mutual" is superfluous here, delete

      l. 13 "amplification. In addition, sequence effects (GC content) and the strength of resource" - hardly "effects" - should be 'features' or 'properties'

      l. 41 "If enzyme-free replication of oligomer modules with a high degree of sequence" - "modules" here is only confusing - simply, "oligomers"

      l. 44 "under ecological competition conditions with which distinct replicator types with different" - delete "with" etc, there are many such minor glitches that are best corrected.

      Thank you for pointing out, we have corrected! Other drafting errors, glitches, superfluous sentences have also been corrected.

      Reviewer #2 (Recommendations For The Authors):

      None

      Editor (Recommendations For The Authors):

      In the manuscript, it appears that coexistence is assessed at a given point in time, while figures seem to show that it remains time-dependent. It would be great if the authors could clarify this and/or discuss this.

      We appreciate you bringing this to our attention, as we have indeed missed to elaborate on this important point. The steady state characteristic of the coexistence is assessed in our model in the following way: the relative frequency of each master sequence is tested for the condition of ≥ 100- (cut-off relative frequency for survival) in every 2,000th replication step in the interval between 10,000 replication steps before termination and actual termination (10= replication steps). If the above condition is true more than once, we consider the master type in question as survived (we have included this explanation in the Methods section: lines 258-268). Although this relatively narrow time interval can still be regarded as a snapshot of the state of the system, according to our numerical experiences, the resulting measure is a reliable quantitative indicator of the apparent stability of species coexistence in the parabolic dynamics.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Responses to reviewers’ comments

      (1) The rationale of selecting tNOX/ENOX2 as a potential target of 4-dmH, but not heliomycin, is unclear by taking a biased approach. Thus, there is high possibility that 4-dmH binds to other proteins involved in apoptosis inhibition. An unbiased screen to identify 4-dmH-binding proteins would be a better approach unless there is a clear and logical rationale.

      We apologize for this oversight. In response to this comment, we rewrote the abstract, reorganized the results, and added more references to better introduce tNOX/ENOX2.

      A) Under the “4-dmH, but not heliomycin, targets intracellular tNOX, an upstream regulator of SIRT1” result section:

      We next addressed the molecular mechanisms underlying SIRT1 inhibition and concurrent cell death by these two compounds in oral cancer cells. Being an NAD+-dependent protein deacetylase, SIRT1 activity is primarily governed by NAD+/NADH ratio, thus, there exists a positive correlation between these two [1-9]. We then questioned whether these two compounds inhibit SIRT1 by affecting the intracellular NAD+/NADH levels, and were surprised to find that 4-dmH, but not heliomycin, caused a prominent inhibition of intracellular NAD+/NADH ratio (revised Fig. 7a). The discrepancy in their ability to reduce NAD+ generation led us to explore the role of a tumor-associated NADH oxidase (tNOX, ENOX2) in 4-dmH-suppressed SIRT1 and apoptosis induction. We have previously reported that tNOX inhibition reduced the intracellular NAD+/NADH ratio and SIRT1 deacetylase activity, increasing p53 acetylation and apoptosis [10-13]. In the light of this information, we assessed the effect of the compounds on tNOX expression and found that 4-dmH, but not heliomycin, considerably diminished tNOX protein expression in a concentration-dependent manner (Fig. 7b).

      B) To demonstrate that our results from ligand-binding assays (CETSA) were specific to tNOX, we conducted more CETSA experiments to exclude PARP or NOX4 targets of 4-dmH. PARP acts as a DNA damage sensor and also a NAD+-consuming enzyme, affecting many cellular functions [14]. NOX4 belongs to the NOX family of NADPH oxidases that mediate electron transport through intracellular membranes and is also shown to be involved in tumorigenesis [15, 16]. We show that 4-dmH treatments did not seem to increase the melting temperature of PARP or NOX4, excluding those two proteins as potential targets of 4-dmH (revised Fig. 8c).

      Author response image 1.

      (2) The authors should show whether heliomycin indeed does not induce apoptosis, while 4-dmH cannot induce autophagy.

      We have reorganized and revised our manuscript and figures (Fig. 5 and Fig. 6) to better demonstrate the different cell death pathways associated with heliomycin and 4-dmH. Using flow cytometry, we show that heliomycin, but not 4-dmH, induced autophagy in two lines of oral cancer cells (Fig. 5a). In the revision, we moved up the analysis of apoptosis by JC-1 staining to Figure 5 (revised Fig. 5b). We also reorganized the protein analysis to demonstrate better the downregulation of pro-apoptotic Bak and Puma and a lack of caspase 3-directed PARP cleavage, indicating the ineffective apoptosis by heliomycin (revised Fig. 5c). Similarly, we found that the absence of upregulation of ULK1, Atg 5, Atg7, and cleaved LC3-II provided evidence for the inadequate autophagy by 4-dmH (revised Fig. 5d). Attached please see the revised Figure 5.

      Author response image 2.

      (3) They should demonstrate whether genetic knockdown of tNOX, SirT1, or both tNOX and SirT1 induces apoptosis or autophagy and also reduces malignant properties of oral cancer cells.

      A) In the revision, we conducted more experiments utilizing the RNAi-knockdown to understand the role of tNOX on the regulation of apoptosis or autophagy. Our results indicate that the tNOX-depletion effectively provoked spontaneous apoptosis and autophagy in SAS cells (revised Fig. 7e). However, given that SIRT1 per se is not the focus of this present study and SIRT1-knockdown has been shown to increase apoptotic population by other groups [17] [18], we decided not to pursue it further.

      Author response image 3.

      B) In our earlier studies, we have adequately demonstrated that tNOX confers a survival advantage for cancer cells. For example, tNOX-deficiency by RNA interference in cancer cells abolishes cancer phenotypes, reducing NAD+ production, proliferation, and migration/invasion while increasing apoptosis [19-22]. On the other hand, tNOX-overexpressing in non-cancerous cells stimulates the growth of cells, decreases doubling time, and enhances cell migration [23-26].

      (4) The authors should examine whether overexpression of SirT1 or tNOX in cells treated with heliomycin or 4-dmH could nullify heliomycin-induced autophagy and 4-dmH-induced apoptosis. Also, instead of overexpressing tNOX, they can supplement NAD into cells treated with 4-dmH.

      A) The utilization of tNOX overexpression has been previously reported in several studies, demonstrating that tNOX-overexpressing in non-cancerous cells stimulates the growth of cells, decreases doubling time, and enhances cell migration [23-26]. However, in our experiences, the effect of tNOX overexpression in cancer cells is much less apparent than that in non-cancerous cells. Thus, we decided not to study it further, given that our results from tNOX knockdown have evidently signified the role of tNOX in the regulation of apoptosis and autophagy.

      B) Since SIRT1 is not the major focus of this present study and SIRT1-overexpression has been shown to reduce stress-mediated apoptosis by other groups [27, 28], we decided not to pursue it further.

      C) The systemic deterioration in NAD+ level has been correlated with many diseases and aging [29-31]. In this regard, NAD+ administration was reported to attenuate doxorubicin-induced apoptosis in the liver of mice, suggesting a protective effect [32]. The administration of nicotinamide riboside (NR), a precursor of NAD+, was also demonstrated to prevent ROS generation and apoptosis in the mouse sepsis models [33]. With data from these animal studies already demonstrating the benefits of NAD+ supplements, we decided not to conduct similar experiments in a cell-based setting.

      (5) Related to Fig. 5C and 6a, the authors should examine the effects of heliomycin and 4-dmH on the cell cycle profiles, Annexin V positivity, and colony formation.

      We added the results from colony-forming assays and revealed that both compounds exhibited high growth-suppressive ability against oral cancer cells (revised Fig. 6c). Nevertheless, we showed that the diminution in growth by the compounds was least likely to arise from cell cycle arrest mediated by these two compounds (revised Fig. 6d). Due to the possible interference of the fluorescence wavelength of heliomycin/derivative, we examined JC-1 staining rather than Annexin V positivity. The apoptotic effect of the compounds was demonstrated in revised Fig. 5b in the revision.

      Author response image 4.

      (6) They should also examine whether either or both heliomycin and 4-dmH induce reactive oxygen species (ROS).

      In our previous report, we examined the effects of heliomycin and 4-dmH on oxidative stress utilizing H2DCFDA [34]. The dye fluoresces in the presence of intracellularly generated reactive oxygen species (ROS). We showed that 4-dmH significantly induced the generation of ROS generation. However, no marked ROS generation was observed in cells exposed to heliomycin.

      (7) Related to Fig. 9d, they should mutate amino acid residue(s) in tNOX that are crucial for the 4-dmH-tNOX binding, including Ile 90, Lys98, Pro111, Pro113, Leu115, Pro117, and Pro118, to examine whether these mutants lose the binding to 4-dmH and fail to rescue 4-dmH-induced apoptosis, unlike wild-type tNOX.

      For further evaluation of the importance of the consistent interaction residues in the three docked compound-tNOX complexes, the seven interaction residues on tNOX were substituted with alanine or glycine amino acids and then simulated the protein structures. The simulated protein structures appear slightly different from the original tNOX structure. Overall, the root mean square difference between the original tNOX structure and the structures with residues substituted by alanine or glycine amino acids was estimated at 3.339 or 4.024 angstroms (Å), respectively (Fig. S1a). The simulated protein structures were also employed to conduct the docking analysis for 4-dmH. The results of further docking analysis revealed that 4-dmH could bind within the same pocket of different types of tNOX structures but with varying orientations (Fig. S1b). This observation also suggests that the replacement of both key residues with alanine or glycine could result in a reduction of the binding affinity of 4-dmH to tNOX, with values of -8.2 and -7.6 kcal/mol, respectively. Moreover, the substitution of both key residues with alanine or glycine also reduces the number of the original interacting residues and interaction forces in core moieties in the 4-dmH-tNOX complexes (Fig. S1c and S1d). Together, our experimental results and molecular docking simulations are consistent with the notion that 4-dmH possesses a better affinity ability for tNOX than for SIRT1.

      Author response image 5.

      The simulated tNOX structures (a, b) and the binding modes of 4-dmH after docking study (c, d). (a) Superimposition of three types of tNOX structures, including the original tNOX structure (orange) and the critical residues in tNOX protein substituted with alanine (magenta) or glycine (cyan). The substituted residues were shown as sticks. (b) Superimposition of the docked 4-dmH (blue). (c) Schematic presentations of possible interactions between 4-dmH and the interacted residues in tNOX protein substituted with alanine. (d) Schematic presentations of possible interactions between 4-dmH and the interacted residues in tNOX protein substituted with glycine. The key residues were identified based on the best docking pose of 4-dmH. The red circles and ellipses indicate the identical residues that interacted with different types of tNOX structures.

      (8) Related to Fig. 10a, heliomycin appears to also reduce tNOX levels (although the extent is not as robust as 4-dmH), which is not expected since heliomycin does not bind to tNOX. They should compare the effects of heliomycin and 4-dmH on reducing the protein levels of tNOX. If heliomycin does not change the tNOX protein levels, then they need to discuss why heliomycin reduces tNOX levels in vivo.

      In our previous studies, we have shown that tNOX knockdown partially attenuates SIRT1 expression and represses growth in various cancer cell types, such as lung [22], bladder [20], and stomach [13]. We also observed that tNOX is acetylated/ubiquitinated under certain stresses and SIRT1 depletion affects tNOX expression (data not shown). It is speculated that SIRT1 deacetylates tNOX and modulates its protein stability. Thus, there is a reciprocal regulation between tNOX and SIRT1. Although heliomycin does not bind to tNOX, its inhibition of SIRT1 activity/expression might also have an impact on tNOX expression.

      (9) Related to Fig. 10F, if tNOX is an upstream regulator of SirT1 and both heliomycin and 4-dmH ultimately target SirT1, it is unclear why heliomycin and 4-dmH cause different biological outcomes. One explanation is that tNOX has apoptosis-inhibiting function other than supporting (or independent of) SirT1 and hence 4-dmH-mediated tNOX inhibition causes apoptosis rather than autophagy. They should explain and discuss more about whether tNOX-inhibiting/binding function of 4-dmH is sufficient to explain the different biological outcomes from heliomycin.

      Thank you for this valuable suggestion. Indeed, in our earlier studies, we have adequately demonstrated that tNOX-deficiency by RNA interference in cancer cells abolishes cancer phenotypes, reducing NAD+ production, proliferation, and migration/invasion while increasing apoptosis; thus, tNOX confers a survival advantage for cancer cells [19-22]. On the other hand, tNOX-overexpressing in non-cancerous cells stimulates the growth of cells, decreases doubling time, and enhances cell migration [23-26]. With these lines of evidence, we believe that tNOX not only supports but also exerts functions independent of SIRT1. The tNOX- and SIRT1-inhibiting function of 4-dmH, thus, results in the different biological outcomes from the SIRT1-binding heliomycin.

      (10) They should examine the effects of heliomycin and 4-dmH on cell viability of non-tumor cells to examine their toxicities.

      Using cell impedance measurements, we also examined the effects of heliomycin and 4-dmH on the proliferation of human non-cancerous BEAS-2B cells. Our results demonstrated that heliomycin did not exhibit cytotoxicity toward human non-cancerous BEAS-2B cells (revised Fig. 6a). Furthermore, the water-soluble 4-dmH effectively diminished cell proliferation in a dose-dependent manner in oral cancer cells, but much less apparent in that of BEAS-2B cells (revised Fig. 6b). Similar results were reported in our previous study, indicating that 4-dmH displayed much higher IC50 values against non-cancerous human dermal microvascular endothelium HMEC-1 cells compared to those of tumor cells [34].

      Author response image 6.

      (11) They should consistently use either tNOX or ENOX2 to avoid confusion.

      Thank you for the suggestion. We have now consistently used tNOX throughout the manuscript. However, for the revised Figure 7d, the commercially available antibody to ENOX2 (from Proteintech, Rosemont, IL, USA) is different from the one to tNOX (produced in our laboratory) and this is the only place we have used ENOX2 rather than tNOX.

      References

      1) Mouchiroud L, Houtkooper RH, Moullan N, Katsyuba E, Ryu D, Canto C, Mottis A, Jo YS, Viswanathan M, Schoonjans K et al: The NAD(+)/Sirtuin Pathway Modulates Longevity through Activation of Mitochondrial UPR and FOXO Signaling. Cell 2013, 154(2):430-441.

      2) He S, Gao Q, Wu X, Shi J, Zhang Y, Yang J, Li X, Du S, Zhang Y, Yu J: NAD(+) ameliorates endotoxin-induced acute kidney injury in a sirtuin1-dependent manner via GSK-3beta/Nrf2 signalling pathway. J Cell Mol Med 2022, 26(7):1979-1993.

      3) Donmez G: The neurobiology of sirtuins and their role in neurodegeneration. Trends Pharmacol Sci 2012, 33(9):494-501.

      4) Teertam SK, Phanithi PB: Up-regulation of Sirtuin-1/autophagy signaling in human cerebral ischemia: possible role in caspase-3 mediated apoptosis. Heliyon 2022, 8(12):e12278.

      5) Li BY, Peng WQ, Liu Y, Guo L, Tang QQ: HIGD1A links SIRT1 activity to adipose browning by inhibiting the ROS/DNA damage pathway. Cell reports 2023, 42(7):112731.

      6) Bai P, Canto C, Oudart H, Brunyanszki A, Cen Y, Thomas C, Yamamoto H, Huber A, Kiss B, Houtkooper RH et al: PARP-1 inhibition increases mitochondrial metabolism through SIRT1 activation. Cell Metab 2011, 13(4):461-468.

      7) Ma Y, Nie H, Chen H, Li J, Hong Y, Wang B, Wang C, Zhang J, Cao W, Zhang M et al: NAD(+)/NADH metabolism and NAD(+)-dependent enzymes in cell death and ischemic brain injury: current advances and therapeutic implications. Curr Med Chem 2015, 22(10):1239-1247.

      8) Fulco M, Schiltz RL, Iezzi S, King MT, Zhao P, Kashiwaya Y, Hoffman E, Veech RL, Sartorelli V: Sir2 regulates skeletal muscle differentiation as a potential sensor of the redox state. Mol Cell 2003, 12(1):51-62.

      9) Yang Y, Liu Y, Wang Y, Chao Y, Zhang J, Jia Y, Tie J, Hu D: Regulation of SIRT1 and Its Roles in Inflammation. Front Immunol 2022, 13:831168.

      10) Tikhomirov AS, Shchekotikhin AE, Lee YH, Chen YA, Yeh CA, Tatarskiy VV, Jr., Dezhenkova LG, Glazunova VA, Balzarini J, Shtil AA et al: Synthesis and Characterization of 4,11-Diaminoanthra[2,3-b]furan-5,10-diones: Tumor Cell Apoptosis through tNOX-Modulated NAD(+)/NADH Ratio and SIRT1. Journal of medicinal chemistry 2015, 58(24):9522-9534.

      11) Chang CF, Islam A, Liu PF, Zhan JH, Chueh PJ: Capsaicin acts through tNOX (ENOX2) to induce autophagic apoptosis in p53-mutated HSC-3 cells but autophagy in p53-functional SAS oral cancer cells. Am J Cancer Res 2020, 10(10):3230-3247.

      12) Lin CY, Islam A, Su CJ, Tikhomirov AS, Shchekotikhin AE, Chuang SM, Chueh PJ, Chen YL: Engagement with tNOX (ENOX2) to Inhibit SIRT1 and Activate p53-Dependent and -Independent Apoptotic Pathways by Novel 4,11-Diaminoanthra[2,3-b]furan-5,10-diones in Hepatocellular Carcinoma Cells. Cancers (Basel) 2019, 11(3).

      13) Chen HY, Cheng HL, Lee YH, Yuan TM, Chen SW, Lin YY, Chueh PJ: Tumor-associated NADH oxidase (tNOX)-NAD+-sirtuin 1 axis contributes to oxaliplatin-induced apoptosis of gastric cancer cells. Oncotarget 2017, 8(9):15338-15348.

      14) Xu Q, Liu X, Mohseni G, Hao X, Ren Y, Xu Y, Gao H, Wang Q, Wang Y: Mechanism research and treatment progress of NAD pathway related molecules in tumor immune microenvironment. Cancer Cell Int 2022, 22(1):242.

      15) Brandes RP, Weissmann N, Schroder K: Nox family NADPH oxidases: Molecular mechanisms of activation. Free Radic Biol Med 2014, 76:208-226.

      16) Gong S, Wang S, Shao M: NADPH Oxidase 4: A Potential Therapeutic Target of Malignancy. Front Cell Dev Biol 2022, 10:884412.

      17) Wang Y, Sui Y, Niu Y, Liu D, Xu Q, Liu F, Zuo K, Liu M, Sun W, Wang Z et al: PBX1-SIRT1 Positive Feedback Loop Attenuates ROS-Mediated HF-MSC Senescence and Apoptosis. Stem Cell Rev Rep 2023, 19(2):443-454.

      18) Wang X, Lu Y, Tuo Z, Zhou H, Zhang Y, Cao Z, Peng L, Yu D, Bi L: Role of SIRT1/AMPK signaling in the proliferation, migration and invasion of renal cell carcinoma cells. Oncol Rep 2021, 45(6).

      19) Liu SC, Yang JJ, Shao KN, Chueh PJ: RNA interference targeting tNOX attenuates cell migration via a mechanism that involves membrane association of Rac. Biochem Biophys Res Commun 2008, 365(4):672-677.

      20) Lin MH, Lee YH, Cheng HL, Chen HY, Jhuang FH, Chueh PJ: Capsaicin Inhibits Multiple Bladder Cancer Cell Phenotypes by Inhibiting Tumor-Associated NADH Oxidase (tNOX) and Sirtuin1 (SIRT1). Molecules 2016, 21(7).

      21) Cheng HL, Lee YH, Yuan TM, Chen SW, Chueh PJ: Update on a tumor-associated NADH oxidase in gastric cancer cell growth. World J Gastroenterol 2016, 22(10):2900-2905.

      22) Lee YH, Chen HY, Su LJ, Chueh PJ: Sirtuin 1 (SIRT1) Deacetylase Activity and NAD(+)/NADH Ratio Are Imperative for Capsaicin-Mediated Programmed Cell Death. J Agric Food Chem 2015, 63(33):7361-7370.

      23) Islam A, Su AJ, Zeng ZM, Chueh PJ, Lin MH: Capsaicin Targets tNOX (ENOX2) to Inhibit G1 Cyclin/CDK Complex, as Assessed by the Cellular Thermal Shift Assay (CETSA). Cells 2019, 8(10).

      24) Su YC, Lin YH, Zeng ZM, Shao KN, Chueh PJ: Chemotherapeutic agents enhance cell migration and epithelial-to-mesenchymal transition through transient up-regulation of tNOX (ENOX2) protein. Biochim Biophys Acta 2012, 1820(11):1744-1752.

      25) Zeng ZM, Chuang SM, Chang TC, Hong CW, Chou JC, Yang JJ, Chueh PJ: Phosphorylation of serine-504 of tNOX (ENOX2) modulates cell proliferation and migration in cancer cells. Experimental cell research 2012, 318(14):1759-1766.

      26) Chueh PJ, Wu LY, Morre DM, Morre DJ: tNOX is both necessary and sufficient as a cellular target for the anticancer actions of capsaicin and the green tea catechin (-)-epigallocatechin-3-gallate. Biofactors 2004, 20(4):235-249.

      27) Ran D, Zhou D, Liu G, Ma Y, Ali W, Yu R, Wang Q, Zhao H, Zhu J, Zou H et al: Reactive Oxygen Species Control Osteoblast Apoptosis through SIRT1/PGC-1alpha/P53(Lys382) Signaling, Mediating the Onset of Cd-Induced Osteoporosis. J Agric Food Chem 2023.

      28) Zhang Z, Chen X, Liu S: Role of Sirtuin-1 in Neonatal Hypoxic-Ischemic Encephalopathy and Its Underlying Mechanism. Med Sci Monit 2020, 26:e924544.

      29) McReynolds MR, Chellappa K, Baur JA: Age-related NAD(+) decline. Exp Gerontol 2020, 134:110888.

      30) Xie N, Zhang L, Gao W, Huang C, Huber PE, Zhou X, Li C, Shen G, Zou B: NAD(+) metabolism: pathophysiologic mechanisms and therapeutic potential. Signal Transduct Target Ther 2020, 5(1):227.

      31) Zapata-Perez R, Wanders RJA, van Karnebeek CDM, Houtkooper RH: NAD(+) homeostasis in human health and disease. EMBO Mol Med 2021, 13(7):e13943.

      32) Wang B, Ma Y, Kong X, Ding X, Gu H, Chu T, Ying W: NAD(+) administration decreases doxorubicin-induced liver damage of mice by enhancing antioxidation capacity and decreasing DNA damage. Chem Biol Interact 2014, 212:65-71.

      33) Hong G, Zheng D, Zhang L, Ni R, Wang G, Fan GC, Lu Z, Peng T: Administration of nicotinamide riboside prevents oxidative stress and organ injury in sepsis. Free Radic Biol Med 2018, 123:125-137.

      34) Nadysev GY, Tikhomirov AS, Lin MH, Yang YT, Dezhenkova LG, Chen HY, Kaluzhny DN, Schols D, Shtil AA, Shchekotikhin AE et al: Aminomethylation of heliomycin: Preparation and anticancer characterization of the first series of semi-synthetic derivatives. European journal of medicinal chemistry 2018, 143:1553-1562.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      (1) General comment: The evidence for these highly novel, potentially interesting roles (of the exocyst) would need to be more compelling to support direct involvement.

      We wish to thank the reviewer for his/her comments, and for considering that the proposed functions are highly novel and potentially interesting. To strengthen the evidence supporting the new roles of the exocyst, we have performed a number of additional experiments that are depicted in novel figures or figure panels of the new version of the manuscript. Particularly, we aimed at providing further support of the direct involvement of the exocyst in different steps of the regulated secretory pathway. Please see the details below.

      (2) For instance, the localization of exocyst to Golgi or to granule-granule contact sites does not seem substantial.

      We have performed quantitative colocalization studies, as suggested by the reviewer to further substantiate our initial findings. We have carefully analysed GFP-Sec15 distribution in relation to the Golgi complex and secretory Glue granules at relevant time points of salivary gland development. Overall, we found that GFP-Sec15 distribution is dynamic during salivary gland development. Before Glue synthesis (72 h AEL), Sec15 was observed in close association (defined as a distance equal to, or less than 0.6 µm) with the Golgi complex (please see below Author response image 1). This association was lost once Glue granules have begun to form (96 h AEL). Importantly, we do not see relevant association between GFP-Sec15 and the ER (please see Author response image 2). These observations support our conclusion that the exocyst plays a role at the Golgi complex. New images supporting these conclusions, as well as quantitative data, have been included in Figure 5 of the new version of the manuscript. In addition, real time imaging, as well as 3D reconstruction analyses, confirming the close association between Sec15 and Golgi cisternae are now included in the manuscript. Please see Supplementary Videos 1-3. These new data are described in the text lines 200-210 of the Results section and text lines 359368 of the Discussion section.

      Interestingly, at the time when Sec15-Golgi association is lost (96 h AEL), Sec15 foci associate instead with newly formed secretory granules (< 1µm diameter). This association persists during secretory granule maturation (100-116 h AEL), when Sec15 foci localize specifically in between neighbouring, immature secretory granules. When maturation has ended and Glue granule exocytosis begins (116-120 h AEL), this localization between granules is lost. These observations are consistent with a role of the exocyst in homotypic fusion during SG maturation. We have included new images showing that association between Sec15 and secretory granules is dynamic and depends on the developmental stage. We have quantified this association both during maturation and at a stage when SGs are already mature. We have in addition performed a 3D reconstruction analysis of these images to confirm the close association between Sec15 and immature SGs. These new data are now depicted in Figure 7BC, Supplementary Videos 4-5, and described in text lines 216-221 of the Results section. In addition, a lower magnification image is provided below in this letter (Author response image 3), quantifying the proportion of Sec15 foci localized in between SGs (yellow arrows) relative to the total number of Sec15 foci (yellow arrows + green arrowheads).

      Author response image 1.

      Criteria utilized to define Sec15 focithat were“associated” or“not associated” withthe trans-Golgi network in the experiments of Figure 5C-E of the manuscript.When the distance between maximal intensities of GFP-Sec15 and Golgi-RFP signals was equal or less than 0.6 m, the signals were considered “associated” (upper panels). When the distance was more than 0.6 m, the signals were considered “not associated” (lower panels).

      Author response image 2.

      Criteria utilized to define Sec15 focithat were“associated” or“not associated” withthe ERin the experiments of Figure 5A-Bof the manuscript.When the distance between maximal intensities of GFP-Sec15 and KDEL-RFP signals was equal or less than 0.6 m, the signals were considered “associated”. When the distance was more than 0.6 m, the signals were considered “not associated”.

      Author response image 3.

      (A) GFP-Sec15 foci (cyan) and SGs (red) are shown in cells bearing Immature SGs or (B) with mature SGs. Yellow arrows indicate GFP-Sec15 foci localized in between SGs; green arrowheads indicate GFP-Sec15 foci that arenot in between SGs. (C) Quantification of the percentage (%) of Sec15 foci localized in between SGs respect to the total number of Sec15 foci in cells filled with immature SGs (ISG)vs cells with mature SGs (MSG).

      It is interesting to mention that previous evidence from mammalian cultured cells (Yeaman et al,  2001) show that the exocyst localizes both at the trans-Golgi network and at the plasma membrane, weighing in favour of our claim that the exocyst is required at various steps of the exocytic pathway. Thus, the exocyst may play multiple roles in the secretion pathway in other biological models as well. This concept has now been included at the Discussion section of the revised version of the manuscript (lines 359-368).

      To make the conclusions of our work clearer, in the revised version of the manuscript, we have now included a graphical abstract, summarizing the dynamic localization of the exocyst in relation to the processes of SG biogenesis, maturation and exocytosis reported in our work. 

      (3) Instead, it is possible that defects in Golgi traffic and granule homotypic fusion are not due to direct involvement of the exocyst in these processes, but secondary to a defect in canonical exocyst roles at the plasma membrane. A block in the last step of glue exocytosis could perhaps propagate backward in the secretory pathway to disrupt Golgi complexes or cause poor cellular health due to loss of cell polarity or autophagy.

      We thank the reviewer for these thoughtful comments. We have performed a number of additional experiments to assess “cellular health” or to identify possible defects in cell polarity after knock-down of exocyst subunits. These new data have been included in new supplementary figures 5 and 6 of the revised version of the manuscript (please see below). 

      In our view, the precise localization of GFP-Sec15 at the Golgi complex (Figure 5C-E), as well as in between immature secretory granules (Figure 7B-D), argues in favour of a direct involvement of the exocyst in SG biogenesis and homofusion respectively. 

      We truly appreciate the comment of the reviewer raising the possibility that the defects that we observe at early steps of the pathway (SG biogenesis and SG maturation) may actually stem from a backward effect of the role of the exocyst in SG-plasma membrane tethering. We wish to respectfully point out that the processes of biogenesis, maturation and plasma membrane tethering/fusion of SGs do not occur simultaneously in the Drosophila larval salivary gland in vivo, as they do in other secretory model systems (i.e. cell culture). In this regard, the experimental model is unique in terms of synchronization. In each cell of the salivary gland, the three processes (biogenesis, maturation and exocytosis) occur sequentially, and controlled by developmental cues. At the developmental stage when SGs fuse with the plasma membrane, SG biogenesis has already ceased many hours earlier: SG biogenesis occurs at 96-100 hours after egg lay (AEL), SG maturation takes place at 100-112 hours AEL, and SG-plasma membrane fusion happens only when all SGs have undergone maturation and are ready to fuse with the plasma membrane at 116-120 h AEL. Thus, in our view it is not conceivable that a defect in SG-plasma membrane tethering/fusion (116-120 h AEL) may affect backwards the processes of SG biogenesis or SG maturation, which have occurred earlier in development (96-112 h AEL).

      As suggested by the reviewer, we have analysed several markers of cellular health and cell polarity, comparing conditions of exocyst subunit silencing (exo70RNAi, sec3RNAi or exo84RNAi) with wild type controls (whiteRNAi). These new data are depicted in Supplementary Figures 5 and 6, and described in lines 172-179 of the Results section of the revised version of the manuscript. Noteworthy, for these experiments we have applied silencing conditions that block secretory granule maturation, bringing about mostly immature SGs. Our analyses included: 1) Subcellular distribution of PI(4,5)P2, 2) subcellular distribution of the tetraspanin CD63, 3) of Rab11, 4) of filamentous actin, and 5) of CD8. We have also compared 6) nuclear size and nuclear general morphology, 7) the number and distribution of mitochondria, 8) morphology and subcellular distribution of the cis- and 9) trans-Golgi networks. Finally, 10) we have compared basal autophagy in salivary cells with or without knocking down exocyst subunits. The markers that we have analysed behaved similarly to those of control salivary glands, suggesting that the observed defects in regulated exocytosis indeed reflect different roles of the exocyst in the secretory pathway, rather than poor cellular health or impaired cell polarity.  

      Our conclusions are in line with previous studies in which apico-basal polarity, Golgi complex morphology and distribution, as well as apical membrane trafficking were also evaluated in exocyst mutant backgrounds, finding no anomalies (Jafar-Nejad et al, 2005). 

      Conversely, in studies in which apical polarity was disturbed by interfering with Crumbs levels, SG biogenesis, maturation and exocytosis were not affected (Lattner et al, 2019), indicating that these processes not necessarily interfere with one another.  

      (4) Final recommendation: In the absence of stronger evidence for these other exocyst roles, I would suggest focusing the study on the canonical role (interesting, as it was previously reported that Drosophila exocyst had no function in the salivary gland and limited function elsewhere [DOI: 10.1034/j.1600-0854.2002.31206.x]), and leave the alternative roles for discussion and deeper study in the future.  

      We appreciate the reviewer´s recommendation. However, we believe that the major strength of our work is the discovery of non-canonical roles of the exocyst complex, unrelated to its function as a tethering complex for vesicle-plasma membrane fusion. We believe that in the new version of our manuscript, we provide stronger evidence supporting the two novel roles of the exocyst:

      a) Its participation in maintaining the normal structure of the Golgi complex, and b) Its function in secretory granule maturation.

      Reviewer 2:

      (5) General comment: A key strength is the breadth of the assays and study of all 8 exocyst subunits in a powerful model system (fly larvae). Many of the assays are quantitated and roles of the exocyst in early phases of granule biogenesis have not been ascribed. 

      We are grateful that the reviewer appreciates the novelty of our contribution.

      (6) However there are several weaknesses, both in terms of experimental controls, concrete statements about the granules (better resolution), and making a clear conceptual framework. Namely, why do KD of different exocysts have different effects on presumed granule formation

      The reviewer has raised a point that is central to the interpretation of all our data throughout the manuscript. The short answer is that the extent of RNAi-dependent silencing of exocyst subunits determines the phenotype: 

      1) Maximum silencing affects Golgi complex morphology and prevents SG biogenesis. 2) Intermediate silencing blocks SG maturation, without affecting Golgi complex morphology and SG biogenesis. 3) Weak silencing blocks SG tethering and fusion with the plasma membrane, without affecting Golgi complex morphology, SG biogenesis or SG maturation. 

      In other words, 1) Low levels of exocyst subunits are sufficient for normal Golgi complex morphology and SG biogenesis. 2) Intermediate levels of exocyst subunits are sufficient for SG maturation (and also sufficient for SG biogenesis). 3) High levels of exocyst subunits are required for SG tethering and subsequent fusion with the plasma membrane. 

      Based on the above notion, we have exploited the fact that temperature can fine-tune the level of Gal4/UAS-dependent transcription, thereby achieving different levels of silencing, as shown by Norbert Perrimon et al in their seminal paper “the level of RNAi knockdown can also be altered by using Gal4 lines of various strengths, rearing flies at different temperatures, or via coexpression of UAS-Dicer2” (Perkins et al, 2015). 

      We found in our system that indeed, by applying appropriate silencing conditions (RNAi line and temperature) to any of the eight subunits of the exocyst, we have been able to obtain one of the three alternative phenotypes: Impaired SG biogenesis, or impaired SG maturation, or impaired SG tethering/fusion with the plasma membrane.

      These concepts are summarized below in Author response image 4. Please see also at point 26, the general comment of Reviewer #3. 

      We have conducted qRT-PCR assays to provide experimental support to the notions summarized above in Author response image 4. We measured the remaining levels of mRNAs of some of the exocyst subunits, after inducing RNAi-mediated silencing at different temperatures, or with different RNAi transgenic lines. The remaining RNA levels after silencing correlate well with the observed phenotypes, following the predictions of Author response image 4 and summarized in Author response image 5. These new data are now shown in Supplementary Figure 2 of the revised version of the manuscript, and described in lines 153-159 at the Results section.

      (7) Why does just overexpression of a single subunit (Sec15) induce granule fusion?

      The reviewer raises a very important point. Based on available data from the literature, Sec15 behaves as a seed for assembly of the holocomplex and it also mediates the recruitment of the holocomplex to SGs through its interaction with Rab11 (Escrevente et al, 2021; Bhuin and Roy, 2019; Wu et al, 2005; Zhang et al, 2004; Guo et al, 1999). Thus, overexpression of Sec15 is expected to enhance exocyst assembly, thereby potentiating the activities carried out by the complex in the cell, including SG homofusion. In the revised version of the manuscript we have also performed the overexpression of Sec8, finding that, unlike Sec15, Sec8 fails to induce homotypic fusion. These results were expected, as they confirm that Sec8 does not behave as a seed for mounting the whole complex. These new data have been included in Figure 7E-H, and are described in text lines 221-229 of the Results section. 

      Author response image 4.

      Conceptual model of RNAi expression at different temperatures , remaining levels of mRNA/protein levels and phenotypes obtained at each temperature.

      Author response image 5.

      qRT-PCR assays presented in Supplementary Figure 2 are shown in combination with the phenotypes observed at each of the conditions analyzed. Note the correlation between phenotypes and the extent of mRNA downregulation.

      (8) While the paper is fascinating, the major comments need to be addressed to really be able to make better sense of this work, which at present is hard to disentangle direct vs. secondary effects, especially as much of the TGN seems to be altered in the KDs.  

      We hope that our response to point 6) has helped to clarify this important point raised by the Reviewer. After applying silencing conditions where normal structure of the trans-Golgi network is impaired, SG biogenesis does not occur. Thus, since SGs do not form, it is not conceivable to detect defects in SG maturation or SG fusion with the plasma membrane in the same cell.

      (9) The authors conveniently ascribe many of the results to the holocomplex, but their own data (Fig. 4 and Fig. 6) are at odds with this.

      This is another central point of our work, so we thank the reviewer for his/her comment. In Figures 4A, 7A and 9A of the revised version of the manuscript, we show that, by inducing appropriate levels of silencing of any of the 8 subunits of the exocyst, each of the three alternative phenotypic manifestations can occur. In our opinion, this argues in favour of a function for the whole exocyst complex in each of the three specific activities proposed in our study: 1) SG biogenesis, 2) SG maturation, and 3) SG tethering/fusion with the plasma membrane. In detailed characterizations of these three phenotypes performed throughout the study, we decided to induce silencing of just two or three of the subunits of the exocyst, assuming that the whole complex accounts the mechanisms involved.

      Major comments

      (10) Resolution not sufficient. Identification of "mature secretory granules" (MSG) in Fig. 3 is based on low-resolution images in which the MSG are not clearly seen (see control in Fig. 3A) and rather appear as a diffuse haze, and not as clear granules. There may be granules here, but as shown it is not clear. Thus it would be helpful to acquire images at higher resolution (at the diffraction limit, or higher) to see and count the MSG.

      We thank the reviewer for raising this point, as it may not be straightforward to the reader to identify the SGs throughout the figures of our study. To make it clearer, in Figure 3A (magnified insets on the right), we have delimitated individual SGs with a green dotted line, and included diagrams (far right), which we hope will help the identification of SGs. In Figure 3B, we show that after silencing Sec84, a mosaic phenotype was observed: In some cells SGs fail to undergo maturation, and remain smaller than normal. In other cells of this mosaic phenotype, biogenesis of SGs was impaired and the fluorescent cargo remained trapped in a mesh-like structure (that we later show that corresponds to the ER). The dotted line marks individual SGs, and the diagrams included on the right intend to help the interpretation of the phenotype. The mesh-like structures where Sgs3-GFP was retained are also marked with dotted line, and schematized on the right. These new schemes are described in the Figure 3 caption of the revised version of the manuscript.

      We wish to mention that all the confocal images depicted in this figure and throughout the manuscript  have been captured at high resolution, with a theoretical resolution limit of 168177nm (d = γ/2NA). Given that secretory granules range from 0.8-7µm in diameter, the resolution is more than sufficient to clearly resolve these structures. 

      (11) Note: the authors are not clear on which objective was used. Maybe the air objective as the resolution appears poor).  

      In this particular figure, we have utilized a Plan-Apochromat 63X/1.4NA oil objective of the inverted Carl Zeiss LSM 880 confocal microscope (mentioned in materials and methods).

      (12) They need to prove that the diffuse Sgs3-GFP haze is indeed due to MSG.  

      If we interpret correctly the concern of the reviewer, what he/she calls “diffuse haze” is actually the distribution of Sgs3-GFP within individual SGs, which, as previously reported by other authors, is not homogeneous at this stage (Syed et al. 2022). We hope that the diagrams that we have included in Figure 3 A, B (point 10) will help the readers interpreting the images.   

      (13) Related it is unclear what are the granule structures that correspond to Immature secretory granules (ISG) and cells with mesh-like structures (MLS)?

      We are confident that the diagrams now included in Figure 3A and B will help the interpretation, and particularly to identify immature granules and the mesh-like structure generated after silencing of exocyst subunits.

      (14) Similarly, Sgs3 images of KD of 8 exocyst subunits were interpreted to be identical, in Fig. 4, but the resolution is poor.

      We hope that the issue related to resolution of our images has been properly addressed in the response to point 10) of this letter. In Figure 4A, we show that after silencing of any of the 8 subunits (with the appropriate conditions), in all cases SG biogenesis was impaired, and Sgs3GFP was instead retained in a mesh-like structure. Images obtained after silencing different exocyst subunits are of course not identical, but in all cases, a mesh-like structure has replaced the formation of SGs (Figure 4A). Hopefully, the diagrams now included in Figure 3A and B help the correct interpretation of the phenotypes throughout the study.

      To demonstrate that the structure in which Sgs3-GFP was retained upon exocyst complex knockdown corresponds to the ER, we performed a colocalization analysis between Sgs3-GFP and the ER markers GFP-KDEL or Bip-sfGFP-HDEL, after which we calculated the Pearsons Coefficient, which indicated substantial colocalization (Figure 4B-G and Supplementary Figures 7 and 8). These new data are described in lines 196-199 of the revised version of the manuscript. To facilitate the visualization of the results, in the revised version of the manuscript we have included magnified cropped areas of the images shown in Figure 4A.

      (15) What is remarkable is a highly variable effect of different subunit KD on the percentage of cells with MLS (Fig. 4C). Controls = 100 %, Exo70=~75% (at 19 deg), Sec3 = ~30%, Sec10 = 0%, Exo84 = 100% ... This is interesting for the functional exocyst is an octameric holocomples, thus why the huge subunit variability in the phenotypes? The trivial explanation is either: i) variable exocyst subunit KD (not shown) or ii) variability between experiments (no error bars are shown). Both should be addressed by quantification of the KD of different proteins and secondly by replicating the experiments.

      We agree with the reviewer statement. We believe that both, variability of KD efficiency (i) and variability between experiments (ii) contribute to the variable effect observed after knocking down the different subunits. As detailed in the response to point 6), we have performed qRT-PCR determinations to confirm that the severity of the phenotype depends on the efficiency of RNAimediated silencing. We chose to analyse in detail the effect on the subunits exo70 and sec3, which were those with the highest phenotypic differences between the three silencing temperatures utilized. We found that as expected, the levels of silencing were temperaturedependent, being higher at 29°C and lower at 19°C. These data were included in Supplementary Figure 2, and described lines 153-159 of the Results section and also summarized in Author response images 4 and 5 of this rebuttal letter.

      We thank the reviewer for his/her comment on the replication of experiments and statistics. We failed to include detailed numerical information in the original submission, such as the number of replicas and standard deviations of the data depicted in Figure 3C and Supplementary Figure 1, so we apologize for this omission. In the revised version of the manuscript, we have included a table (Supplementary Table 3) in which all the raw data of Figure 3C and Supplementary Figure 1, including standard deviations, are now depicted.

      (16) If their data holds up then the underlying mechanism here needs to be considered.

      (Note: there is some precedent from the autophagy field of differential exocyst effects)

      Our proposed mechanism is essentially that the holocomplex is required for multiple processes along the secretory pathway. Each of these actions (Golgi structure maintenance, SG maturation and SG tethering/fusion with the plasma membrane) requires different amounts of holocomplex activity, being this the reason why each phenotype manifests at different levels of RNAi-mediated silencing (Author response image 4 of this letter). The model predicts that Golgi structure maintenance requires minimal levels of complex activity, and that is why strong knock-down of exocyst subunits is required to obtain this phenotype. In line with our results, it has been reported that other tethering complexes of the CATCHR family are also required for maintaining Golgi cisternae stuck together (D'Souza et al, 2020; Khakurel and Lupashin, 2023; Liu et al, 2019). One possibility is that the exocyst may play a redundant role in the maintenance of the normal structure of the Golgi complex, along with other CATCHR complexes. This potential redundancy could explain why severe exocyst knock-down is required to observe structural anomalies at this organelle. On the other end of the spectrum, we propose that tethering/fusion with the plasma membrane is very susceptible to even slight reduction of complex activity, so that mild RNAi-mediated silencing is sufficient to provoke defects in this process. This proposed model is depicted in Author response image 4 and discussed in lines 395-405 of the Discussion section. 

      (17) In the salivary glands the authors state that the exocyst is needed for Sgs3-GFP exit from the ER. First, Pearson's coefficient should be shown so as to quantitate the degree of ER localizations of all KDs.

      We thank the reviewer for this comment that helped us to strengthen the observation that when SG biogenesis is impaired, Sgs3-GFP remains trapped in the ER. In the revised version of the manuscript, we have calculated Pearson´s coefficient to assess colocalization between ER markers (GFP-KDEL or Bip-sfGFP-HDEL) and Sgs3-GFP in salivary gland cells that express sec15RNAi. The Pearson’s coefficient was around 0.6 for both ER markers, indicating that colocalization with Sgs3-GFP was substantial (Supplementary Figure 8, text lines 196-199 of the Results section).

      (18) Second, there should be some rescue performed (if possible) to support specificity. 

      As suggested by the reviewer, we have performed a rescue experiment of the phenotype provoked by the expression of sec15 RNAi, which consisted on the retention of Sgs3-GFP in the endoplasmic reticulum: Expression of Sec15-GFP reverted substantially the ER retention phenotype, rescuing SG biogenesis and also SG maturation in most cells (over 60% of the cells). These new data are now shown in Supplementary Figure 4, and described in lines 168-171 of the Results section.

      (19) Third, importantly other proteins that should traffic to the PM need to be shown to traffic normally so as to rule out a non-specific effect.

      We have addressed this issue (also mentioned by Reviewer #1), by analyzing the localization of a number of polarization markers, finding that the overall polarization of the cell was not affected by loss of function of exocyst subunits. Please, see our response to the point 3) raised by Reviewer #1. The new data showing cell polarization markers are shown in Supplementary Figure 6 of the revised version of the manuscript, and described on text lines 172-179 of the Results section.

      (20) It is unclear from their model (Fig. 5) why after exocyst KD of Sec15 the cis-Golgi is more preserved than the TGN, which appears as large vacuoles. This is not quantitated and not shown for the 8 subunits.

      We thank the reviewer for this relevant comment. We agree that the phenotype of either, sec15 or sec3 loss-of-function cells manifests differently with cis-Golgi and trans-Golgi markers. While the cis-Golgi marker looked fragmented and aggregated, the trans-Golgi marker adopted a swollen appearance. However, in our view, the different appearance of the two markers does not necessarily imply that one compartment is more preserved than the other. In the revised version of the manuscript, we have quantified the penetrance of the phenotypes provoked by sec15 or sec3 silencing, using both cis-Golgi and trans-Golgi markers. In both cases, the penetrance was high, although even higher with the trans-Golgi marker. These new data are now depicted in Supplementary Figure 9 of the revised version of the manuscript. 

      It is interesting to mention that in HeLa cells, as well as in the retinal epithelial cell line hTERT, Golgi phenotypes similar to those we have described here have been reported after loss-offunction of other tethering complexes, which were shown to maintain the Golgi cisternae stuck together, including the GOC and GARP complexes (D'Souza et al, 2020, Khakurel and Lupashin, 2023; Shijie Liu et al, 2019). As we did throughout our work, not every aspect of the analysis included the silencing of all eight subunits. In this case, we chose to silence Sec3 and Sec15. Please note that we have modified the model depicted in Figure 6E-F, to highlight the cis- and transGolgi phenotypes upon exocyst knock-down, as well as the localization of the exocyst in cisternae of the Golgi complex.

      (21) Acute/Chronic control: It would be nice to acutely block the exocyst so as to better distinguish if the effects observed are primary or secondary effects (e.g. on a recycling pathway).

      We thank the reviewer for raising this important issue. To address this point, and to be able to induce silencing of exocyst subunits at specific time intervals of larval development, we utilized a strategy based on a thermosensitive variant of the Gal4 inhibitor Gal80 (Gal80ts)(Lee and Luo, 1999). We blocked Gal4 activity (and therefore RNAi expression) by maintaining the larvae at 18 °C during the 1st and 2nd instars (until 120 hours after egg lay), and then induced the activity of Gal4 specifically at the 3rd larval instar by raising the temperature to 29 ºC, a condition in which Gal80ts becomes inactive. After silencing the expression of sec3 or sec15 at the 3rd larval instar only, the phenotype was very similar to that observed after chronic silencing of exocyst subunits (larvae maintained at 29 ºC all throughout development, where Gal4 was never inhibited). These observations suggest that the defects observed in the secretory pathway after knock down of exocyst subunits reflect genuine functions of the exocyst in this pathway, rather than a secondary effect derived from impaired development of the salivary glands at early larval stages. These new results are now shown in Supplementary Figure 3, and described in manuscript lines 160-171 of the Results section.   

      (22) Granule homotypic fusion. Strangely over-expression of just one subunit, Sec15-GFP, made giant secretory granules (SG) that were over 8 microns big! Why is that, especially if normally the exocyst is normally a holocomplex. Was this an effect that was specific to Sec15 or all exocyst subunits? Is the Sec15 level rate limiting in these cells? It may be that a subcomplex of Sec15/10 plays earlier roles, but in any case this needs to be addressed across all (or many) of the exocyst subcomplex members.

      Please, see our response to point 7) of this letter. Sec15 is believed to act as a seed for the formation of the whole complex.

      (23) In summary, there are clearly striking effects on secretory granule biogenesis by dysfunction of the exocyst, however right now it is hard to disentangle effects on ERGolgi traffic, loss of the TGN, and a problem in maturation or fusion of granules. 

      As discussed in detail in our response to the point 3 raised by Reviewer #1, the secretory pathway is highly synchronized in each of the cells of the Drosophila salivary gland. SG biogenesis, SG maturation and SG fusion with the plasma membrane never occur simultaneously in the same cell. Thus, in a cell in which ER-Golgi traffic is impaired (and SG biogenesis does not occur), SGs do not exist, and therefore, they cannot exhibit defects in the process of maturation or fusion with the plasma membrane. In summary, we believe that our work has shown that in Drosophila larval salivary glands the exocyst holocomplex is required for (at least) three functions along the secretory pathway: 1) To maintain the appropriate Golgi complex architecture, thus enabling ERGolgi transport; 2) For secretory granule maturation: both, homotypic fusion and acquisition of maturation factors; 3) For secretory granule exocytosis: secretory granule tethering to enable subsequent fusion with the plasma membrane. As mentioned above (point 6 of this letter), these three functions require different amounts of the holocomplex, and therefore can be revealed by inducing different levels of silencing.  

      (24) It is also confusing if the entire exocyst holocomplex or subcomplex plays a key role 

      The fact that, by silencing any of the subunits (with the appropriate conditions) it is possible obtain any of the 3 phenotypes (impaired SG biogenesis, impaired SG maturation or impaired SG fusion with the plasma membrane) argues in favour of a function of the complex as a whole in each of these three functions.

      Reviewer 3:

      (25) General comment: Freire and co-authors examine the role of the exocyst complex during the formation and secretion of mucins from secretory granules in the larval salivary gland of Drosophila melanogaster. Using transgenic lines with a tagged Sgs3 mucin the authors KD expression of exocyst subunit members and observe a defect in secretory granules with a heterogeneity of phenotypes. By carefully controlling RNAi expression using a Gal4-based system the authors can KD exocyst subunit expression to varying degrees. The authors find that the stronger the inhibition of expression of exocyst the earlier in the secretory pathway the defect. The manuscript is well written, the model system is physiological, and the techniques are innovative.

      We appreciate the reviewer´s assessment of our work. 

      (26) My major concern is that the evidence underlying the fundamental claim of the manuscript that "the exocyst complex participates" in multiple secretory processes lacks direct evidence.

      We thank the reviewer for raising this important issue. We believe that the analysis of Sec15 subcellular localization during salivary gland development (Figures 5, 7B-D and 9E-F), in combination with the detailed analysis of the phenotypes provoked by loss-of-function of each of the exocyst subunits, provide evidence supporting multiple functions of the exocyst in the secretory pathway. We have also included 3D reconstructions and videos of GFP-Sec15 colocalization with Golgi and SG markers to support exocyst localization associated to these structures (Supplementary Videos 1-7), text lines 200-210; 216-221 and 303-305.

      (27) It is clear from multiple lines of evidence, which are discussed by the authors, that exocyst is essential for an array of exocytic events. The fundamental concern is that loss of homeostasis on the plasma membrane proteome and lipidome might have severe pleiotropic effects on the cell.

      We agree with the reviewer that this is an important point that needed to be addressed. As discussed in detail above at the response to point 3 raised by Reviewer #1, we have analysed several plasma membrane markers (including a PI(4,5)P2 lipid reporter), and found that overall, plasma membrane integrity and polarity were not substantially affected (Supplementary Figure 6). In addition, we have analyzed several markers of general cellular “health” that indicate that salivary gland cells do not seem to be distressed by the reduction of exocyst complex activity (Supplementary Figure 5). These new data are described in lines 172-179 of the Results section.

      (28) Perhaps the authors have more evidence that exocyst is important for homeotypic fusion of the SGs, as supported by the localisation of Sec15 on the fusion sites.

      We believe that the fact that, by silencing any of the exocyst subunits (with the appropriate conditions), immature smaller-than-normal granules were observed, argus in favour that the exocyst as a whole participates in SG homofusion (Figure 7A). In addition, we have included more images, quantifications, 3D reconstructions and videos of GFP-Sec15 localized just at the contact sites between immature SGs. We have quantified and compared GFP-Sec15 localization at immature SG vs its localization at mature SGs, finding that localizes preferentially at immature SGs, supporting a role of the exocyst as a tethering complex during homotypic fusion (shown Figure 7B-C and Supplementary Videos 4-6, and described in lines 216-221 of the Results section). Please see also our response to the point 2 raised by reviewer 1 in this rebuttal letter, and to Author response image 3 above in this letter.

      (29) The second question that I think is important to address is, what exactly do the varying RNAi levels correspond to in terms of experiments, and have these been validated? Due to the fundamental claim being that the severity of the phenotype being correlated with the level of KD, I think validation of this model is absolutely essential.  

      We thank the Reviewer for raising this important point, and agree it was lacking in the original version of our manuscript. As discussed in our response to the point 6) raised by Reviewer #2, we have performed qRT-PCR determinations for exo70 and sec3 mRNA levels after inducing silencing of these subunits at different temperatures, or with different RNAi transgenic lines. The remnant mRNA levels correlate well with the observed phenotypes. Please see Supplementary Figure 2 of the revised manuscript, and Author response image 5 of this rebuttal letter; described in lines 155-159 of the Results section. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      -  The authors assert in the discussion that exocyst involvement in constitutive secretion is well documented. This is based on a very recent study in mammalian culture cells. Therefore, I would not dismiss the issue as completely settled. Furthermore, a previous study of Drosophila sec10 reported no roles outside the ring gland (DOI: 10.1034/j.1600-0854.2002.31206.x).

      We have included these observations in the Discussion section. Lines 326-329.

      -  A salivary gland screening by Julie Brill's lab reported exocyst components as hits (DOI: 10.1083/jcb.201808017).

      We have referred to this paper in the Discussion section. Lines 326-329.

      -  It should be explained in more detail what is measured in graphs 7C, F, and others quantifying fluorescence around secretory granules. Looking at the images, the decrease in Rab1 and Rab11 seems less convincing.

      We have made a clearer description of how fluorescence intensity was measured in the Methods section lines 558-561. Also, we have uploaded a source data file in which the raw data of each experiment used for quantifications are disclosed. 

      Please note that the data indicates that Rab11 levels are higher in sec5 (Figure 8J-L) and sec3 (supplementary Figure 11M-R).

      Reviewer #2 (Recommendations For The Authors):

      No major issues.

      Writing - The authors should better frame their interpretations of other studies of the exocyst that include the role in autophagy, Palade body trafficking, and differential roles of the subunits.

      We have discussed these specific points in the Discussion section, lines 348-355 and 409-410.

      Minor - Fig. 6A: Why are variable temperatures (19-29 deg C used for the 8 KD experiments)?

      Please show it all at the same temperature (control too).

      The need for the usage of specific temperatures to obtain specific phenotypes with each of the RNAi lines used was explained in point 6 of this letter.

      Reviewer #3 (Recommendations For The Authors):

      In the abstract, the authors refer to the exocytic process and go on to describe secretory granule biogenesis and exocytosis. However, there are many exocytic processes aside from secretory granule biogenesis, and I think the authors should clarify this.

      Corrected in the Abstract. Lines 19-21

      Page 17 Thomas, 2021 reference, there is a glitch with the reference.

      Thanks for noticing. Fixed.

      References

      Bhuin T, Roy JK. Developmental expression, co-localization and genetic interaction of exocyst component Sec15 with Rab11 during Drosophila development. Exp Cell Res. 2019 Aug 1;381(1):94-104. doi: 10.1016/j.yexcr.2019.04.038. Epub 2019 May 7. PMID: 31071318.

      D'Souza Z, Taher FS, Lupashin VV. Golgi inCOGnito: From vesicle tethering to human disease. Biochim Biophys Acta Gen Subj. 2020 Nov;1864(11):129694. doi: 10.1016/j.bbagen.2020.129694. Epub 2020 Jul 27. PMID: 32730773; PMCID: PMC7384418.

      Escrevente C, Bento-Lopes L, Ramalho JS, Barral DC. Rab11 is required for lysosome exocytosis through the interaction with Rab3a, Sec15 and GRAB. J Cell Sci. 2021 Jun 1;134(11):jcs246694. doi: 10.1242/jcs.246694. Epub 2021 Jun 8. PMID: 34100549; PMCID: PMC8214760.

      Guo W, Roth D, Walch-Solimena C, Novick P. The exocyst is an effector for Sec4p, targeting secretory vesicles to sites of exocytosis. EMBO J. 1999 Feb 15;18(4):1071-80. doi: 10.1093/emboj/18.4.1071. PMID: 10022848; PMCID: PMC1171198.

      Jafar-Nejad H, Andrews HK, Acar M, Bayat V, Wirtz-Peitz F, Mehta SQ, Knoblich JA, Bellen HJ. Sec15, a component of the exocyst, promotes notch signaling during the asymmetric division of Drosophila sensory organ precursors. Dev Cell. 2005 Sep;9(3):351-63. doi: 10.1016/j.devcel.2005.06.010. PMID: 16137928.

      Khakurel A, Lupashin VV. Role of GARP Vesicle Tethering Complex in Golgi Physiology. Int J Mol Sci. 2023 Mar 23;24(7):6069. doi: 10.3390/ijms24076069. PMID: 37047041; PMCID: PMC10094427.

      Lattner J, Leng W, Knust E, Brankatschk M, Flores-Benitez D. Crumbs organizes the transport machinery by regulating apical levels of PI(4,5)P2 in Drosophila. Elife. 2019 Nov 7;8:e50900. doi: 10.7554/eLife.50900. PMID: 31697234; PMCID: PMC6881148.

      Lee T, Luo L. Mosaic analysis with a repressible cell marker for studies of gene function in neuronal morphogenesis. Neuron. 1999 Mar;22(3):451-61. doi: 10.1016/s08966273(00)80701-1. PMID: 10197526.

      Liu S, Majeed W, Grigaitis P, Betts MJ, Climer LK, Starkuviene V, Storrie B. Epistatic Analysis of the Contribution of Rabs and Kifs to CATCHR Family Dependent Golgi Organization. Front Cell Dev Biol. 2019 Aug 2;7:126. doi: 10.3389/fcell.2019.00126. PMID: 31428608; PMCID: PMC6687757.

      Perkins LA, Holderbaum L, Tao R, Hu Y, Sopko R, McCall K, Yang-Zhou D, Flockhart I, Binari R, Shim HS, Miller A, Housden A, Foos M, Randkelv S, Kelley C, Namgyal P, Villalta C, Liu LP, Jiang X, Huan-Huan Q, Wang X, Fujiyama A, Toyoda A, Ayers K, Blum A, Czech B, Neumuller R, Yan D, Cavallaro A, Hibbard K, Hall D, Cooley L, Hannon GJ, Lehmann R, Parks A, Mohr SE, Ueda R, Kondo S, Ni JQ, Perrimon N. The Transgenic RNAi Project at Harvard Medical School: Resources and Validation. Genetics. 2015 Nov;201(3):843-52. doi: 10.1534/genetics.115.180208. Epub 2015 Aug 28. PMID: 26320097; PMCID: PMC4649654.

      Wu S, Mehta SQ, Pichaud F, Bellen HJ, Quiocho FA. Sec15 interacts with Rab11 via a novel domain and affects Rab11 localization in vivo. Nat Struct Mol Biol. 2005 Oct;12(10):879-85. doi: 10.1038/nsmb987. Epub 2005 Sep 11. PMID: 16155582.

      Yeaman C, Grindstaff KK, Wright JR, Nelson WJ. Sec6/8 complexes on trans-Golgi network and plasma membrane regulate late stages of exocytosis in mammalian cells. J Cell Biol. 2001 Nov 12;155(4):593-604. doi: 10.1083/jcb.200107088. Epub 2001 Nov 5. PMID: 11696560; PMCID: PMC2198873.

      Zhang XM, Ellis S, Sriratana A, Mitchell CA, Rowe T. Sec15 is an effector for the Rab11 GTPase in mammalian cells. J Biol Chem. 2004 Oct 8;279(41):43027-34. doi: 10.1074/jbc.M402264200. Epub 2004 Jul 29. PMID: 15292201.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors present a potentially useful model involving Ca2+ signaling in inflammasome activation. As it stands, it was felt that the data were not sufficient to support the model and the claims of the study are inadequately presented.

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript proposes a complex unclear model involving Ca2+ signaling in inflammasome activation. The experimental approaches used to study the calcium dynamics are problematic and the results shown are of inadequate quality. The major claims of this manuscript are not adequately substantiated.

      Major concerns:

      (1) The analysis of lysosomal Ca2+release is being carried out after many hours of treatment. Such evidence is not meaningful to claim that PA activates Ca2+ efflux from lysosome and even if this phenomenon was robust, it is not doubtful that such kinetics are meaningful for the regulation of inflammasome activation. Furthermore, the evidence for lysosomal Ca2+ release is indirect and relies on a convoluted process that doesn't make any conceptual sense to me. In addition to these major shortcomings, the indirect evidence of perilysosomal Ca2+ elevation is also of very poor quality and from the standpoint of my expertise in calcium signaling, the data are incredulous. The use of GCaMP3-ML1, transiently transfected into BMDMs is highly problematic. The efficiency of transfection in BMDMs is always extremely low and overexpression of the sensor in a few rare cells can lead to erroneous observations. The overexpression also results in gross mislocalization of such membrane-bound sensors. The accumulation of GCaMP3-ML1 in the ER of these cells would prevent any credible measurements of perilysosomal Ca2+ signals. A meaningful investigation of this process in primary macrophages requires the generation of a mouse line wherein the sensor is expressed at low levels in myeloid cells, and shown to be localized almost exclusively in the lysosomal membrane. The mechanistic framework built around these major conceptual and technical flaws is not especially meaningful and since these are foundational results, I cannot take the main claims of this study seriously.

      Ans) We agree with the reviewer’s concern that transfection efficiency could be low in BMDMs together with possible mislocalization of GCAMP3-ML1. However, in our experiment, transfection of BMDM with test plasmids resulted in good expression of test proteins. Below, we present our data showing good transfection efficiency of BMDM cells, while a different plasmid was employed.

      Author response image 1.

      (2) The cytosolic Ca2+ imaging shown in Figure 1C doesn't make any sense. It looks like a snapshot of basal Ca2+ many hours after PA treatment - calcium elevations are highly dynamic. Snapshot measurements are not helpful and analyses of Calcium dynamics requires a recording over a certain timespan. Unfortunately, this technical approach has been used throughout the manuscript. Also, BAPTA-AM abrogates IL-1b secretion because IL-1b transcription is Ca2+ dependent - the result shown in figure 1D does not shed light on anything to do with inflammasome activation and it is misleading to suggest that.

      Ans) We agree with the reviewer’s concern that snapshot could lead to false conclusion. We have not traced cytosolic Ca2+ content after treatment with LPS + PA. However, we have traced lysosomal Ca2+ and ER Ca2+ for more than 15 min, which was presented in Figure 4B. We also agree with the comment that BAPTA-AM might affect transcription of pro-IL-1β. We have conducted immunoblot analysis after treatment with LPS+PA in the presence of BAPTA-AM. Protein band of pro-IL-1β was not affected by BAPTA-AM treatment suggesting no effect of BAPTA-AM on transcription or translation of pro-IL-1β, which was added to Figure 1D, as suggested.

      (3) Trpm2-/- macrophages are known to be hyporesponsive to inflammatory stimuli - the reduced secretion of IL-1b by these macrophages is not novel. From a mechanistic perspective, this study does not add much to that observation and the proposed role of TRPM2 as a lysosomal Ca2+ release channel is not substantiated by good quality Ca2+ imaging data (see point 3 above). Furthermore, the study assumes that TRPM2 is a lysosomal ion channel. One paper reported TRPM2 in the lysosomes but this is a controversial claim, with no replication or further development in the last 14 years. This core assumption can be highly misleading to readers unfamiliar with TRPM2 biology and it is necessary to present credible evidence that TRPM2 is functional in the lysosomal membrane of macrophages. Ideally, this line of investigation should rest on robust demonstration of TRPM2 currents in patch-clamp electrophysiology of lysosomes. If this is not technically feasible for the authors, they should at least investigate TRPM2 localization on lysosomal membranes of macrophages.

      Ans) We agree with the reviewer’s comment that TRPM2. However, we have shown that TRPM2 current was not activated in the plasma membrane of BMDMs after treatment with LPS+PA. We also agree with the reviewer’s comment that inflammatory cytokine release from TRPM2 KO cells or inflammasome response of TRPM2 KO macrophages to ROS or nanoparticles has been reported to be reduced; however, the role of TRPM2 in metabolic inflammation or inflammasome activation in response to lipid stimulators has not been shown, as discussed in the new lines 9-10 from the bottom of page 18. Regarding the role of lysosomal TRPM2 in inflammation, we have shown that bafilomycin A1 treatment abrogated increase of cytosolic Ca2+ by LPS+PA (Figure 3-figure supplement 1D), supporting the role of lysosome and lysosomal Ca2+ in inflammasome activation by LPS+PA.

      We agree with the reviewer’s comment that TRPM2 expression on lysosome needs to be tested. We conducted confocal microscopy after immunofluorescence staining using anti-TRMP2 and -LAMP2 antibodies, which showed a certain portion of TRPM2 was colocalized with LAMP-2. This result substantiating TRPM2 expression on lysosome of macrophages was incorporated as Figure 2-figure supplement 1A.

      (4) Apigenin and Quercetin are highly non-specific and their effects cannot be attributed to CD38 inhibition alone. Such conclusions need strong loss of function studies using genetic knockouts of CD38 - or at least siRNA knockdown. Importantly, if indeed TRPM2 is being activated downstream of CD38, this should be easily evident in whole cell patch clamp electrophysiology. TRPM2 currents can be resolved using this technique and authors have Trpm2-/- cells for proper controls. Authors attempted these experiments but the results are of very poor quality. If the TRPM2 current is being activated through ADPR generated by CD38 (in response to PA stimulation), then it is very odd that authors need to include 200 uM cADPR to see TRPM2 current (Fig. 3A). Oddly, even these data cast great doubt on the technical quality of the electrophysiology experiments. Even with such high concentrations of cADPr, the TRPM2 current is tiny and Trpm2-/- controls are missing. The current-voltage relationship is not shown, and I feel that the results are merely reporting leak currents seen in measurements with substandard seals. Also 20 uM ACA is not a selective inhibitor of TRPM2 - relying on ACA as the conclusive diagnostic is problematic.

      Ans) We agree with the reviewer’s comment that effects of apigenin and quercetin could be due to mechanisms other than inhibition of CD38-mediated inflammasome activation. Indeed, that is the reason we have used TRPM2 KO mice and cells. Small TRPM2 current after treatment with high concentrations of cADPr might suggest the minor role of plasma membrane of TRPM2 in macrophage. Regarding concern about ACA, we added data showing inhibition of IL-1β release in response to LPS+PA by ACA as a new Figure 3-figure supplement 1A.

      (5) TRPM2 is expressed in many different cell lines. The broad metabolic differences observed by the authors in the Trpm2-/- mice cannot be attributed to macrophage-mediated inflammation. Such a conclusion requires the study of mice wherein Trpm2 is deleted selectively in macrophages or at least in the cells of the myeloid lineage.

      Ans) We agree with the reviewer’s comment that TRPM2 in cells other than macrophage might have affected the results. Thus, we have conducted in vitro stimulation of TRPM2-KO primary peritoneal macrophages with LPS+PA. We have observed that IL-1β release of TRPM2-KO macrophages in response in vitro treatment with LPS+PA was significantly lower than that from wild-type macrophages (Figure 2C & D), showing the role of TRPM2 in macrophages in inflammasome activation by LPS+PA, which could be independent of TRPM2 in tissues or cells other than macrophages.

      (6) The ER-Lysosome Ca2+ refilling experiments rely on transient transfection of organelle-targeted sensors into BMDMs. See point #1 to understand why I find this approach to be highly problematic. Furthermore, the data procured are also not convincing and lack critical controls (localization of sensors has not been demonstrated and their response to acute mobilization of Ca2+ has not been shown to inspire any confidence in these results).

      Ans) We agree with the reviewer’s comment that transfection or ER-targeted Ca2+ sensor could have artifactual effects. However, we have studied ER-Lysosome Ca2+ experiment using not only GEM-CEPIAer but also using D1ER, a FRET-based ER Ca2+ sensor which has an advantage of short distance of molecular interaction. Thus, we believe that changes of ER Ca2+ after treatment with LPS+PA is not due to an artifactual effect. Multiple contact between VAPA and ORP1L (Figure 4E) also supports ER-lysosome contact, likely facilitating ER-lysosome Ca2+ flux.

      (7) Authors claim that SCOE is coupled to K+ efflux. But there is no credible evidence that SOCE is activated in PA stimulated macrophages. The data shown in Fig 4 supp 1 do not investigate SOCE in a reliable manner - the conclusion is again based on snapshot measurements and crude non-selective inhibitors. The correct way to evaluate SOCE is to record cytosolic Ca2+ elevations over a period of time in absence and presence of extracellular Ca2+. However, even such recordings can be unreliable since the phenomenon is being investigated hours after PA stimulation. So, the only definitive way to demonstrate that Orai channels are indeed active during this process is through patch clamp electrophysiology of PA stimulated cells.

      Ans) We agree with the reviewer’s comment that the final proof of SOCE activation is activation of Orai channel evidenced by electrophysiology. However, we have shown STIM1 aggregation colocalized with Ora1, which is another strong evidence of SOCE channel activation (Vaca L. Cell Calcium 47:199, 2010). Such a paper showing the role of SOCE aggregation in SOCE activation was incorporated in the text (line 4 from the bottom of page 10) and References.

      Reviewer #2 (Public Review):

      In this manuscript by Kang et. al., the authors investigated the mechanisms of K+-efflux-coupled SOCE in NLRP3 inflammasome activation by LP(LPS+PA, and identified an essential role of TRPM2-mediated lysosomal Ca2+ release and subsequent IP3Rs-mediated ER Ca2+ release and store depletion in the process. K+ efflux is shown to be mediated by a Ca2+-activated K+ channel (KCa3.1). LP-induced cytosolic Ca2+ elevation also induced a delayed activation of ASK1 and JNK, leading to ASC oligomerization and NLRP3 inflammasome activation. Overall, this is an interesting and comprehensive study that has identified several novel molecular players in metabolic inflammation. The manuscript can benefit if the following concerns could be addressed:

      (1) The expression of TRPM2 in the lysosomes of macrophages needs to be more definitively established. For instance, the cADPR-induced TRPM2 currents should be abolished in the TRPM2 KO macrophages. Can you show the lysosomal expression of TRPM2, either with an antibody if available or with a fluorescently-tagged TRPM2 overexpression construct?

      Ans) We agree with the reviewer’s comment that TRPM2 expression on lysosome needs to be tested. We conducted confocal microscopy after immunofluorescent staining using anti-TRMP2 and -LAMP2 antibodies, which showed a certain portion of TRPM2 was colocalized with LAMP2. This result was incorporated as Figure 2-figure supplement 1A.

      (2) Can you use your TRPM2 inhibitor ACA to pharmacologically phenocopy some results, e.g., about [Ca2+]ER, [Ca2+]LY, and [Ca2+]i from the TRPM2 knockout? Ans) We agree with the reviewer’s comment that the effect of ACA on other experimental results needs to be shown. We did not study the effect of ACA on Ca2+ flux; however, we have observed that ACA inhibited IL-1β release in response to LPS+PA. This data was incorporated as Figure 3-figure supplement 1A.

      Author response image 2.

      (3) In Fig. S4A, bathing the cells in zero Ca2+ for three hours might not be ideal. Can you use a SOCE inhibitor, e.g, YM-58483, to make the point?

      Ans) We agree with the reviewer’s comment that SOCE inhibitor experiment would be necessary in addition to the experiment employing zero Ca2+. In fact, we have already used two SOCE inhibitors (2-APB and BTP2) (Figure 4-fig. supplement 1 B-D. Particularly, BTP2 experiment could eliminate possible role of ER Ca2+ inhibition that might occur when 2-APB was employed.

      (4) In Fig. 1A, you need a positive control, e.g., ionomycin, to show that the GPN response was selectively reduced upon LP treatment.

      Ans) We did not employ ionomycin as a control in this study. In our previous study using other agents inducing lysosomal Ca2+ efflux, we have observed lysosomal Ca2+ efflux with intact subsequent ionomycin response. While we did not include ionomycin in the current paper, we are positive that ionomycin response would be preserved.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      See Public Review.

      Reviewer #2 (Recommendations For The Authors):

      (5) In Fig. 4B, the red label should read "BAPTA-1 Dextran", but not "GAPTA-1 Dextran".

      (6) Writing should be improved in many sections.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to the Referee Comments We would like to express our appreciation to the editor and the reviewers for their thoughtful comments and constructive suggestions on the manuscript. We agree with most of the comments and have carefully revised the manuscript accordingly. The revisions are highlighted in red font in the revised manuscript. Below are point-by-point responses to the referee’s comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Microglia are increasingly recognized as playing an important role in shaping the synaptic circuit and regulating neural dynamics in response to changes in their surrounding environment and in brain states. While numerous studies have suggested that microglia contribute to sleep regulation and are modulated by sleep, there has been little direct evidence that the morphological dynamics of microglia are modulated by the sleep/wake cycle. In this work, Gu et al. applied a recently developed miniature two-photon microscope in conjunction with EEG and EMG recording to monitor microglia surveillance in freely-moving mice over extended period of time. They found that microglia surveillance depends on the brain state in the sleep/wake cycle (wake, non-REM, or REM sleep). Furthermore, they subjected the mouse to acute sleep deprivation, and found that microglia gradually assume an active state in response. Finally, they showed that the state-dependent morphological changes depend on norepinephrine (NE), as chemically ablating noradrenergic inputs from locus coeruleus abolished such changes; this is in agreement with previous publications. The authors also showed that the effect of NE is partially mediated by β2-adrenergic receptors, as shown with β2-adrenergic receptor knock-out mice. Overall, this study is a technical tour de force, and its data add valuable direct evidence to the ongoing investigations of microglial morphological dynamics and its relationship with sleep. However, there are a number of details that need to be clarified, and some conclusions need to be corroborated by more control experiments or more rigorous statistical analysis. Specifically:

      1. The number of branch points per microglia shown here (e.g., Fig. 2g) is much lower than the values of branch points in the literature, e.g., Liu T et al., Neurobiol. Stress 15: 100342, 2021 (mouse dmPFC, IHC); Liu YU et al., Nat. Neurosci. 22: 1771-81, 2019 (mouse S1, in vivo 2P imaging). The authors need to discuss the possible source of such discrepancy.

      Thank you for raising this important point. Two reasons may account for this difference. Firstly, the difference in the definition of branch points in the software. Liu YU et al. used the Sholl analysis of image J software to analyze the number of branch points of microglia. Sholl analysis defines the number of branch points as the number of crossings between branches and concentric circles of increasing radii. We reconstructed microglia morphology using Imaris, a software that defines branching points based on the number of bifurcation points. The number of bifurcations calculated represents the number of microglia branch points. Secondly, this and previous studies found that more branching points present in the state of anesthesia. The morphological characteristics of microglia in head-fixed mice under anesthesia was reported by Liu T et al. and the microglia reconstruction results presented by the authors are indeed more complex than ours. In short, this is an aspect that we have been paying attention to, and the main reasons for this difference may lie in the definition of branch points, analysis methods and related choice of thresholds. True differences in brain states and the heterogeneity of microglia in different brain regions may also contribute to the apparent discrepancy.

      1. Microglia process end-point speed (Fig. 2h, o): here the authors show that the speed is highest in the wake state and lowest in NREM, which agrees with the measurement on microglia motility during wakefulness vs NREM in a recent publication (Hristovska I et al., Nat. Commun. 13: 6273, 2022). However, Hristovska et al. also reported lower microglia complexity in NREM vs wake state, which seems to be the opposite of the finding in this paper. The authors need to discuss the possible source of such differences.

      This is also an important point. Hristovska et al. reported the morphodynamic characteristics of microglia during wakefulness and NREM sleep. It is worth noting that the sleep state of the mice in their experiments was unnatural due to the head fixation and body limitations, the duration of NREM sleep (sleep stability) being quite different from the NREM sleep analyzed under natural sleep. The limitations of this approach are also discussed by Hristovska et al. “Even though sleep episodes were, as anticipated, shorter than those observed in freely moving animals, changes in neuronal activity characteristic of NREM sleep were monitored by EEG recordings, and changes in morphodynamics were observed during single episodes. Several episodes of REM sleep were detected, but they were too short and rare to be analyzed reliably.” The unnatural sleep state would lead to an increase in the microarousal state, and ultimately a change in the structure of the sleep state, which may be the main reason for the difference in microglia behavior from our natural sleep. We have discussed this in the revised manuscript. Please see line 292298.

      1. Fig. 3: the authors used single-plane images to analyze the morphological changes over 3 or 6 hours of SD, which raises the concern that the processes imaged at the baseline may drift out of focus, leading to the dramatic reduction in process lengths, surveillance area, and number of branch points. In fact, a previous study (Bellesi M et al., J. Neurosci. 37(21): 5263-73, 2017) shows that after 8 h SD, the number of microglia process endpoints per cell and the summed process length per cell do not change significantly (although there is a trend to decline). The authors may confirm their findings by either 3D imaging in vivo, or 3D imaging in fixed tissue.

      Three lines of evidence indicate that microglia morphology changes in Fig 3 are due to SD, rather than variations in the focal plane. First, our single-plane images were quite stable over 3 or 6 hours of SD, though occasional reversible drifts might happen due to sudden motions. Second, per your suggestion, further experiments and analysis of 3D imaging were performed to monitor microglia dynamics during sleep deprivation. The new result is shown in revised Fig. S3 C-D: the length of microglia branches and the number of branching points were significantly reduced after SD, in agreement with the results of single-plane imaging. Furthermore, we detected no significant difference in microglia branching characteristics during 6h sleep deprivation in 2AR KO mice (Fig.S4), and this indirectly affirmed that singleplane imaging is stable enough for detecting true changes in branching during SD.

      1. Fig. 4b: the EEG and EMG signals look significantly different from the example given in Fig. 2a. In particular, the EMG signal appears completely flat except for the first segment of wake state; the EEG power spectrum for REM appears dark; and the wake state corresponds to stronger low frequency components (below ~ 4 Hz) compared to NREM, which is the opposite of Fig. 2a. This raises the concern whether the classification of sleep stage is correct here.

      Thank you for insightful comments. We carefully examined the behavioral video of Figure 4b, there were occasionally microarousal events indicated by slow head rotation during NREM sleep, while the companion EMG signals were completely flat, which is atypical during sleep wake cycle. The microarousal events were not excluded from sleep, which makes this set of data unrepresentative and contrary to Fig.4b. In our revised manuscript, we replaced it with more representative data that can clearly and consistently distinguish between different brain states in mice on EMG and EEG. Please see revised Fig.2a, page 34; revised Fig.4b, page 37.

      1. Fig. 4 NE dynamics. • How long is a single continuous imaging session for NE? • When monitoring microglia surveillance, the authors were able to identify wake or NREM states longer than 15 min, and REM states longer than 5 min. Here the authors selected wake/NREM states longer than 1 min and REM states longer than 30 s. What makes such a big difference in the time duration selected for analysis? • Also, the definition of F0 is a bit unclear. Is the same F0 used throughout the entire imaging session, or is it defined with a moving window?

      A single continuous session of NE imaging usually took about 1 hour. Subsequent analysis was performed on imaging data from each recording that included wake, NREM sleep, and REM sleep. Because of the different time scales of microglia morphological dynamic (relatively slow) and NE signals (fast), we used different time windows in the previous analysis in the previous version of the manuscript.

      Per your suggestion, we have now set the same time window selection criteria for both microglia morphological and NE dynamic analysis: for wake and NREM sleep durations longer than 1 minute, and REM sleep durations longer than 30 seconds. We updated the Methods and all statistics in related figures, please see line 151-154, 481-485, 490-492; Fig. 2e-g and 2l-n, page 34. F0 definition is now explained in the Methods section. Please see line 521-522.

      1. Fig. 5b: how does the microglia morphology in LC axon ablation mice compare with wild type mice under the wake state? The text mentioned "more contracted" morphology but didn't give any quantification. Also, the morphology of microglia in the wake state (Fig. 5b) appears very different from that shown in Fig. S3C1 (baseline). What is the reason?

      The morphology of microglia is indeed heterogeneous and variable, affected by factors including brain state, brain region, microenvironmental changes, along with animal-to-animal difference. We didn’t perform the microglia morphology comparison between the LC axon ablation mice and wild type mice and, in view of this, we removed the description of “more contracted morphology” from the main text. It should also be noted that, as we primarily focused on changes of a microglia in different states over time by selfcomparison, we minimized possible effects of heterogeneity in microglia morphology on our conclusions.

      1. The relationship between NE level and microglia dynamics. Fig. 4C shows that the extracellular NE level is the highest in the wake state and the lowest in REM. Previous studies (Liu YU et al., Nat. Neurosci. 22(11):1771-1781, 2019; Stowell RD et al., Nat. Neurosci. 22(11): 1782-1792, 2019) suggest that high NE tone corresponds to reduced microglia complexity and surveillance. Hence, it would be expected that microglia process length, branch point number, and area/volume are higher in REM than in NREM. However, Fig. 2l-n show the opposite. How should we understand this ?

      Your point is well-taken. On the one hand, our data clearly showed that NE is critically involved in the brain state-dependent microglia dynamic surveillance, with evidence from the ablation of the LC-NE projection and from the β2AR knockout animal model.

      On the other hand, we also understand that NE is not the sole determinant, so the relationship between the NE level and the complexity and surveillance may not be unique.

      In this regard, other potential modulators also present dynamic during sleepwake cycle and may partake in the regulation of microglia dynamic surveillance. previous studies (Liu YU et al., 2019; Stowell RD et al., 2019) have shown that microglia can be jointly affected by surrounding neuronal activity and NE level during wake. It has been reported that LC firing stops (Aston-Jones et al., 1981; Rasmussen et al., 1986), while inhibitory neurons, such as PV neurons and VIP neurons, become relatively active during REM sleep (Brécier et al., 2022). ATP level in basal forebrain is shown to be higher in REM than NREM (Peng et al., 2023). In addition, our own preliminary result (Author response image 1) also showed a higher adenosine level in REM than NREM in somatosensory cortex. Last but not the least, we found that β2AR knockout failed to abolish microglial responses to sleep state switch and SD stress altogether.

      In brief, microglia are highly sensitive to varied changes in the surrounding environment, and many a modulator may participate in the microglia dynamic during sleep state. This may underlie the microglia complexity difference between REM and NREM. Future investigations are warranted to delineate the signal-integrative role of microglia in physiology and under stress. We have discussed the pertinent points in the revised manuscript. Please see line 343-354.

      Author response image 1.

      Extracellular adenosine levels in somatosensory cortex in different brain states. AAV2/9-hSyn-GRABAdo1.0 (Peng W. et al., Science. 2020) was injected into the somatosensory cortex (A/P, -1 mm; M/L, +2 mm; D/V, -0.3 mm). Data from the same recording are connected by lines. n = 9 from 3 mice.

      Reviewer #2 (Public Review):

      The manuscript describes an approach to monitor microglial structural dynamics and correlate it to ongoing changes in brain state during sleep-wake cycles. The main novelty here is the use of miniaturized 2p microscopy, which allows tracking microglia surveillance over long periods of hours, while the mice are allowed to freely behave. Accordingly, this experimental setup would permit to explore long-lasting changes in microglia in a more naturalistic environment, which were previously not possible to identify otherwise. The findings could provide key advances to the research of microglia during natural sleep and wakefulness, as opposed to anesthesia. The main findings of the paper are that microglia increase their process motility and surveillance during REM and NREM sleep as compared to the awake state. The authors further show that sleep deprivation induces opposite changes in microglia dynamics- limiting their surveillance and size. The authors then demonstrate potential causal role for norepinephrine secretion from the locus coeruleus (LC) which is driven by beta 2 adrenergic receptors (b2AR) on microglia. However, there are several methodological and experimental concerns which should be addressed.

      The major comments are summarized below:

      1. The main technological advantage of the 2p miniaturized microscope is the ability to track single cells over sleep cycles. A main question that is unclear from the analysis and the way the data is presented is: are the structural changes in microglia reversible? Meaning, could the authors provide evidence that the same cell can dynamically change in sleep state and then return to similar size in wakefulness? The same question arises again with the data which is presented for anesthesia, is this change reversible?

      As revealed by long-term free behavioral mTPM imaging, the brain-statedependent morphological changes in microglia were reproducible and reversible. Author response image 2 shows that microglia displayed reversible dynamic changes during multiple rounds of sleep-wake transition. Author response image 3 shows that microglia dynamics induced by anesthesia also exhibited reversibility.

      Author response image 2.

      Long-term tracking of microglia process area in different brain states. Data analysis used 8 cells. Data total of 31 time points were selected from in vivo imaging data and were used to characterize the morphological changes of microglia over a continuous 7-hour period.

      Author response image 3.

      Reversible changes of microglial process length, area, number of branch points under anesthesia. Wake group: 30 minute-accommodation to new environment; Isoflurane group: 1.5% in air applied at a flow rate of 0.4 L/min for 30 minutes; Recovery group: 30 minutes after recovery from anesthesia. n = 9 cells from 3 mice for each group.

      1. The binary comparison between brain states is misleading, shouldn't the changes in structural dynamics compared to the baseline of the state onset? The authors method describes analysis of the last 5 minutes in each sleep/wake state. However, these transitions are directional- for instance, REM usually follows NREM, so the description of a decrease in length during REM sleep could be inaccurate.

      As you know, the time scale of microglia morphological dynamic is relatively slow, so we analyzed the microglia morphological dynamic of the last part (30s in the revised manuscript) of each state instead of the state onset, allowing time for stabilization of the microglia response to inter-state transition.

      Further, we compared microglia dynamic between two NREM groups transiting to different subsequent states: group1 (NREM to REM) vs group2 (NREM to Wake). This precaution was to exclude the directional effect of state transitions. Our results showed that there was no difference in microglial length, area, number of branching points between the two NREM groups (Author response image 4), indicating that the last 30s of each NREM was not affected by its following state and that it’s reasonable to perform binary comparison.

      Author response image 4.

      Microglial morphological length, area change, and number of branch points of the last 30s of NREM sleep followed by REM or Wake. n = 9 cells from 3 mice for each group.

      1. Sleep deprivation- again, it is unclear whether these structural changes are reversible. This point is straightforward to address using this methodology by measuring sleep following SD. In addition, the authors chose a method to induce sleep deprivation that is rather harsh. It is unclear if the effect shown is the result of stress or perhaps an excess of motor activity.

      We adopted the method of forced exercise as it has been commonly used for sleep deprivation (Pandi-Perumal et al., 2007; Nollet M et al., 2020), though it does have the potential limitation of excess of motor activity.

      In light of your comments and suggestion, we presented new data demonstrating that sleep duration of the mice, mostly NREM sleep, increased compensatively (ZT9-10) after the 6-hour sleep deprivation (ZT2-8) (revised Fig. S3B). This result shows that sleep deprivation indeed increase sleep pressure in the mice. As the sleep pressure was eased during recovery sleep, morphological changes of microglia were reversed over a timescale of several hours (revised Fig. S3 E-J).

      1. The authors perform measurements of norepinephrine with a recently developed GRAB sensor. These experiments are performed to causally link microglia surveillance during sleep to norepinephrine secretion. They perform 2p imaging and collect data points which are single neurons, and it is unclear why the normalization and analysis is performed for bulk fluorescence similar to data obtained with photometry.

      We did not perform single-neuron analysis for two reasons. First, our experimental conditions, e.g., the expression of the NE indicator and the control of imaging laser intensity, did not yield sufficient signal-to-noise to clearly discriminate individual neurons with two-photon imaging. Second, NE signal may play a modulatory role, and fluorescence changes appeared to be global, rather than local or cell-specific. Therefore, we analyzed fluorescence changes in different brain states over the whole field-of-view in Fig. 4, rather than at the subregional or single-cell level.

      1. The experiments involving b2AR KO mice are difficult to interpret and do not provide substantial mechanistic insight. Since b2AR are expressed throughout numerous cell types in the brain and in the periphery, it is entirely not clear whether the effects on microglia dynamics are direct. The conclusion and the statement regarding the expression of b2AR in microglia is not supported by the references the authors present, which simply demonstrate the existence and function of b2AR in microglia. In addition, these mice show significant changes in sleep pattern and increased REM sleep. This could account for reasons for the changes in microglia structure rather than the interpretation that these are direct effects.

      To summarize, the main conclusions of the paper require further support with analysis of existing data and experimental validation.

      Previous studies have revealed that norepinephrine (NE) has a modulating effect on microglial dynamics through β2AR pathway (Stowell RD et al., 2019; Liu YU et al., 2019). Stowell et al. and Liu et al. use in vivo two-photon imaging to demonstrate that microglia dynamics differ between awake and anesthetized mice and to highlight the roles of NE and β2AR in these states (Gyoneva S et al., 2013; Stowell RD et al., 2019; Liu YU et al., 2019). To evaluate the direct effect of β2AR on microglial dynamics, Stowell et al. administered the β2AR agonist clenbuterol to anesthetized mice and found that this decreased the motility, arbor complexity, and process coverage of microglia in the parenchyma (Stowell RD et al., 2019). Inhibition of β2AR by antagonist ICI-118,551 in awake mice recapitulated the effects of anesthesia by enhancing microglial arborization and surveillance (Stowell RD et al., 2019). In addition, it has been shown microglia expressed higher numbers of β2ARs than any other cells in the brain (Zhang et al., 2014).

      To this end, our current work provided new evidence to support the involvement of the LC-NE-β2AR axis in modulating microglia dynamics both during natural sleep-wake cycle and under SD stress. While we were aware the limitation of using pan-tissue β2AR knockout model that precluded us from pinpointing role of microglial β2AR, it is safe to state that β2-adrenergic receptor signaling plays a significant role in the sleep-state dependent microglia dynamic surveillance, based on the present and previous data.

      We have discussed this in the revised manuscript. Please see line 324-354. As you suggested, we added references to support the statement regarding the expression of β2AR in microglia (please see line 333).

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      Reviewer #1 (Recommendations For The Authors):

      Some technical details need to be clarified. Also, please double-check for typos.

      1. In vivo imaging preparation: how long is the recovery time between window/EEG implantation surgery and imaging/recording?

      Imaging data were collected one month after the surgery. We have added descriptions to the methods section of the revised manuscript. Please see line 419.

      1. Statistical analysis: the authors used t-test or ANOVA without first checking whether the data pass the normality test. If the data does not follow a normal distribution, nonparametric tests would be more appropriate.

      Per your suggestion, we performed the test of statistical significance using parametric (ANOVA) if past the normality test, or the non-parametric (Friedman) tests for non-normal data. Please see line 533-535.

      1. Fig. 1b needs a minor change. In the figure, the EMG electrodes appear to be connected to the brain as well.

      We have corrected this oversight. Thank you.

      1. Fig. 1c: it would be helpful to give examples of raw EEG and EMG traces for REM and NREM separately.

      Raw traces are now shown as suggested. Please see Fig. 1c, page 32.

      1. Fig. 1h: is each data point one microglia or one end-point?

      In Fig. 1h, each data represents the average speed of all branches of one microglia, not one end-point.

      1. Sleep deprivation starts at 9 am. What time corresponds to Zeitgeber Time 0 (ZT0, the beginning of the light phase)?

      We now clarified that 9 am corresponds to Zeitgeber time 2. Please see line 196.

      1. Line 61: the authors referred to Ramon y Cajal's original suggestion that microglia dynamics are coupled to the sleep-wake cycle. However, the cited paper only indicates that Cajal suggested a role of astrocytes in the sleep-wake cycle, not microglia. In addition, there is a typo in the line: there should be a space between "Ramon" and "y" in Cajal's name.

      We have updated the statement and reference literature to point out the microglia’s involvement in the sleep-wake cycle. The typo was corrected. Please see line 64-65.

      1. Fig. S3B: As each group has only 3 mice, it is unclear how t-test can yield p < 0.01 or even 0.001.

      We checked the original data again and it was correct. This small p-values may be due to the small intra-group difference of control group.

      1. Line 251-253, "Figure 4h-n" should be "Figure 5h-n"?

      We have revised it. Please see line 265-266.

      1. Fig. 5h: the receptor should be "adrenergic receptor", not "adrenal receptor".

      We changed the term to “adrenergic receptor”. Please see Fig 5h.

      1. Fig. 5g, n: the number of data points is apparently less than the sample size given in the figure legend. Perhaps some data points have exactly the same value so they overlap? The authors may consider plotting identical values with a slight shift so that the number of data points shown matches the actual sample size, to avoid confusion.

      Yes, we have added small jitters so different data points can be seen to avoid confusion. Please see Fig. 5n.

      1. There are some typos (e.g., Line 217, "he" should be "the") and some incomplete references (e.g., [13], [22], [34], [35] lack volume and page number, [15] and [39] lack publisher information). Some references have inconsistent formats (e.g., "Journal of Neuroscience" is sometimes abbreviated and sometimes not). Please correct these.

      We have corrected these oversights. Please see references, page 27.

      Reviewer #2 (Recommendations For The Authors):

      Major issues:

      1. Re-analyze the data in a manner that allows to follow and compare the same cells over different state transitions. This is necessary to evaluate the reversibility of microglia structure. In addition, consider analysis of the change from the beginning to the end of each state.

      As shown in response figure 2, microglia dynamics were reversible during multiple rounds of sleep-wake transition.

      1. It would be nice to see the raw data obtained over time, at least for Figure 1, before offline correction of movement to evaluate the imaging quality and level of drift during imaging.

      We agree to your good suggestion. Please see the supporting material video.

      1. It would be helpful to add an analysis of the percent time spent in each state for the 10 hour recordings.

      Advice has been adopted. Please see revised Fig. S4C.

      1. In Figure 2 the results are from 15 cells from several animals. How much do the results vary between mice? It will be helpful to show if this varies between different mice by labeling cells from each mouse differently.

      In Author response image 5, in which we have labeled the distribution of data points from seven mice, there was mixed distribution of data from different animals at each brain state, but no clear animal-to-animal difference.

      Author response image 5.

      Quantitative analysis of microglial length based on multi-plane microglial imaging. n = 17 cells from 7 mice for each group. In right panel, each color codes data from the same animal.

      1. SD- please add some quantification for sleep and EEG to show that the manipulation really caused sleep deprivation. To address the confound of forced movement and stress, it might be helpful to add quantification of movement compared to an undisturbed wakefulness.

      We have added related data (revised Fig. S3B), as suggested. Please see line 196-197.

      1. The DSP4 application should be also performed with NE measurements to verify the specific of the NE signal measured as well as the DSP4 toxin.

      Following your suggestion, we have added DSP4 data in revised Fig. S4B.

      1. Some suggested refined experiments for the b2AR KO are: a-A conditional b2AR KO in microglia, as cited in the work. b- Local application of a b2 blocker during SD. c- Imaging of NE dynamics in the b2 animals. If NE dynamics during natural sleep cycle are perturbed, then this suggests upstream mechanisms rather than direct microglia effects as suggested by the authors.

      We agree that the current study cannot pinpoint a direct effect of microglia harbored β2AR. We have discussed this limitation in the revised manuscript.

      Please see line 324-354.

      Minor:

      1. Typo on page 4 (microcopy instead of microscopy).

      It was corrected. Please see line 87.

      1. Typo page 11- 'and he largest changes in NE' - supposed to be 'the'.

      We have corrected these mistakes. Please see line 228.

      1. Fig. 4- there are several units missing in the figure in panel b: the top is Hz, but what does the color bar indicate exactly? 2 what? both for theta/delta and for NE. We have modified this figure and legend for clarity. Please see Fig. 4, page 37.

      2. Bottom of page 12- referring to figure 4 but talking about figure 5.

      The typo was corrected. Please see line 265-266.

      Reference

      1. Aston-Jones G, Bloom FE. Activity of norepinephrine-containing locus coeruleus neurons in behaving rats anticipates fluctuations in the sleep-waking cycle. J Neurosci. 1, 876–886 (1981).

      2. Bellesi M, de Vivo L, Chini M, Gilli F, Tononi G, Cirelli C. Sleep loss promotes astrocytic phagocytosis and microglial activation in mouse cerebral cortex. J Neurosci. 37, 5263–5273 (2017).

      3. Brécier A, Borel M, Urbain N, Gentet LJ. Vigilance and behavioral state-dependent modulation of cortical neuronal activity throughout the sleep/wake cycle. J Neurosci. 42, 4852–66 (2022).

      4. Dworak M, McCarley RW, Kim T, Kalinchuk AV, Basheer R. Sleep and brain energy levels: ATP changes during sleep. J Neurosci. 30, 9007-16 (2010).

      5. Gyoneva S., Traynelis SF. Norepinephrine modulates the motility of resting and activated microglia via different adrenergic receptors. J Biol Chem. 288, 15291302 (2013).

      6. Kjaerby C, Andersen M, Hauglund N, Untiet V, Dall C, Sigurdsson B, Ding F, Feng J, Li Y, Weikop P, Hirase H, Nedergaard M. Memory-enhancing properties of sleep depend on the oscillatory amplitude of norepinephrine. Nat Neurosci. 25, 1059–1070 (2022).

      7. Liu T, Lu J, Lukasiewicz K, Pan B, Zuo Y. Stress induces microglia-associated synaptic circuit alterations in the dorsomedial prefrontal cortex. Neurobiology of Stress. 15, 100342 (2021).

      8. Liu YU, Ying Y, Li Y, Eyo UB, Chen T, Zheng J, Umpierre AD, Zhu J, Bosco DB, Dong H, Wu LJ. Neuronal network activity controls microglial process surveillance in awake mice via norepinephrine signaling. Nat Neurosci. 22, 1771–1781 (2019).

      9. Nollet M, Wisden W, Franks NP. Sleep deprivation and stress: a reciprocal relationship. Interface Focus. 10, 20190092 (2020).

      10. Pandi-Perumal SR, Cardinali DP, Chrousos GP. 2007. Neuroimmunology of sleep. New York, NY: Springer.

      11. Peng W, Liu X, Ma G, Wu Z, Wang Z, Fei X, Qin M, Wang L, Li Y, Zhang S, Xu M. Adenosine-independent regulation of the sleep-wake cycle by astrocyte activity. Cell Discov. 9, 16 (2023).

      12. Peng W, Wu Z, Song K, Zhang S, Li Y, Xu M. Regulation of sleep homeostasis mediator adenosine by basal forebrain glutamatergic neurons. Science. 369, 6508 (2020).

      13. Rasmussen K, Morilak DA, Jacobs BL. Single unit activity of locus coeruleus neurons in the freely moving cat: I. During naturalistic behaviors and in response to simple and complex stimuli. Brain Research. 371, 324–334 (1986).

      14. Stowell RD, Sipe GO, Dawes RP, Batchelor HN, Lordy KA, Whitelaw BS, Stoessel MB, Bidlack JM, Brown E, Sur M, Majewska AK. Noradrenergic signaling in the wakeful state inhibits microglial surveillance and synaptic plasticity in the mouse visual cortex. Nat Neurosci. 22, 1782-1792 (2019).

      15. Umpierre AD, Bystrom LL, Ying Y, Liu YU, Worrell G, Wu LJ. Microglial calcium signaling is attuned to neuronal activity in awake mice. Elife. 27, e56502 (2020).

      16. Wang Z, Fei X, Liu X, Wang Y, Hu Y, Peng W, Wang YW, Zhang S, Xu M. REM sleep is associated with distinct global cortical dynamics and controlled by occipital cortex. Nat Commun. 13, 6896 (2022).

      17. Zhang Y, Chen K, Sloan SA, Bennett ML, Scholze AR, O’Keeffe S, Phatnani HP, Guarnieri P, Caneda C, Ruderisch N, Deng S, Liddelow SA, Zhang C, Daneman R, Maniatis T, Barres BA, Wu JQ. An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex. J Neurosci. 34, 11929–11947 (2014).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The paper proposes an interesting perspective on the spatio-temporal relationship between FC in fMRI and electrophysiology. The study found that while similar network configurations are found in both modalities, there is a tendency for the networks to spatially converge more commonly at synchronous than asynchronous time points. However, my confidence in the findings and their interpretation is undermined by an apparent lack of justification for the expected outcomes for each of the proposed scenarios, and in the analysis pipeline itself.

      Main Concerns

      (1) Figure 1 makes sense to me conceptually, including the schematics of the trajectories, i.e.

      Scenario 1: Temporally convergent, same trajectories through connectome state space

      Scenario 2: Temporally divergent, different trajectories through connectome state space

      However, based on my understanding I am concerned that these scenarios do not necessarily translate into the schematic CRP plots shown in Figure 2C, or the statements in the main text:

      For Scenario 1: "epochs of cross-modal spatial similarity should occur more frequently at on-diagonal (synchronous) than off-diagonal (asynchronous) entries, resulting in an on-/off-diagonal ratio larger than unity"

      For Scenario 2: "epochs of spatial similarity could occur equally likely at on-diagonal and off-diagonal entries (ratio≈1)"

      Where do the authors get these statements and the schematics in Figure 2C from? Are they based on previous literature, theory, or simulations?

      I am not convinced based on the evidence currently in the paper, that the ratio of off- to on-diagonal entries (and under what assumptions) is a definitive way to discriminate between scenarios 1 and 2.

      For example, what about the case where the same network configuration reoccurs in both modalities at multiple time points? It seems to me that one would get a CRP with entries occurring equally on the on-diagonal as on the off-diagonal, regardless of whether the dynamics are matched between the two modalities or not (i.e. regardless of scenario 1 or 2 being true).

      This thought experiment example might have a flaw in it, and the authors might ultimately be correct, but nonetheless, a systematic justification needs to be provided for using the ratio of off- to on-diagonal entries to discriminate between scenarios 1 and 2 (and under what assumptions it is valid).

      In the absence of theory, a couple of ways I can think of to gain insight into this key aspect are:

      (1) Use surrogate data for scenarios 1 and 2:

      a. For scenario 1: Run the CRP using a single modality. E.g. feed in the EEG into the analysis as both modality 1 AND modality 2. This should provide at least one example of CRP under scenario 1 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check)

      b. For scenario 2: Run the CRP using a single modality plus a shuffled version. E.g. feed in the EEG into the analysis as both modality 1 AND a temporally shuffled version of the EEG as modality 2. The temporal shuffling of the EEG could be done by simply splitting the data into blocks of say ~10s and then shuffling them into a new order. This should provide a version of the CRP under scenario 2 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check).

      (2) Do simulations, with clearly specified assumptions, for scenarios 1 and 2. One way of doing this is to use a simplified (state-space) setup and randomly simulate N spatially fixed networks that are independently switching on and off over time (i.e. "activation" is 0 or 1). Note that this would result in a N-dimensional connectome state space.

      The authors would only need to worry about simulating the network activation time courses, i.e. they would not need to bother with specifying the spatial configuration of each network, instead, they would make the implied assumption that each of these networks has the same spatial configuration in modality 1 and modality 2.

      With that assumption, the CRP calculation should simply correspond to calculating, at each time i in modality 1 and time j in modality 2, the number of networks that are activating in both modality 1 and modality 2, by using their activation time courses. Using this, one can simulate and compute the CRPs for the two scenarios:

      a. Scenario 1: where the simulated activation timecourses are set to be the same between both modalities

      b. Scenario 2: where the simulated activation timecourses are simulated separately for each of the modalities

      We thank the reviewer for raising this important matter as it directly relates to our study hypothesis. To address this point, we chose to focus on the first of the two alternative suggestions of the reviewer, as it provides evidence based on empirical data. In line with the reviewer’s suggestion 1, recurrence plots have indeed been previously applied to connectome dynamics data from the same modality [Hansen et al., NeuroImage 2015; Fig. 2B]. As shown in the referenced study, where the recurrence plot has been estimated within fMRI connectome dynamics, the on-diagonal entries have noticeably larger correlation values in comparison to off-diagonal entries. As the authors state, this contrast emphasizes the autocorrelation of connectome dynamics in their single modality recurrence plot. Extending these findings to our cross-modal recurrence plots, more synchronicity of connectome dynamics across fMRI and EEG will -by theory- translate into stronger correlation values along the diagonal axis as it represents neighboring timepoints in the data. On the other hand, less cross-modal synchronicity translates to a lack of such correlation prevalence along the diagonal axis.

      Complementing these statements with empirical data, Author response image 1 shows the fMRI-to-iEEG and fMRI-to-fMRI CRPs side by side as suggested by the reviewer. For simplicity, we thresholded each CRP at the top 5% of entries and calculated their corresponding on-/off-diagonal ratios. The on/off-diagonal ratio for fMRI-to-fMRI CRP was 4.32 ± 6.26 across -5 to +5 TR lags (with a maximum of 16.56 at a lag of one TR), while this value was 1.00 ± 0.31 for fMRI-to-iEEG CRP. Thus, it becomes apparent that synchronicity of connectome dynamics directly translates to the on-/off-diagonal ratio in CRP.

      Author response image 1.

      Sample CRP shown for a subject for comparing two cases: fMRI-to-iEEG (left) and fMRI-to-fMRI (right). The comparison shows that in the presence of genuine synchronous connectome dynamics, as expected for the within-molality case (right panel), the on-/off-diagonal ratio is expected to show noticeably higher values. This figure establishes a strong link between our proposed metric of on-/off-diagonal ratio and the extent of synchronicity of connectome dynamics.

      Author response image 2.

      On-/off-diagonal ratio in the fMRI-to-fMRI recurrence plot is considerably higher than the cross-modal fMRI-to-iEEG case. Horizontal axis shows the lag where the metric was calculated in the CRP. The bars reflect the group average metric while the whickers show standard deviation. Note that for the within-modality case, ratio is not defined at lag zero because of identical connectome frames.

      (2) Choices in the analysis pipeline leading up to the computation of FC in fMRI or EEG will affect the quality of information available in the FC. For example, but not only, the choice of parcellation (in the study, the number of parcels is very high given the number of EEG sensors). I think it is important that we see the impact of the chosen pipeline on the time-averaged connectomes, an output that the field has some idea about what is sensible. This would give confidence that the information being used in the main analyses in the paper is based on a sensible footing and relates to what the field is used to thinking about in terms of FC. This should be trivial to compute, as it is just a case of averaging the time-varying FCs being used for the CRP over all time points. Admittedly, this approach is less useful for the intracranial EEG.

      We agree with the reviewer on ensuring that the time-averaged FC aligns with expectations of the field and prior work. For this reason, our supplementary analysis already included an analysis that replicates the well-established (albeit modest) spatial similarity between fMRI static connectome and EEG/iEEG static connectomes:

      “In scalp EEG-fMRI data, cross-modal spatial (2D) Pearson correlation of group-level time-averaged connectomes between fMRI and EEG-FCAmp or fMRI and EEG-FCPhase were calculated across all frequency bands. The average spatial correlation value across frequency bands r = 0.28 and r = 0.28 for EEG-FCAmp and EEG-FCPhase, respectively. The spatial correlation values across all frequency bands and connectivity measures were significantly higher than the corresponding null distributions generated by phase-permuted group-level fMRI-FC spatial organization (p<0.005; 200 repetitions; FDR-corrected at q<0.05 for the number of frequency bands). …. Of note, the small effect sizes are strongly in line with prior literature (Hipp and Siegel, 2015; Wirsich et al., 2017; Betzel et al., 2019) and may point to possible divergence in the dynamic domain as investigated in the main manuscript.”

      This replication directly confirms the validity of our selected atlas for further investigations into the connectome dynamics. We acknowledge that with 64 EEG channels, one can only estimate a relatively coarse connectome. Among the well-known coarse atlases, we chose the Desikan-Killiany atlas as it is based on anatomical features, eliminating possible biases towards a particular functional data modality. Moreover, this atlas has been commonly used for multimodal functional connectivity studies, facilitating the confirmation of prior findings in the time-averaged domain [Deligianni et al. Front. Neurosci 2104, Wirsich et al. NeuroImage, 2020, Wirsich et al., NeuroImage 2021].

      (3) Leakage correction. The paper states: "To mitigate this issue, we provide results from source-localized data both with and without leakage correction (supplementary and main text, respectively)." Given that FC in EEG is dominated by spatial leakage (see Hipp paper), then I cannot see how it can be justified to look at non-spatial leakage correction results at all, let alone put them up front as the main results. All main results/figures for the scalp EEG should be done using spatial leakage-corrected EEG data.

      We agree that relying on leakage-uncorrected scalp EEG alone would be problematic. It is for this reason that the intracranial data constructs the core of our results, emphasizing that the observed multiplex architecture of connectomes is indeed present in the absence of source leakage. Only when this finding is established in the intracranial EEG, do we provide the scalp EEG data as a generalization to whole-cortex coverage connectomes of healthy subjects. Moreover, it is known that existing source-leakage correction algorithms may inadvertently remove some of the genuine zero-lag connectivity. For instance, Finger and colleagues have shown that the similarity of functional connectivity to structural connectivity diminishes after correction for source-leakage (Finger et. al, PLOS Comp. Biol. 2016). Therefore, we have deliberately chosen to include our generalization findings before source-leakage correction (main text) as well as after source-leakage correction reflecting a more stringent approach (supplementary analysis). Importantly, our conclusions hold true for both before and after source-leakage correction.

      Reviewer #2 (Public Review):

      Summary:

      The study investigates the brain's functional connectivity (FC) dynamics across different timescales using simultaneous recordings of intracranial EEG/source-localized EEG and fMRI. The primary research goal was to determine which of three convergence/divergence scenarios is the most likely to occur.

      The results indicate that despite similar FC patterns found in different data modalities, the time points were not aligned, indicating spatial convergence but temporal divergence.

      The researchers also found that FC patterns in different frequencies do not overlap significantly, emphasizing the multi-frequency nature of brain connectivity. Such asynchronous activity across frequency bands supports the idea of multiple connectivity states that operate independently and are organized into a multiplex system.

      Strengths:

      The data supporting the authors' claims are convincing and come from simultaneous recordings of fMRI and iEEG/EEG, which has been recently developed and adapted.

      The analysis methods are solid and involve a novel approach to analyzing the co-occurrence of FC patterns across modalities (cross-modal recurrence plot, CRP) and robust statistics, including replication of the main results using multiple operationalizations of the functional connectome (e.g., amplitude, orthogonalized, and phase-based coupling).

      In addition, the authors provided a detailed interpretation of the results, placing them in the context of recent advances and understanding of the relationships between functional connectivity and cognitive states.

      Weaknesses:

      Despite the impressive work, the paper still lacks some analyses to make it complete.

      Firstly, the effect of the window size is unclear, especially in the case of different frequencies where the number of cycles that fall in a window will vary drastically. A typical oscillation lasts just a few cycles (see Myrov et al., 2024), and brain states are usually short-lived because of meta-stability (see Roberts et al., 2019).

      We now replicate our results with an additional window size. Please see section “Recommendations for the authors”.

      Secondly, the authors didn't examine frequencies lower than 1Hz despite similarities between fMRI and infra-slow oscillations found in prior literature (see Palva et al., 2014; Zhang et al., 2023).

      We address this issue below. Please see section “Recommendations for the authors”.

      On a minor note, the phase-locking value (PLV) is positively biased for EEG data (see Palva et al., 2018) and a different metric for phase coupling could be a more appropriate choice (e.g., iPLV/wPLI, see Vinck et al., 2011).

      While iPLV and wPLI are not positively biased, they may reduce genuine zero-phase connectivity as they were initially designed to address spurious zero-phase connectivity from source leakage in scalp EEG. Indeed, PLV connectivity is shown to be more strongly correlated with structural connectivity than wPLI and other phase coupling methods [Finger et al., PLOS Comp. Biol. 2016], emphasizing that it contains genuine connectivity that may be lacking when zero-phase connectivity is removed. We chose PLV because it is a widely used functional connectivity metric, particularly in intracranial data where source leakage is not a critical concern. Thus, using PLV facilitates cross-study comparisons including to our prior work [e.g. Mostame et al. NeuroImage 2020, Mostame et al. J Neurosci 2021].

      The repository with the code is also unavailable.

      Thank you for bringing this to our attention. We have now made our repository publicly accessible at: https://github.com/connectlab/Mostame2024_Multiplex_iEEG_fMRI.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The window widths used to compute FC as a function of time are an important aspect, so I feel that this should be briefly described up-front in the main Results text.

      Methods. "Finally, to compensate for the time lag between hemodynamic and neural responses of the brain (Logothetis et al., 2001), we shifted the fMRI-FC time course 6 seconds backwards in time." What about the effects of temporal blurring from the HRF? Do we need to care about that?

      We agree with the importance to investigate the effect if temporal blurring of the HRF. The main text already included a replication of findings from CRPs generated using fMRI data and EEG amplitude signals convolved with the canonical HRF. This method serves as an alternative to the 6-second shifting. Both approaches produced similar results.

      Methods. In fMRI connectome computation it is common to look at partial correlation rather than full correlation. Partial correlation focuses more on direct connections. It would be good if the paper acknowledged and justified why it is OK to use full correlation.

      We have now added a brief explanation in this regard in the main text (Methods section) as follows:

      “In fMRI connectome computation, some prior work has used partial correlation instead of full correlation. Partial correlation emphasizes direct connections by calculating correlation between any pair of bran regions after regressing out the timeseries of all other regions. However, we have opted to use full correlation because this permits interpretation of our outcomes in the context of the vast existing literature that uses full correlations in fMRI including the majority of bimodal (EEG-fMRI) connectome studies (e.g. Tagliazucchi et al., 2012; Deligianni et al., 2014; Wirsich et al., 2017b, 2020, 2021; Allen et al., 2018).”

      The paper should relate the results to findings showing clear links between simultaneously recorded EEG and fMRI beyond FC. E.g. Mantini (PNAS) 2007 and Van De Ville (PNAS) 2010 to name two.

      In line with this important point, we have extended the existing discussion section that compares our outcomes to EEG-fMRI beyond functional connectivity:

      “Prior multi-modal studies of neural dynamics have predominantly aimed at methodologically cross-validating hemodynamic and electrophysiological observations, thus focusing on their convergence. These important foundational studies include e.g., the cross-modal comparison of region-wise (Mukamel et al., 2005; Nir et al., 2007) or ICN-wise (Mantini et al., 2007) activity fluctuations, instantaneous activity maps (Hunyadi et al., 2019; Zhang et al., 2020) or EEG microstates (Van de Ville 2010), infraslow connectome states (Abreu et al., 2020), or connection-wise FC including studies in the iEEG-fMRI and scalp EEG-fMRI data used in the current study (Ridley et al., 2017; and Wirsich et al., 2020, respectively). In contrast to this prior work, the current study investigated the highly time-resolved cross-modal temporal relationship at the level of FC patterns distributed over all available pairwise connections, and found a connectome-level temporal divergence. The discrepancy between temporal divergence in our study and convergence in prior studies implies that infraslow fluctuations of activity in individual regions or of FC in individual region-pairs observable in both modalities (prior studies) are neurally distinct from connectome-wide FC dynamics observable separately in each modality (current study). Indeed, we confirmed the existence of infraslow electrophysiological FC dynamics driving cross-modal temporal associations at the level of individual connections (Fig. S3) …”

      Reviewer #2 (Recommendations For The Authors):

      (1) Check different window sizes and stability of the FC patterns as a function of it.

      We thank the reviewer for the helpful feedback. We agree that the window size could possibly affect the estimation of individual connectome frames, particularly given that neural processes unfold at hundreds of milliseconds rather than seconds. However, we expect that the asynchronous nature of cross-modal convergence observed in our data would remain intact regardless of the specific window length used for FC calculations. To confirm this, we replicated some of our main analyses in the iEEG-fMRI data with a window length of 500ms (as opposed to 3s, equivalent to one TR) as follows:

      First, we showed that changing the window length does not substantially impact the overall architecture of the connectomes (Author response image 3). Particularly, the time-averaged connectome patterns across different frequency bands were all strongly correlated between the two analyses (500ms and 3s window lengths).

      Author response image 3.

      Time-averaged connectome patterns are highly replicable when calculated using 3s or 500ms window lengths. Horizontal axis represents frequency bands, while each dot represents a subject. Vertical axis shows 2D Pearson correlation of the two connectomes. The group average within each frequency band is marked by a horizontal line.

      Second, we replicated our major findings of CRP and its on-/off-diagonal ratio in the iEEG-fMRI dataset using a window length of 500ms for FC calculations. Indeed, the data does not show a substantial difference in the on-/off-diagonal ratios of the CRP entries between the 3s and 500ms window lengths. Specifically, the ratio was equal to 1.02 ± 0.07 for 500ms window length, emphasizing absence of significant temporal convergence of the connectome dynamics (see Author response image 4). A paired t-test between group-averaged ratios across different lags confirms a lack of significant difference between the two analyses (p= 0.50). This finding further emphasizes the genuine asynchronous nature of connectome dynamics across the neural timescales measured in fMRI and electrophysiology. We have added this analysis to the supplementary data.

      Author response image 4.

      On-/off-diagonal ratio is shown across lags for both analyses: 3s window length (blue) and 500ms window length (red). Each bar shows the mean across subjects, while the whiskers show the corresponding standard deviations.

      (2) Try to decrease the lowest frequency of the analysis below 1Hz or just compute it for multiple log-spaced frequencies from infra-slow delta to high-gamma band.

      Thank you for pointing out this matter. We do not expect considerable signal in the frequency range below the current lower bound of delta (1Hz) because as in most other EEG recordings, EEG was not recorded in DC setting and has a hardware high-pass filter of 0.1Hz. Nonetheless, we investigated the power spectral density of our iEEG-fMRI data and found that there is indeed little signal power left in the available infraslow range [0.5 – 1 Hz] after the preprocessing steps (Author response image 5).

      Author response image 5.

      Power spectral density of all subjects in the fMRI-iEEG dataset shows lack of sufficient power in the infraslow range. Infraslow range signals are almost always filtered out during recording unless the recording setup includes a DC amplifier. The infraslow signal of EEG that is often considered correlated with the fMRI signals in the literature most commonly are extracted from the slow-changing envelope of the bandlimited signals, like envelope of gamma oscillations.

      Accordingly, when the iEEG signals are filtered within the range of [0.5, 1], there is little signal variation observed in the signal timeseries, contrasting the adjacent delta band signal (Author response image 6). Importantly, the power envelope of the delta band (and all other canonical bands not shown here) comprise major fluctuations in the infraslow range, as expected. We would like to emphasize that the existing studies addressing infraslow EEG signal dynamics typically consider the infraslow envelope fluctuations of band-limited signals in traditional frequency bands [e.g. Nir et. al, Nat Neurosci 2008] rather than direct recordings in the infraslow frequency range. Investigating HRF-convolved EEG signals similarly captures the infraslow characteristics of the timeseries [e.g. Mantini et al. PNAS 2007, Sadaghiani et al., J Neurosci 2010] (and note that HRF-convolved analyses are included as supplementary investigation in the current study). To the best of our knowledge, very few studies have investigated direct infraslow EEG signals using DC EEG, and we are aware of only two DC-EEG studies with concurrent fMRI [Hiltunen et al., J Neurosci 2014, Grooms et al., Brain Connectivity 2017]. The infraslow correlates of fMRI in electrophysiological signals reported in prior work therefore reflect the slow changes in faster activity or connectivity of traditional frequency bands, which is indeed already included in the current study.

      Author response image 6.

      Sample timeseries of the iEEG signal of the nine subjects (nine rows) for a 400 second interval. Blue signals show the bandlimited delta with its envelope shown as darker blue. The red signal represents the infraslow signal component left in the data, which is much lower in power.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Ritvo and colleagues present an impressive suite of simulations that can account for three findings of differentiation in the literature. This is important because differentiation-in which items that have some features in common, or share a common associate are less similar to one another than are unrelated items-is difficult to explain with classic supervised learning models, as these predict the opposite (i.e., an increase in similarity). A few of their key findings are that differentiation requires a high learning rate and low inhibitory oscillations, and is virtually always asymmetric in nature.

      This paper was very clear and thoughtful-an absolute joy to read. The model is simple and elegant, and powerful enough to re-create many aspects of existing differentiation findings. The interrogation of the model and presentation of the findings were both extremely thorough. The potential for this model to be used to drive future work is huge. I have only a few comments for the authors, all of which are relatively minor.

      (1) I was struck by the fact that the "zone" of repulsion is quite narrow, compared with the zone of attraction. This was most notable in the modeling of Chanales et al. (i.e., just one of the six similarity levels yielded differentiation). Do the authors think this is a generalizable property of the model or phenomenon, or something idiosyncratic to do with the current investigation? It seems curious that differentiation findings (e.g., in hippocampus) are so robustly observed in the literature despite the mechanism seemingly requiring a very particular set of circumstances. I wonder if the authors could speculate on this point a bit-for example, might the differentiation zone be wider when competitor "pop up" is low (i.e., low inhibitory oscillations), which could help explain why it's often observed in hippocampus? This seems related a bit to the question about what makes something "moderately" active, or how could one ensure "moderate" activation if they were, say, designing an experiment looking at differentiation.

      We thank the reviewer for this comment. In the previous version of the manuscript, in the section entitled “Differentiation Requires a High Learning Rate and Is Sensitive to Activation Dynamics”, we discussed some reasons why differentiation may be more likely to be found in the hippocampus – namely, the high learning rate of the hippocampus and the sparsity of hippocampal activation patterns (pp. 27-28):

      “These results have implications for where to look for differentiation in the brain. Our finding that differentiation requires a high learning rate suggests that differentiation will be more evident in the hippocampus than in neocortex, insofar as hippocampus is thought to have a higher learning rate than neocortex (McClelland et al., 1995). In keeping with this prediction, numerous studies have found differentiation effects in hippocampus but not in neocortical regions involved in sensory processing (e.g., Chanales et al., 2017; Favila et al., 2016; Zeithamova et al., 2018). At the same time, some studies have found differentiation effects in neocortex (e.g., Schlichting et al., 2015; Wammes et al., 2022). One possible explanation of these neocortical differentiation effects is that they are being ``propped up’’ by top-down feedback from differentiated representations in the hippocampus. This explanation implies that disruptions of hippocampal processing (e.g., lesions, stimulation) will eliminate these neocortical differentiation effects; we plan to test this prediction in future work.

      Additionally, the simulations where we adjusted the oscillation amount (using our model of Schlichting et al., 2015) imply that differentiation will be most evident in brain regions where it is relatively hard to activate competitors. Given the U shape of the NMPH learning rule, limiting competitor activity makes it less likely that plasticity will ``cross over'' from weakening (and differentiation) to strengthening (and integration). Thus, within the hippocampus, subregions with sparser activity (e.g., dentate gyrus, and to a lesser extent, CA3; Barnes et al., 1990, GoodSmith et al., 2017; West et al., 1991) will be more prone to differentiation. There is strong empirical support for this prediction. For example, Wammes et al. (2022) manipulated the similarity of stimuli in a statistical learning experiment and found that moderate levels of visual similarity were associated with significant differentiation in the dentate gyrus but not other subregions. Also, numerous studies have found greater differentiation in dentate gyrus / CA3 than in CA1 (e.g., Dimsdale-Zucker et al., 2018; Wanjia et al., 2021; Molitor et al., 2021; Kim et al., 2017; but see Zheng et al., 2021).”

      In the revised draft we have supplemented this discussion with a new section entitled “Reconciling the Prevalence of Differentiation in the Model and in the Data” (pp. 30-31):

      “A key lesson from our model is that, from a computational perspective, it is challenging to obtain differentiation effects: The region of parameter space that gives rise to differentiation is much smaller than the one that gives rise to integration (for further discussion of this issue, see the section in Methods on Practical Advice for Getting the Model to Show Differentiation). However, the fact that integration is more prevalent in our simulations across parameter configurations does not mean that integration will be more prevalent than differentiation in real-life circumstances. What really matters in predicting the prevalence of differentiation in real life is how the parameters of the brain map on to parameters of the model: If the parameters of the brain align with regions of model parameter space that give rise to differentiation (even if these regions are small), this would explain why differentiation has been so robustly observed in extant studies. Indeed, this is exactly the case that we sought to make above about the hippocampus – i.e., that its use of especially sparse coding and a high learning rate will give rise to the kinds of neural dynamics that cause differentiation (as opposed to integration). As another example, while it is true that half of the overlap conditions in our simulation of Chanales et al. (2021) give rise to integration, this does not imply that integration will occur half of the time in the Chanales et al. (2021) study; it may be that the levels of overlap that are actually observed in the brain in Chanales et al. (2021) are more in line with the levels of overlap that give rise to differentiation in our model.”

      (2) With real fMRI data we know that the actual correlation value doesn't matter all that much, and anti-correlations can be induced by things like preprocessing decisions. I am wondering if the important criterion in the model is that the correlations (e.g., as shown in Figure 6) go down from pre to post, versus that they are negative in sign during the post learning period. I would think that here, similar to in neural data, a decrease in correlation would be sufficient to conclude differentiation, but would love the authors' thoughts on that.

      We thank the reviewer for bringing this up. In the paper, we define differentiation as the moving apart of representations – so we agree with the reviewer that it would be appropriate to conclude that differentiation is taking place when correlations go down from pre to post.

      In addition to the definitional question (“what counts as differentiation”), one can also ask the mechanistic question of what is happening in the model at the (simulated) neuronal level in conditions where differentiation (i.e., an average decrease in similarity from pre to post) occurs. Here, the model’s answer is clear: When the similarity of two pairmates decreases, it is because the pairmates have acquired anticorrelated representations at the (simulated) neuronal level. When similarity decreases on average from pre to post, but the average “post” similarity value is not negative, this is because there is a mix of outcomes across runs of the model (due to variance in the initial, random model weights and also variance in the order in which items are presented across training epochs) – some runs lead to differentiation (manifested as anticorrelated pairmate representations) whereas others lead to no change or integration. The average pre-to-post change depends on the relative frequencies with which these different outcomes occur.

      We have made several edits to the paper to clarify this point.

      We added a new section under “Results” in our simulation of Chanales et al. (2021) entitled, “Pairs of Items that Differentiate Show Anticorrelated Representations” (p. 15):

      “Figure 6B also highlights that, for learning rates where robust differentiation effects occur in aggregate (i.e., there is a reduction in mean pattern similarity, averaging across model runs), these aggregate effects involve a bimodal distribution across model runs: For some model runs, learning processes give rise to anticorrelated representations, and for other model runs the model shows integration; this variance across model runs is attributable to random differences in the initial weight configuration of the model. The aggregate differentiation effect is therefore a function of the proportion of model runs showing differentiation (here, anticorrelation) and the proportion of model runs showing integration. The fact that differentiation shows up as anticorrelation in the model's hidden layer relates to the learning effects discussed earlier:

      Unique competitor units are sheared away from (formerly) shared units, so the competitor ends up not having any overlap with the target representation (i.e., the level of overlap is less than you would expect due to chance, which mathematically translates into anticorrelation). We return to this point and discuss how to test for anticorrelation in the Discussion section.”

      We added new text to the “Take-Home Lessons” section in the Chanales et al. (2021) simulation (p. 17):

      “In particular, the simulations expose some important boundary conditions for when representational change can occur according to the NMPH (e.g., that differentiation depends on a large learning rate, but integration does not), and the simulations provide a more nuanced account of exactly how representations change (e.g., that differentiation driven by the NMPH is always asymmetric, whereas integration is sometimes asymmetric and sometimes symmetric; and that, when differentiation occurs on a particular model run, it tends to give rise to anticorrelated representations in the model's hidden layer).”

      We added new text to the “Nature of Representational Change” section in the Favila et al. (2016) simulation (p. 21):

      “Figure 8 - Supplement 1 also indicates that, as in our simulation of Chanales et al. (2021), individual model runs where differentiation occurs show anticorrelation between the pairmate representations, and gradations in the aggregate level of differentiation that is observed across conditions reflect differences in the proportion of trials showing this anticorrelation effect.”

      We added new text to the “Take-Home Lessons” section in the Favila et al. (2016) simulation (p.21):

      “As in our simulation of \cite{chanales2021adaptive}, we found that the NMPH-mediated differentiation was asymmetric, manifested as anticorrelation between pairmate representations on individual model runs, and required a high learning rate, leading to abrupt representational change.”

      We added new text to the “Nature of Representational Change” section in the Schlichting et al. (2015) simulation (p. 26):

      “Also, as in our other simulations, when differentiation occurs on a particular model run it tends to give rise to anticorrelated representations (results not shown).”

      We added new text to the “Take-Home Lessons” section in the Schlichting et al. (2015) simulation (pp. 26-27):

      “As in the other versions of our model, differentiation requires a high learning rate, and – on model runs when it occurs – it is asymmetric and gives rise to anticorrelated representations.”

      We added new text at the start of the Discussion (p. 27):

      “In addition to qualitatively replicating the results from the studies we simulated, our model gives rise to several novel predictions – most notably, that differentiation driven by the NMPH requires a rapid learning rate and, when it occurs for a particular pair of items, it is asymmetric and gives rise to anticorrelated representations.”

      We also added a new section in the Discussion entitled “Testing the Model's Prediction about Anticorrelation”, which (among other things) highlights the reviewer’s point that fMRI pattern similarity values can be affected by preprocessing choices (p. 30):

      “Even though we operationally define differentiation as a reduction in similarity with learning, the way that it actually shows up on individual model runs is as anticorrelation between pairmates; in the model, the size of the aggregate differentiation effect is determined by the proportion of model runs that show this anticorrelation effect (vs. no change or integration). This implies that, if we could get a clean measurement of the similarity of pairmates in an experiment, we might see a multimodal distribution, with some pairmates showing anticorrelation, and others showing increased correlation (integration) or no change in similarity. This kind of clean readout of the similarity of individual pairs might be difficult to obtain with fMRI; it is more feasible that this could be obtained with electrophysiology. Another challenge with using fMRI to test this prediction is that anticorrelation at the individual-neuron level might not scale up to yield anticorrelation at the level of the BOLD response; also, fMRI pattern similarity values can be strongly affected by preprocessing choices – so a negative pattern similarity value does not necessarily reflect anticorrelation at the individual-neuron level. A final caveat is that, while we predict that differentiation will show up as anticorrelation in the brain region that gives rise to the differentiation effect, this might not translate into anticorrelation in areas that are downstream of this region (e.g., if the hippocampus is the source of the differentiation effect, we would expect anticorrelation there, but not necessarily in neocortical regions that receive input from the hippocampus; we revisit this point later in the discussion, when we address limitations and open questions).”

      We added new text in the Discussion, under “Limitations and Open Questions” (p. 31):

      “Importantly, while hippocampus can boost the representation of unique features in neocortex, we expect that neocortex will continue to represent shared perceptual features (e.g., in Favila et al., 2016, the fact that both pairmates are photos of barns). For this reason, in paradigms like the one used by Favila et al. (2016), the predicted effect of hippocampal differentiation on neocortical representations will be a reduction in pattern similarity (due to upregulation in the representation of unique pairmate features) but neocortex should not cross over into anticorrelation in these paradigms (due to its continued representation of shared perceptual features). Indeed, this is exactly the pattern that Wanjia et al. (2021) observed in their study, which used similar stimuli to those used in Favila et al. (2016).”

      Lastly, we updated the Abstract (p. 1)

      “What determines when neural representations of memories move together (integrate) or apart (differentiate)? Classic supervised learning models posit that, when two stimuli predict similar outcomes, their representations should integrate. However, these models have recently been challenged by studies showing that pairing two stimuli with a shared associate can sometimes cause differentiation, depending on the parameters of the study and the brain region being examined. Here, we provide a purely unsupervised neural network model that can explain these and other related findings. The model can exhibit integration or differentiation depending on the amount of activity allowed to spread to competitors – inactive memories are not modified, connections to moderately active competitors are weakened (leading to differentiation), and connections to highly active competitors are strengthened (leading to integration). The model also makes several novel predictions – most importantly, that when differentiation occurs as a result of this unsupervised learning mechanism, it will be rapid and asymmetric, and it will give rise to anticorrelated representations in the region of the brain that is the source of the differentiation. Overall, these modeling results provide a computational explanation for a diverse set of seemingly contradictory empirical findings in the memory literature, as well as new insights into the dynamics at play during learning.”

      (3) For the modeling of the Favila et al. study, the authors state that a high learning rate is required for differentiation of the same-face pairs. This made me wonder what happens in the low learning rate simulations. Does integration occur?

      For the same-face condition of the Favila simulation, lowering learning rate does not result in an overall integration effect:

      Author response image 1.

      In other cases, we do see integration emerge at lower learning rates – e.g., in the Schlichting interleaved condition we see a small integration effect emerge for a learning rate value of 0.3:

      Author response image 2.

      Our view is that, while integration can emerge at low learning rates, it is not a reliable property of the model – in some cases, there is a “window” of learning rates where there is enough learning to drive integration but not enough to drive differentiation, and in other cases there is not. Given this lack of reliability across simulations, we would prefer not to discuss this in the paper.

      This paradigm has a lot of overlap with acquired equivalence, and so I am thinking about whether these are the sorts of small differences (e.g., same-category scenes and perhaps a high learning rate) that bias the system to differentiate instead of integrate.

      We agree that it would be very interesting to use the model to explore acquired equivalence and related phenomena, but we think it is out of scope of the current paper. We have added some text to the Discussion under “Limitations and Open Questions” (p. 32):

      “Another important future direction is to apply the model to a wider range of learning phenomena involving representational change – for example, acquired equivalence, which (like some of the studies modeled here) involves linking distinct stimuli to a shared associate (see, e.g., Honey and Hall, 1989; Shohamy and Wagner, 2008; Myers et al., 2003; Meeter et al., 2009; de Araujo Sanchez and Zeithamova, 2023). It is possible that some of these phenomena might be better explained by supervised learning, or a mixture of unsupervised and supervised learning, than by unsupervised learning alone.”

      (4) For the simulations of the Schlichting et al. study, the A and B appear to have overlap in the hidden layer based on Figure 9, despite there being no similarity between the A and B items in the study (in contrast to Favila et al., in which they were similar kinds of scenes, and Chanales et al., in which they were similar colors). Why was this decision made? Do the effects depend on some overlap within the hidden layer? (This doesn't seem to be explained in the paper that I saw though, so maybe just it's a visualization error?)

      Overlap in the pretrained hidden representations of A and B is not strictly necessary for these effects – it would be possible to reconfigure other parameters to get high levels of competition even if there were no overlap (e.g., by upregulating the strengths of connections from shared input features). Having said that, it is definitely true that overlap between the pretrained hidden representations boosts competition, and we think it is justified to posit this in the Schlichting simulation. We have now added an explanation for this in the paper (p. 23):

      “New text in Schlichting, “Knowledge Built into the Network”

      Matching the previous two simulations, we pretrained the weights so the hidden representations of the stimuli initially had 2/6 units in common. Even though the A and B stimuli used in the actual experiment did not have obvious feature overlap (they were randomly selected novel objects), it is important to note that the hidden layer is not simply a representation of the sensory features of the A and B stimuli; the hidden layer also receives input from the output layer, which represents the shared associate of A and B (X). We think that the presence of this shared associate justifies our use of initially-overlapping hidden representations.”

      (5) It seems as though there were no conditions under which the simulations produced differentiation in both the blocked and intermixed conditions, which Schlichting et al. observed in many regions (as the present authors note). Is there any way to reconcile this difference?

      We thank the reviewer for bringing this up. If we set the connection strength between X (in the output layer) and A (in the hidden layer) in the blocked condition to .9 instead of .999 (keeping this connection strength at .8 for the interleaved condition) and we set Osc to .0615, we observe differentiation in both conditions.

      Rather than replacing the original results in the paper, which would entail re-making the associated videos, etc., we have added a supplementary figure (Figure 10 - Supplement 1), which is included on p. 46.

      We also added the following to the Results section of the Schlichting simulation in the main text (p. 26):

      “Figure 10 - Supplement 1 shows results from an alternative parameterization where, in the low-oscillation-amplitude condition, differentiation is observed in both the blocked and interleaved conditions (mirroring results from Schlichting et al., 2015, who found differentiation in both conditions in several regions of interest, including parts of the hippocampus and medial prefrontal cortex).”

      (6) A general question about differentiation/repulsion and how it affects the hidden layer representation in the model: Is it the case that the representation is actually "shifted" or repelled over so it is no longer overlapping? Or do the shared connections just get pruned, such that the item that has more "movement" in representational space is represented by fewer units on the hidden layer (i.e., is reduced in size)? I think, if I understand correctly, that whether it gets shifted vs. reduce would depend on the strength of connections along the hidden layer, which would in turn depend on whether it represents some meaningful continuous dimension (like color) or not. But, if the connections within the hidden layer are relatively weak and it is the case that representations become reduced in size, would there be any anticipated consequences of this (e.g., cognitively/behaviorally)?

      The representations are shifted – this is discussed in the Chanales results section:

      “Because the activity ``set point'' for the hidden layer (determined by the kWTA algorithm) involves having 6 units active, and the unique parts of the competitor only take up 4 of these 6 units, this leaves room for activity to spread to additional units. Given the topographic projections in the output layer, the model is biased to ``pick up'' units that are adjacent in color space to the currently active units; because activity cannot flow easily from the competitor back to the target (as a result of the aforementioned severing of connections), it flows instead {\em away} from the target, activating two additional units, which are then incorporated into the competitor representation. This sequence of events (first a severing of the shared units, then a shift away from the target) completes the process of neural differentiation, and is what leads to the behavioral repulsion effect in color recall (because the center-of-mass of the color representation has now shifted away from the target).”

      Reviewer #2 (Public Review):

      This paper addresses an important computational problem in learning and memory. Why do related memory representations sometimes become more similar to each other (integration) and sometimes more distinct (differentiation)? Classic supervised learning models predict that shared associations should cause memories to integrate, but these models have recently been challenged by empirical data showing that shared associations can sometimes cause differentiation. The authors have previously proposed that unsupervised learning may account for these unintuitive data. Here, they follow up on this idea by actually implementing an unsupervised neural network model that updates the connections between memories based on the amount of coactivity between them. The goal of the authors' paper is to assess whether such a model can account for recent empirical data at odds with supervised learning accounts. For each empirical finding they wish to explain, the authors built a neural network model with a very simple architecture (two inputs layers, one hidden layer, and one output layer) and with prewired stimulus representations and associations. On each trial, a stimulus is presented to the model, and inhibitory oscillations allow competing memories to pop up. Pre-specified u-shaped learning rules are used to update the weights in the model, such that low coactivity leaves model connections unchanged, moderate coactivity weakens connections, and high coactivity strengthens connections. In each of the three models, the authors manipulate stimulus similarity (following Chanales et al), shared vs distinct associations (following Favila et al), or learning strength (a stand in for blocked versus interleaved learning schedule; following Schlichting et al) and evaluate how the model representations evolve over trials.

      As a proof of principle, the authors succeed in demonstrating that unsupervised learning with a

      simple u-shaped rule can produce qualitative results in line with the empirical reports. For instance, they show that pairing two stimuli with a common associate (as in Favila et al) can lead to *differentiation* of the model representations. Demonstrating these effects isn't trivial and a formal modeling framework for doing so is a valuable contribution. Overall, the authors do a good job of both formally describing their model and giving readers a high level sense of how their critical model components work, though there are some places where the robustness of the model to different parameter choices is unclear. In some cases, the authors are very clear about this (e.g. the fast learning rate required to observe differentiation). However, in other instances, the paper would be strengthened by a clearer reporting of the critical parameter ranges.

      We thank the reviewer for raising this point. The interdependence of parameters in our model makes it infeasible to identify critical parameter ranges. We have added a paragraph to the “Approach to Parameterization and Data Fitting” section in the Methods to address this point (p. 33):

      “The overall goal of this modeling work is to account for key empirical regularities regarding differentiation and integration and to establish boundary conditions on these regularities. As such, the modeling work described below focuses more on qualitative fits to general properties of the data space than on quantitative fits to results from specific studies. Automatic parameter optimization is not feasible for this kind of model, given the large number of model parameters and the highly interactive, nonlinear nature of competitive dynamics in the model; consequently, model fitting was done by hand.

      These complex interactions between parameters also make it infeasible to list “critical parameter ranges” for generating particular model outcomes. Our experience in working with the model has been that activation dynamics are what matter most for learning, and that disparate parameter sets can give rise to the same activation dynamics and -- through this -- the same learning effects; likewise, similar parameter sets can give rise to different activation dynamics and different learning outcomes. Consequently, in this paper we have focused on characterizing the dynamics that give rise to different learning effects (and how they can be affected by local parameter perturbations, e.g., relating to learning rate and oscillation size), rather than the – impossible, we believe – task of enumerating the full set of parameter configurations that give rise to a particular result.”

      For instance, it's clear from the manipulation of oscillation strength in the model of Schlichting et al that this parameter can dramatically change the direction of the results. The authors do report the oscillation strength parameter values that they used in the other two models, but it is not clear how sensitive these models are to small changes in this value.

      In some cases, the effects of oscillation strength are relatively smooth. For example, in the Favila simulation, increasing the oscillation amplitude Osc effectively recapitulates the U-shaped curve (i.e., higher levels of Osc lead to more competitor activation, which initially leads to weakening / differentiation but then gives way to strengthening / integration), as is shown for the Favila Different Face condition in this plot:

      Author response image 3.

      In the Chanales 2/6 overlap condition, the effects of varying Osc are more nonlinear:

      Author response image 4.

      We think this is attributable to the increased “all-or-none” recurrent dynamics in this simulation (due to the recurrent projections within the output layer), which make it more difficult to evoke moderate (vs. high) levels of activation. This difficulty in reliably obtaining graded activation dynamics is likely a consequence of the small-scale (“toy”) nature of the model and the simple inhibitory mechanisms employed here, as opposed to being a generalizable property of the brain – presumably, the actual brain employs more nuanced and effective means of controlling activation. Furthermore, we don’t think that the high prevalence of integration in the model’s parameter space necessarily translates into a prediction that integration should be more prevalent overall – see the new “Reconciling the Prevalence of Differentiation in the Model and in the Data” section described in response to one of the reviewer’s other points below. Due to the paper already being quite long, we have opted not to include the above plots / discussion in the paper.

      Similarly, it's not clear whether the 2/6 hidden layer overlap (only explicitly manipulated in the model of Chanales et al) is required for the other two models to work.

      When we were parameterizing the model, we opted to keep the 2/6 level of overlap for all of the simulations and we adjusted other parameters to fit the data; in part, this was because overlap can only be adjusted in discrete jumps, whereas other influential parameters in the model can be adjusted in a more graded, real-valued way. Our use of 2/6 overlap (as opposed to, say, 1/6 or 3/6 overlap) for the Favila and Schlichting models was done out of convenience, and should not be interpreted as a strong statement that this particular level of overlap is necessary for obtaining differentiation; we could easily get the model to show differentiation given other overlap levels by adjusting other parameters.

      Finally, though the u-shaped learning rule is essential to this framework, the paper does little formal investigation of this learning rule. It seems obvious that allowing the u-shape to collapse too much toward a horizontal line would reduce the model's ability to account for empirical results, but there may be other more interesting features of the learning rule parameterization that are essential for the model to function properly.

      Given that the paper is already quite long, we have opted not to include further exploration of the parameters of the U-shaped learning rule in the paper. However, for the reviewer’s information, we report the effects of a few illustrative manipulations of these parameters below. As a general principle, the effects of these manipulations make sense in light of the theoretical framework described in the paper.

      For example, the parameter “DRevMag” controls the size of the negative “dip” in the U-shaped curve (more negative values = a larger dip). Given that this negative dip is essential for severing weights to competitors and causing differentiation, shifting DRevMag upwards towards zero should shift the balance of the model away from differentiation and towards integration. This is indeed what we observe, as shown in this parameter sweep from the Chanales simulation:

      Author response image 5.

      As another example: The “DRev” parameter controls where the U-shaped curve transitions from negative weight change to positive weight change. Lower values of DRev mean that the region of coactivity values leading to negative weight change will be smaller, and the region of coactivity values leading to positive weight change will be larger. As such, we would expect that lower values of DRev would bias the model toward integration. That is indeed the case, as shown in this parameter sweep from the Schlichting Blocked simulation:

      Author response image 6.

      There are a few other points that may limit the model's ability to clearly map onto or make predictions about empirical data. The model(s) seems very keen to integrate and do so more completely than the available empirical data suggest. For instance, there is a complete collapse of representations in half of the simulations in the Chanales et al model and the blocked simulation in the Schlichting et al model also seems to produce nearly complete integration Even if the Chanales et al paper had observed some modest behavioral attraction effects, this model would seem to over-predict integration. The author's somewhat implicitly acknowledge this when they discuss the difficulty of producing differentiation ("Practical Advice for Getting the Model to Show Differentiation") and not of producing integration, but don't address it head on.

      We thank the reviewer for this comment – R1 had a similar comment. We have added a new section to the Discussion to address this point (p. 30):

      “Reconciling the Prevalence of Differentiation in the Model and in the Data.

      A key lesson from our model is that, from a computational perspective, it is challenging to obtain differentiation effects: The region of parameter space that gives rise to differentiation is much smaller than the one that gives rise to integration (for further discussion of this issue, see the section in Methods on Practical Advice for Getting the Model to Show Differentiation). However, the fact that integration is more prevalent in our simulations across parameter configurations does not mean that integration will be more prevalent than differentiation in real-life circumstances. What really matters in predicting the prevalence of differentiation in real life is how the parameters of the brain map on to parameters of the model: If the parameters of the brain align with regions of model parameter space that give rise to differentiation (even if these regions are small), this would explain why differentiation has been so robustly observed in extant studies. Indeed, this is exactly the case that we sought to make above about the hippocampus – i.e., that its use of especially sparse coding and a high learning rate will give rise to the kinds of neural dynamics that cause differentiation (as opposed to integration). As another example, while it is true that half of the overlap conditions in our simulation of Chanales et al. (2021) give rise to integration, this does not imply that integration will occur half of the time in the Chanales et al. (2021) study; it may be that the levels of overlap that are actually observed in the brain in Chanales et al. (2021) are more in line with the levels of overlap that give rise to differentiation in our model.”

      Second, the authors choice of strongly prewiring associations in the Chanales and Favila models makes it difficult to think about how their model maps onto experimental contexts where competition is presumably occurring while associations are only weakly learned. In the Chanales et al paper, for example, the object-face associations are not well learned in initial rounds of the color memory test. While the authors do justify their modeling choice and their reasons have merit, the manipulation of AX association strength in the Schlichting et al model also makes it clear that the association strength has a substantial effect on the model output. Given the effect of this manipulation, more clarity around this assumption for the other two models is needed.

      We thank the reviewer for bringing this up. We have edited the section entitled “A Note on Prewiring Representations” in the Methods to further justify our choice to prewire associations in the Chanales and Favila models (p. 37):

      “In our model, our practice of ``prewiring'' memory representations for the A and B pairmates serves two functions. In some cases, it is meant to stand in for actual training (as in the blocked / interleaved manipulation; the connections supporting the AX association are prewired to be stronger in the blocked condition than in the interleaved condition). However, the other, more fundamental role of prewiring is to ensure that the A and B input patterns evoke sparse distributed representations in the hidden layer (i.e., where some units are strongly active but most other units are inactive). In the real brain, this happens automatically because the weight landscape has been extensively sculpted by both experience and evolution. For example, in the real hippocampus, when the second pairmate is presented for the first time, it will evoke a sparse distributed representation in the CA3 subfield (potentially overlapping with the first pairmate’s CA3 representation) even before any learning of the second pairmate has occurred, due to the strong, sparse mossy fiber projections that connect the dentate gyrus to CA3 (McNaughton & Morris, 1987). As discussed above, we hypothesize that this initial, partial overlap between the second pairmate’s representation and the first pairmate’s representation can lead to pop-up of the unique features of the first pairmate’s representation, triggering learning that leads to differentiation or integration. In our small-scale model, we are effectively starting with a ``blank brain''; in the absence of prewiring, the A and B inputs would activate overly diffuse representations that do not support these kinds of competitive dynamics. As such, prewiring in our model is necessary for proper functioning. The presence of prewired A and B representations should therefore not be interpreted as reflecting a particular training history (except in the blocked / interleaved case above); rather, these prewired representations constitute the minimum step we would take to ensure well-defined competitive dynamics in our small-scale model.

      The fact that connection strengths serve this dual function – sometimes reflecting effects of training (as in our simulation of Schlichting et al., 2015) and in other cases reflecting necessary prewiring – complicates the interpretation of these strength values in the model. Our view is that this is a necessary limitation of our simplified modeling approach – one that can eventually be surmounted through the use of more biologically-detailed architectures (see Limitations and Open Questions in the Discussion).”

      Overall, this is strong and clearly described work that is likely to have a positive impact on computational and empirical work in learning and memory. While the authors have written about some of the ideas discussed in this paper previously, a fully implemented and openly available model is a clear advance that will benefit the field. It is not easy to translate a high-level description of a learning rule into a model that actually runs and behaves as expected. The fact that the authors have made all their code available makes it likely that other researchers will extend the model in numerous interesting ways, many of which the authors have discussed and highlighted in their paper.

      Reviewer #3 (Public Review):

      This paper proposes a computational account for the phenomenon of pattern differentiation (i.e., items having distinct neural representations when they are similar). The computational model relies on a learning mechanism of the nonmonotonic plasticity hypothesis, fast learning rate and inhibitory oscillations. The relatively simple architecture of the model makes its dynamics accessible to the human mind. Furthermore, using similar model parameters, this model produces simulated data consistent with empirical data of pattern differentiation. The authors also provide insightful discussion on the factors contributing to differentiation as opposed to integration. The authors may consider the following to further strengthen this paper:

      The model compares different levels of overlap at the hidden layer and reveals that partial overlap seems necessary to lead to differentiation. While I understand this approach from the perspective of modeling, I have concerns about whether this is how the human brain achieves differentiation. Specifically, if we view the hidden layer activation as a conjunctive representation of a pair that is the outcome of encoding, differentiation should precede the formation of the hidden layer activation pattern of the second pairmate. Instead, the model assumes such pattern already exists before differentiation. Maybe the authors indeed argue that mechanistically differentiation follows initial encoding that does not consider similarity with other memory traces?

      Related to the point above, because the simulation setup is different from how differentiation actually occurs, I wonder how valid the prediction of asymmetric reconfiguration of hidden layer connectivity pattern is.

      We thank the reviewer for this comment. In the revised manuscript, we have edited the “Note on Prewiring Representations” in the Methods to clarify how our assumptions about prewiring relate to what we really think is happening in the brain (p. 37):

      “In our model, our practice of ``prewiring'' memory representations for the A and B pairmates serves two functions. In some cases, it is meant to stand in for actual training (as in the blocked / interleaved manipulation; the connections supporting the AX association are prewired to be stronger in the blocked condition than in the interleaved condition). However, the other, more fundamental role of prewiring is to ensure that the A and B input patterns evoke sparse distributed representations in the hidden layer (i.e., where some units are strongly active but most other units are inactive). In the real brain, this happens automatically because the weight landscape has been extensively sculpted by both experience and evolution. For example, in the real hippocampus, when the second pairmate is presented for the first time, it will evoke a sparse distributed representation in the CA3 subfield (potentially overlapping with the first pairmate’s CA3 representation) even before any learning of the second pairmate has occurred, due to the strong, sparse mossy fiber projections that connect the dentate gyrus to CA3 (McNaughton & Morris, 1987). As discussed above, we hypothesize that this initial, partial overlap between the second pairmate’s representation and the first pairmate’s representation can lead to pop-up of the unique features of the first pairmate’s representation, triggering learning that leads to differentiation or integration. In our small-scale model, we are effectively starting with a ``blank brain''; in the absence of prewiring, the A and B inputs would activate overly diffuse representations that do not support these kinds of competitive dynamics. As such, prewiring in our model is necessary for proper functioning. The presence of prewired A and B representations should therefore not be interpreted as reflecting a particular training history (except in the blocked / interleaved case above); rather, these prewired representations constitute the minimum step we would take to ensure well-defined competitive dynamics in our small-scale model.

      The fact that connection strengths serve this dual function – sometimes reflecting effects of training (as in our simulation of Schlichting et al., 2015) and in other cases reflecting necessary prewiring – complicates the interpretation of these strength values in the model. Our view is that this is a necessary limitation of our simplified modeling approach – one that can eventually be surmounted through the use of more biologically-detailed architectures (see Limitations and Open Questions in the Discussion).”

      Although as the authors mentioned, there haven't been formal empirical tests of the relationship between learning speed and differentiation/integration, I am also wondering to what degree the prediction of fast learning being necessary for differentiation is consistent with current data. According to Figure 6, the learning rates lead to differentiation in the 2/6 condition achieved differentiation after just one-shot most of the time. On the other hand, For example, Guo et al (2021) showed that humans may need a few blocks of training and test to start showing differentiation.

      We thank the reviewer for mentioning this. We have added a paragraph to the “Differentiation Requires a High Learning Rate and Is Sensitive to Activity Dynamics” section of the Discussion that addresses this point (pp. 28-29):

      “Although the results from Wanjia et al. (2021) provide strong support for the model's prediction that differentiation will be abrupt, they raise another question: What explains variance across items in when this abrupt change takes place? The answer to this question remains to be seen, but one possibility is encoding variability: If we assume that participants stochastically sample (i.e., attend to) the features of the scene pairmates, it is possible that participants might initially fail to sample the features that distinguish the scene pairmates, which can be quite subtle – and if the distinguishing features of the pairmates are not represented in high-level visual regions (i.e., the pairmates are represented in these regions as having the same features), this could delay the onset of differentiation until the point at which the distinguishing features happen (by chance) to be sampled.”

      Related to the point above, the high learning rate prediction also seems to be at odds with the finding that the cortex, which has slow learning (according to the theory of complementary learning systems), also shows differentiation in Wammes et al (2022).

      We now address this point in the section of the Discussion entitled “Differentiation Requires a High Learning Rate and Is Sensitive to Activity Dynamics” (p. 27):

      “Our finding that differentiation requires a high learning rate suggests that differentiation will be more evident in the hippocampus than in neocortex, insofar as hippocampus is thought to have a higher learning rate than neocortex (McClelland et al., 1995). In keeping with this prediction, numerous studies have found differentiation effects in hippocampus but not in neocortical regions involved in sensory processing (e.g., Chanales et al., 2017; Favila et al., 2016; Zeithamova et al., 2018). At the same time, some studies have found differentiation effects in neocortex (e.g., Schlichting et al., 2015; Wammes et al., 2022). One possible explanation of these neocortical differentiation effects is that they are being ``propped up’’ by top-down feedback from differentiated representations in the hippocampus.”

      More details about the learning dynamics would be helpful. For example, equation(s) showing how activation, learning rate and the NMPH function work together to change the weight of connections may be added. Without the information, it is unclear how each connection changes its value after each time point.

      We thank the reviewer for this comment. We have made two major changes to address this concern. First, we have edited the “Learning” section within “Basic Network Properties” in the main text (pp. 6-7):

      “Connection strengths in the model between pairs of connected units x and y were adjusted at the end of each trial (i.e., after each stimulus presentation) as a U-shaped function of the coactivity of x and y, defined as the product of their activations on that trial. The parameters of the U-shaped learning function relating coactivity to change in connection strength (i.e., weakening / strengthening) were specified differently for each projection where learning occurs (bidirectionally between the input and hidden layers, the hidden layer to itself, and the hidden to output layer). Once the U-shaped learning function for each projection in each version of the model was specified, we did not change it for any of the various conditions. Details of how we computed coactivity and how we specified the U-shaped function can be found in the Methods section.”

      Second, we have added the requested equations to the “Learning” part of the Methods (pp. 37-38):

      The right side of the function, strong activation leads to strengthening of the connectivity, which I assume will lead to stronger activation on the next time point. The model has an upper limit of connection strength to prevent connection from strengthening too much. The same idea can be applied to the left side of the function: instead of having two turning points, it can be a linear function such that low activation keeps weakening connection until the lower limit is reached. This way the NMPH function can take a simpler form (e.g., two line-segments if you think the weakening and strengthening take different rates) and may still simulate the data.

      We thank the reviewer for mentioning this. We have added a new paragraph in the “Learning” section of the Methods to justify the particular shape of the learning curve (pp. 38-39):

      “Evidence for the U-shaped plasticity function used here (where low activation leads to no change, moderate activation leads to weakening, and higher levels of activation lead to strengthening) was previously reviewed in Ritvo et al. (2019). In brief, there are three lines of work that support the U shape: First, multiple neurophysiological studies have found that moderate postsynaptic depolarization leads to synaptic weakening and higher levels of depolarization lead to synaptic strengthening (e.g., Artola et al., 1990; Hansel et al., 1996). Second, human neuroscience studies have used pattern classifiers, applied to fMRI and EEG data, to measure memory activation, and have related this measure to subsequent memory accessibility; several studies using this approach have found that low levels of activation lead to no change in memory strength, moderate levels of activation lead to impaired subsequent memory, and higher levels of activation lead to increased subsequent memory (e.g., Newman and Norman, 2010; Detre et al., 2013; Kim et al., 2014; for related findings, see Lewis-Peacock and Norman, 2014; Wang et al., 2019). Third, a recent human fMRI study by Wammes et al. (2022) manipulated memory activation by varying the visual similarity of pairmates and observed a U-shaped function relating visual similarity to representational change in the hippocampus, whereby low levels of pairmate similarity were associated with no change, moderate levels of similarity were associated with differentiation, and the differentiation effect went away at higher levels of similarity.

      We have also included a pointer to this new paragraph in the “Nonmonotonic Plasticity Hypothesis” section of Introduction (p. 2):

      (for further discussion of the empirical justification for the NMPH, see the Learning subsection in the Methods)”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A few additional minor things about data presentation and the like:

      (1) Figure 1 legend - a more general description of how to interpret the figure might be helpful for more naive readers (e.g., explaining how one can visualize in the schematic that there is overlap in the hidden layer between A and B). Also, from the Figure 1 depiction, it's not clear what is different about the setup from the initial left hand side panels in A, B, C, to make it such that activity spreads strongly to A in panel A, weakly in panel B, and not at all in panel C since the weights are the same. Is there a way to incorporate this into the graphic, or describe it in words?

      To address this point, we have added the following text to the Figure 1 caption (p. 3):

      “Note that the figure illustrates the consequences of differences in competitor activation for learning, without explaining why these differences would arise. For discussion of circumstances that could lead to varying levels of competitor activation, see the simulations described in the text.”

      (2) I believe not all of the papers cited on lines 193-195 actually have similarity manipulations in them. I'd recommend double checking this list and removing those less relevant to the statement.

      Thank you for pointing this out; we have removed the Ballard reference and we have clarified what we mean by similarity reversal (p. 7):

      “The study was inspired by recent neuroimaging studies showing ``similarity reversals'', wherein stimuli that have more features in common (or share a common associate) show less hippocampal pattern similarity (Favila et al., 2016; Schlichting et al., 2015; Molitor et al., 2021; Chanales et al., 2017; Dimsdale-Zucker et al., 2018; Wanjia et al., 2021; Zeithamova et al., 2018; Jiang et al., 2020; Wammes et al., 2022).”

      (3) I wanted a bit more detail about how the parameters were set in the main paper, not just in the methods. Even something as brief as noting that model fitting was done by hand by tweaking parameters to re-create the empirical patterns (if I'm understanding correctly) would have been helpful for me.

      To address this point, we have added the following text under “Basic Network Properties” (p. 4):

      “Our goal was to qualitatively fit key patterns of results from each of the aforementioned studies. We fit the parameters of the model by hand as they are highly interdependent (see the Methods section for more details).”

      (4) In Figure 4E, it would be helpful to describe the x and y axes of the MDS plots in the legend.

      To address this point, we have added the following new text to the Figure 4 caption that clarifies how the MDS plots were generated (p. 11):

      “MDS plots were rotated, shifted, and scaled such that pairmate 1before is located at (0,0), pairmate 2before is located directly to the right of pairmate 1before, and the distance between pairmate 1before and pairmate 2before is proportional to the baseline distance between the pairmates.”

      (5) Figure 6 - at first I thought the thicker line was some sort of baseline, but I think it is just many traces on top of one another. If other readers may be similarly confused, perhaps this could be stated.

      Thanks for this comment. We have updated Figure 6 (p. 16).

      We have also updated the caption.

      I am having a lot of difficulty understanding the terms "competitor-to-competitor,"

      "competitor-to-target/shared," and "target/shared-to-target/shared," and therefore I don't fully get Figure 5. I think it might be helpful to expand the description of these terms where they are first introduced in the paper (p. 13?). I think I am missing something crucial here, and I am not quite sure what that is-which I know is not very helpful! But, to narrate my confusion a bit, I thought that these terms would somehow relate to connections between different connections of the network. For example is competitor-to-competitor within the hidden layer? Or is this somehow combining across relevant connections that might span different pairs of layers in the model? And, I really have no idea why it is "target/shared."

      Thank you for these comments. We have updated Figure 5 and we have also made several changes to the main text and the figure caption to address these points.

      Changes to the main text (p. 13):

      “Whether symmetric or asymmetric integration occurs depends on the relative strengths of connections between pairs of unique competitor units (competitor-competitor connections) compared to connections between unique competitor units and shared units (competitor-shared connections) after the first trial (Figure 5; note that the figure focuses on connections between hidden units, but the principle also applies to connections that span across layers). Generally, coactivity between unique competitor units (competitor-competitor coactivity) is less than coactivity between unique competitor units and shared units (competitor-shared coactivity), which is less than coactivity between unique target units and shared units (target-shared coactivity).”

      (7) Relatedly in Figure 13, I understand how some competitor-to-target/shared connections could be spared in the bottom instance given panel B. However, I'm struggling to understand how that relates to the values in the corresponding chart in panel A. What about panel A, bottom (vs. the top) means lower coactivities between some competitor-to-target/shared? Is it because if the noise level is higher, the "true" activation of competitor-to-target/shared connections is weaker? I think again, I'm missing something critical here! and wonder if other readers may be in the same situation. (I know the authors described this also on p. 36, but I'm still confused!)

      We have updated Figure 13 to clarify these points.

      (8)  In Figure 9, I believe there is no caption for panel D. Also, it looks as though the item unit active for A and B is the same. I wonder if this is an error?

      Thank you for catching these errors! They have both been fixed.

      Reviewer #2 (Recommendations For The Authors):

      -Perhaps I missed it, but I think defining coactivity (how it is computed) in the main text would be useful for readers, as this is critical for understanding the model. I did find it in the methods.

      We thank the reviewer for this suggestion. We have updated the “Learning” section within “Basic Network Properties” in the main text to address this point (pp. 6-7):

      “Connection strengths in the model between pairs of connected units x and y were adjusted at the end of each trial (i.e., after each stimulus presentation) as a U-shaped function of the coactivity of x and y, defined as the product of their activations on that trial. The parameters of the U-shaped learning function relating coactivity to change in connection strength (i.e., weakening / strengthening) were specified differently for each projection where learning occurs (bidirectionally between the input and hidden layers, the hidden layer to itself, and the hidden to output layer). Once the U-shaped learning function for each projection in each version of the model was specified, we did not change it for any of the various conditions. Details of how we computed coactivity and how we specified the U-shaped function can be found in the Methods section.”

      -The modeling results in the different face condition are at odds with the data for the Favila et al model (they observe some differentiation in the paper and the model predicts no change). This could be due to a number of unmodeled factors, but it is perhaps worth noting.

      Thank you for pointing this out. It is possible to better capture the pattern of results observed by Favila et al. in their paper (with some differentiation in the different-face condition and even more differentiation in the same-face condition) by slightly adjusting the model parameters (specifically, by setting the oscillation amplitude Osc for the hidden layer to .1 instead of .067).

      Rather than replacing the old (Osc \= .067) results in the paper, which would entail re-making the associated videos, etc., we have added a supplementary figure (Figure 8 - Supplement 1; see p.45):

      We also added new text to the Favila Results, under “Differentiation and Integration” (p. 20):

      “Note also that the exact levels of differentiation that are observed in the different-face and same-face conditions are parameter dependent; for an alternative set of results showing some differentiation in the different-face condition (but still less than is observed in the same-face condition), see Figure 8 - Supplement 1.”

      -Related to my comment in the public review about pre-wiring associations, in the caption for Figure 9 (Schlichting model), the authors report "In both conditions, the pre-wired connection linking the "item B" hidden units to the "item X" output unit is set to .7. In the interleaved condition, the connection linking the "item A" hidden units to the "item X" output unit is set to .8, to reflect some amount of initial AX learning. In the blocked condition, the connection linking the "item A" hidden units to the "item X" output unit is set a higher value (.999), to reflect extra AX learning." What are the equivalent values for the other models, especially the Favila model since the structure is the same as Schlichting? I understood all the "strong" connections to be .99 unless otherwise stated. If that's the case, I don't understand why the blocked Schlichting model and the Favila model produce opposite effects. More clarity would be useful here.

      We have added a new paragraph to the results section for the Schlicting model (under “Differentiation and Integration”) to clarify why the blocked Schlichting model and the Favila model show different results (p. 24):

      “Note that the key feature driving integration in the blocked condition of this simulation is not the high strength of the connection from X to A on its own – rather, it is the asymmetry in the pretrained connection strengths from X to A (.999) and from X to B (.7). This asymmetry, which is meant to reflect the extensive training on A-X that occurred before the initial presentation of B-X, results in the A-X hidden representation decisively winning the competition during B-X presentation, which then leads to the B input also being linked to this representation (i.e., integration). It is instructive to compare this to the same-face condition from our simulation of Favila et al. (2016): In that simulation, the two pairmates are also linked strongly (.99 initial connection strength) to a shared associate, but in that case the connections are equally strong, so there is more balanced competition -- in this case, the competitor representation only comes to mind moderately (instead of displacing the target representation), so the result is differentiation instead of integration.”

      -The meaning of the different colored dots in Figure 5 is bit hard to keep track of, even given the legend labels. The figure might benefit from a model sketch highlighting each of the different coactivity types. The left side of Fig 13 was useful but again somehow mapping on the colors would help further. Another note on these figures: what does having two dots of each color mean? Is it just an illustration of the variance? There would be more dots if there was one dot per coactivity value.

      We have updated Figure 5 and Figure 13 to clarify these points (including a clarification that the dots only represent a subset of the possible pairings between units).

      -While I appreciate the goal of the paper is to account for these three studies, readers who aren't familiar with or specifically interested in these studies may appreciate a small amount of intuition on why formalizing unsupervised learning models may be broadly important for computational investigations of learning/memory/cognition.

      We have added the following text under “Basic Network Properties” in the Introduction to address this point (p. 4):

      “Achieving a better understanding of unsupervised learning is an important goal for computational neuroscience, given that learning agents have vastly more opportunities to learn in an unsupervised fashion than from direct supervision (for additional discussion of this point, see, e.g., Zhuang et al., 2021).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper presents a compelling and comprehensive study of decision-making under uncertainty. It addresses a fundamental distinction between belief-based (cognitive neuroscience) formulations of choice behaviour with reward-based (behavioural psychology) accounts. Specifically, it asks whether active inference provides a better account of planning and decision-making, relative to reinforcement learning. To do this, the authors use a simple but elegant paradigm that includes choices about whether to seek both information and rewards. They then assess the evidence for active inference and reinforcement learning models of choice behaviour, respectively. After demonstrating that active inference provides a better explanation of behavioural responses, the neuronal correlates of epistemic and instrumental value (under an optimised active inference model) are characterised using EEG. Significant neuronal correlates of both kinds of value were found in sensor and source space. The source space correlates are then discussed sensibly, in relation to the existing literature on the functional anatomy of perceptual and instrumental decision-making under uncertainty.

      Strengths:

      The strengths of this work rest upon the theoretical underpinnings and careful deconstruction of the various determinants of choice behaviour using active inference. A particular strength here is that the experimental paradigm is designed carefully to elicit both information-seeking and reward-seeking behaviour; where the information-seeking is itself separated into resolving uncertainty about the context (i.e., latent states) and the contingencies (i.e., latent parameters), under which choices are made. In other words, the paradigm - and its subsequent modelling - addresses both inference and learning as necessary belief and knowledge-updating processes that underwrite decisions.

      The authors were then able to model belief updating using active inference and then look for the neuronal correlates of the implicit planning or policy selection. This speaks to a further strength of this study; it provides some construct validity for the modelling of belief updating and decision-making; in terms of the functional anatomy as revealed by EEG. Empirically, the source space analysis of the neuronal correlates licences some discussion of functional specialisation and integration at various stages in the choices and decision-making.

      In short, the strengths of this work rest upon a (first) principles account of decision-making under uncertainty in terms of belief updating that allows them to model or fit choice behaviour in terms of Bayesian belief updating - and then use relatively state-of-the-art source reconstruction to examine the neuronal correlates of the implicit cognitive processing.

      Response: We are deeply grateful for your careful review of our work and for the thoughtful feedback you have provided. Your dedication to ensuring the quality and clarity of the work is truly admirable. Your comments have been invaluable in guiding us towards improving the paper, and We appreciate your time and effort in not just offering suggestions but also providing specific revisions that I can implement. Your insights have helped us identify areas where I can strengthen the arguments and clarify the methodology.

      Comment 1:

      The main weaknesses of this report lies in the communication of the ideas and procedures. Although the language is generally excellent, there are some grammatical lapses that make the text difficult to read. More importantly, the authors are not consistent in their use of some terms; for example, uncertainty and information gain are sometimes conflated in a way that might confuse readers. Furthermore, the descriptions of the modelling and data analysis are incomplete. These shortcomings could be addressed in the following way.

      First, it would be useful to unpack the various interpretations of information and goal-seeking offered in the (active inference) framework examined in this study. For example, it will be good to include the following paragraph:

      "In contrast to behaviourist approaches to planning and decision-making, active inference formulates the requisite cognitive processing in terms of belief updating in which choices are made based upon their expected free energy. Expected free energy can be regarded as a universal objective function, specifying the relative likelihood of alternative choices. In brief, expected free energy can be regarded as the surprise expected following some action, where the expected surprise comes in two flavours. First, the expected surprise is uncertainty, which means that policies with a low expected free energy resolve uncertainty and promote information seeking. However, one can also minimise expected surprise by avoiding surprising, aversive outcomes. This leads to goal-seeking behaviour, where the goals can be regarded as prior preferences or rewarding outcomes.

      Technically, expected free energy can be expressed in terms of risk plus ambiguity - or rearranged to be expressed in terms of expected information gain plus expected value, where value corresponds to (log) prior preferences. We will refer to both decompositions in what follows; noting that both decompositions accommodate information and goal-seeking imperatives. That is, resolving ambiguity and maximising information gain have epistemic value, while minimising risk or maximising expected value have pragmatic or instrumental value. These two kinds of values are sometimes referred to in terms of intrinsic and extrinsic value, respectively [1-4]."

      Response 1: We deeply thank you for your comments and corresponding suggestions about our interpretations of active inference. In response to your identified weaknesses and suggestions, we have added corresponding paragraphs in the Methods section (The free energy principle and active inference, line 95-106):

      “Active inference formulates the necessary cognitive processing as a process of belief updating, where choices depend on agents' expected free energy. Expected free energy serves as a universal objective function, guiding both perception and action. In brief, expected free energy can be seen as the expected surprise following some policies. The expected surprise can be reduced by resolving uncertainty, and one can select policies with lower expected free energy which can encourage information-seeking and resolve uncertainty. Additionally, one can minimize expected surprise by avoiding surprising or aversive outcomes (oudeyer et al., 2007; Schmidhuber et al., 2010). This leads to goal-seeking behavior, where goals can be viewed as prior preferences or rewarding outcomes.

      Technically, expected free energy can also be expressed as expected information gain plus expected value, where the value corresponds to (log) prior preferences. We will refer to both formulations in what follows. Resolving ambiguity, minimizing risk, and maximizing information gain has epistemic value while maximizing expected value have pragmatic or instrumental value. These two types of values can be referred to in terms of intrinsic and extrinsic value, respectively (Barto et al., 2013; Schwartenbeck et al., 2019).”

      Oudeyer, P. Y., & Kaplan, F. (2007). What is intrinsic motivation? A typology of computational approaches. Frontiers in neurorobotics, 1, 108.

      Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE transactions on autonomous mental development, 2(3), 230-247.

      Barto, A., Mirolli, M., & Baldassarre, G. (2013). Novelty or surprise?. Frontiers in psychology, 4, 61898.

      Schwartenbeck, P., Passecker, J., Hauser, T. U., FitzGerald, T. H., Kronbichler, M., & Friston, K. J. (2019). Computational mechanisms of curiosity and goal-directed exploration. elife, 8, e41703.

      Comment 2:

      The description of the modelling of choice behaviour needs to be unpacked and motivated more carefully. Perhaps along the following lines:

      "To assess the evidence for active inference over reinforcement learning, we fit active inference and reinforcement learning models to the choice behaviour of each subject. Effectively, this involved optimising the free parameters of active inference and reinforcement learning models to maximise the likelihood of empirical choices. The resulting (marginal) likelihood was then used as the evidence for each model. The free parameters for the active inference model scaled the contribution of the three terms that constitute the expected free energy (in Equation 6). These coefficients can be regarded as precisions that characterise each subjects' prior beliefs about contingencies and rewards. For example, increasing the precision or the epistemic value associated with model parameters means the subject would update her beliefs about reward contingencies more quickly than a subject who has precise prior beliefs about reward distributions. Similarly, subjects with a high precision over prior preferences or extrinsic value can be read as having more precise beliefs that she will be rewarded. The free parameters for the reinforcement learning model included..."

      Response 2: We deeply thank you for your comments and corresponding suggestions about our description of the behavioral modelling. In response to your identified weaknesses and suggestions, we have added corresponding content in the Results section (Behavioral results, line 279-293):

      “To assess the evidence for active inference over reinforcement learning, we fit active inference (Eq.9), model-free reinforcement learning, and model-based reinforcement learning models to the behavioral data of each participant. This involved optimizing the free parameters of active inference and reinforcement learning models. The resulting likelihood was used to calculate the Bayesian Information Criterion (BIC) (Vrieze 2012) as the evidence for each model. The free parameters for the active inference model (AL, AI, EX, prior, and α) scaled the contribution of the three terms that constitute the expected free energy in Eq.9. These coefficients can be regarded as precisions that characterize each participant's prior beliefs about contingencies and rewards. For example, increasing α means participants would update their beliefs about reward contingencies more quickly, increasing AL means participants would like to reduce ambiguity more, and increasing AI means participants would like to learn the hidden state of the environment and avoid risk more. The free parameters for the model-free reinforcement learning model are the learning rate α and the temperature parameter γ and the free parameters for the model-based are the learning rate α, the temperature parameter γ and prior (the details for the model-free reinforcement learning model can be seen in Eq.S1-11 and the details for the model-based reinforcement learning model can be seen Eq.S12-23 in the Supplementary Method). The parameter fitting for these three models was conducted using the `BayesianOptimization' package in Python (Frazire 2018), first randomly sampling 1000 times and then iterating for an additional 1000 times.”

      Vrieze, S. I. (2012). Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychological methods, 17(2), 228.

      Frazier, P. I. (2018). A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811.

      Comment 3:

      In terms of the time-dependent correlations with expected free energy - and its constituent terms - I think the report would benefit from overviewing these analyses with something like the following:

      "In the final analysis of the neuronal correlates of belief updating - as quantified by the epistemic and intrinsic values of expected free energy - we present a series of analyses in source space. These analyses tested for correlations between constituent terms in expected free energy and neuronal responses in source space. These correlations were over trials (and subjects). Because we were dealing with two-second timeseries, we were able to identify the periods of time during decision-making when the correlates were expressed.

      In these analyses, we focused on the induced power of neuronal activity at each point in time, at each brain source. To illustrate the functional specialisation of these neuronal correlates, we present whole-brain maps of correlation coefficients and pick out the most significant correlation for reporting fluctuations in selected correlations over two-second periods. These analyses are presented in a descriptive fashion to highlight the nature and variety of the neuronal correlates, which we unpack in relation to the existing EEG literature in the discussion. Note that we did not attempt to correct for multiple comparisons; largely, because the correlations observed were sustained over considerable time periods, which would be almost impossible under the null hypothesis of no correlations."

      Response 3: We deeply thank you for your comments and corresponding suggestions about our description of the regression analysis in the source space. In response to your suggestions, we have added corresponding content in the Results section (EEG results at source level, line 331-347):

      “In the final analysis of the neural correlates of the decision-making process, as quantified by the epistemic and intrinsic values of expected free energy, we presented a series of linear regressions in source space. These analyses tested for correlations over trials between constituent terms in expected free energy (the value of avoiding risk, the value of reducing ambiguity, extrinsic value, and expected free energy itself) and neural responses in source space. Additionally, we also investigated the neural correlate of (the degree of) risk, (the degree of) ambiguity, and prediction error. Because we were dealing with a two-second time series, we were able to identify the periods of time during decision-making when the correlates were expressed. The linear regression was run by the "mne.stats.linear regression" function in the MNE package (Activity ~ Regressor + Intercept). Activity is the activity amplitude of the EEG signal in the source space and regressor is one of the regressors that we mentioned (e.g., expected free energy, the value of reducing ambiguity, etc.).

      In these analyses, we focused on the induced power of neural activity at each time point, in the brain source space. To illustrate the functional specialization of these neural correlates, we presented whole-brain maps of correlation coefficients and picked out the brain region with the most significant correlation for reporting fluctuations in selected correlations over two-second periods. These analyses were presented in a descriptive fashion to highlight the nature and variety of the neural correlates, which we unpacked in relation to the existing EEG literature in the discussion. Note that we did not attempt to correct for multiple comparisons; largely, because the correlations observed were sustained over considerable time periods, which would be almost impossible under the null hypothesis of no correlations.”

      Comment 4:

      There was a slight misdirection in the discussion of priors in the active inference framework. The notion that active inference requires a pre-specification of priors is a common misconception. Furthermore, it misses the point that the utility of Bayesian modelling is to identify the priors that each subject brings to the table. This could be easily addressed with something like the following in the discussion:

      "It is a common misconception that Bayesian approaches to choice behaviour (including active inference) are limited by a particular choice of priors. As illustrated in our fitting of choice behaviour above, priors are a strength of Bayesian approaches in the following sense: under the complete class theorem [5, 6], any pair of choice behaviours and reward functions can be described in terms of ideal Bayesian decision-making with particular priors. In other words, there always exists a description of choice behaviour in terms of some priors. This means that one can, in principle, characterise any given behaviour in terms of the priors that explain that behaviour. In our example, these were effectively priors over the precision of various preferences or beliefs about contingencies that underwrite expected free energy."

      Response 4: We deeply thank you for your comments and corresponding suggestions about the prior of Bayesian methods. In response to your suggestions, we have added corresponding content in the Discussion section (The strength of the active inference framework in decision-making, line 447-453):

      “However, it may be the opposite. As illustrated in our fitting results, priors can be a strength of Bayesian approaches. Under the complete class theorem (Wald 1947; Brown 1981), any pair of behavioral data and reward functions can be described in terms of ideal Bayesian decision-making with particular priors. In other words, there always exists a description of behavioral data in terms of some priors. This means that one can, in principle, characterize any given behavioral data in terms of the priors that explain that behavior. In our example, these were effectively priors over the precision of various preferences or beliefs about contingencies that underwrite expected free energy.”

      Wald, A. (1947). An essentially complete class of admissible decision functions. The Annals of Mathematical Statistics, 549-555.

      Brown, L. D. (1981). A complete class theorem for statistical problems with finite sample spaces. The Annals of Statistics, 1289-1300.

      Reviewer #2 (Public Review):

      Summary:

      Zhang and colleagues use a combination of behavioral, neural, and computational analyses to test an active inference model of exploration in a novel reinforcement learning task.

      Strengths:

      The paper addresses an important question (validation of active inference models of exploration). The combination of behavior, neuroimaging, and modeling is potentially powerful for answering this question.

      Response: We want to express our sincere gratitude for your thorough review of our work and for the valuable comments you have provided. Your attention to detail and dedication to improving the quality of the work are truly commendable. Your feedback has been invaluable in guiding us towards revisions that will strengthen the work. We have made targeted modifications based on most of the comments. However, due to factors such as time and energy constraints, we have not added corresponding analyses for several comments.

      Comment 1:

      The paper does not discuss relevant work on contextual bandits by Schulz, Collins, and others. It also does not mention the neuroimaging study of Tomov et al. (2020) using a risky/safe bandit task.

      Response 1:

      We deeply thank you for your suggestions about the relevant work. We now discussion and cite these representative papers in the Introduction section (line 42-55):

      “The decision-making process frequently involves grappling with varying forms of uncertainty, such as ambiguity - the kind of uncertainty that can be reduced through sampling, and risk - the inherent uncertainty (variance) presented by a stable environment. Studies have investigated these different forms of uncertainty in decision-making, focusing on their neural correlates (Daw et al., 2006; Badre et al., 2012; Cavanagh et al., 2012).

      These studies utilized different forms of multi-armed bandit tasks, e.g the restless multi-armed bandit tasks (Daw et al., 2006; Guha et al., 2010), risky/safe bandit tasks (Tomov et al., 2020; Fan et al., 2022; Payzan et al., 2013), contextual multi-armed bandit tasks (Schulz et al., 2015; Schulz et al., 2015; Molinaro et al., 2023). However, these tasks either separate risk from ambiguity in uncertainty, or separate action from state (perception). In our work, we develop a contextual multi-armed bandit task to enable participants to actively reduce ambiguity, avoid risk, and maximize rewards using various policies (see Section 2.2) and Figure 4(a)). Our task makes it possible to study whether the brain represents these different types of uncertainty distinctly (Levy et al., 2010) and whether the brain represents both the value of reducing uncertainty and the degree of uncertainty. The active inference framework presents a theoretical approach to investigate these questions. Within this framework, uncertainties can be reduced to ambiguity and risk. Ambiguity is represented by the uncertainty about model parameters associated with choosing a particular action, while risk is signified by the variance of the environment's hidden states. The value of reducing ambiguity, the value of avoiding risk, and extrinsic value together constitute expected free energy (see Section 2.1).”

      Daw, N. D., O'doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876-879.

      Badre, D., Doll, B. B., Long, N. M., & Frank, M. J. (2012). Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron, 73(3), 595-607.

      Cavanagh, J. F., Figueroa, C. M., Cohen, M. X., & Frank, M. J. (2012). Frontal theta reflects uncertainty and unexpectedness during exploration and exploitation. Cerebral cortex, 22(11), 2575-2586.

      Guha, S., Munagala, K., & Shi, P. (2010). Approximation algorithms for restless bandit problems. Journal of the ACM (JACM), 58(1), 1-50.

      Tomov, M. S., Truong, V. Q., Hundia, R. A., & Gershman, S. J. (2020). Dissociable neural correlates of uncertainty underlie different exploration strategies. Nature communications, 11(1), 2371.

      Fan, H., Gershman, S. J., & Phelps, E. A. (2023). Trait somatic anxiety is associated with reduced directed exploration and underestimation of uncertainty. Nature Human Behaviour, 7(1), 102-113.

      Payzan-LeNestour, E., Dunne, S., Bossaerts, P., & O’Doherty, J. P. (2013). The neural representation of unexpected uncertainty during value-based decision making. Neuron, 79(1), 191-201.

      Schulz, E., Konstantinidis, E., & Speekenbrink, M. (2015, April). Exploration-exploitation in a contextual multi-armed bandit task. In International conference on cognitive modeling (pp. 118-123).

      Schulz, E., Konstantinidis, E., & Speekenbrink, M. (2015, November). Learning and decisions in contextual multi-armed bandit tasks. In CogSci.

      Molinaro, G., & Collins, A. G. (2023). Intrinsic rewards explain context-sensitive valuation in reinforcement learning. PLoS Biology, 21(7), e3002201.

      Levy, I., Snell, J., Nelson, A. J., Rustichini, A., & Glimcher, P. W. (2010). Neural representation of subjective value under risk and ambiguity. Journal of neurophysiology, 103(2), 1036-1047.

      Comment 2:

      The statistical reporting is inadequate. In most cases, only p-values are reported, not the relevant statistics, degrees of freedom, etc. It was also not clear if any corrections for multiple comparisons were applied. Many of the EEG results are described as "strong" or "robust" with significance levels of p<0.05; I am skeptical in the absence of more details, particularly given the fact that the corresponding plots do not seem particularly strong to me.

      Response 2: We deeply thank you for your comments about our statistical reporting. We have optimized the fitting model and rerun all the statistical analyses. As can be seen (Figure 6, 7, 8, S3, S4, S5), the new regression results are significantly improved compared to the previous ones. Due to the limitation of space, we place the other relevant statistical results, including t-values, std err, etc., on our GitHub (https://github.com/andlab-um/FreeEnergyEEG). Currently, we have not conducted multiple comparison corrections based on Reviewer 1’s comments (Comments 3) “Note that we did not attempt to correct for multiple comparisons; largely, because the correlations observed were sustained over considerable time periods, which would be almost impossible under the null hypothesis of no correlations”.

      Author response image 1.

      Comment 3:

      The authors compare their active inference model to a "model-free RL" model. This model is not described anywhere, as far as I can tell. Thus, I have no idea how it was fit, how many parameters it has, etc. The active inference model fitting is also not described anywhere. Moreover, you cannot compare models based on log-likelihood, unless you are talking about held-out data. You need to penalize for model complexity. Finally, even if active inference outperforms a model-free RL model (doubtful given the error bars in Fig. 4c), I don't see how this is strong evidence for active inference per se. I would want to see a much more extensive model comparison, including model-based RL algorithms which are not based on active inference, as well as model recovery analyses confirming that the models can actually be distinguished on the basis of the experimental data.

      Response 3: We deeply thank you for your comments about the model comparison details. We previously omitted some information about the comparison model, as classical reinforcement learning is not the focus of our work, so we put the specific details in the supplementary materials. Now we have placed relevant information in the main text (see the part we have highlighted in yellow). We have now added the relevant information regarding the model comparison in the Results section (Behavioral results, line 279-293):

      “To assess the evidence for active inference over reinforcement learning, we fit active inference (Eq.9), model-free reinforcement learning, and model-based reinforcement learning models to the behavioral data of each participant. This involved optimizing the free parameters of active inference and reinforcement learning models. The resulting likelihood was used to calculate the Bayesian Information Criterion (BIC) as the evidence for each model. The free parameters for the active inference model (AL, AI, EX, prior, and α) scaled the contribution of the three terms that constitute the expected free energy in Eq.9. These coefficients can be regarded as precisions that characterize each participant's prior beliefs about contingencies and rewards. For example, increasing α means participants would update their beliefs about reward contingencies more quickly, increasing AL means participants would like to reduce ambiguity more, and increasing AI means participants would like to learn the hidden state of the environment and avoid risk more. The free parameters for the model-free reinforcement learning model are the learning rate α and the temperature parameter γ and the free parameters for the model-based are the learning rate α, the temperature parameter γ and prior (the details for the model-free reinforcement learning model can be found in Eq.S1-11 and the details for the model-based reinforcement learning model can be found in Eq.S12-23 in the Supplementary Method). The parameter fitting for these three models was conducted using the `BayesianOptimization' package in Python, first randomly sampling 1000 times and then iterating for an additional 1000 times.”

      We have now incorporated model-based reinforcement learning into our comparison models and placed the descriptions of both model-free and model-based reinforcement learning algorithms in the supplementary materials. We have also changed the criterion for model comparison to Bayesian Information Criterion. As indicated by the results, the performance of the active inference model significantly outperforms both comparison models.

      Sorry, we didn't do model recovery before, but now we have placed the relevant results in the supplementary materials. From the result figures, we can see that each model fits its own generated simulated data well:

      “To demonstrate how reliable our models are (the active inference model, model-free reinforcement learning model, and model-based reinforcement learning model), we run some simulation experiments for model recovery. We use these three models, with their own fitting parameters, to generate some simulated data. Then we will fit all three sets of data using these three models.

      The model recovery results are shown in Fig.S6. This is the confusion matrix of models: the percentage of all subjects simulated based on a certain model that is fitted best by a certain model. The goodness-of-fit was compared using the Bayesian Information Criterion. We can see that the result of model recovery is very good, and the simulated data generated by a model can be best explained by this model.”

      Author response image 2.

      Comment 4:

      Another aspect of the behavioral modeling that's missing is a direct descriptive comparison between model and human behavior, beyond just plotting log-likelihoods (which are a very impoverished measure of what's going on).

      Response 4: We deeply thank you for your comments about the comparison between the model and human behavior. Due to the slight differences between our simulation experiments and real behavioral experiments (the "you can ask" stage), we cannot directly compare the model and participants' behaviors. However, we can observe that in the main text's simulation experiment (Figure 3), the active inference agent's behavior is highly consistent with humans (Figure 4), exhibiting an effective exploration strategy and a desire to reduce uncertainty. Moreover, we have included two additional simulation experiments in the supplementary materials, which demonstrate that active inference may potentially fit a wide range of participants' behavioral strategies.

      Author response image 3.

      (An active inference agent with AL=AI=EX=0. It can accomplish tasks efficiently like a human being, reducing the uncertainty of the environment and maximizing the reward.)

      Author response image 4.

      (An active inference agent with AL=AI=0, EX=10. It will only pursue immediate rewards (not choosing the "Cue" option due to additional costs), but it can also gradually optimize its strategy due to random effects.)

      Author response image 5.

      (An active inference agent with EX=0, AI=AL=10. It will only pursue environmental information to reduce the uncertainty of the environment. Even in "Context 2" where immediate rewards are scarce, it will continue to explore.) (a) shows the decision-making of active inference agents in the Stay-Cue choice. Blue corresponds to agents choosing the "Cue" option and acquiring "Context 1"; orange corresponds to agents choosing the "Cue" option and acquiring "Context 2"; purple corresponds to agents choosing the "Stay" option and not knowing the information about the hidden state of the environment. The shaded areas below correspond to the probability of the agents making the respective choices. (b) shows the decision-making of active inference agents in the Stay-Cue choice. The shaded areas below correspond to the probability of the agents making the respective choices. (c) shows the rewards obtained by active inference agents. (d) shows the reward prediction errors of active inference agents. (e) shows the reward predictions of active inference agents for the "Risky" path in "Context 1" and "Context 2".

      Comment 5:

      The EEG results are intriguing, but it wasn't clear that these provide strong evidence specifically for the active inference model. No alternative models of the EEG data are evaluated.

      Overall, the central claim in the Discussion ("we demonstrated that the active inference model framework effectively describes real-world decision-making") remains unvalidated in my opinion.

      Response 5: We deeply thank you for your comments. We applied the active inference model to analyze EEG results because it best fit the participants' behavioral data among our models, including the new added results. Further, our EEG results serve only to verify that the active inference model can be used to analyze the neural mechanisms of decision-making in uncertain environments (if possible, we could certainly design a more excellent reinforcement learning model with a similar exploration strategy). We aim to emphasize the consistency between active inference and human decision-making in uncertain environments, as we have discussed in the article. Active inference emphasizes both perception and action, which is also what we wish to highlight: during the decision-making process, participants not only passively receive information, but also actively adopt different strategies to reduce uncertainty and maximize rewards.

      Reviewer #3 (Public Review):

      Summary:

      This paper aims to investigate how the human brain represents different forms of value and uncertainty that participate in active inference within a free-energy framework, in a two-stage decision task involving contextual information sampling, and choices between safe and risky rewards, which promotes a shift from exploration to exploitation. They examine neural correlates by recording EEG and comparing activity in the first vs second half of trials and between trials in which subjects did and did not sample contextual information, and perform a regression with free-energy-related regressors against data "mapped to source space." Their results show effects in various regions, which they take to indicate that the brain does perform this task through the theorised active inference scheme.

      Strengths:

      This is an interesting two-stage paradigm that incorporates several interesting processes of learning, exploration/exploitation, and information sampling. Although scalp/brain regions showing sensitivity to the active-inference-related quantities do not necessarily suggest what role they play, it can be illuminating and useful to search for such effects as candidates for further investigation. The aims are ambitious, and methodologically it is impressive to include extensive free-energy theory, behavioural modelling, and EEG source-level analysis in one paper.

      Response: We would like to express our heartfelt thanks to you for carefully reviewing our work and offering insightful feedback. Your attention to detail and commitment to enhancing the overall quality of our work are deeply admirable. Your input has been extremely helpful in guiding us through the necessary revisions to enhance the work. We have implemented focused changes based on a majority of your comments. Nevertheless, owing to limitations such as time and resources, we have not included corresponding analyses for a few comments.

      Comment 1:

      Though I could surmise the above general aims, I could not follow the important details of what quantities were being distinguished and sought in the EEG and why. Some of this is down to theoretical complexity - the dizzying array of constructs and terms with complex interrelationships, which may simply be part and parcel of free-energy-based theories of active inference - but much of it is down to missing or ambiguous details.

      Response 1: We deeply thank you for your comments about our work’s readability. We have significantly revised the descriptions of active inference, models, research questions, etc. Focusing on active inference and the free energy principle, we have added relevant basic descriptions and unified the terminology. We have added information related to model comparison in the main text and supplementary materials. We presented our regression results in clearer language. Our research focused on the brain's representation of decision-making in uncertain environments, including expected free energy, the value of reducing ambiguity, the value of avoiding risk, extrinsic value, ambiguity, and risk.

      Comment 2:

      In general, an insufficient effort has been made to make the paper accessible to readers not steeped in the free energy principle and active inference. There are critical inconsistencies in key terminology; for example, the introduction states that aim 1 is to distinguish the EEG correlates of three different types of uncertainty: ambiguity, risk, and unexpected uncertainty. But the abstract instead highlights distinctions in EEG correlates between "uncertainty... and... risk" and between "expected free energy .. and ... uncertainty." There are also inconsistencies in mathematical labelling (e.g. in one place 'p(s|o)' and 'q(s)' swap their meanings from one sentence to the very next).

      Response 2: We deeply thank you for your comments about the problem of inconsistent terminology. First, we have unified the symbols and letters (P, Q, s, o, etc.) that appeared in the article and described their respective meanings more clearly. We have also revised the relevant expressions of "uncertainty" throughout the text. In our work, uncertainty refers to ambiguity and risk. Ambiguity can be reduced through continuous sampling and is referred to as uncertainty about model parameters in our work. Risk, on the other hand, is the inherent variance of the environment and cannot be reduced through sampling, which is referred to as uncertainty about hidden states in our work. In the analysis of the results, we focused on how the brain encodes the value of reducing ambiguity (Figure 8), the value of avoiding risk (Figure 6), and (the degree of) ambiguity (Figure S5) during action selection. We also analyzed how the brain encodes reducing ambiguity and avoiding risk during belief update (Figure 7).

      Comment 3:

      Some basic but important task information is missing, and makes a huge difference to how decision quantities can be decoded from EEG. For example:

      - How do the subjects press the left/right buttons - with different hands or different fingers on the same hand?

      Response 3: We deeply thank you for your comments about the missing task information. We have added the relevant content in the Methods section (Contextual two-armed bandit task and Data collection, line 251-253):

      “Each stage was separated by a jitter ranging from 0.6 to 1.0 seconds. The entire experiment consists of a single block with a total of 120 trials. The participants are required to use any two fingers of one hand to press the buttons (left arrow and right arrow on the keyboard).”

      Comment 4:

      - Was the presentation of the Stay/cue and safe/risky options on the left/right sides counterbalanced? If not, decisions can be formed well in advance especially once a policy is in place.

      Response 4: The presentation of the Stay/cue and safe/risky options on the left/right sides was not counterbalanced. It is true that participants may have made decisions ahead of time. However, to better study the state of participants during decision-making, our choice stages consist of two parts. In the first two seconds, we ask participants to consider which option they would choose, and after these two seconds, participants are allowed to make their choice (by pressing the button).

      We also updated the figure of the experiment procedure as below (We circled the time that the participants spent on making decisions).

      Author response image 6.

      Comment 5:

      - What were the actual reward distributions ("magnitude X with probability p, magnitude y with probability 1-p") in the risky option?

      Response 5: We deeply thank you for your comments about the missing task information. We have placed the relevant content in the Methods section (Contextual two-armed bandit task and Data collection, line 188-191):

      “The actual reward distribution of the risky path in "Context 1" was [+12 (55%), +9 (25%), +6 (10%), +3 (5%), +0 (5%)] and the actual reward distribution of the risky path in "Context 2" was [+12 (5%), +9 (5%), +6 (10%), +3 (25%), +0 (55%)].”

      Comment 6:

      The EEG analysis is not sufficiently detailed and motivated.

      For example,

      - why the high lower-filter cutoff of 1 Hz, and shouldn't it be acknowledged that this removes from the EEG any sustained, iteratively updated representation that evolves with learning across trials?

      Response 6: We deeply thank you for your comments about our EEG analysis. The 1Hz high-pass filter may indeed filter out some useful information. We chose a 1Hz high-pass filter to filter out most of the noise and prevent the noise from affecting our results analysis. Additionally, there are also many decision-related works that have applied 1Hz high-pass filtering in EEG data preprocessing (Yau et al., 2021; Cortes et al., 2021; Wischnewski et al., 2022; Schutte et al., 2017; Mennella et al., 2020; Giustiniani et al., 2020).

      Yau, Y., Hinault, T., Taylor, M., Cisek, P., Fellows, L. K., & Dagher, A. (2021). Evidence and urgency related EEG signals during dynamic decision-making in humans. Journal of Neuroscience, 41(26), 5711-5722.

      Cortes, P. M., García-Hernández, J. P., Iribe-Burgos, F. A., Hernández-González, M., Sotelo-Tapia, C., & Guevara, M. A. (2021). Temporal division of the decision-making process: An EEG study. Brain Research, 1769, 147592.

      Wischnewski, M., & Compen, B. (2022). Effects of theta transcranial alternating current stimulation (tACS) on exploration and exploitation during uncertain decision-making. Behavioural Brain Research, 426, 113840.

      Schutte, I., Kenemans, J. L., & Schutter, D. J. (2017). Resting-state theta/beta EEG ratio is associated with reward-and punishment-related reversal learning. Cognitive, Affective, & Behavioral Neuroscience, 17, 754-763.

      Mennella, R., Vilarem, E., & Grèzes, J. (2020). Rapid approach-avoidance responses to emotional displays reflect value-based decisions: Neural evidence from an EEG study. NeuroImage, 222, 117253.

      Giustiniani, J., Nicolier, M., Teti Mayer, J., Chabin, T., Masse, C., Galmès, N., ... & Gabriel, D. (2020). Behavioral and neural arguments of motivational influence on decision making during uncertainty. Frontiers in Neuroscience, 14, 583.

      Comment 7:

      - Since the EEG analysis was done using an array of free-energy-related variables in a regression, was multicollinearity checked between these variables?

      Response 7: We deeply thank you for your comments about our regression. Indeed, we didn't specify our regression formula in the main text. We conducted regression on one variable each time, so there was no need for a multicollinearity check. We have now added the relevant content in the Results section (“EEG results at source level” section, line 337-340):

      “The linear regression was run by the "mne.stats.linear regression" function in the MNE package (Activity ~ Regressor + Intercept). Activity is the activity amplitude of the EEG signal in the source space and regressor is one of the regressors that we mentioned (e.g., expected free energy, the value of reducing ambiguity, etc.).”

      Comment 8:

      - In the initial comparison of the first/second half, why just 5 clusters of electrodes, and why these particular clusters?

      Response 8: We deeply thank you for your comments about our sensor-level analysis. These five clusters are relatively common scalp EEG regions to analyze (left frontal, right frontal, central, left parietal, and right parietal), and we referred previous work analyzed these five clusters of electrodes (Laufs et al., 2006; Ray et al., 1985; Cole et al., 1985). In addition, our work pays more attention to the analysis in source space, exploring the corresponding functions of specific brain regions based on active inference models.

      Laufs, H., Holt, J. L., Elfont, R., Krams, M., Paul, J. S., Krakow, K., & Kleinschmidt, A. (2006). Where the BOLD signal goes when alpha EEG leaves. Neuroimage, 31(4), 1408-1418.

      Ray, W. J., & Cole, H. W. (1985). EEG activity during cognitive processing: influence of attentional factors. International Journal of Psychophysiology, 3(1), 43-48.

      Cole, H. W., & Ray, W. J. (1985). EEG correlates of emotional tasks related to attentional demands. International Journal of Psychophysiology, 3(1), 33-41.

      Comment 9:

      How many different variables are systematically different in the first vs second half, and how do you rule out less interesting time-on-task effects such as engagement or alertness? In what time windows are these amplitudes being measured?

      Response 9 (and the Response for Weaknesses 11): There were no systematic differences between the first half and the second half of the trials, with the only difference being the participants' experience. In the second half, participants had a better understanding of the reward distribution of the task (less ambiguity). The simulation results can well describe these.

      Author response image 7.

      As shown in Figure (a), agents can only learn about the hidden state of the environment ("Context 1" (green) or "Context 2" (orange)) by choosing the "Cue" option. If agents choose the "Stay" option, they will not be able to know the hidden state of the environment (purple). The risk of agents is only related to wh

      ether they choose the "Cue" option, not the number of rounds. Figure (b) shows the Safe-Risky choices of agents, and Figure (e) is the reward prediction of agents for the "Risky" path in "Context 1" and "Context 2". We can see that agents update the expected reward and reduce ambiguity by sampling the "Risky" path. The ambiguity of agents is not related to the "Cue" option, but to the number of times they sample the "Risky" path (rounds).

      In our choosing stages, participants were required to think about their choices for the first two seconds (during which they could not press buttons). Then, they were asked to make their choices (press buttons) within the next two seconds. This setup effectively kept participants' attention focused on the task. And the two second during the “Second choice” stage when participants decide which option to choose (they cannot press buttons) are measured for the analysis of the sensor-level results.

      Comment 10:

      In the comparison of asked and not-asked trials, what trial stage and time window is being measured?

      Response 10: We have added relevant descriptions in the main text. The two second during the “Second choice” stage when participants decide which option to choose (they cannot press buttons) are measured for the analysis of the sensor-level results.

      Author response image 8.

      Comment 11:

      Again, how many different variables, of the many estimated per trial in the active inference model, are different in the asked and not-asked trials, and how can you know which of these differences is the one reflected in the EEG effects?

      Response 11: The difference between asked trials and not-asked trials lies only in whether participants know the specific context of the risky path (the level of risk for the participants). A simple comparison indeed cannot tell us which of these differences is reflected in the EEG effects. Therefore, we subsequently conducted model-based regression analysis in the source space.

      Comment 12:

      The authors choose to interpret that on not-asked trials the subjects are more uncertain because the cue doesn't give them the context, but you could equally argue that they don't ask because they are more certain of the possible hidden states.

      Response 12: Our task design involves randomly varying the context of the risky path. Only by choosing to inquire can participants learn about the context. Participants can only become increasingly certain about the reward distribution of different contexts of the risky path, but cannot determine which specific context it is. Here are the instructions for the task that we will tell the participants (line 226-231).

      "You are on a quest for apples in a forest, beginning with 5 apples. You encounter two paths: 1) The left path offers a fixed yield of 6 apples per excursion. 2) The right path offers a probabilistic reward of 0/3/6/9/12 apples, and it has two distinct contexts, labeled "Context 1" and "Context 2," each with a different reward distribution. Note that the context associated with the right path will randomly change in each trial. Before selecting a path, a ranger will provide information about the context of the right path ("Context 1" or "Context 2") in exchange for an apple. The more apples you collect, the greater your monetary reward will be."

      Comment 13:

      - The EEG regressors are not fully explained. For example, an "active learning" regressor is listed as one of the 4 at the beginning of section 3.3, but it is the first mention of this term in the paper and the term does not arise once in the methods.

      Response 13: We have accordingly revised the relevant content in the main text (as in Eq.8). Our regressors now include expected free energy, the value of reducing ambiguity, the value of avoiding risk, extrinsic value, prediction error, (the degree of) ambiguity, reducing ambiguity, and avoiding risk.

      Comment 14:

      - In general, it is not clear how one can know that the EEG results reflect that the brain is purposefully encoding these very parameters while implementing this very mechanism, and not other, possibly simpler, factors that correlate with them since there is no engagement with such potential confounds or alternative models. For example, a model-free reinforcement learning model is fit to behaviour for comparison. Why not the EEG?

      Response 14: We deeply thank you for your comments. Due to factors such as time and effort, and because the active inference model best fits the behavioral data of the participants, we did not use other models to analyze the EEG data. At both the sensor and source level, we observed the EEG signal and brain regions that can encode different levels of uncertainties (risk and ambiguity). The brain's uncertainty driven exploration mechanism cannot be explained solely by a simple model-free reinforcement learning approach.

      Recommendations for the authors:

      Response: We have made point-to-point revisions according to the reviewer's recommendations, and as these revisions are relatively minor, we have only responded to the longer recommendations here.

      Reviewer #1 (Recommendations For The Authors)

      I enjoyed reading this sophisticated study of decision-making. I thought your implementation of active inference and the subsequent fitting to choice behaviour - and study of the neuronal (EEG) correlates - was impressive. As noted in my comments on strengths and weaknesses, some parts of your manuscript with difficult to read because of slight collapses in grammar and an inconsistent use of terms when referring to the mathematical quantities. In addition to the paragraphs I have suggested, I would recommend the following minor revisions to your text. In addition, you will have to fill in some of the details that were missing from the current version of the manuscript. For example:

      Recommendation 1:

      Which RL model did you use to fit the behavioural data? What were its free parameters?

      Response 1: We have now added information related to the comparison models in the behavioral results and supplementary materials. We applied both simple model-free reinforcement learning and model-based reinforcement learning. The free parameters for the model-free reinforcement learning model are the learning rate α and the temperature parameter γ, while the free parameters for the model-based approach are the learning rate α, the temperature parameter γ, and the prior.

      Recommendation 2:

      When you talk about neuronal activity in the final analyses (of time-dependent correlations) what was used to measure the neuronal activity? Was this global power over frequencies? Was it at a particular frequency band? Was it the maximum amplitude within some small window et cetera? In other words, you need to provide the details of your analysis that would enable somebody to reproduce your study at a certain level of detail.

      Response 2: In the final analyses, we used the activity amplitude at each point in the source space for our analysis. Previously, we had planned to make our data and models available on GitHub to facilitate easier replication of our work.

      Reviewer #3 (Recommendations For The Authors)

      Recommendation 1:

      It might help to explain the complex concepts up front, to use the concrete example of the task itself - presumably, it was designed so that the crucial elements of the active inference framework come to the fore. One could use hypothetical choice patterns in this task to exemplify different factors such as expected free energy and unexpected uncertainty at work. It would also be illuminating to explain why behaviour on this task is fit better by the active inference model than a model-free reinforcement learning model.

      Response 1: Thank you for your suggestions. We have given clearer explanations to the three terms in the active inference formula: the value of reducing ambiguity, the value of avoiding risk, and the extrinsic value (Eq.8), which makes it easier for readers to understand active inference.

      In addition, we can simply view active inference as a computational model similar to model-based reinforcement learning, where the expected free energy represents a subjective value, without needing to understand its underlying computational principles or neurobiological background. In our discussion, we have argued why the active inference model fits the participants' behavior better than our reinforcement learning model, as the active inference model has an inherent exploration mechanism that is consistent with humans, who instinctively want to reduce environmental uncertainty (line 435-442).

      “Active inference offers a superior exploration mechanism compared with basic model-free reinforcement learning  (Figure 4 (c)). Since traditional reinforcement learning models determine their policies solely on the state, this setting leads to difficulty in extracting temporal information (Laskin et al., 2020) and increases the likelihood of entrapment within local minima. In contrast, the policies in active inference are determined by both time and state. This dependence on time (Wang et al., 2016) enables policies to adapt efficiently, such as emphasizing exploration in the initial stages and exploitation later on. Moreover, this mechanism prompts more exploratory behavior in instances of state ambiguity. A further advantage of active inference lies in its adaptability to different task environments (Friston et al., 2017). It can configure different generative models to address distinct tasks, and compute varied forms of free energy and expected free energy.”

      Laskin, M., Lee, K., Stooke, A., Pinto, L., Abbeel, P., & Srinivas, A. (2020). Reinforcement learning with augmented data. Advances in neural information processing systems, 33, 19884-19895.

      Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., ... & Botvinick, M. (2016). Learning to reinforcement learn. arXiv preprint arXiv:1611.05763.

      Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., & Pezzulo, G. (2017). Active inference: a process theory. Neural computation, 29(1), 1-49.

      Recommendation 2:

      Figure 1A provides a key example of the lack of effort to help the reader understand. It suggests the possibility of a concrete example but falls short of providing one. From the caption and text, applied to the figure, I gather that by choosing either to run or to raise one's arms, one can control whether it is daytime or nighttime. This is clearly wrong but it is what I am led to think by the paper.

      Response 2: Thank you for your suggestion, which we had not considered before. In this figure, we aim to illustrate that "the agent receives observations and optimizes his cognitive model by minimizing variational free energy → the agent makes the optimal action by minimizing expected free energy → the action changes the environment → the environment generates new observations for the agent." We have now modified the image to be simpler to prevent any possible confusion for readers. Correspondingly, we removed the figure of a person raising their hand and the shadowed house in Figure a.

      Author response image 9.

      Recommendation 3:

      I recommend an overhaul in the labelling and methodological explanations for consistency and full reporting. For example, line 73 says sensory input is 's' and the cognitive model is 'q(s),' and the cause of the sensory input is 'p(s|o)' but on the very next line, the cognitive model is 'p(s|o)' and the causes of sensory input are 'q(s).' How this sensory input s relates to 'observations' or 'o' is unclear, and meanwhile, capital S is the set of environmental states. P seems to refer to the generative distribution, but it also means probability.

      Response 3: Thank you for your advice. Now we have revised the corresponding labeling and methodological explanations in our work to make them consistent. However, we are not sure how to make a good modification to P here. In many works, P can refer to a certain probability distribution or some specific probabilities.

      Recommendation 4:

      Even the conception of a "policy" is unclear (Figure 2B). They list 4 possible policies, which are simply the 4 possible sequences of steps, stay-safe, cue-risky, etc, but with no contingencies in them. Surely a complete policy that lists 'cue' as the first step would entail a specification of how they would choose the safe or risky option BASED on the information in that cue

      Response 4: Thank you for your suggestion. In active inference, a policy actually corresponds to a sequence of actions. The policy of "first choosing 'Cue' and then making the next decision based on specific information" differs from the meaning of policy in active inference.

      Recommendation 5:

      I assume that the heavy high pass filtering of the EEG (1 Hz) is to avoid having to baseline-correct the epochs (of which there is no mention), but the authors should directly acknowledge that this eradicates any component of decision formation that may evolve in any way gradually within or across the stages of the trial. To take an extreme example, as Figure 3E shows, the expected rewards for the risky path evolve slowly over the course of 60 trials. The filter would eliminate this.

      Response 5: Thank you for your suggestion. The heavy high pass filtering of the EEG (1 Hz) is to minimize the noise in the EEG data as much as possible.

      Recommendation 6:

      There is no mention of the regression itself in the Methods section - the section is incomplete.

      Response 6: Thank you for your suggestion. We have now added the relevant content in the Results section (EEG results at source level, line 337-340):

      “The linear regression was run by the "mne.stats.linear regression" function in the MNE package (Activity ∼ Regressor + Intercept, Activity is the activity amplitude of the EEG signal in the source space and regressor is one of the regressors that we mentioned).”

      Recommendation 7:

      On Lines 260-270 the same results are given twice.

      Response 7: Thank you for your suggestion. We have now deleted redundant content.

      Recommendation 8:

      Frequency bands are displayed in Figure 5 but there is no mention of those in the Methods. In Figure 5b Theta in the 2nd half is compared to Delta in the 1st half- is this an error?

      Response 8: Thank you for your suggestion. It indeed was an error (they should all be Theta) and now we have corrected it.

      Author response image 10.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Major points:

      R1C1: I appreciate that the data are aligned, in some points, with related studies of this niche. However, it would help the reader to have this alignment explored more extensively in the Discussion as well.

      Answer: We acknowledge that the discussion would benefit from additional comparisons to the available datasets. We thus add the following comment after the first paragraph of the discussion: “Previous studies of the different sub-populations of SVZ progenitors were carried out using transcriptomic approaches based on the expression of various more or less specific markers. These approaches have made it possible to identify quiescent and activated neural stem cells as well as mature neuroblasts, but have been faced with the strong influence of the cell cycle on cell clustering. Indeed, neural progenitors in these studies cycling have been gathered in either “mitotic” clusters (Llorens et al. 2015, Zywitza et al. 2018, Cebrian et al. 2021) or “neural progenitor cells” clusters (Dulken et al. 2017) that had no clear biological significance and hindering identification of subtypes of SVZ cycling progenitors. Our study, combining, for the first time, characterization of Facs-isolated cells and an irradiation-based model of sequential regeneration, allowed to clearly distinguish the molecular profiles of TAP and iNB among cycling progenitors reflecting differences in their in vitro and in vivo respective potentials”.

      R1C2: The data on multilineage differentiation, both in culture and upon engraftment, would be greatly strengthened by quantification. What is the relative yield of TUJ1/DCX-positive cells versus the other marker combinations? Specifically regarding the multilineage differentiation in vitro - because different media conditions are used to generate each lineage, it may be difficult to determine relative yield. Could a differentiation system that allows production of all 3 lineages be used instead?

      If the fraction of non-DCX/TUJ1-labeled progeny is low, particularly in vivo, this might suggest that while multilineage differentiation is possible, it is a much less likely cellular state outcome than production of mature neuroblasts. Some suggested references with examples of the culture conditions, experimental conditions, and discussions highlighted in the public review: Culture conditions that allow simultaneous trilineage differentiation. PMID: 17615304 Influence of culture conditions on potency: similar to issues covered in PMID: 21549325.

      Answer: We agree with the reviewer that quantification of a multilineage differentiation in vitro would improve the characterization of the relative potencies of the different SVZ progenitor.

      According to PMID: 17615304 and PMID: 21549325, and in agreement with our own experience, the only culture condition that allows neurosphere-derived neural progenitors to differentiate in vitro into the three lineages is the removal of mitogens from the culture medium. However, this does not work on freshly isolated SVZ cells, which remain in an undifferentiated state in this condition.

      This is why we chose to use specific differentiation media for each of the 3 lineages as in Figure 1C. It is also for this reason that we performed as many experiments as possible in vivo rather than in vitro as in Figure S2. In the new version, we have added a quantitative analysis of stainings by antibodies against GFAP, CNPase or DCX of GFP-positive cells persisting at IS, where high number of grafted cells were found in Figure S2B. This was performed by using the NIS software measuring eGFP-, GFAP-, CNPase- and DCX-positive areas. The intersection between each marker and eGFP areas was then determined as a percentage of staining (Figure S2C). The results showed that approximately one third of GFP+ cells expressed GFAP or DCX. The quantitative analysis of CNPase expression was complicated by CNPase-positive host cells, but the stronger CNPase staining in eGFP-positive areas clearly revealed the expression of CNPase by a significant proportion of eGFP-positive cells.

      R1C3: Additionally, for claims similar to what is currently made in the text, it would be extremely valuable to confirm the purity of the sort for each population - for example by fixing and staining the sorted fraction with additional antibodies that confirm cell identity.

      Answer: We have previously shown in Daynac et al. 2013 that s-iNB expressed the neuroblast markers CD24 and DCX, but also markers of neural progenitors such as Mash1, a basic helix-loop-helix transcription factor. As suggested by the reviewer, we have further investigated the expression of other markers of neural progenitors by sorted cells. The results showed that the proportion of DLX2+ cells a marker of proliferating progenitors (Doetsch et al. 2002) was very high in aNSC/TAP (98%) and progressively decreased in iNB (82%) and mNB (25%). Similarly, the expression of the transcription factor SOX2 that plays an essential role in the maintenance of neural progenitors (PMID: 25126380) accounted for 78% of aNSC/TAP, 70% of iNB and 17% of mNB.

      Altogether, these new data confirmed the identity of the different cell populations and particularly that of iNB. They are commented at the beginning of the Results and shown in Figure S1.

      R1C4: Line 125: GFAP alone doesn't necessarily indicate a "conversion to NSCs" - this conclusion could be greatly strengthened by inclusion of more markers, particularly at the protein level, or cyto-architectural studies.

      Answer: We agree with the reviewer that GFAP expression alone is not sufficient to evidence the presence of NSC in the SVZ. We have thus modified the text accordingly: “Importantly, eGFP+ cells were present in the SVZ of all the animals transplanted with eGFP+s-iNB and eGFP+s-NSC/TAP (Fig. 1Db, Fig. 1Dc), some of them expressing GFAP indicating the generation of astrocytes, and therefore possibly NSC”.

      R1C5: Could these cellular states be reflective of preferential translation of DCX? It would be very helpful to see the flow cytometry sort data for iNBs / mNBs used in Figure 6, particularly if these cells were also fixed and stained directly for DCX protein.

      Answer: As suggested by the reviewer, freshly FAC-sorted iNB and mNB were fixed and labelled with an anti-DCX monoclonal antibody after permeabilization. As shown in the figure below, we found a higher level of DCX expression in mNB than in iNB. Therefore, this result tends to indicate that the proliferation capacity is somehow related to the level of DCX expression. However, because of the relatively low importance of this result, we decided not to include them in the manuscript.

      Author response image 1.

      Modal histogram representation of DCX expression level in unstained, iNB and mNB cells determined by flow cytometry (FlowJo).

      <R1C6: Figure S8 is all zeroes, showing the GFP+Dcxhigh NBs do not retain proliferative capacity. But we don't get a direct experimental comparison to EGFPnegative/lowDcxlow iNB engraftment, which would strengthen the conclusions of the paper.

      Answer: Unfortunately, there is no method available to analyse the eGFPnegative/lowDcxlow iNB engraftment: by definition, these cells do not express eGFP and the use of a tracker is not appropriate for long periods of time — and thus a high number of cell divisions — after engraftment. However, to us, this control is not needed to conclude that GFP+Dcxhigh iNB have no (or at least a lower) stem cell potential in vivo considering that we have shown in Figure 1 and Table 1 that the whole iNB population is able to generate the different types of neural cells.

      R1C7: Transplant data in Table 1 - a relatively small proportion of transplant derived cells are in OB, etc. Given that A cells are thought to cycle at least once in vivo, is this expected?

      Answer: The reviewer is right considering that a relatively small proportion of transplant derived cells were found in the OB. However, we should consider that we used immunocompetent mice as receivers, which could have significantly reduced the engraftment efficiency, and the migration of engrafted cells outside the injection site.

      R1C8: A caveat is that there is not much functional testing of the proposed model, especially for the interconversion of iNB states suggested by the diagram in Figure 7. The text is relatively restrained in proposing this model, so it is reasonable to keep - but perhaps should be noted that this part of the model will need additional testing.

      Answer: Data presented in Figure 6 clearly suggest that Dcxhigh iNB have similar in vitro potential than Dcxlow iNB, whereas they don’t have such potential in vivo (Figure S10). This suggests that, providing they are in appropriate conditions, Dcxhigh iNB could reacquire stem/progenitor properties. However, we agree that this hypothesis requires further investigation. Therefore, as suggested by the reviewer, we have added in the Figure 7 legend: “Possible interconversion of iNB states would require further experimental confirmation.”

      Additional minor points:

      R1C9: Introduction: the SVZ is described as "the lateral wall" - however, several works in the mouse have also examined the medial wall and callosal roof, as cited later in the intro. Suggest rephrasing the second sentence (line 48) and later sentence (line 66) to clarify that "the SVZ" encompasses all of these subregions, they are not necessarily separate niches. Answer: As indicated by the reviewer, the SVZ encompasses distinct subdomains, with NSCs having a regional identity based on their location in the lateral or septal wall of the ventricle and generating different types of neuronal and glial progeny (PMID:34259628.). To address the reviewer concern about possible confusion and clearly indicate that SVZ encompass several subdomains, we have modified the sentence line 66 as follows: “Since then, the single cell RNA-sequencing has revolutionized the field and has made it possible to precisely elucidate the transcriptome of SVZ cells present in the LW and in the septal wall which also harbors NSC niches”.

      However, we did not modify the line 48, since in this sentence we just indicate that the largest neurogenic niche in the adult brain reside in the LW of the SVZ.

      R1C10: Line 77: "exposure" not "exposition"

      Answer: The error has been corrected in the revised manuscript.

      R1C11: As noted in the Public Review - the use of the term "D1/D2" cells seems likely to confuse readers who are also versed in dentate gyrus neurogenesis. Recommend removing this term from the manuscript.

      Answer: We agree that the D1/D2 terminology could bring confusion, D cells referring to Tanycytes in the hypothalamus. We now refer to iNB1 for DcxLow iNB and iNB2 for DcxHigh iNB in the revised manuscript.

      Reviewer 2

      Major comments:

      Lack of rigor

      R2C1: There is a lack of appropriate normalization controls for the microarray data. As there is a decreased level of transcription in quiescent NSCs, there needs to be a cell number control (spike-ins based on cell numbers). Without this normalization, the readout can be greatly skewed.

      Answer: We agree that qNSC are marked by a decreased level of transcription due to quiescence. To overcome this problem in the Clariom assays, we thus chose to calibrate each population, with a fixed amount of cRNA and cDNA using Hela cells as internal control. We totally agree that this method is not optimal but it appears to be efficient in the end. Indeed, it should be noticed that it has been adopted, thus with the same rigor, in other microarray studies published in the field (PMID: 24811379) and also on skeletal muscle cells (PMID: 29273087). Moreover, interestingly the transcriptomic signature of qNSC matches perfectly with those from other studies and particularly to those of related clusters in single cell experiments (including ours, Figure S5). This is probably linked to the fact that more importantly that the number of cells, the main characteristic of these cells is the lack of expression of genes involved in cell proliferation and metabolism. Whatever so, these data confirming previously published are not the main information of our manuscript, which is mainly dedicated to the characterization of proliferating cells, which is not impaired by our choices of normalization.

      R2C2: The absolute segregation of clusters in the single-cell analysis is currently entirely in agreement with the cell cycle stage. This suggests that in the author's analysis, the clustering in 3F is entirely shaped by the cell cycle, making that the defining characteristic of the author's definitions for their cell types. Has an analysis been done that regresses out cell cycle-associated genes to see if there are clusters for different cell states/types that are identified in the absence of cell cycle stage being the defining factor? (Barron and Li, 2016). For example, just as you would see a difference in cluster if you are a quiescent or activated NSC as compared to a neuroblast for example, even without the contribution of cell cycle. These are different cell types.

      Answer: We agree that cell cycle regression would theoretically allow for further discrimination between cycling cells along successive neurogenic stages. We have already performed regression using several methods, including regressing using S- and G2/M-score regression as indicated in the Seurat workflow, removing cell cycle-related PCs from UMAP calculation as used in the Cebrian-Sylla study, and using alternative gene sets such as the ones provided by the tricycle method (PMID: 35101061). These regression methods have all been used on our datasets, the original Cebrian-Sylla datasets and a combination of our datasets with the Cebrian-Sylla original datasets to increase cell number and clustering resolution. However, none of these methods modified the clustering of cycling cells.

      In fact, the strong influence of the cell cycle over clustering highlights the relevance of our depletion/replenishment approaches to decipher the molecular changes masked by the cell cycle, as discussed below.

      R2C3: The use of the DCX-CreERT2 line is a lineage tracing line. Once DCX is expressed, Cre recombines the DNA to allow for fluorescence. It is binary, on or off associated with DCX expression. And once on, it is always on, whether the cell is currently expressing DCX or not. As the authors had previously described a DCXlow condition, the eGFP- cells would not reflect DCXlow, but no DCX at all. And the eGFP+ cells may not be currently expressing DCX anymore. The authors should have used a system where the DCX promoter itself drives fluorescence.

      Answer: We took advantage of the DCX-CreERT2 line to demonstrate that some neural cells that have recently acquired DCX expression (i.e. eGFP+ iNB) could keep (or recover) the potential of neural progenitors in vitro. Of course, some of these GFP+ cells could have stopped to express DCX. This is probably the case when they differentiate into astrocytes and oligodendrocytes in vitro as shown in Figure 6.

      Whatever so, the use of the Dcx promoter as a direct driver of eGFP fluorescence would have totally impeded our capacity to demonstrate such changes in cell fate in vivo because of the impossibility to track oligodendrocytes or astrocytes derived from iNB because of the loss of Dcx expression.

      R2C4: The lack of analysis of images (differentiation, for example) limits the conclusions of the in-vitro data, and the images with unclear staining, limit the conclusions of the in-vivo experiments.

      Answer: This comment is similar to that of R1C2. We have now added a quantification in Figure S2.

      R2C5: The cited difference in splicing differences in cell types was interesting (though did not show up in the transcriptome enrichment analyses Fig S2) and would be something to further pursue, however, this was a very limited analysis. There was no further study of these splicing mediators beyond single-cell data.

      Answer: We now show enrichments of GO terms corresponding to mRNA splicing isoforms in the different types of sorted SVZ cells (Figure S4). This analysis clearly revealed that spliced genes in SVZ cells are mainly involved in neuron development and neurogenesis. Interestingly this also showed that qNSC logically differed from the other cell types by splicing concerning genes involved in mitosis and cell cycle, consistently with their quiescent state. More importantly, GO annotations of differentially spliced isoforms further confirmed that s-TAP and s-iNB have distinct features. We agree with the reviewer that further analysis of splicing mediators would be very important for understanding molecular changes involved in neurogenesis. However, we think that it is largely beyond the scope of this study.

      R2C6: Fig 1C - Show values, not just pictures. You may need to shift your current differentiation paradigm to do so by removing growth factors instead of unique differentiation conditions.

      Answer: See the answer to R1C2.

      R2C7: Fig S1A - Stainings for GFAP and DCX are not clear. It is very hard to distinguish which cells are associated with these signals.

      Answer: This figure (now Figure S2A) shows an eGFP+iNB cell (white arrow) that has reached the rostral migratory stream and expressed DCX (inset a3), but not GFAP (inset a2). This is now indicated in the figure legend. We have also moved the arrow for more clarity.

      R2C8: Fig S1B2 - There is red staining everywhere, so it is very hard to see a specific CNPase signal.

      Answer: We have added a new figure (Fig S2B) distinguishing eGFP+CNPase+ cells (yellow arrows) from eGFP+CNPase- cells (white arrow).

      R2C9: Line 174 - It's the mRNA that you are detecting is being downregulated - be more specific as you are not showing protein downregulation.

      Answer: We specified, "encoding" a major splicing repressor in the Line 174 text to refer to the mRNA: “Interestingly, Ptbp1, encoding a major splicing repressor”.

      R2C10: Line 189 - text in this line have some clusters not shown in the figure - (clusters 6 and 15, DCX+ Ki67+ neuroblasts) - which would be an important thing to visualize. As is shown now, the authors are only showing that iNBs are similar to mitotic TAPs.

      Answer: Clusters 6 and 15 have been added to Figure S5.

      R2C11: Fig 3D-E - Why is cluster 17 called aNSCs (3E) when it has the highest GFAP (Fig 3D). Typically, the highest GFAP cells are qNSCs or astrocytes, not aNSCs.

      Answer: We previously reported that the level of gfap mRNA expression in neural stem cells (quiescent and activated) did not exactly reflect the amount of protein in these cells. This is the reason why we also used the Slc1a3 marker (Glast), which is highly expressed both at the RNA and protein levels in quiescent NSCs (Daynac et al. 2013).

      R2C12: Line 216 - You said in line 216 cluster 13 were astrocytes, then you said in line 227 that cluster 13 was s-qNSC. Which is it?

      Answer: This is due to the fact that we performed two distinct analyses.

      In the first one (line 216), cells were scored based on datasets provided by Cebrian et al. with one dataset containing genes enriched in astrocytes, and another one, genes enriched in quiescent B-cells. Therefore, cluster 13 was shown to contain 73% cells expressing astrocyte markers, whereas cluster 4 gathered cells expressing both qNSC (B-cells, 48%) and astrocyte (52%) genes.

      In the second one (line 227), cells were scored using our transcriptomic signatures of FAC-sorted SVZ cells, which do not include differentiated astrocytes. We demonstrated that the cluster 13 cells only expressed s-qNSC genes.

      R2C13: Line 214 - While other clusters were all named in lines 214-221 that were then further discussed in lines 227-230, clusters 15 and 19 were not. You associate both of those clusters with s-iNB - what was it associated with in the above section?

      Answer: Lines 219-221 have been reworded as follows: Clusters 10, 5, 15, 12, and 8 were defined as cycling progenitors based on the expression of proliferative markers such as Top2a, Mki67, Ascl1. Clusters 1, 3, 7 and 9 were identified as mNB due to the loss of Mki67, Top2 a and Ascl1 expressions and the expression of Robo2 and Dcx. Cluster 19 that have lost Ascl1 but still expressing Top2a and Mki67 together with Robo2 and Dcx appears at the transition between iNB and mNB.

      R2C14: Fig 3I-J - 5 days after irradiation, I would like to see from tissue slices how many cells are dividing compared to 1day post-irradiation and controls. In other paradigms, such as temozolomide experiments (Kalamakis et al), by 5 days we should see less cells in quiescence and more of those quiescent cells exiting quiescence into the cell cycle. Why would there be more cells in quiescence in the irradiated brain? Even if they are radiation resistant, the base number should be comparative between controls and irradiated, which is not what you show in Fig 3I-J. And R2C14)

      Line 234-235 - the text says normalized to numbers of qNSCs which is supposed to be the same (which I agree should be the same). However, your graph in 3I and J shows more qNSCs in irradiated conditions, which would influence greatly and is currently hard to interpret.

      Answer: As stated by the reviewer, there is no increase in the absolute number of quiescent cells in the irradiated SVZ. The reconstitution of SVZ cell populations after 4Gy irradiation has already been studied by our group (Daynac et al. 2013, see Fig. 3F), showing that s-iNB and s-mNB are still under-represented after 5 days, while qNSC are in similar numbers as in unirradiated SVZ. Therefore, this led to an over-representation of quiescent cells and early SVZ progenitors in Figure 3J as compared in Figure 3I.

      R2C15: Fig 6A - the authors show a significant difference in neurospheres between eGFP- (DCX-) and eGFP+ (DCX+) iNBs - as would be expected as DCX suggests a further commitment towards neurogenic fates, yet your population doubling is the same.

      Answer: To determine the population doublings, the medium was changed and cells numbered every 7 days. This condition masked the differences between two cell populations reaching the plateau phase at different time, explaining why eGFP-iNB and eGFP+iNB could not be clearly distinguished by this technique.

      R2C16: Fig 6C - Differentiation data (in-vitro) should be quantified in 6C, just as was mentioned for 1C. These values should be done for both of the populations (eGFP-iNB, and eGFP+iNB) and not just compared to the previous pictures which were on total iNB. Again, numbers are required, not just picture examples.

      Answer: Quantitative data have been given in Figure 6D showing that approximately 60-80% of cells eGFP+iNB are able to differentiate in either neurons, oligodendrocytes or astrocytes. We did not analyze the differentiation of eGFP-iNB since it would not add any supplementary information.

      R2C17: Fig S8 - The authors did not show if the lack of engraftment of eGFP+ cells is due to the transplant (previously you showed only 2/3 worked in a similar paradigm). It would be helpful if the authors would have some means to visualize the DCX low cells to confirm they worked as before in the transplantation (another color? Another type of mouse (Thy1 antigen differences)?) Answer: Unfortunately, the Thy1 antigen has not been documented in mouse subventricular zone progenitors, but only in neurons (PMID: 10813783). Thy1 antigen has also been described in bipotent glial progenitor cell (GCP) from the developing human brain giving rise to oligodendrocytes (PMID: 36931245).

      As shown, in Figure S10 we have performed 5 grafts with s-iNB eGFP+ cells, 2 alone and 3 mixed with eGFP- cells and never found any eGFP+ cells 5 weeks after grafting. Moreover, we did not find any eGFP+ cells in the brains of 3 other animals 2 weeks after grafting with s-iNB eGFP+ cells (These data have been added to Figure S10). As compared to the results described in Figure 1 this clearly shows that iNB DCXhigh are not able to generate persistent cells in the grafted brains similarly as mNB.

      R2C18: Fig S8 - Why were there no eGFP cells even at the injection site? DCX expression promotes migration, indeed DCX expression becomes very high in cells in the SVZ as they begin to exit to go to the migratory stream. If one didn't see migration, one would expect you would still have survival. Currently, the authors show no cells at 5 weeks, however, they would need to show earlier timepoints as well to determine what is happening with these cells. It is possible these GFP+ cells are not even expressing DCX anymore (see above).

      Answer: As stated above, we did not find any GFP+ cells in the brains of 3 other animals 2 weeks after grafting with s-iNB eGFP+ cells (see Figure S10).

      R2C19: Line 320 - the authors suggest a subpopulation of NEURONS continues to divide and cite 2 works from the 1990s showing proliferating SVZ cells can differentiate. Our knowledge of this system has come dramatically forward since the 1990s as well as technologically, and to date, neurons have not been shown to divide.

      Answer: We apologize for this lack of clarity, as we agree that neurons correspond to differentiated non-cycling cells, but we used the terminology used in these articles. The incorrect part of the sentence Line 320 has thus been deleted from the text.

      R2C20: Fig 7 - The whole figure is based on changing levels of RSR genes which were not confirmed in any way to be involved in any of these stages, only descriptively in single-cell analyses.

      Answer: As stated above, in our opinion, further characterization of the involvement of RSR genes in neurogenesis is largely beyond the scope of our manuscript. Nevertheless, we think that the role of RSR genes in neurogenesis is an important question that should be addressed in further studies.

      Overstatement of findings

      R2C21: Fig 1 - Authors did not compare all cell types in each condition but made overstatements about their relationships to each other between graphs. There should also be separate graphs showing all cell types at 4% and a separate one at 20%.

      Answer: In the revised version, Figure 1 shows the graph comparing all cell types at 4%O2 and a separate one at 20% as requested by the reviewer. The graphs clearly shows that 4%O2 promotes iNB proliferation compared to the 20% condition.

      R2C22: Fig 1D-b2 - Why does DCX look nuclear? One can't say they are only NSCs if they are GFAP as astrocytes also express GFAP. The authors would need another marker to separate those populations. In the text, the authors say expressing GFAP (line 124) which means NSC, but then in line 127 expressing GFAP means astrocytes - which further shows you need additional markers to validate those 2 different cell types. Answer: DCX nuclear translocation has been shown to improve cellular proliferation (PMID:32050972).

      As indicated in R1C4. The text has been modified as follows: “Importantly, eGFP+ cells were present in the SVZ of all the animals transplanted with s-iNB eGFP+ and s-NSC/TAP eGFP+ (Fig. 1Db, 1Dc), some of them expressing GFAP indicating the generation of astrocytes, and therefore possibly NSC”.

      R2C23: Fig S2 - The transcriptome signature for s-iNBs is very similar to s-TAP, basically suggesting the iNBs are further along in cell cycle.

      Answer: This is now the Figure S3. Functional enrichment analysis of individual transcriptome signatures revealed that both s-TAP and s-iNB are enriched in genes related to the cell cycle although with different GO terms enrichments. Indeed, s-TAP are enriched in genes related to G1, G1/S and S phase (but with low -log10 adjusted p-values) and s-iNB with genes related to cell cycle mitosis and M phase (with high -log10 adjusted p-values).

      We have previously shown that around 33 % s-iNB have DNA content>2N, versus around 26% of s-TAP and s- aNSC (Daynac et al. 2013), which is in accordance with GO terms enrichments. However, these data have also shown that most s-iNB and s-TAP are in G1, indicating that siNB are not just further along mitosis than TAP.

      Moreover, our transcriptomic data clearly show that s-iNB are distinct from s-TAP: 1) according to principal component analyses (Figure 2B et C), the whole transcriptome of s-TAP is closer to that of s-aNSCs than to that of s-iNB (10% variations in PCA2), 2) the heatmap in Figure 2D shows that they have different RSR genes expression profiles, 3) the new Figure S4 shows that GO annotations of differentially spliced isoforms further confirmed that s-TAP and s-iNB have distinct features, and 5) Figure S5 shows that s-iNB expressed genes associated to either TAP or NB that have been described in previous studies, whereas s-TAP did not express genes associated to NB, but look closer to aNSC. Finally, scRNAsq cell clusters related to s-iNB are distinct from the cluster related to s-TAP as shown 1) in Figure 3D and 2) in Figure 4.

      R2C24: Fig 3 - The lack of information about timepoint 0 after irradiation, and when proliferation and cell cycle entry begins again following irradiation, limits our interpretation of the single-cell irradiated data.

      Answer: We have previously reported the relative abundance of each SVZ neural progenitors in the young adult mouse brain in several papers. Particularly, we based our interpretation on our SVZ irradiation model reported in Daynac et al. 2013 demonstrating a radio resistance of qNSC re-entering into the cell cycle as early as 2 days after 4Gy irradiation successively regenerating aNSC, TAP then iNB and mNB.

      R2C25: Fig S3 - These results effectively show that the s-aNSCs and s-TAPs are actually less specific when compared to that same identity in other studies, and that the iNBs are most similar to mitotic TAPs. This supports what was mentioned above, which is that the transcriptional signatures are very similar between the s-TAPs and i-NBs, showing these are not a unique cell state, but just a bit further along mitosis within the TAP cell state.

      Answer: This is now the Figure S5. In this figure, we show that s-iNB expressed genes associated to either TAP or NB that have been described in previous studies, whereas s-TAP did not express genes associated to NB, but look like closer to aNSC. As indicated above in R2C23, s-iNB are not just a bit further along mitosis within the TAP cell state. Indeed, we give several data showing that s-iNB and s-TAP have different transcriptomic profiles.

      R2C26: Fig 4B - The focus on Ptbp1 as being associated with the iNB cluster border to mNB is expected as all previous studies of Ptbp1 have focused on its role in the progression of other cell types through the cell cycle, its control of cell cycle regulators, and a cell cycle mRNA regulon (Monzon-Casanova et al, 2018, 2019, 2020). This further supports these analyses are specifically defined by cell cycle stages.

      Answer: We totally agree that Ptbp1 expression distinguishes cycling cells from postmitotic neuroblasts in accordance with previously published paper, and that based on this unique gene we cannot find any differences between cycling cells ie. aNSC, TAP and iNB. However, as shown in the manuscript and stated above (R2C23 and 25), these cells can be distinguished by their respective expression of many other genes, including other RSR genes.

      R2C27: Line 281-282 is an overstatement - the authors suggest that this is a new type of cycling neural progenitor - when all studies point to it being the end of mitosis TAPs as they go on their way to mNBs. This clearly shows a trajectory and not a defined, binary cell type.

      Answer: We agree with this statement that the use of the word "type" was misleading, and changed it to "stage" to better reflect that s-iNB are a distinct stage along the differentiation process according to our pseudotime cell-trajectory analysis.

      Author response image 2.

      Pseudotime analysis using Monocle 3 (excluding the cluster 13 corresponding to astrocytes and starting from s-qNSC) revealed two branches starting from s-TAP, one towards cell cycle the other towards neuronal differentiation.

      minor comments:

      R2C28: Fig 3D - For ease, please define what you called the clusters in 3D - not just cluster numbers

      Answer: We chose not to call the clusters in 3D because their identification (Group names) is based on data presented after in Figures 3E, F and G.

      R2C29: Fig 3E-F - Show astrocytes by text in 3E and F

      Answer: As discussed above, astrocytes cannot be shown in these figures because they are based on our signatures which did not include astrocyte signature.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study from Belato, Knight, and co-workers, the authors investigated the Rec domain of a thermophilic Cas9 from Geobacillus stearothermophilus (GeoCas9). The authors investigated three constructs, two individual subdomains of Rec (Rec1 and Rec2) and the full Rec domain. This domain is involved in binding to the guide RNA of Cas9, as well as the RNA-DNA duplex that is formed upon target binding. The authors performed RNA binding and relaxation experiments using NMR for the wild-type domain as well as two-point mutants. They observed differences in RNA binding activities as well as the flexibility of the domain. The authors also performed experiments on fulllength GeoCas9 to determine whether these biophysical differences affect the RNA binding or cleavage activity. Although the authors observed some changes in the thermal stability of the mutant GeoCas9-gRNA complex, they did not observe substantial differences in the cleavage activities of the mutant GeoCas9 variants.

      Overall, this manuscript provides a detailed biophysical analysis of the GeoCas9 Rec domain. The NMR assignments for this construct should prove very useful, and the results may provide the grounds for future engineering of higher fidelity variants of GeoCas9. While the NMR results are generally well presented, it is unclear how the results on the isolated Rec domain related to the overall function of full-length GeoCas9. In addition, some conclusions are overstated and not fully supported by the evidence provided. The following major points should be addressed by the authors.

      (1) Many of the results rely on the backbone resonance assignments of the three constructs that were used, and the authors have done an excellent job of assigning the Rec1 and Rec2 constructs. However, it is unclear from the descriptions in the text how the full-length Rec construct was assigned. Were these assignments made based on assignments for the individual domains? The authors state that the spectra of individual domains and RecFL overlay very well, but there appear to be many resonances that have chemical shift differences or are only present in one construct. As it stands, it is unclear how the resonances were assigned for residues whose chemical shifts were perturbed, making it difficult to interpret many of the results.

      The Reviewer raises an important oversight. In Lines 491-493, we clarify that we were able to transfer the assignments using spectral overlays of the individual domains with GeoRec (i.e. careful analysis of the data in Figure S3). We also cite two new references where a similar approach was applied to Cas9.

      (2) The minimal gRNA that was used for the Rec-gRNA binding experiments is unlikely to be a good mimic for the full-length gRNA, as it lacks any of the secondary structure that is most specifically recognized by the REC lobe and the rest of the Cas9 protein. The majority of this RNA is a "spacer" sequence, but spacers are variable, so this sequence is arbitrary. Thus, the interactions that the authors are observing most likely represent non-specific interactions between the Rec domains and RNA. The authors also map chemical shift perturbations and line broadening on structural models with an RNA-DNA duplex bound, but this is not an accurate model for how the Rec domain binds to a single-stranded RNA (for which there is no structural model). Thus, many of the conclusions regarding the RNA binding interface are overstated.

      The Reviewer again raises an important point. We have added a section of text explaining the rationale for truncating the gRNA for binding experiments with NMR (Lines 223-235). We chose the 5’end of the gRNA containing the spacer sequence based on crystal structures of NmeCas9 and SpCas9 that show the Rec lobe interacting with this section of nucleic acid. The newly published GeoCas9 cryo-EM structure bound to gRNA, which overlaid well with the NmeCas9 structure, also suggested that this portion of the gRNA could interact with Rec.

      Figures S11 and S12 show our gradual truncation of the gRNA and Rec construct to achieve useful atomic detail. Ultimately, a 39nt gRNA containing a 23 base pair spacer sequence was chosen for this study to retain the NMR signal of the complex and because several structures suggested this 39nt sequence would be long enough to interact with the entire Rec lobe.

      To investigate the effect of the spacer sequence, we have now measured binding affinities via MST between GeoRec and a 39nt Tnnt2 gRNA and a 39nt gRNA from PDB: 8UZA, containing a different spacer sequence used in the very recent GeoCas9 cryo-EM structure. The observed trends for each gRNA are consistent across the samples. We also measured WT, K267E, and R332A GeoCas9 affinity for the full-length Tnnt2 and PDB:8UZA gRNAs.

      Lastly, we used a new cryo-EM structure of GeoCas9 bound to gRNA (PDB: 8JTR) to better define the interface for NMR CSPs and line broadening and have adjusted the language in this section.

      (3) The authors include microscale thermophoresis (MST) data for the Rec constructs binding to the minimal gRNA. These data suggest that all three Rec variants have very similar Kd's for the RNA. Given these similarities, it is unclear why the RNA titration experiments by NMR yielded such different results. Moreover, in the Discussion, the authors state that the NMR titration data are consistent with the MST-derived Kd values. This conclusion appears to be overstated given the very small differences in affinities measured by MST.

      MST and NMR experiments describing the weakened binding affinity of GeoRec and GeoRec2 for the Tnnt2 gRNA agree with each other (Figure 5). However, additional MST experiments with a different gRNA sequence (from PDB: 8UZA) and with fulllength GeoCas9 (new Figure 7) have provided new insight for us to soften and reframe the Discussion to avoid overstatement. See Lines 263-282 and 375-385.

      (4) While the authors have performed some experiments to help place their findings on the isolated Rec domain in the context of the full-length protein, these experiments do not fully support the conclusions that the authors draw about the meaning of their NMR results. The two Cas9 variants that were explored via NMR have no effect on Cas9 cleavage activity, and it is unclear from the data provided whether they have any effect on GeoCas9 binding to the full sgRNA. This makes it difficult to determine whether the observed differences in RNA binding and dynamics of the isolated Rec domain have any consequence in the context of the full protein.

      We have since measured the binding affinities of full-length GeoCas9 to full-length gRNA. (new Figure 7) We have also added a comment in the Discussion section describing how both GeoRec and GeoRec2 domain variants bind the truncated RNA with weaker affinity than the WT, but this biophysical effect does not translate to GeoCas9 with its full-length gRNA. We describe this finding as an explanation for why the single-point mutants have minimal effect of GeoCas9 cleavage activity. See Lines 375-385.

      (5) The authors state in multiple places that the K267E/R332A mutant enhanced GeoCas9 specificity. Improved specificity refers to a situation in which the efficiency of cleavage of a perfectly matched target improves in comparison to a mismatched target. This is not what the authors observed for the double mutant. Instead, the cleavage of the perfect target was drastically reduced, in some cases to a larger degree than for the mismatched target. The double mutant does not appear to have improved specificity, it has simply decreased cleavage efficiency of the enzyme.

      The conclusion has been reframed to suggest that the K267E/R332A double mutant has decreased cleavage efficiency of the enzyme but does not enhance GeoCas9 specificity. We discuss an interesting contrast, namely that mutations in the SpCas9 Rec lobe alter its specificity, which is at times accompanied by a loss of overall activity. We also speculate on why this may not be the case in GeoCas9, considering some very recent (unpublished at the time of initial submission) structural and biochemical data. See Lines 414-418.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript from Belato et al. used advanced NMR approaches and a mutagenesis campaign to probe the conformational dynamics of the recognition lobe (Rec) of the CRISPR Cas9 enzyme from G. stearothermophilus (GeoCas9). Using truncated and full-length constructs they assess the impacts of two different point mutations have on the redistribution and timescale of these motions and assess gRNA recognition and specificity. Single point mutations in the Rec domain in a Cas9 from a related species had profound impacts on- and off-target DNA editing, therefore the authors reasoned analogous mutations in GeoCas9 would have similar effects. However, despite a redistribution of local motions and changes in global stability, their chosen mutations had little impact on DNA editing in the context of the full-length enzyme. Their studies highlight the species-specific complexity of interdomain communication and allosteric mechanisms used by these multi-domain endonucleases. Despite these negative results, their study is highly rigorous, and their approach will broadly support understanding how the activity and specificity of these enzymes can be engineered to tune activity and limit off-target cleavage by these enzymes.

      Strengths:

      (1) Atomistic investigation of the conformational dynamics of the GeoCas9 gRNA recognition lobe (GeoRec), probing dynamics on a broad range of timescales from ps to ms using advanced NMR approaches will be broadly interesting to both the structural biology and CRISPR engineering communities.

      (2) Highly rigorous biophysical studies that push the boundaries of current techniques, provide insight into local dynamics of the GeoRec domain that serve to propagate allosteric information and potentially regulate enzymatic activity.

      (3) The study highlights the complexities of understanding interdomain communication in Cas9 enzymes since analogous mutations in different species have different effects on target recognition and cleavage.

      (4) The type of structural and dynamic insights derived from this study design could serve as foundational information to guide a rational design strategy aimed at improving the selectivity and reducing the off-target effects of Cas9 enzymes.

      Weaknesses:

      (1) Despite the rigor of the experiments, the mutations chosen by the authors do not have a profound effect on the overall substrate affinity or activity of GeoCas9 rendering little mechanistic insight into allosteric communication in this particular Cas9. However, the double mutant K267E/R332A has a more pronounced effect on the cleavage of WT and mismatched (at nucleotides 19 and 20) DNA substrates while minimally affecting the cleavage of mismatched (at nucleotides 5 and 6), suggesting more could be learned about the allosteric mechanism from the detailed characterization of this mutant.

      We thank the Reviewer for this comment. While we have included new binding experiments with full-length GeoCas9 and gRNAs (new Figure 7), the addition of new MD simulations (new Figure 6) better address this point. MD examined our single and double mutants, as well as the recently published high-specificity iGeoCas9, and reported the degree of conformational sampling and nucleic acid contacts and binding energies.

      The simulations show that our mutations induce some, but not the full extent of the effect of iGeoCas9 (with one mutation in GeoRec and many others in the adjacent WED domain), implying that further engineering of GeoRec to mimic iGeoCas9’s properties can have profound functional outcomes. Future efforts to mutate GeoRec will be leverage this strategy. See Lines 309-342.

      (2) Follow-up experiments with other residues that were identified as being highly dynamic might affect substrate recognition and cleavage activity in different ways providing additional insight.

      The Reviewer is correct. While beyond this initial scope, new MD simulations (see the response directly above) and NMR resonances distally affect by gRNA (via CSP or relaxation dispersion) will be used identify the primary targets for this analysis.

      (3) Details regarding the authors' experimental approach are incomplete such as a description of the model used to fit the CD data, a detailed explanation of the global fitting of the relaxation dispersion data describing how the best-fit model was selected, and the description of the ModelFree fitting of fast timescale dynamics is incomplete.

      We thank the Reviewer for pointing out these oversights. We have now included the fitting equation in the CD Methods section.

      We included new Figures S8-S10 with the individual relaxation dispersion curves and note in the Methods that global fits were deemed superior based on the Akaike Information Criterion. For WT, the AIC showed the global fit to be ~10-fold better. For K267E, the global model was 4-fold better, and for R332A, the global model was 6-fold better.

      We have included a more detailed description of CPMG and Model-free fitting. See Lines 520-526.

      Reviewer #3 (Public Review):

      The authors explore the role of Rec domains in a thermophilic Cas9 enzyme. They report on the crystal structure of part of the recognition lobe, its dynamics from NMR spin relaxation and relaxation-dispersion data, its interaction mode with guide RNA, and the effect of two single-point mutations hypothesised to enhance specificity. They find that mutations have small effects on Rec domain structure and stability but lead to significant rearrangement of micro- to milli-second dynamics which does not translate into major changes in guide RNA affinity or DNA cleavage specificity, illustrating the inherent tolerance of GeoCas9. The work can be considered as a first step towards understanding motions in GeoCas9 recognition lobe, although no clear hotspots were discovered with potential for future rational design of enhanced Cas9 variants.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses

      (1) Please update the sentences on lines 100-105 and the Methods to clarify how the RecFL assignments were obtained. If RecFL was assigned based on the assignments for Rec1 and Rec2, please describe in the Methods how the shifted resonances were handled. Please also provide chemical shift perturbation profiles for the truncated constructs versus the full-length Rec construct.

      We have now added text (Lines 491-493) and two new references explaining the GeoRec (full-length) assignment.

      We appreciate this point. We have now provided a new Figure S9 with analysis of CSPs and line broadening in truncated constructs (GeoRec2 only). See also Lines 263-282. We also show a similar structural response to mutation in full-length GeoRec and GeoRec2 NMR CSPs (Figure 2 and Figure S5).

      We have provided the CSPs for each construct, relative to the full-length GeoRec domain, Author response image 1. In most cases, the largest CSPs occur at resonances on the periphery of the spectra, retaining the ability to unambiguously assign it.

      Author response image 1.

       

      (2) It is unclear whether the differences in Kd's for the Rec-gRNA interactions are statistically significant, given the errors associated with the values. Can the authors further analyze these data to determine statistical significance? If they are not found to be significantly different, the authors should soften all conclusions related to the observed differences.

      Statistical significance was calculated for all MST data and Figures 5 and 7 have been updated to reflect this

      (3) As mentioned above, it seems likely that the Rec-RNA binding that is observed is non-specific. Have the authors tried MST with another 39 nt RNA? Are there differences in affinities for the Rec constructs?

      We have done MST with another 39nt RNA. The affinity for each gRNA (Tnnt2 vs 8UZA) is similar for WT and K267E, and a factor of ~4 weaker for R332A with 8UZA gRNA. The trend is the same, that WT Rec has a (statistically significant) stronger affinity for the gRNA compared to the mutants.

      (4) Have the authors tried MST with full-length GeoCas9 and the sgRNA? The current data on the thermal stability of the RNP's is interesting, but a more direct measurement of the affinity of the Cas9-sgRNA complexes would provide stronger evidence of the effects of the mutations.

      The Reviewer makes an excellent suggestion. We have now generated Cy5-labeled full-length gRNAs and conducted MST with full-length GeoCas9 (new Figure 7). The binding affinities to multiple guides do not vary significantly. We have discussed this, and its implications, in Lines 376-385.

      (5) One potential issue with not observing differences between the three Cas9 variants' cleavage activity is that the activity of these purified proteins appears to be very low in comparison to previous studies of GeoCas9. There are significant differences in the expression protocol used by the authors of the current study and previous studies. Have the authors attempted to replicate the expression and purification protocol of previous reports? This may improve the enzymatic activity and allow for a more detailed investigation of cleavage between the three variants (e.g. by performing time-course cleavage assays).

      The expression protocol of GeoCas9 is identical to those of previous studies. This was a written mistake on our part, which has now been corrected in the methods section. We apologize for this oversight.

      Recommendations for improving the writing and presentation

      The introduction of the manuscript is reasonable for specialists who are very familiar with Cas9 function, but it does not contain important details that may be unknown to most readers. The authors do not introduce the domains of Cas9 in the Introduction section. A brief description of the domains that are important to this work should be provided. For example, what is the role of the Rec lobe? This is not introduced until lines 110-111, after some discussion of the authors' initial work on these domains. For a broad audience, it would also be helpful to define the two catalytic domains of the protein. A paragraph describing the general architecture of Cas9 and the overall mechanism of Cas9, including allostery and domain movement, would be very helpful to a general audience. There are elements of this throughout the manuscript, but it would be better to have everything described in a single location at the beginning of the Introduction.

      The Reviewer makes an excellent point. We have added significant clarifying text to the Introduction (Lines 42-47, 52-58, and 61-66). We have also amended Figure 1 to highlight the domain arrangement of GeoCas9 and construct domain boundaries.

      Minor corrections to the text

      (1) Lines 37-38: The statement about GeoCas9 activity should reference citation.

      We have added two references here.

      (2) Line 39-40: "The widely-studied SpCas9, as well as GeoCas9, are Type-II CRISPR systems". Cas9 is only a single component of a larger system that contains other proteins and DNA elements, so it would be more appropriate to say "are effectors of type II CRISPR systems" or "are signature proteins of type II CRISPR systems". Also, please define the organism from which SpCas9 is derived. It may be more appropriate to use the three-letter abbreviation "SpyCas9" to be consistent with the abbreviation used for GeoCas9.

      We have revised the initial suggestion and specified the organisms. We have, however, chosen to keep “SpCas9” for consistency with our prior work and the work of many several others, including Doudna et al and Zhang et al.

      (3) Lines 39-42: "only the Type II-C class to which GeoCas9 belongs has been rigorously validated for mammalian genome editing". SpCas9 is from a type II-A system and is by far the most commonly used ortholog for genome editing, including in ongoing clinical trials. It is unlikely that any of the type II-C Cas9 orthologs have been more rigorously validated than SpCas9. The reference cited in this sentence also does not support this statement and is a review written in 2017, so would be unlikely to reflect the current state of the art. Please revise this sentence.

      We have softened and revised this text (Lines 42-47).

      (4) Lines 48-52: It would be helpful to describe the dynamic movement of the HNH domain (and cite appropriate references) prior to describing the authors' previous work. As it stands, it is unclear how this sentence would be understood by a non-specialist.

      We have added text in Lines 61-68

      (5) Lines 44-45: The wording is a little unclear, as it sounds like the guide RNA, rather than the nuclease domains, is responsible for dsDNA cleavage. The sentence could be adjusted to remove "and cleave". Cleavage by the HNH and RuvC domains could be described in a separate sentence.

      We have revised this text. See Lines 49-50.

      (6) Lines 46-48: This segment of the sentence suggests that PAM recognition triggers the allosteric events that result in the movement of the nuclease domain (HNH). This is misleading, as HNH movement is triggered by the complete formation of an R-loop, rather than initial PAM recognition. Please revise this sentence.

      We have revised the text in Lines 52-58.

      (7) Lines 62-65: The first sentence is unclear. The specificity of many protein-nucleic acid complexes is well understood and is also readily quantified by several wellestablished methods. Are the authors specifically referring to the structural basis for Cas9 specificity? Although Cas9 specificity is highly complex, it has been studied structurally in great detail and should not be described as "poorly understood" without some discussion of what is already known. These sentences also elide the fact that Cas9 specificity has been successfully altered via rational design, based on our general framework for understanding protein-nucleic acid interactions. Please clarify these statements.

      The Reviewer makes an important point. We have softened this statement (Lines 8081). We have clarified that we intended to refer to structural characterization of large, multidomain proteins and nucleic acid complexes via NMR. We agree that many critical structural studies comment on Cas9 dynamics and specificity in great detail, including at the domain-level.

      (8) Lines 62-68: It seems like the citations do not match up with the references in this section. The references for citations 8-10 are not about DNA repair complexes, references 11-14 are not papers about the directed evolution of Cas9 (should these be 16-17?), and the references for the HNH domain movements should be for citations 1821.

      We apologize for the confusion, and the references have been updated

      (9) Lines 116-119: The description of the RNAs used is unclear, as the segments that are described add up to 141 not 101. Also, what is meant by "110-nt guide sequence intrinsic to GeoCas9"? Is this referring to the tracrRNA segment? It may be helpful if the RNA sequences shown in the accompanying figures were replaced with cartoons of the RNAs that were used, with the different segments labeled.

      We now describe the gRNA sequences in detail in new Table S4. We also expanded a bit in the text (Lines 224-235).

      (10) Line 121-123: This sentence should contain reference(s).

      We have changed the sentence.

      (11) Line 156-158: Reference 19 did not report or investigate any higher specificity SpCas9 variants, is this citation correct?

      We have removed the reference from this line. Ref. 19 (now Ref 23, Slaymaker et al) should be correct.

      (12) Lines 162-166: Please provide a sequence and structural alignment for SpCas9 and GeoCas9 to support the claim that the amino acid substitutions are equivalent between the two orthologs.

      We have updated Figure 1 to display the similarity in domain arrangement between SpCas9 and GeoCas9 and have noted similarity in structure and sequence of these proteins in Figure S1.

      (13) Lines 234-236: There is insufficient evidence to conclude that the alterations in protein dynamics caused the changes in gRNA interaction. The substitutions are charge swap substitutions, and it is equally (if not more) feasible that these substitutions decrease the potential for favorable electrostatic interactions.

      (14) Lines 261-265: While the RNP stability for R332A is clearly decreased in comparison to WT, the authors' conclusions regarding K267E seem overstated. The difference in Tm for the K267E mutant and WT RNPs is not very large and may be within error, especially given that the CD data are noisy. Similarly, on lines 321-322, only one of the mutations really impacted the stability of the full-length RNP.

      We have softened this text in Lines 303-305.

      (15) Lines 336-338: HiFi-SpCas9 does not contain four mutations, it is a single R691A point mutation, as reported in reference 17. This sentence and subsequent sentences should be updated.

      Here, the “final form” of HiFi SpCas9 contains the R691A and three additional mutations. The Reviewer is correct, though, that the R691A mutation alone was enough to enhance the specificity of WT SpCas9. We have clarified this point on Line 156.

      Minor corrections to the figures

      (16) The cryo-EM structures of GeoCas9 have recently been released on the PDB. The authors may now update figures to include the experimentally determined structure, rather than an AlphaFold model and update the text accordingly.

      We have made this change.

      (17) For Figure S4, please describe what the red dashed lines are in the top three graphs. Are these the Tm values determined for the two individual Rec domains? How do these compare to the inflection points for the two transitions in the full Rec construct (could be determined by plotting the first derivative data)? Please provide information in the Methods on how the temperature-dependent CD spectral data were fit and Tm's were determined.

      We have made these changes in the Figure S4 caption and Methods section.

      (18) The blue box denoting the unassigned region is missing from Figure 2C-D, although it is mentioned in the figure legend.

      We have added the blue box denoting the unassigned linker.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is well-written and generally clear and concise. The following recommendations will help improve the readability and include details important for interpreting the results.

      (1) In general, the figures are too small and difficult to interpret, it was hard to discern the differences described in the text (e.g. Figure 1A, E, 4A, etc.), the text labels are illegible in several panels (e.g. Figure 4A, S8B, C, etc.), the chosen colors were difficult to interpret in the structures (Figure 4C, S8G, H, etc.), as well as residues with motion (as balls) were difficult to make out due to size and color usage. Similar story for the dispersion curves (Fig 3A), the plots are chaotically crowded, and it is impossible to interpret (or see) the undelaying data.

      We apologize for these difficulties. We have now revised the Figures in several ways. First, we greatly simplified Figure 1, such that it now includes only the domain arrangement, structure, and initial NMR details for GeoRec (essentially A-B of the old Figure 1).

      Second, we have reformatted Figure 3 to make the structure maps a bit easier to see.

      We certainly appreciate the point made by the Reviewer about the dispersion curves. Our intent here is to illustrate the number of curves that can be fit globally, which substantially increase for K267E and R332A GeoRec3, versus WT. As a compromise, we have included the individual dispersion curves in the SI for each variant. We have also thinned the line weights for each fit, and added NMR order parameters to the main figure to round out the discussion of dynamics.

      Third, we have compiled the gRNA titration into Figure 4, removing the CD analysis (to SI), MST data (new Fig 5), and unclear structure maps to focus only on the NMR spectra here.

      Fourth, we have created a new Figure 5 focusing on MST studies of two gRNAs with GeoRec, which now include bar charts of affinities with appropriate statistics.

      Much of the data trimmed from the prior version of the manuscript figures has been moved to Supporting Information. We have also created two new main text Figures (6 & 7) based on MD simulations and MST studies of full-length GeoCas9 and gRNAs to provide additional context for interpreting the results in prior figures.

      (2) Line 39 - this sentence is awkward, could you rephrase it?

      We have rephrased this sentence.

      (3) There is inconsistent labeling, in Figure S2 the full-length construct is referred to as GeoRecFL while in other places in the text and in Figure 1 it is called GeoRec.

      We have changed all references to the intact Rec lobe to “GeoRec.”

      (4) It would be helpful to include a cartoon of the domain organization of GeoCas9 and indicate the truncation mutants that were studied in this manuscript.

      We included the domain organization in Figure 1A and indicated the amino acid boundaries for each construct on the figure and in the Methods section.

      (5) There is significant line broadening that occurs during the titration, not all line broadening is due to changes in rotational correlation time, and differential line broadening may reveal interactions of residues that are in the intermediate regime, certainly, uM affinities measured by the authors, would suggest this, therefore, a plot of I/Io might inform on binding sites, and it might be useful to look at differential broadening as a function of titrant added.

      The Reviewer makes a very good point. In addition to the data in Figure 4, which show a clear reduction in gRNA-induced line broadening in larger GeoRec constructs, we included new titration data on smaller GeoRec2 domains (Figure S12). Here, we conducted an I/I0 analysis and added some clarifying language about the possible nature of line broadening in these samples. See new Figure S12 and Lines 268-274.

      (6) Line 126 "Importantly, many resonances are also minimally impacted." This statement is unclear since from the plots shown in Figure 1D, it seems that many of the residues are impacted by RNA titration, see the point about differential broadening above, this sort of plot may help pick apart residues that broaden due to RNA contacts (rather than changing rotational correlation).

      We have removed this statement, in addition to our revisions above regarding the line broadening.

      (7) Line 137 - I am not sure that a max chemical shift of 0.15 ppm constitutes "strong chemical shift perturbations"

      The Reviewer makes a good point. We have changed “strong” to “significant” which refers to 1 standard deviation above the 10% trimmed mean of the data. See Line 237.

      (8) Line 144 - change to "...experimentally determined structure...".

      We have added new lines 135-136 to make this point clear. We reinforced that initial predictions were based on the Alphafold2, since an experimental structure was lacking, but we have now discussed the mutations in context of the new structural data.

      (9) The section from lines 150 - 166, comparison of the effect of different mutations in different Cas9 seems more appropriate for the discussion section.

      We have added additional text on this point in the Discussion section, within several new paragraphs.

      (10) In Figure S6, chemical shifts are observed at the distal site away from the mutations, could the authors discuss?

      The Reviewer makes an important observation. Indeed, the CSPs caused by K267E and R332A extend beyond the mutation site. These shifts are mostly close in 3D space to the mutation, and consistent in Figures 2 and S5. New titrations of gRNA into isolated GeoRec2 also activate some distal sites, and new MD simulations suggests the mutations disrupt RNA and DNA contacts, where these distal effects may play a role with full-length gRNAs.

      We agree it would be worth mutating distal sites undergoing CSPs to examine their impact on function, but two complicating factors are 1) the lack of substantial gRNA affinity differences in experiments with full-length GeoCas9 and 2) the lack of functional changes in the mutants. In this initial study, it appears difficult to assign an effect to these distal sites in GeoCas9 (beyond speculation). We do have a brief discussion of the distal sites (Lines 293-298) and will follow up this work with more comprehensive mutagenesis studies of these sites.

      (11) It appears that the authors fitted the Tm data to some model although this is not mentioned in the text, figure captions, or methods. In the caption for Figure 4D the authors refer to "Fitted thermal denaturation profiles...".

      We have added the relevant Equation in the Methods and referenced it in Figure S6 and S14 captions.

      (12) Details of the ModelFree fitting are needed, how many residues fit with the minimal models, and how many invoked Rex and other terms? How does the statement in line 191 about the elevated S2 values arising from global tumbling compare with an experimental estimation of rotational correlation eg. from R2/R1 ratios?

      We have included an expanded description of the Model-free protocol (Lines 521-527). The best diffusion tensor was an ellipsoid model. The number of residues utilizing Rex was 81, though Rex contribution was very small. The mean and errors for the fast motion (S<sup>2</sup><sub>f</sub>), slow motion (S<sup>2</sup><sub>z</sub>) and generalized order parameter were 0.97 ± 0.15, 0.84 ± 0.14, and 0.91 ± 0.20, respectively.

      R2/R1 ratios for each of the samples (relaxation conducted on GeoRec2 in isolation) corresponded to an estimated tc of 16.3 ns for all data sets. This value is a bit larger than would be expected for a compact globular protein of 25 kDa, though our X-ray structure of GeoRec2 shows a somewhat elongated domain.

      (13) Line 221 - referring to two different figures at the end of the sentence is confusing, maybe place the figure references immediately after the referral in the sentence.

      We have resolved due to reshuffling of the Figures.

      (14) Line 234 - Fig 4E is mentioned before fig 4D, in fact Fig 4D is not mentioned in the text.

      We have reordered and edited many of the Figures, this is now resolved.

      (15) Line 243 - what is the saturating concentration to which the authors are referring?

      We have amended the Results section to more clearly discuss the effect of gRNA on the GeoRec and (now) GeoRec2 domains. We meant 3-fold excess gRNA-to-protein by “saturating” in the prior version. At that point, CSPs held stable and the degree of line broadening at certain sites had completely obscured the resonance from view.

      (16) Fig 4E caption - mentions error of 1.34 while the figure is labeled 1.1 for the R332A GeoRec mutant.

      This has been resolved due to additional MST trails as well as the editing and reordering of many Figures.

      (17) Line 253 - the authors are discussing regions of allosteric hotspots, how do the motions of these predicted hotspots compare with the relaxation dispersion data? There seems to be some overlap.

      The Reviewer makes a keen observation. Yes, there is overlap in these data. For example, hotspot residue R269 is bracketed by L268 and L270 with relaxation dispersion. Also, hotspot L279 surrounded by C275, A276, R277, and D281 with dispersion in both variants. Further, D403 and E408 reside in a stretch of ms timescale flexibility comprised of N404, L406, N412, and L413. We have yet to fully understand the functional significance of this overlap, but have added a note in Line 298 to draw the reader’s attention to it.

      Reviewer #3 (Recommendations For The Authors):

      Although the scope of the manuscript is rather limited due to the minor effects observed for the selected mutations, it is clear that a lot of work was done in spearheading the investigation of dynamic modes in GeoCas9 Rec2. In my view, the data will still be of relevance and interest to the general structural and chemical biology communities.

      However, there are a few technical shortcomings that need to be addressed and some statements that are poorly supported by data, necessitating either more experimental proofs or rephrasing of the conclusions.

      Major points:

      X-ray structure - No PDB ID, structural statistics, or validation report is given for the structure, so it is impossible to judge of the quality. Please provide these. Furthermore, it would be commendable to determine the structure of the point mutant Rec2 domains, this would greatly strengthen the claim that mutations affect only dynamics and do not change structure.

      We apologize for this oversight. We absolutely had these data at the time of submission but must have forgotten to upload them. The validation report is now attached.

      Regarding the mutant structures, the Reviewer’s point is well taken. In the absence of these structures, we have adjusted the language to include the possibility of structural change. We have also included new MD simulations (new Figure 6 and associated text) that provide comment on possible structural and dynamic changes due to mutation. We note that NMR spectral changes are quite modest, beyond the site of mutation. Further, the new binding data with full-length GeoCas9 (new Figure 7) shows very little change in gRNA affinity with mutations, implying that a profound structural rearrangement does not take place.

      Translating isolated Rec2 findings to FL GeoCas9 - This is an important point and I do appreciate that the authors discuss this. I agree that working on FL samples for NMR would not be feasible, but I am not convinced by the statement that "GeoRec2 in isolation represents the structure of the subdomain within full-length GeoCas9 very well". The chemical shift perturbations observed between isolated Rec2 and FL Cas9 are relatively sizable. This should be discussed in further detail. Figure 1B should showcase peaks having the highest perturbations. Are they located at termini or interaction interfaces?

      We have provided the combined <sup>1</sup>H-<sup>15</sup>N combined CSPs for each construct, relative to the full-length GeoRec domain, Author response image 1. In most cases, the largest CSPs occur at resonances on the periphery of the spectra, retaining the ability to unambiguously assign it. The largest CSPs do appear to exist at the termini.

      The Rec1 and Rec2 subdomains are connected by a short, but flexible unstructured linker in full-length GeoRec. Thus, the two subdomains do not form a particularly tight non-covalent interface and behave somewhat independently (see Figure S4, for example).

      Regarding the statement of “GeoRec2 in isolation...,” we apologize for this confusion.

      We were referring to our solved crystal structure in relation to the AlphaFold model. With the new cryo-EM structure of GeoCas9 having been recently published, our X-ray structure of GeoRec2 is still in excellent agreement, but we have clarified our intent on Line 111.

      Dynamics and effect of mutations - K267E is more destabilizing and leads to more spread chemical shift perturbations throughout Rec2 and to faster-correlated dynamics but not in significantly lower affinity or cleavage. How do the authors explain this?

      The Reviewer raises an interesting question. Regarding the impact of the K267E mutation, new MD simulations also suggest K267E to be quite disruptive of the GeoCas9 structure and dynamics, modulating contacts with the nucleic acids. However, further MD analysis of the recently published (bona fide high specificity) iGeoCas9 variant shows that K267E only imparts a portion of the effect of iGeoCas9, suggesting that even further modulation of GeoRec would be require for substantial functional impact. In addition, new MST binding studies with full-length variants and gRNAs show K267E does not dramatically alter gRNA binding, suggesting that the lack of functional impact, despite biophysical change, is suppressed by the surrounding GeoCas9 domains. We comment on this in the Discussion.

      Moreover, the time regime for the fit of the CPMG curves is surprisingly slow given the profiles, how were the minor state populations? Were the dynamics really correlated? Please provide numbers (also see minor points below). In that regime CEST experiments should work, was that done?

      The minor state populations were very low in the analysis, <1%.

      To examine the correlated dynamics, we compared the global fits to those of the individual fits for each residue and found them to be better for the global fit, based on the Akaike Information Criterion. For WT, the AIC showed the global fit to be ~10-fold better. For K267E, the global model was 4-fold better, and for R332A, the global model was 6-fold better. We have added language clarifying the use of AIC to the Methods section.

      We have done CEST experiments on _Geo_HNH (we did not see overly clear evidence for a minor state), but we did not perform these experiments on GeoRec. However, we strongly agree that a detailed follow-up study focusing on CEST and new GeoRec variants should investigate this further.

      Since the binding effects with gRNAs differ in the isolated domain and the full-length protein, we have tried not to over-analyze the impact of the relaxation data in this specific context. These data still provide useful information regarding the impact of point mutants on GeoCas9 domain biophysics, and MD simulations support the enhanced dynamics seen in CPMG and other relaxation data. However, the functional implication is clearly more complicated and requires further study.

      Mutations affect gRNA affinity - I am not convinced that affinity itself is significantly affected based on the MST data. This data could be reproduced as technical replicates to reduce the error bars, or another technique with less intrinsic noise (ITC, SPR) could be used to better support this claim. However, a 3-fold difference seen from NMR titrations could indicate a change in binding mode, for instance in koff. It would be interesting to obtain SPR or BLI data quantifying the kinetics of the interactions. Anyhow, this point should be more carefully discussed.

      We agree with the Reviewer on this point. We conducted additional replicates of MST trials, as well as new MST with a different gRNA sequence. Our updated analysis, including statistics, provides a better measure for “significance” in these data, which is now reported. We have also added some text discussing a possible change in binding mode, see Lines 256-259.

      We also carried out MST on full-length GeoCas9 with full-length gRNAs (the same two RNAs used as truncated constructs). We report these data in new Figure 7 and note there is essentially no difference between the gRNAs or the GeoCas9 variants under these conditions.

      Further, MD simulations suggest a change in binding energy associated with the gRNA interaction in the context of full-length GeoCas9. Since experimental studies are not able to parse these differences, collectively, we describe a scenario where the highly stable structure of GeoCas9 resists substantial mutation-induced change seen for analogous perturbations in SpCas9. See Lines 309-342, 414-418, and 448-461.

      Minor points:

      • Please detail how the error on R1 and R2 rates was calculated.

      We have included new text in Lines 514-518.

      • Please detail how hetNOE values were calculated (simply Isat/Iref?) and what values were used for Model Free.

      Yes, the Reviewer is correct. We have added specifically that we used Isat/Iref on Line 518.

      • Please elaborate on the Model Free analysis. What tensor was used for tumbling? What was the correlation time? This is needed to judge the trustworthiness of S2 parameters.

      We have included new text on Lines 520-526. The diffusion tensor used was an ellipsoid and the correlation time was 15.4 ns. The correlation time estimated from R2/R1 ratios was 16.3 ns.

      • Figure 1: Please indicate where Rec1 and Rec2 are located on panel A and indicate the residue assignments for each peak showcased in panel B.

      We have indicated the boundary of Rec1 and Rec2 in the new cartoon of Figure 1A. We have also noted the exact amino acids used for each construct in the Methods. We also added resonance labels to the spectral overlays in Figure 1B. We have done the same

      • Line 187: I believe this should refer to Figure S8C rather than Figure 3A.

      We have made this change.

      • Some fits of the CPMG curves look strange, e.g. R343 in Fig. 3B WT definitely does not contain significant us-ms dynamics and should be excluded from the analysis. Please double-check each profile. Were other models besides CR72 not providing better fits?

      The Reviewer has made a very careful observation. Our intent was to highlight these sites on purpose to show differences in CPMG relaxation dispersion between WT and variant samples. This was provided as some evidence for the redistribution of dynamics between samples, as many different sites found to be “rigid” on the ms timescale in WT GeoRec2 were flexible in GeoRec2 variants. We agree, however, that this Figure panel was confusing and have therefore removed it in favor of simple discussion in the text.

      • To what degree are the CPMG dynamics correlated, can you provide statistical measures for the global fits?

      We compared the global fits to those of the individual fits for each residue and found them to be better for the global fit, based on the Akaike Information Criterion. For WT, the AIC showed the global fit to be ~10-fold better. For K267E, the global model was 4fold better, and for R332A, the global model was 6-fold better.

      We have added language clarifying the use of AIC to the Methods section.

      • Error measured from replicates and p-values should be reported for DNA cleavage assays.

      We thank the Reviewer for pointing out this omission. We have included error bars on these plots.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This work provides a new Python toolkit for combining generative modeling of neural dynamics and inversion methods to infer likely model parameters that explain empirical neuroimaging data. The authors provided tests to show the toolkit's broad applicability and accuracy; hence, it will be very useful for people interested in using computational approaches to better understand the brain.

      Strengths:

      The work's primary strength is the tool's integrative nature, which seamlessly combines forward modelling with backward inference. This is important as available tools in the literature can only do one and not the other, which limits their accessibility to neuroscientists with limited computational expertise. Another strength of the paper is the demonstration of how the tool can be applied to a broad range of computational models popularly used in the field to interrogate diverse neuroimaging data, ensuring that the methodology is not optimal to only one model. Moreover, through extensive in-silico testing, the work provided evidence that the tool can accurately infer ground-truth parameters, which is important to ensure results from future hypothesis testing are meaningful.

      We are happy to hear the positive feedback on our effort to provide an open-source and widely accessible tool for both fast forward simulations and flexible model inversion, applicable across popular models of large-scale brain dynamics.

      Weaknesses:

      Although the tool itself is the main strength of the work, the paper lacked a thorough analysis of issues concerning robustness and benchmarking relative to existing tools.

      The first issue is the robustness to the choice of features to be included in the objective function. This choice significantly affects the training and changes the results, as the authors even acknowledged themselves multiple times (e.g., Page 17 last sentence of first paragraph or Page 19 first sentence of second paragraph). This brings the question of whether the accurate results found in the various demonstrations are due to the biased selection of features (possibly from priors on what worked in previous works). The robustness of the neural estimator and the inference method to noise was also not demonstrated. This is important as most neuroimaging measurements are inherently noisy to various degrees.

      The second issue is on benchmarking. Because the tool developed is, in principle, only a combination of existing tools specific to modeling or Bayesian inference, the work failed to provide a more compelling demonstration of its added value. This could have been demonstrated through appropriate benchmarking relative to existing methodologies, specifically in terms of accuracy and computational efficiency.

      We fully agree with the reviewer that the VBI estimation heavily depends on the choice of data features, and this is the core of the inference procedure, not its weakness. We have demonstrated different scenarios showing how the informativeness of features (commonly used in the literature) results in varying uncertainty quantification. For instance, using summary statistics of functional connectivity (FC) and functional connectivity dynamics (FCD) matrices to estimate global coupling parameter leads to fast convergence; however, it is not sufficient to accurately estimate the whole-brain heterogeneous excitability parameter, which requires features such as statistical moments of time series. VBI provides a taxonomy of data features that users can employ to test their hypotheses. It is important to note that one major advantage of VBI is its ability to make estimation using a battery of data features, rather than relying on a limited set (such as only FC or FCD) as is often the case in the literature. In the revised version, we will elaborate further by presenting additional scenarios to demonstrate the robustness of the estimation. We will also evaluate the robustness of the neural density estimators to (dynamical/additive) noise.

      More importantly, relative to benchmarking, we would like to draw attention to a key point regarding existing tools and methods. The literature often uses optimization for fitting whole-brain network models, and its limitations for reliable causal hypothesis testing have been pointed out in the Introduction/Discussion. As also noted by the reviewer under strengths, and to the best of our knowledge, there are no existing tools other than VBI that can scale and generalize to operate across whole-brain models for Bayesian model inversion. Previously, we developed Hamiltonian Monte Carlo (HMC) sampling for Epileptor model in epilepsy (Hashemi et al., 2020, Jha et al., 2022). This phenomenological model is very well-behaved in terms of numerical integration, gradient calculation, and dynamical system properties (Jirsa et al., 2014). However, this does not directly generalize to other models, particularly the Montbrió model for resting-state, which exhibits bistability with noise driving transitions between states. As shown in Baldy et al., 2024, even at the level of a single neural mass model (i.e., one brain region), gradient-based HMC failed to capture such switching behaviour, particularly when only one state variable (membrane potential) was observed while the other (firing rate) was missing. Our attempts to use other methods (e.g., the second-derivative-based Laplace approximation used in Dynamic Causal Modeling) also failed, due to divergence in gradient calculation. Nevertheless, reparameterization techniques (Baldy et al., 2024) and hybrid algorithms (Gabrié et al., 2022) could offer improvements, although this remains an open problem for these classes of computational models.

      In sum, for oscillatory systems, it has been shown previously that SBI approach used in VBI substantially outperforms both gradient-based and gradient-free alternative methods (Gonçalves et al., 2020, Hashemi et al., 2023, Baldy et al., 2024). Importantly, for bistable systems with switching dynamics, gradient-based methods fail to converge, while gradient-free methods do not scale to the whole-brain level (Hashemi et al., 2020). Hence, the generalizability of VBI relies on the fact that neither the model nor the data features need to be differentiable. We will clarify this point in the revised version. Moreover, we will provide better explanations for some terms mentioned by the reviewer in Recommendations.

      Hashemi, M., Vattikonda, A. N., Sip, V., Guye, M., Bartolomei, F., Woodman, M. M., & Jirsa, V. K. (2020). The Bayesian Virtual Epileptic Patient: A probabilistic framework designed to infer the spatial map of epileptogenicity in a personalized large-scale brain model of epilepsy spread. NeuroImage, 217, 116839.

      Jha, J., Hashemi, M., Vattikonda, A. N., Wang, H., & Jirsa, V. (2022). Fully Bayesian estimation of virtual brain parameters with self-tuning Hamiltonian Monte Carlo. Machine Learning: Science and Technology, 3(3), 035016.

      Jirsa, V. K., Stacey, W. C., Quilichini, P. P., Ivanov, A. I., & Bernard, C. (2014). On the nature of seizure dynamics. Brain, 137(8), 2210-2230.

      Baldy, N., Breyton, M., Woodman, M. M., Jirsa, V. K., & Hashemi, M. (2024). Inference on the macroscopic dynamics of spiking neurons. Neural Computation, 36(10), 2030-2072.

      Baldy, N., Woodman, M., Jirsa, V., & Hashemi, M. (2024). Dynamic Causal Modeling in Probabilistic Programming Languages. bioRxiv, 2024-11.

      Gabrié, M., Rotskoff, G. M., & Vanden-Eijnden, E. (2022). Adaptive Monte Carlo augmented with normalizing flows. Proceedings of the National Academy of Sciences, 119(10), e2109420119.

      Gonçalves, P. J., Lueckmann, J. M., Deistler, M., Nonnenmacher, M., Öcal, K., Bassetto, G., ... & Macke, J. H. (2020). Training deep neural density estimators to identify mechanistic models of neural dynamics. elife, 9, e56261.

      Hashemi, M., Vattikonda, A. N., Jha, J., Sip, V., Woodman, M. M., Bartolomei, F., & Jirsa, V. K. (2023). Amortized Bayesian inference on generative dynamical network models of epilepsy using deep neural density estimators. Neural Networks, 163, 178-194.

      Reviewer #2 (Public review):

      Summary:

      Whole-brain network modeling is a common type of dynamical systems-based method to create individualized models of brain activity incorporating subject-specific structural connectome inferred from diffusion imaging data. This type of model has often been used to infer biophysical parameters of the individual brain that cannot be directly measured using neuroimaging but may be relevant to specific cognitive functions or diseases. Here, Ziaeemehr et al introduce a new toolkit, named "Virtual Brain Inference" (VBI), offering a new computational approach for estimating these parameters using Bayesian inference powered by artificial neural networks. The basic idea is to use simulated data, given known parameters, to train artificial neural networks to solve the inverse problem, namely, to infer the posterior distribution over the parameter space given data-derived features. The authors have demonstrated the utility of the toolkit using simulated data from several commonly used whole-brain network models in case studies.

      Strengths:

      (1) Model inversion is an important problem in whole-brain network modeling. The toolkit presents a significant methodological step up from common practices, with the potential to broadly impact how the community infers model parameters.

      (2) Notably, the method allows the estimation of the posterior distribution of parameters instead of a point estimation, which provides information about the uncertainty of the estimation, which is generally lacking in existing methods.

      (3) The case studies were able to demonstrate the detection of degeneracy in the parameters, which is important. Degeneracy is quite common in this type of model. If not handled mindfully, they may lead to spurious or stable parameter estimation. Thus, the toolkit can potentially be used to improve feature selection or to simply indicate the uncertainty.

      (4) In principle, the posterior distribution can be directly computed given new data without doing any additional simulation, which could improve the efficiency of parameter inference on the artificial neural network if well-trained.

      We thank the reviewer for the careful consideration of important aspects of the VBI tool, such as uncertainty quantification, degeneracy detection, parallelization, and amortization strategy.

      Weaknesses:

      (1) While the posterior estimator was trained with a large quantity of simulated data, the testing/validation is only demonstrated with a single case study (one point in parameter space) per model. This is not sufficient to demonstrate the method's accuracy and reliability, but only its feasibility. Demonstrating the accuracy and reliability of the posterior estimation in large test sets would inspire more confidence.

      (2) The authors have only demonstrated validation of the method using simulated data, but not features derived from actual EEG/MEG or fMRI data. So, it is unclear if the posterior estimator, when applied to real data, would produce results as sensible as using simulated data. Human data can often look quite different from the simulated data, which may be considered out of distribution. Thus, the authors should consider using simulated test data with out-of-distribution parameters to validate the method and using real human data to demonstrate, e.g., the reliability of the method across sessions.

      (3) The z-scores used to measure prediction error are generally between 1-3, which seems quite large to me. It would give readers a better sense of the utility of the method if comparisons to simpler methods, such as k-nearest neighbor methods, are provided in terms of accuracy.

      (4) A lot of simulations are required to train the posterior estimator, which seems much more than existing approaches. Inferring from Figure S1, at the required order of magnitudes of the number of simulations, the simulation time could range from days to years, depending on the hardware. Although once the estimator is well-trained, the parameter inverse given new data will be very fast, it is not clear to me how often such use cases would be encountered. Because the estimator is trained based on an individual connectome, it can only be used to do parameter inversion for the same subject. Typically, we only have one session of resting state data from each participant, while longitudinal resting state data where we can assume the structural connectome remains constant, is rare. Thus, the cost-efficiency and practical utility of training such a posterior estimator remains unclear.

      We agree with the reviewer that it is necessary to show results on larger synthetic test sets, and we will elaborate further by presenting additional scenarios to demonstrate the robustness of the estimation. However, there are some points raised by the reviewer that we need to clarify.

      The validation on empirical data was beyond the scope of this study, as it relates to model validation rather than the inversion algorithms. This is also because we aimed to avoid repetition, given that we have previously demonstrated model validation on empirical data using theses techniques, for invasive sEEG (Hashemi et al., 2023), MEG (Sorrentino et al., 2024), EEG (Angiolelli et al., 2025) and fMRI (Lavanga et al., 2024, Rabuffo et al., 2025). Note that if the features of the observed data are not included during training, VBI ignores them, as it requires an invertible mapping function between parameters and data features.

      We have used z-scores and posterior shrinkage to measure prediction performance, as these are Bayesian metrics that take into account the variance of both prior and posterior rather than only the mean value or thresholding for ranking of the prediction used in k-NN or confusion matrix methods. This helps avoid biased accuracy estimation, for instance, if the mean posterior is close to the true value but there is no posterior shrinkage. Although shrinkage is bounded between 0 and 1, we agree that z-scores have no upper bound for such diagnostics.

      Finally, the number of required simulations depends on the dimensionality of the parameter space and the informativeness of the data features. For instance, estimating a single global scaling parameter requires around 100 simulations, whereas estimating whole-brain heterogeneous parameters requires substantially more simulations. Nevertheless, we have provided fast simulations, and one key advantage of VBI is that simulations can be run in parallel (unlike MCMC sampling, which is more limited in this regard). Hence, with commonly accessible CPUs/GPUs, the fast simulations and parallelization capabilities of the VBI tool allow us to run on the order of 1 million simulations within 2–3 days on desktops, or in less than half a day on supercomputers at cohort level, rather than over several years! It has been previously shown that the SBI method used in VBI provides an order-of-magnitude faster inversion than HMC for whole-brain epilepsy spread (Hashemi et al., 2023). Moreover, after training, the amortized strategy is critical for enabling hypothesis testing within seconds to minutes. We agree that longitudinal resting-state data under the assumption of a constant structural connectome is rare; however, this strategy is essential in brain diseases such as epilepsy, where experimental hypothesis testing is prohibitive.

      We will clarify these points and better explain some terms mentioned by the reviewer in the revised manuscript.

      Hashemi, M., Vattikonda, A. N., Jha, J., Sip, V., Woodman, M. M., Bartolomei, F., & Jirsa, V. K. (2023). Amortized Bayesian inference on generative dynamical network models of epilepsy using deep neural density estimators. Neural Networks, 163, 178-194.

      Sorrentino, P., Pathak, A., Ziaeemehr, A., Lopez, E. T., Cipriano, L., Romano, A., ... & Hashemi, M. (2024). The virtual multiple sclerosis patient. IScience, 27(7).

      Angiolelli, M., Depannemaecker, D., Agouram, H., Regis, J., Carron, R., Woodman, M., ... & Sorrentino, P. (2025). The virtual parkinsonian patient. npj Systems Biology and Applications, 11(1), 40.

      Lavanga, M., Stumme, J., Yalcinkaya, B. H., Fousek, J., Jockwitz, C., Sheheitli, H., ... & Jirsa, V. (2023). The virtual aging brain: Causal inference supports interhemispheric dedifferentiation in healthy aging. NeuroImage, 283, 120403.

      Rabuffo, G., Lokossou, H. A., Li, Z., Ziaee-Mehr, A., Hashemi, M., Quilichini, P. P., ... & Bernard, C. (2025). Mapping global brain reconfigurations following local targeted manipulations. Proceedings of the National Academy of Sciences, 122(16), e2405706122.

      Recommendations for the authors:

      We appreciate the time and effort of the reviewers, and their insightful and constructive comments to improve the paper. We have now addressed the reviewers’ comments in our revised manuscript and provide here below detailed explanations of the changes.

      We have adapted the Wilson-Cowan model to follow the same brain network modeling notation as the other models (Fig. 3 in the main text and Figs. S2–S4 in the supplementary materials). Additionally, we have included multiple figures in the supplementary material presenting extensive in-silico testing to demonstrate the accuracy and reliability of the estimations across different configurations, as well as the sensitivity to both additive and dynamical noise.

      Reviewer #1 (Recommendations for the authors):

      (1) There were some inaccurate statements throughout the text that need to be corrected.

      a) In section 2.1, paragraph 1, the authors mentioned that they would describe network models corresponding to different types of neuroimaging recordings. This is inaccurate. The models were developed to approximate various aspects of the architecture of neural circuits. They were not developed per se to solely describe a specific neuroimaging modality.

      Thank you for pointing this out. We agree that our phrasing in Section 2.1, paragraph 1, was not clear that the network models were developed to generate neural activity at the source level, and that a projection needs to be established to transform the simulated neural activity into empirically measurable quantities, such as BOLD fMRI, EEG, or MEG. We have revised the wording in the revised manuscript to clarify this point accordingly.

      b) The use of the term "spatio-temporal data features" is misleading as there are no true spatial features extracted.

      We have clarified that:Following Hashemi et al., 2024, we use the term spatio-temporal data features to refer to both statistical and temporal features derived from time series. In contrast, we refer to the connectivity features extracted from FC/FCD matrices as functional data features. We would like to retain this term, as it is used consistently in the code.

      (2) The authors need to improve the model descriptions in Equations (1)-(10). Several variables/parameters were not explained, limiting the accessibility of the work to those without prior experience in computational modeling.

      Thank you for pointing this out. In the revised manuscript, we have improved the model descriptions, all variables and parameters used in these equations.

      (3) Various things need further clarification and/or explanation:

      a) There is a need to highlight that the models section only provides examples of one of the many possible variants of the models. For example, the Wilson-Cowan model described is not your typical and more popular cortico-cortical-based Wilson-Cowan model. This is important to ensure that the work reflects an accurate account of the literature, avoiding future references that the models presented are THE models.

      This is a very important point. We have now highlighted that each model represents one of many possible variants. Moreover, we adapted the Wilson-Cowan model as a whole-brain network modeling approach to harmonize with all other models.

      b) In Figure 1, it is unclear where the empirical data come into play. The neural density estimator also sounds like a black box and needs further explanation (e.g., its architecture).

      Thank you for the careful reading. This is correct. We have now clarified where the empirical data enters as input to the neural density estimator and have added further explanation in section 2.2.

      c) There is also a need to better explain what shrinkage means and what the z-score vs shrinkage implies.

      We have elaborated on the definition of posterior z-score and shrinkage.

      d) It is unclear how the authors decided on the number of training samples to use.

      There is no specific rule for determining the optimal number of simulations required for training. In general, the larger number of simulations, within the available computational budget, the better the posterior estimation is likely to be. In the case of synthetic data, we have monitored the z-score and posterior shrinkage to assess the quality and reliability of the inferred parameters.  This also critically depends on the parameter dimensionality. For instance, in estimating only global coupling parameter, a maximum of 300 simulations was used, demonstrating accurate estimation across models and different realizations (Fig S20), except for the Jansen-Rit model, where coupling did not induce a significant change in the intrinsic frequency of regional activity. We have now pointed this out in the discussion.

      e) In the Results section, paragraph 1, there is a need to clarify that "ground truth" is available because you simulate data using predefined parameters. In fact, these predefined parameters and how they were chosen to generate the observed data were never described in the text.

      The "ground truth" is often chosen randomly within biologically plausible ranges, typically with some level of heterogeneity, and this has now been highlighted.

      f) Can the authors comment on why the median of the posterior distributions (e.g., in Figure 4E) is actually far off from the ground truth parameters? This is probably understandable in the Jansen-Ritt model due to complexity, but not obvious in the very low-dimensional Stuart-Landau oscillator model.

      This can happen due to non-identifiability in high-dimensional settings. Figure 4E represents the posterior estimation using Jansen-Rit model with high-dimensional parameters. An accurate estimation close to the true values can be observed in the low-dimensional Stuart-Landau model, as shown in Figure 5.

      g) In Figure 7, the FC and FCD matrices look weird relative to those typically seen in other works.

      We have updated Figure 7. To do the our best, we have followed the code and the parameters from the following paper Kong et al., Nat Commun 12, 6373 (2021), and the following repo https://github.com/ThomasYeoLab/CBIG/blob/master/stable_projects/fMRI_dynamics/Kong2021_pMFM/examples/scripts/CBIG_pMFM_parameter_estimation_example.py

      We considered 300 iterations for optimizing the parameters, using CMA-ES method, and with window length of 60 sec, and TR=0.72 sec, yielding a 1118 × 1118 FCD matrix for each run. Nevertheless, some discrepancy can happen with the shown FC/FCD, due to convergence of the optimization process and other model parameters.

      h) In Figure 8, results for the J parameter are missing. Also, the BOLD signal time series of some regions in Figure 8B looks very weird, with some having very large deflections.

      We have updated Figure 8. In this figure, the parameter J is not inferred; it is instead presented in the appendix (S18). Please note that the system is in a bistable regime. We have implemented the full Wong-Wang model (Deco, 2014, Journal of Neuroscience), by optimized external current and global coupling (using CMA-ES optimization) to maximize the fluidity of FCD, as those typically seen in other works:

      Author response image 1.

      i) On page 14, the authors mentioned that they perform a PCA on the FC/FCD matrices. Can the authors explain this step further and what it specifically gives out, as this is something unusual in the generative model fitting literature?

      Indeed, PCA is a widely used dimension reduction method in machine learning. Please note that in SBI, any dimensionality reduction technique, such as PCA, can be used, as long as it preserves information relevant to the target parameters.

      j) On page 3, what does ABC in ABC methods stand for?

      ABC stands for Approximate Bayesian Computation, which is now spelled out in the text.

      Reviewer #2 (Recommendations for the authors):

      Overall, I found the paper well-written. These are basically just minor comments:

      We appreciate your positive feedback.

      (1) P3:

      - Amortization requires more explanation for the neuroscience audience.

      - What does ABC stand for?

      We have elaborated on Amortization. ABC stands for Approximate Bayesian Computation, which is now spelled out in the text.

      (2) Section 2.1:

      Should clarify the parcellation used

      In section 2.1, we now mentioned that: “The structural connectome was built with TVB-specific reconstruction pipeline using generally available neuroimaging software (Schirner et al., Neuroimage 2015)”.

      (3) P20: The method for sensitivity analysis (Figure 5F) is not clearly described.

      We have now added a subsection in the Methods section to explain the sensitivity analysis.

      (4) P21: statement that 10k simulations took less than 1 min doesn't match info shown in Figure S1. Please clarify.

      This is correct, as for the Epileptor model, the total integration time is less than 100 ms. Due to the model’s stable behavior with a large time step and the use of 10 CPU cores, all simulations were completed in less than a minute. Previously (Hashemi et al., 2023) it has been reported that each VEP run to simulate 100sec of whole-brain epileptic patterns takes only 0.003 s using a JIT compiler. The other models require more computational cost due to longer integration durations and smaller time steps. We have clarified this point.

      (5) P23-24: the distribution of FCDs also doesn't match well even if we don't consider element-wise correspondence. Please clarify.

      This is correct, as we used summary statistics of the FCD, such as fluidity, and due to noise, each realization of the FCD matrix exhibits different element-wise correspondence. We have already mentioned this point.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Liang et al. have conducted a small-scale pilot study focusing on the feasibility and tolerability of Low-dose chemotherapy combined with delayed immunotherapy in the neoadjuvant treatment of non-small cell lung cancer. The design of delayed immunotherapy after chemotherapy is relatively novel, while the reduced chemotherapy, although somewhat lacking in innovation, still serves as an early clue for exploring future feasible strategies. Also, the dynamic ctDNA and TCR profiles could give some important hints of intrinsic tumor reaction.

      However, as the author mentioned in the limitation part, due to the small sample size and lack of a control group, we cannot fully understand the advantages and disadvantages of this approach compared to standard treatment. Compared to standard immunotherapy, the treatment group in this study has three differences: (1) reduced chemotherapy, (2) the use of cisplatin instead of the commonly used carboplatin in neoadjuvant therapy trials, and (3) delayed immunotherapy. Generally, in the exploration of updated treatment strategies, the design should follow the principle of "controlling variables." If there are too many differences at once, it becomes difficult to determine which variable is responsible for the effects, leading to confusion in the interpretation of the results. Moreover, the therapeutic strategy may lack practical clinical operability due to the long treatment duration.

      Thank you for your advice. As you pointed out, incorporating too many variables can obscure research findings. Our study focuses on two primary objectives: (1) to demonstrate that our approach is less toxic than the standard regimen; and (2) to fully activate the immune system in order to achieve better therapeutic outcomes. Based on these two objectives, we reduced chemotherapy dosage to alleviate toxicity, and perform delayed immunotherapy administration to alleviate the killing of activated immune cells by chemotherapy so as to maximize the immune response. Therefore, the two variables of reduced chemotherapy and delayed immunotherapy are unified in this study. The reduction of cisplatin to 60mg/m2 is supported by data for Chinese people; A retrospective study conducted by our center found that delayed immunotherapy also has great therapeutic effects. Considering the previous blood toxicity of carboplatin and albumin paclitaxel, we replaced carboplatin with cisplatin to alleviate bone marrow suppression. Usually, our patients are hospitalized for 4-7 days to receive treatment, observe and manage potential side effects, including nausea, vomiting, diarrhea, bone marrow suppression and so on. Therefore, it is convenient and feasible for immunotherapy administration on the 5th day.

      Furthermore, in the exploration of biomarkers, the authors emphasized the procedure of whole RNA sequencing in tumor tissues in the method section, and this was also noted in the flowchart in Figure 1. However, I didn't find any mention of RNA-related analyses in the Results section, which raises some concerns about the quality of this paper for me. If the authors have inadvertently omitted some results, they should supplement the RNA-related analyses so that I can re-evaluate the paper.

      Thanks for your comment. In this study, we employed a multi-omics approach involving whole transcriptome, ctDNA, and TCR sequencing to investigate the effects of a neoadjuvant treatment on NSCLC. The sequencing details are described in the Materials and Methods section. RNA-related analyses are presented in Figure S3. Given that our primary focus is on the impact of this modified treatment on immune cells, we estimate immune cell compositions by using the xCell and immunCellAI algorithms based on the RNA sequencing results. The estimated immune cell profiles have been added to Supplementary Tables 5 and 6.

      To sum up, this article exhibited a certain degree of innovation to some extent, However, due to its intrinsic design defects and data omissions, the quality of the research warranted further improvement.

      Thanks for your comment. We have provided a more detailed explanation of the administration for all patients. Additionally, we have clarified and supplemented the sequencing results to enhance the clarity and overall quality of the article.

      Reviewer #2 (Public review):

      Summary:

      In this single center, single arm, open label non-randomised study the authors tested the use of paclitaxel at 180-220 mg/m2 and cisplatin at 60mg/m2 in patients with squamous NSCLC and pemetrexed at 500mg/m2 and cisplatin at 60mg/m2 in adenocarcinoma of lung origin in the neoadjuvant setting. The chemotherapy appears to have been given at a relatively standard dose; though the platin dose at 60mg/m2 is somewhat lower than has been used in the checkmate 816 trial (75mg/m2/dose), this is a well-established dose for NSCLC.

      Key differences to currently approved neoadjuvant chemo-ICI treatment is that anti-PD1 antibody sintilimab (at 200mg/dose) was given on day 5 and that only 2 cycles of chemotherapy were given pre surgery, but then repeated on two occasions post surgery. Between May/2020 and Nov/2023 50 patients were screened, 38 went on to have this schedule of tx, 31 (~82%) went on to have surgery and 27 had the adjuvant treatment. The rate of surgery is entirely consistent with the checkmate 816 data.

      Question to the authors:

      It would be very helpful to understand why 7 (~18% of the population) patients did not make it to surgery and whether this is related to disease progression, toxicity or other reasons for withdrawal.

      Thank you for your comment. No patients were denied surgery due to disease progression or side effects. 7 patients did not undergo surgery: three declined to undergo total pneumonectomy, 2 were unable to come to our hospital for treatment because of the COVID-19 pandemic, and 2 were ineligible for radical surgery due to tumor invasion of the arteries.

      The key clinical endpoints were pCR and mPR rates. 2/38 patients are reported to have achieved a radiological pCR but only 31 patients underwent surgery with histological verification. Supp table2 suggests that 10/31 patients achieved a pCR, 6/31 additional patients achieved a major pathological response and that 13/31 did not achieve a major pathological response.

      It would be really helpful for understanding the clinical outcome to present the histopathological findings in the text in a bit more detail and to refer the outcome to the radiological findings. I note that the reference for pathological responses incorrectly is 38 patients as only 31 patients underwent surgery and were evaluated histologically.

      Thanks for your comment. The ITT population consisted of 38 individuals, of whom 31 underwent surgery. After surgery, 18 patients achieved MPR, including 12 achieved pCR and 13 achieved non-MPR. So for ITT population, the rate of pCR and MPR is 12/38 (31.6%) and 18/38 (47.4%) respectively; for patients who have completed surgery, both pCR and MPR have improved, accounting for 12/31 (38.7%) and 18/31 (58.1%) respectively (Results, line 268 to 269).

      Author response image 1.

      The treatment was very well tolerated with only 1 grade 3 AE reported. The longer term outcome will need to be assessed over time as the cohort is very 'young'. It is not clear what the adjuvant chemo-ICI treatment would add and how this extra treatment would be evaluated for benefit - if all the benefit is in the neoadjuvant treatment then the extra post-operative tx would only add toxicity.

      Please consider what the two post-operative chemo-ICI cycles might add to the outcome and how the value of these cycles would be assessed. Would there be a case for a randomised assessment in the patients who have NOT achieved a mPR histologically?

      Thanks for your comment. The purpose of postoperative adjuvant therapy is to prevent recurrence and metastasis.  Both clinical trial Keynote091 and Impower010 have achieved positive test results. The clinical trial design of Checkmate-77T is neoadjuvant therapy followed by surgery and adjuvant therapy. Checkmate-77T resulted in significantly longer event-free survival than chemotherapy in patients with resectable NSCLC. So we designed this perioperative treatment method, which is currently a common approach, hoping to reduce tumor burden and improve surgical remission rate through neoadjuvant therapy; and to kill residual tumor cells and prolong the DFS through adjuvant therapy. As for DFS, follow-up shows that there are currently 3 cases of recurrence, but the overall data is not yet mature (updated in Table S1). The side effect includes all patients who received neoadjuvant therapy and adjuvant therapy, and the addition of immunotherapy shows no new safety signals.

      While the clinical dataset identifies that the proposed reduced chemo-ICI therapy has clinical merit and should be assessed in a randomized study, the translational work is less informative.

      Thanks for your comment. As mentioned in the shortcomings of the article, our research is preliminary and exploratory, and more large-scale randomized studies are needed to be invested in the future.

      The authors suggest that the treatment has a positive impact on T lymphocytes. Blood sampling was done at day 0 and day 5 of each of the four cycle of chemotherapy with an additional sample post cycle 4. The authors state that data were analysed at each stage.

      The data in Figure 3B are reported for three sets of pairs: baseline to pre day 5 in cycle 1, day 5 to day 21 in cycle 1, baseline of cycle to to day 5. It remains unclear whether the datasets contain the same top 20 clones and it would be very helpful to show kinetic change for the individual 'top 20 clones' throughout the events in individual patients; as it stands the 'top20 clones' may vary widely from timepoint to timepoint. Of note, the figures do not demonstrate that the top 20 TCR clones were 'continuously increased'.

      Thanks for your comment. The data in Fig. 3B do not represent the overlapping top 20 clones across all samples but rather illustrate the changes in the individual top 20 clones for each patient. The changes in the top 20 TCR clones during neoadjuvant treatment for specific samples are shown in Fig. S1. Due to tumor heterogeneity, both within and between samples, the top 20 clones for each patient at the same time point may differ. Additionally, since the top 20 TCR clones can vary between stages as a result of antigen exposure over time, the top 20 clones for the same patient may also differ across different time points. Indeed, when analyzing the data, we measured the dynamic changes of the top 20 TCR clones across three stages in cycle 1, and describing these changes as "continuously increased" may not be entirely accurate. Therefore, we believe it is more accurate to correct it to a phased increase. (Results line 293).

      Instead, the data suggest that there are fluctuations in the relative distributions over time but that may simply be a reflection of shifts in T cell populations following chemotherapy rather than of immunological effects in the cancer tissue.<br /> Consistent with this the authors conclude (line 304/5): "No significant difference was observed in the diversity, evenness, and clonality of TCR clones across the whole treatment procedure" and this seems to be a more persuasive conclusion than the statement 'that a positive effect on T lymphocytes was observed' - where it is also not clear what 'positive' means.

      Thanks for your comment. The scores for diversity, evenness, and clonality assess changes in the overall TCR repertoire. In our cohort, we did not observe significant changes in these three metrics throughout the treatment process, indicating the overall stability of the TCR repertoire. Despite this overall stability, we observed a significant increase in the top 20 and large clones—representative of major TCR clone dynamics—during the treatment period. Additionally, integrating RNA results (Table S5-S6 and Fig. S3) from baseline and surgical samples, we found an increasing trend in the proportion of T cells following neoadjuvant therapy. Therefore, we suggested that the treatment has a positive effect on T lymphocytes.

      The text needs a more balanced representation of the data: only a small subset of four patients appear to have been evaluated to generate the data for figure 3B and only three patients (P5, P6, P7) can have contributed to figure 3C if the sample collection is represented accurately in Figure 3A.

      Thanks for your comment. In Fig. 3B, we utilized TCR data from six patients (P1, P2, P3, P10, P11, P12) for the period from day 1 to day 5 of cycle 1. For the period from day 5 of cycle 1 to day 1 of cycle 2, we used data from six patients (P1, P2, P5, P10, P11, P12). For the period from day 1 of cycle 2 to day 5 of cycle 2, we included data from five patients (P2, P4, P10, P11, P12). In Fig. 3C, we used TCR data from eight patients (P1, P2, P4, P6, P7, P10, P11, P12) to generate the images for cycle 1, and data from two patients (P6, P7) to create the images for cycle 3. Therefore, the sampling illustration in Fig. 3A is accurate.

      The text refers to flow cytometric results in SF3. However, no information is given on the flow cytometry in M&M, markers or gating strategy.

      Thanks for your comment. In this study, we performed tissue sampling and whole transcriptome sequencing at both the baseline and surgical stages. Based on the sequencing results, we evaluated T cell populations using two algorithms, xCell and immunoCellAI, and detailed the analysis procedures in the Methods and Materials section. Additionally, we have included the assessment results from both algorithms in Supplementary Tables 5 and 6.

      Please consider changing the terminology of the 'phases' into something that is easier to understand. One option would be to use a reference to a more standard unit (cycle 1-4 of chemotherapy and then d0/d5/d21).

      Thanks for your advice. Since each treatment cycle consists of both chemotherapy and immunotherapy, with chemotherapy administered on day 1 and immunotherapy on day 5 of each cycle, blood samples are collected at these two time points. Following your suggestion, we will use the notation d0/d5 within each treatment cycle to better clarify this process for the readers.

      Please make it explicit in the text that molecular analyses were undertaken for some patients only, and how many patients contribute to the data in figures 3B-F. Figure 3A suggests paired mRNA data were obtained in 2 patients (P2 and P5) but I cannot find the results on these analyses; four individual blood samples to assess TCR changes int PH1/PH2/PH3and PH4 were only available in four patients (P4,P5,P7,P9). Only three patients seem to have the right samples collected to allow the analysis for 'C3' in figure 3C.

      Thanks for your comment. In Fig. 3B and 3D, we used TCR data from six patients (P1, P2, P3, P10, P11, P12) for the period from day 0 to day 5 of cycle 1. For the period from day 5 of cycle 1 to day 0 of cycle 2, data from six patients (P1, P2, P5, P10, P11, P12) were used. For the period from day 0 of cycle 2 to day 5 of cycle 2, we included data from five patients (P2, P4, P10, P11, P12). In Fig. 3C and 3E, TCR data from eight patients (P1, P2, P4, P6, P7, P10, P11, P12) were used to generate the images for cycle 1, while data from two patients (P6, P7) were used to create the images for cycle 3. In Fig. 3F, all patients who underwent sequencing are included in the analysis, with each patient's data represented by dots of different colors.

      For the mRNA data, we sampled and sequenced five patients (P1, P2, P4, P5, P7) before treatment. During the surgical phase, we sampled and sequenced three patients (P2, P5, P6). The T cell assessments and comparisons based on the mRNA sequencing results are presented in Fig. S3 and Tables S5-S6.

      Please display for each of the 'top 20 clones' at any one timepoint how these clones evolve throughout the study; I expect that a clone that is 'top 20' at a given timepoint may not be among the 'top twenty' at all timepoints.

      Thanks for your comment. Yes, due to the heterogeneity of tumors, a variety of different antigens are exposed during the course of cancer treatment. As a result, the formation of TCR dominant clones is a dynamic process, with new dominant clones emerging at each stage. Therefore, the top 20 clones at each time point do not necessarily represent the overall top 20 clones across all time points. However, there is still some overlap in the dominant TCR clones. We have chosen to present the data from P2, which provides the most complete results throughout the entire treatment process.

      Author response image 2.

      Please also assess if the expanded clonotypes are present (and expanded) in the cancer tissue at resection, to link the effect in blood to the tumour. Given that tissue was collected for 31 patients, mRNA sequencing to generate TCR data should be possible to add to the blood analyses in the 12 patients in Figure 3A. Without this data no clear link can be made to events in the cancer.

      Thanks for your comment. Due to limitations in sampling conditions, we were unable to collect samples from all patients at every time point. As shown in Fig. 3A, we performed tissue sampling and RNA sequencing on five patients (P1, P2, P4, P5, P7) before treatment. During the surgical phase, we sampled and conducted RNA sequencing on three patients (P2, P5, P6). This study primarily focuses on TCR analysis in peripheral blood. The relationship between peripheral blood TCR and tissue TCR clones will be addressed in future research.

      Please provide in M&M the missing information on the flow cytometry methodology (instrument, antibody clones, gating strategy) and what markers were used to define T cell subsets (naïve, memory, central memory, effector memory).

      Thanks for your comment. In this study, we evaluated immune cells based on RNA sequencing results rather than using flow cytometry. Subsequently, we compared T cell subsets between the baseline and post-neoadjuvant treatment stages. The steps for RNA sequencing and the evaluation of immune cells using the xCell and ImmunoCellAI algorithms are detailed in the Methods and Materials section. The comparison of T cell subsets is presented in Fig. S3. The estimated immune cell data have been added to Tables S5 and S6.

      The authors also describe that ctDNA reduces after chemo-ICI treatment. This is well documented in their data but ultimately irrelevant: if the cancer volume is reduced to the degree of a radiological or pathological response /complete response then the quantity of circulating DNA from the cancer cells must reduce. More interesting would be the question whether early changes predict clinical outcome and whether recurrent ct DNA elevations herald recurrence.

      Thanks for your comment. If the tumor responds to treatment, its volume will decrease. Over the long term, ctDNA levels in the blood are expected to decline. However, in the short term, as tumor cells are killed, there may be a surge of ctDNA released into the patient's bloodstream, potentially causing a rise in the maxVAF. Based on the current follow-up data, the ctDNA maxVAF for patient P8 has increased compared to baseline levels. However, given the relatively short follow-up period, no recurrence has been observed yet.

      Please probe whether the molecular data identify good radiological or pathological outcomes before cycle 2 is started and whether the ctDNA levels identify patients who will have a poor response and/or who relapse early.

      Thanks for your comment. Before initiating Cycle 2 of treatment, we observed all patients whom performed ctDNA sequencing. Among them, Patients P1 to P4 were classified as MPR, whereas Patients P5 to P9 were categorized as non-MPR. It was noted that Patients P7 and P8 showed a trend of increasing maximum variant allele frequency (maxVAF) in their ctDNA. Thus, 50% (2 out of 4) of the MPR patients could be identified as having potential issues through molecular testing before Cycle 2. Additionally, only P3 experienced a recurrence, which was predicted by molecular testing prior to starting cycle 2.

      Author response image 3.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have some detailed comments for the authors:

      (1) Please explain the reason for putting forward the opinion that "cytotoxic drugs with standard doses and anti-PD1 antibody were administrated on the same day (9), which may result in unsatisfactory eradication rates and relatively high incidence of severe treatment-related adverse events (TRAEs)" (Page 3 Line 76), especially "unsatisfactory eradication rates". Is this based on actual evidence, or is it purely theoretical speculation?

      Thanks for your comment. Our team have done relative research to explore impact of the combined timing of PD-1/PD-L1 inhibitors and chemotherapy on the outcomes in patients with refractory lung cancer. Our findings suggest that administering PD-1/PD-L1 inhibitors 1-10 days (especially 3-5 days) after chemotherapy is superior to administering PD-1/PD-L1 inhibitors before or concurrent with chemotherapy in patients with refractory lung cancer, but this result needs to be further explored by prospective studies. So we infer that cytotoxic drugs with standard doses and anti-PD1 antibody were administrated on the same day may lead to unsatisfactory eradication rates and more side-effects.

      Yao W, Zhao X, Gong Y, Zhang M, Zhang L, Wu Q, et al. Impact of the combined timing of PD-1/PD-L1 inhibitors and chemotherapy on the outcomes in patients with refractory lung cancer. ESMO Open. 2021;6(2):100094.

      (2) Due to the lack of a control group, we cannot assess the advantages and disadvantages of this treatment strategy compared to standardized neoadjuvant immuno-chemotherapy. We can refer to historical data. In the current clinical trials on neoadjuvant chemotherapy combined with immunotherapy (CheckMate-816, etc), what is the proportion of patients who had their chemotherapy reduced due to adverse reactions? Is there a difference in their efficacy? This could serve as a good historical reference.

      Thanks for your comment. In checkmate816, the rate of off neoadjuvant treatment in treatment group and control treatment group is 5.7% and 6.8% respectively. No patients have reduced their chemotherapy dosage due to intolerable side effects. However, it’s a excellent suggestion to find a historical refence, so we will check details in other clinical trials.

      (3) Among the 38 patients, there are 21 cases of SCC and 17 cases of LUAD. From the protocol, it can be seen that SCC patients had both albumin-bound paclitaxel and cisplatin reduced, whereas LUAD patients did not have a reduction in pemetrexed, only in cisplatin. Considering the different pathological subtypes and treatment strategies, I suggest the author to present the efficacy data for SCC and LUAD separately rather than combining them together.

      Thanks for your comment. In this cohort of 31 patients who underwent pathological evaluation, the ratio of squamous cell carcinoma (SCC) to lung adenocarcinoma (LUAD) was 16 vs 15. Upon comparing the groups, no statistically significant difference was observed in the treatment efficacy between SCC and LUAD patients.

      Author response table 1.

      (4) In the discussion, the authors mention that during the adjuvant treatment phase, "no significant change was observed in evenness or clonality of TCR" (Page 13, Line 364). However, in Figure 3E, it can be seen that the evenness and clonality of TCR during the adjuvant treatment phase (i.e., C3) are significantly increased (P < 0.05).

      Thanks for your comment. For the TCR repertoire evenness and clonality, we present these metrics in Fig. S2 B-C. Throughout the treatment process of all patients, there were no significant changes in the Pielou index (representing evenness) or clonality. In Fig. 3E, we defined TCR clones with a frequency greater than 0.001 as "large clones" and examined their changes during cycle 1 and cycle 3. Therefore, although there was a significant increase in large clones during cycle 3, the overall TCR evenness and clonality did not show notable changes.

      (5) The authors indicated that low-dose chemotherapy does not inhibit TCR expansion; however, due to the lack of a control group, we cannot conclude that "standard doses would affect TCR expansion." To better explore this possibility, please analyze the differences in TCR expansion between patients with bone marrow suppression and those without.

      We analyzed the incidence of bone marrow suppression in patients who underwent blood TCR testing. The statistical results are shown in the figure below. Patients were grouped based on the presence or absence of bone marrow suppression to compare differences in TCR clonal dynamics between the two groups during neoadjuvant therapy. As shown in the figure below, patients in the non-bone marrow suppression group exhibited higher TCR diversity (SW score) during treatment compared to those in the bone marrow suppression group. During neoadjuvant therapy, the dominant clones in both groups significantly increased from c2d0 to c2d5. However, from c1d0 to c2d0, there was no significant change observed in the non-bone marrow suppression group, possibly due to the limited sample size. Additionally, Patient P11 in the non-bone marrow suppression group showed a downward trend in dominant clones from c1d5 to c2d0, which may have influenced the overall results for this group during this phase.

      Author response table 2.

      Author response image 4.

      (6) In the analysis of ctDNA maxVAF, I noticed that one patient showed a significant drop at T1 (after C1 chemotherapy), followed by a notable rebound at T2 (after C1 delayed immunotherapy), and then a decline again at T3 (after C2 chemotherapy). Theoretically, maxVAF can reflect tumor burden and should change in accordance with treatment response. Could this indicate that the patient has a poor response to the delayed immunotherapy without concurrent chemotherapy? Additionally, please examine this patient's efficacy separately. What is the status of dynamic TCR? Does it show a trend opposite to that of maxVAF?

      Thanks for your comment. For Patient P7, the radiological evaluation reached PR, while the pathological assessment was non-MPR. The naming of time points has been revised according to the requirements: T0, T1, T2, and T3 were changed to c1d0, c1d5, c2d0, and c2d5, respectively. Combining both radiological and pathological evaluations, the patient experienced a certain degree of tumor shrinkage during neoadjuvant therapy but still retained some residual tumor cells. Theoretically, maxVAF can reflect the tumor burden in the bloodstream as a real-time indicator of treatment response. For patients with long-term benefits, maxVAF is expected to decrease as tumors are eliminated. However, in the short term, the release of large amounts of clonal ctDNA from destroyed tumor cells may lead to a temporary increase in maxVAF. Therefore, it is not possible to conclude that this patient had an adverse response to delayed immunotherapy based on individual cases. The increase in maxVAF from c1d5 to c2d0 might result from the extensive release of newly exposed antigens. During this period, the top 20 and large clone TCRs did not show significant changes, suggesting that the patient's immune response was insufficient, leading to suboptimal neoadjuvant treatment efficacy and failure to achieve MPR. Additionally, there were no noticeable changes in maxVAF or TCR metrics from c1d0 to c2d0 for this patient, indicating that there is no evidence to suggest an inverse trend between TCR and maxVAF.

      Author response image 5.

      (7) In line with the previous question, another patient's maxVAF shows a significant rebound at T3. Please examine this patient's efficacy as well as the status of dynamic TCR.

      Thanks for your comment. For Patient P4, the radiographic assessment showed SD, while the pathological assessment indicated a MPR. Although the reduction rate of the tumor volume in this patient was low, the tumor cell content within the lesion was less than 10%, which suggests that this patient had a good response to neoadjuvant therapy. From c1d0 to c2d0, the maxVAF of this patient showed a downward trend, while there was no significant change in the dominant clone indices of the TCR. From c2d0 to c2d5, both the maxVAF and the TCR dominant clone indices increased significantly. This implies that this patient had a stronger immune response level compared to Patient P7.

      Author response image 6.

      Minor Comments:

      (1) Figure 2E shows only OS, but the corresponding description in the text mentions that OS and DFS are not reached.

      Thanks for your comment. Both OS and disease-free survival (DFS) records are available in Table S1. By January 31, 2025, the follow-up data were updated for 31 patients in Supplementary Table1. Among them, three patients experienced tumor recurrence, one of whom passed away. Additionally, seven patients were lost to follow-up. As a result, neither the overall survival (OS) nor the progression-free survival (PFS) reached the median number of events required for analysis. Since neither OS nor DFS have reached their median values, we opted to display only the OS in Fig. 2E.

      (2) In the Discussion section, it is mentioned that there is controversy regarding chemotherapy combined with immunotherapy. I disagree with this statement. I believe that chemotherapy combined with immunotherapy is a consensus. The wording should be revised accordingly.

      Thanks for your comment. Yes, as you said, the combination of chemotherapy and immunotherapy has become a consensus. What we want to express is that how to optimize the administration time and dosage is worth further exploration. We will make a revise accordingly (Discussion line 328-331).

      (3) The authors mentioned that the study involves multi-omics, but only ctDNA and TCR levels are included, with no RNA-related content observed. Perhaps a different term could be used.

      Thanks for your comment. In this study, we employed a multi-omics approach involving whole transcriptome, ctDNA, and TCR sequencing to investigation. RNA-related analyses are presented in Figure S3. Given that our primary focus is on the impact of this modified treatment on immune cells, we utilized RNA sequencing results to estimate immune cell compositions using the xCell and immunCellAI algorithms. The estimated immune cell profiles have been added to Supplementary Tables 5 and 6.

      Reviewer #2 (Recommendations for the authors):

      Additional comment to the authors:

      The methods section refers to mRNA sequencing of the tumour tissue to define immune cell populations. Figure 3A also identifies that up to two timepoints were to be sequenced for individual patients. I could not find the results in the document.

      Please review the methods section and remove experimental methods where no data are presented.

      Thanks for your comment. As shown in Fig. 3A, for the mRNA data, we sampled and sequenced five patients (P1, P2, P4, P5, P7) before treatment. During the surgical phase, we sampled and sequenced three patients (P2, P5, P6). Then we utilized RNA sequencing results to estimate immune cell compositions using the xCell and immunCellAI algorithms. The estimated immune cell data have been added to Supplementary Tables 5 and 6. The T cells proportion comparisons were shown in fig. S3. The description of Whole transcriptome sequencing and immune cell abundance estimation were detailed in methods section.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Since they used PBMCs, without other assays to confirm the cell subtypes, I am not sure if any of the heterogeneity they detected in 6 cytokine secretion would be able to relate back to biology.

      We agree with the reviewer that we cannot relate cytokine secretion back to specific cell populations and that part of the heterogeneity observed is due to various cellular populations and subpopulations. However, we would argue that the results obtained from measuring PBMCs especially relate to biology, not cellular identity, and provide useful information on how PBMCs will respond to a specific challenge since they offer more clinical relevance in patient stratification and monitoring. Thus, the possibility of identifying trends in polyfunctional cytokine secretion is not hindered by the isolated view of one specific cellular subpopulation. However, we agree that future experiments must identify the polyfunctional cells and decipher the extent of heterogeneity within the population.

      In addition, the two panels were measured on separate cells, I am not sure it is meaningful to make any comparisons of the two panels as they are on different cells.

      Thank you for mentioning this point. If this refers to Figure 3, where we compare the percentage of secreting cells incubation times, these cells are all individual data points, i.e., individual cells and then pooled. It is true that, potentially, these could be similar cell types (a cell co-secreting TNFa/IL-6 could also co-secrete IL-8/MIP-1a). Since they originate from the same cell batch and stimulation, only divided before encapsulation, we think it is a valid comparison as this would also be done in ELISpot or similar techniques.

      Reviewer 2

      The conclusions of the study are based on samples from a single donor, which makes the conclusions on secretion patterns difficult to interpret. The choice of cytokines is explained, but the justification of the groupings of the antibodies into the two panels is missing.

      Thank you for highlighting this valid criticism. We chose to use cells from one donor to examine the secretion patterns observed in one individual, as cells from different individuals might respond differently. The focus of the experiments described in this study was to describe secretion patterns with respect to the incubation times and secreted cytokine, including multiple donors, which would address a different question (i.e., how is polyfunctionality different between individuals). The cytokines were grouped according to expected secretion to observe overlaps between different cell types (to increase the chance of seeing secretion from both panels simultaneously). We have added complementary text discussing the justification of cytokine grouping in the updated manuscript.

      It would further be helpful to discuss how the single cell incubation might affect the secretion dynamics vs. the influence of co-culture of all cell types during the 24 h activation.

      Thank you for this input. We discussed this potential limitation in detail in a previous publication (Portmann et al., Cell Reports Methods, 2023) and added some addressing sentences to the discussion.

      The authors compare average secretion rates and levels. However, the right panel in Fig. 6 looks like there might be two different populations of mono- or polyfuntional cells that have two secretion rates. As the authors have single-cell data, I would find the separation into these populations more meaningful than comparing the mean values. In line with this comment, comparing the mean values for these cytokines instead of the mean of the populations with distinct seretion properties might actually show stronger differences than the authors report here.

      Thank you for this addition. This plot focuses on describing the relationship between secretion and incubation times. We agree that the data can be further divided into high and low secretion and the respective average plot. However, we finally decided against such a solution to avoid bias due to small event counts in certain high- and low-polysecreting populations. We checked whether dynamics are different between these populations, and the individual averages largely follow the overall trend, although on different plateaus – indeed, high-secreting cells will reach a plateau due to saturation. We have added the plot for IFNy here to visualize this point.

      Author response image 1.

      Is the plateau of the cytokine concentration caused by the fluorescence signal saturating the camera, saturation of the magnetic beads, exhaustion of the fluorescent antibodies, or constant cytokine concentrations?

      Thank you for raising this point. On the individual cell level, the plateau is caused by assay capacity limitations for high-secreting cell populations, i.e., the capacity of the nanoparticles. For low secreting populations, the plateau is caused by a cease in secretion, whereas for high-secreting cells, the capacity will be limiting. This has been extensively discussed in Portmann et al., Cell Report Methods, 2023.

      The high number of non-CSCs and the limited number of droplets decrease the statistical power of the method. The authors discuss their choice to use PBMCs and not solely T cells, but this aspect is missing in the discussion.

      As mentioned above, we chose PBMCs for their better representability and heterogeneity in clinical settings. Indeed, focusing on secreting cell subpopulations would increase the percentage of CSCs and the number, but we found the method to be sufficiently statistically powerful for our measurements. However, we also agree with the comment raised by reviewer 1 that a focus on a specific cell population might be interesting for many questions and applications. We have added respective text to the discussion section.

      The absolute cell number is missing. This might also answer the question of whether polyfunctional cells turn into monofunctional cells after stimulation for 24 hours or if the monofunctional population expands more.

      We are unsure of this comment. If the reviewer refers to a potential expansion ex vivo over 24 h, we have checked this for different conditions and could not observe cellular expansion within this timeframe – the numbers remained mostly stable, sometimes decreasing and only increasing in CD3/CD28. However, an overall change in cell counts does not necessarily relate to the functionalities of individual cells. This observation, combined with our results, hints towards a dynamic cellular restriction of polyfunctionality, but is no direct evidence for such a hypothesis as individual cells need to be followed in such an experiment over a much larger time frame.

      Fig. 4: Using a divergent colour scheme would be helpful. Fig. 6: Adding labels with the stimulation next to the plots would be helpful.

      We have changed the figures accordingly.

      A limitation of the approach is that the detection of polyfunctionality relies on how the three cytokines in each panel are selected and comparisons between the two panels are not otherwise helpful. Can the authors discuss how many panels would be needed to fully explore polyfunctionality among the six cytokines?

      Thank you for this comment. We agree that the identification of polyfunctional cells is dependent on the panel selection, and its composition. We had to select respective panels, and based our initial choice for this study on expected secretion behavior from PBMCs, instead of engineering panels specific for one cell type. However, these panels can be adapted to study additional questions. Interesting point. 6 cytokines into groups of 3 allows for 20 possible combinations. However, we very rarely see triple positive polyfunctional cells, and not all combinations would make sense due to cellular restrictions and differences in stimulations.

      Is there any way to increase the number of cytokines that could be detected in one droplet?

      This can be done on a lower throughput scale by removing the Cell Trace violet stain. This would allow the current method to measure up to 4 cytokines. An alternative would be adding different fluorophores without spectral overlap so that the throughput could increase to around 6-7 max, allowing us to measure polyfunctionality in a less biased manner. Other solutions are needed if >6-7 cytokines should be measured. Our experiments (with high-throughput cytokine detection systems, Fireplex and Isoplexis, i.e., 17-18 cytokines) showed that cells rarely secreted more than three cytokines at a time.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ma, Yang et al. report a new investigation aimed at elucidating one of the key nutrients S. Typhimurium (STM) utilizes with the nutrient-poor intracellular niche within the macrophage, focusing on the amino acid beta-alanine. From these data, the authors report that beta-alanine plays an important role in mediating STM infection and virulence. The authors employ a multidisciplinary approach that includes some mouse studies and ultimately propose a mechanism by which panD, involved in B-Ala synthesis, mediates the regulation of zinc homeostasis in Salmonella. The impact of this work is questionable. There are already many studies reporting Salmonella-effector interactions, and while this adds to that knowledge it is not a significant advance over previous studies. While the authors are investigating an interesting question, the work has two important weaknesses; if addressed, the conclusions of this work and broader relevance to bacterial pathogenesis would be enhanced.

      Strengths:

      This reviewer appreciates the multidisciplinary nature of the work. The overall presentation of the figure graphics are clear and organized.

      Weaknesses:

      First, this study is very light on mechanistic investigations, even though a mechanism is proposed. Zinc homeostasis in cells, and roles in bacteria infections, are complex processes with many players. The authors have not thoroughly investigated the mechanisms underlying the roles of B-Ala and panD in impacting STM infection such that other factors cannot be ruled out. Defining the cellular content of Zn2+ STM in vivo would be one such route. With further mechanistic studies, the possibility cannot be ruled out that the authors have simply deleted two important genes and seen an infection defect - this may not relate directly to Zn2+ acquisition.

      Thank you for your patient and thoughtful reading, as well as the constructive comments and advice regarding our manuscript. We have revised the manuscript based on your comments and suggestions.

      You are correct that this work has not thoroughly investigated the mechanisms underlying the roles of β-alanine, panD, and zinc in impacting Salmonella infection. It is challenging to isolate sufficient amounts of Salmonella from infected cells or tissues and then measure the zinc concentration in the bacteria, and we have attempted to do so without success. Therefore, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (Δ_panD_), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. This information has been added to the revised manuscript (lines 325-329, 344-348).

      Meanwhile, we concur that additional, unknown mechanisms are involved in the virulence regulation by β-alanine in Salmonella. Our findings indicate that the double mutant Δ_panD_Δ_znuA_, which cannot synthesize β-alanine nor uptake zinc, is more attenuated than the single mutant Δ_znuA_ (Figures 5D and 6B). This suggests that the contribution of β-alanine to Salmonella's virulence is partially dependent on zinc acquisition. We have revised the related descriptions throughout the manuscript for clarity (lines 31, 304, 341,1056, 1068).

      Second, the authors hint at their newly described mechanism/pathway being important for disease and possibly a target for therapeutics. This claim is not justified given that they have employed a single STM strain, which was isolated from chickens and is not even a clinical isolate. The authors could enhance the impact of their findings and relevance to human disease by demonstrating it occurs in human clinical isolates and possibly other serovars. Further, the use of mouse macrophage as a model, and mice, have limited translatability to human STM infections.

      We thank you for your comments and advice on our manuscript and are delighted to accept them. Salmonella Typhimurium causes systemic disease in mice, which is similar to the symptoms of typhoid fever in humans and has been widely used to explore the pathogenesis of Salmonella. Based on your comment, we have now performed additional experiments to confirm several key points of our findings in another typical Salmonella serovar, Salmonella enterica serovar Typhi, which is a human-limited serovar and the cause of typhoid fever in humans (PLoS Pathog. 2012, 8(10):e1002933).

      We constructed the panD mutant strain (ΔpanD) in the S. Typhi strain Ty2 and  subsequently compared the replication of ΔpanD with that of the Ty2 wild-type in the human THP-1 monocyte like cell line (ATCC TIB-22) using gentamicin protection assays. The results showed that the replication of ΔpanD in THP-1cells was reduced by 2.6-fold at 20 h post-infection compared to the Ty2 wild-type strain  (P < 0.01) (Figure 2_figure Supplement 3), suggesting that panD also facilitates S. Typhi replication in human macrophages and may be involved in the systemic infection of S. Typhi in humans. This result has been included in the revised manuscript. (lines 203-210).

      Based on these results, we speculate that PanD may serve as a potential target for treating Salmonella infection.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28. Latin phrases like de novo should be italicized.

      Thank you for your careful review. We have revised the manuscript thoroughly (Lines 28, 65, 77, 106, 171, 173, 214, 1002, 1023, 1078).

      (2) Line 45. 'survival' typo.

      We have corrected it in the revised manuscript (Line 45).

      (3) Line 57. What evidence or prior work supports the SCV of macrophages in a nutrient-poor environment? Citation needed.

      The relevant reference has now been added (lines 62-63).

      (4) Lines 65-68. If an 'increasing number of studies have focused' on this topic, please cite them here.

      The relevant reference has now been added (lines 72-73).

      (5) Lines 69-71. Citations are needed for these claims.

      The relevant reference has now been added (lines 76-77, 79-80).

      (6) Line 76-77. Citation needed for this claim.

      The relevant reference has now been added (lines 84, 86).

      (7) Line 116-122, and Figure 1C, and Figure 1 legend. An important claim in this work is that the amino acid content of the macrophage cytoplasm is different +/- STM infection. The authors need to explain this result more carefully and define their acronyms. What is VIP, Log2 FC, etc.? What do the colors in Figure 1C mean? They are not defined. If possible, it would be more approachable to list these as molar concentrations, weight/cell, or number of molecules/cell. The authors should calculate an effect size for each of these data to help assess if the differences are meaningful. Without this information, and a clearer explanation of what these data are, it is difficult to evaluate the authors' claim that "8 [amino acids] showed significant differences in abundance."

      Thank you for the comment. The full names of VIP (Variable Importance in the Projection) and FC (fold change) have been included in the revised manuscript. In Figure 1C of the original manuscript, pink represents the content of amino acids that increased following Salmonella infection, whereas blue signifies the content of amino acids that decreased after Salmonella infection.

      Based on your suggestion, we have revised Figure 1C (now Figure 1C, D in the revised manuscript) and the content of amino acids is now expressed as weight per cell (ng/ 10<sup>7</sup> cells). The legend has been updated accordingly. (lines 9931-997).

      (8) Line 134-138. Additional controls are required for this experiment. By adding a nutrient (B-Ala) you have increased the nutrient availability and growth potential of the bacteria. This may not relate to anything special to B-Ala. Perhaps the addition of another amino acid, or sugar, would have a similar impact. Further, this result would be more compelling if the authors demonstrated a dose-dependent effect of B-Ala addition.

      Thank you for the comment. To further confirm that host-derived β-alanine can promote intracellular Salmonella replication, we have added varying concentrations of β-alanine (0.5, 1, 2, and 4 mM) to the culture medium (RPMI) of RAW264.7 cells. Subsequently, we infected these cells with Salmonella to assess the impact of β-alanine supplementation on the bacterium's replication within macrophages. Our observations indicate that the addition of 1, 2, and 4 mM β-alanine significantly (P < 0.001) enhanced Salmonella replication in RAW264.7 cells. Furthermore, the increase in Salmonella intracellular replication was dose-dependent, as illustrated in the revised Figure 1E. These findings suggest that host-derived β-alanine facilitates Salmonella replication inside macrophages. We have included these results in the revised manuscript (lines 141-149).

      (9) Lines 181-184, and Figure 2E. In addition to the fold-change replication data, here and elsewhere the authors should provide raw CFU counts for data transparency.

      Thank you for bringing this to our attention. In this work, we have utilized “fold intracellular replication (20 h intracellular bacterial CFU/ 2 h intracellular bacterial CFU)” to illustrate the differences in intracellular replication of different Salmonella strains in macrophages. The term “fold intracellular replication” is commonly employed in recently published reports (eg. FEMS Microbiol Lett. 2024, 9;371:fnae067; mBio. 2024, 15(7):e0112824; Front Microbiol. 2024, 14:1340143). To ensure data transparency, we have included the raw CFU counts in the source data file.

      (10) Line 197. Why employ i.p. injection of STM? As a non-typhoidal serovar, STM infection is enteric, and so i.p. injection seems very artificial if the goal is to understand the role B-Ala synthesis in disease.

      Thank you for the comment. Salmonella can induce gastroenteritis or systemic infection, which are associated with its capacity to invade intestinal epithelial cells and replicate within macrophages, respectively. In this study, using gentamicin protection assays and immunofluorescence analysis, we demonstrated that β-alanine is crucial for Salmonella replication inside macrophages. Since replication in macrophages is a key determinant of systemic Salmonella infection, we hypothesized that β-alanine also affects Salmonella systemic infection in vivo. Intraperitoneal (i.p.) injection enables Salmonella to disseminate directly to systemic sites via the lymphatic and bloodstream systems, bypassing the need for intestinal invasion (Microbiol Res. 2023, 275:127460; Int Immunopharmacol. 2016, 31:233-8). Thus, we conducted the mice infection assays via intraperitoneal (i.p.) injection to ascertain whether β-alanine affects systemic Salmonella infection. We have included the description in the revised manuscript to enhance clarity. (lines 217-221).

      Whether β-alanine influences Salmonella invasion of intestinal epithelial cells and intestinal colonization has not been investigated in this work; this issue will be explored in our future studies.

      (11) Line 207-214 and Figure 3. If the hypothesis is that B-Ala mediates STM survival/virulence through enhancing metabolism in the SCV and intracellular niche, why did the authors not investigate/enumerate STM in this niche in their in vivo studies?

      Thank you for the comment. Through immunofluorescence staining, we have investigated the bacterial count of Salmonella wild-type (WT), panD mutant (Δ_panD_), and complemented strain (cpanD) within the macrophages of the mouse liver. The findings indicated that the number of Δ_panD_ in each liver macrophage was significantly (P < 0.0001) lower than that of WT, and the complementation of Δ_panD_ increased the bacterial count in each liver macrophage to the level of WT (refer to Figure 3E in the revised manuscript). These results have been included in the revised manuscript. (lines 234-239).

      (12) Figure 4B - the down genes label is cut off.

      Thank you for your careful review. We have corrected it in the revised Figure 4B.

      (13) Line 260-265. SPI-2 needs to be defined and introduced, as do other terms here, to make the work approachable to non-STM specialists.

      The introduction of SPI-2 has been added to the revised manuscript. (Lines 290-292).

      (14) Line 300-301. Additional experiments are needed to support the claim that "data indicate that β-alanine promotes in vivo virulence of Salmonella, partially by increasing the expression of zinc transporter genes." Gene up- or down-regulation does not necessarily have any meaningful impact on function or activity. The authors here need an assay that confirms that the function of znuA is disrupted, such as examining the cell Zn2+ content in vivo at different levels of B-Ala exposure and/or panD activity. Moreover, more Zn2+ is not necessarily beneficial for STM, at levels too high zinc can exert cell toxicity. So, the authors have a correlation but no data supporting this mechanism explains their observations of virulence and infection. How much Zn2+ is ideal for STM growth?

      Thank you for the comment. It is challenging to isolate sufficient amounts of Salmonella from infected cells or tissues and then measure the zinc concentration in the bacteria, and we have attempted to do so without success. Therefore, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (ΔpanD), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. This information has been added to the revised manuscript (lines 325-329, 344-348).

      Zinc is essential for bacterial survival and growth, as zinc-binding proteins constitute approximately 5% of the bacterial proteome and play crucial roles in bacterial metabolism and growth (J Proteome Res. 2006, 5(11):3173-8; Future Med Chem. 2017, 9(9):899-910). Regarding Salmonella, zinc is also employed to undermine the antimicrobial host defense mechanisms of macrophages, by inhibiting NF-кB activation and impairing NF-кB-dependent bacterial clearance (J Biol Chem. 2018, 293(39):15316-15329; Infect Immun. 2017, 85(12):e00418-17). Thus, the efficient acquisition of zinc may play a crucial role in the survival and replication of Salmonella within macrophages, where zinc availability is extremely limited (Infect Immun. 2007, 75(12):5867-76; Biochim Biophys Acta. 2016, 1860(3):534-41). It has been reported that Salmonella utilizes the high-affinity ZnuABC zinc transporter to maximize zinc availability within host cells (Infect Immun. 2007, 75(12):5867-76). Here, we discovered that β-alanine can enhance the expression of the zinc transporter genes znuABC, which might serve as a supplementary mechanism for the efficient uptake of zinc by Salmonella within macrophages.

      You are correct that more zinc is not necessarily beneficial for Salmonella, as excessive zinc can inhibit the growth of Salmonella. Considering that zinc availability is limited within macrophages and the znuABC genes are significantly upregulated when Salmonella resides inside macrophages (PLoS Pathog. 2015, 11(11):e1005262; Science. 2018, 362(6419):1156-1160), it is likely that zinc acts as a limiting factor and may not attain very high concentrations during Salmonella's growth within macrophages. We have included a discussion on this matter in the revised manuscript.t (lines 459-466).

      (15) Figure 6B. Related to the above, these data would be more compelling with higher n and a dose-dependent response demonstrated for Zn2+ addition. This is a central point of the manuscript, and effectively what the authors propose as the underlying mechanism, and it should be more robustly substantiated.

      Thank you for the comment. As stated in the previous response, we were unable to directly assess the bacterial zinc concentration during Salmonella growth within macrophages. Instead, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (ΔpanD), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. Moreover, considering that zinc availability is limited within macrophages and the znuABC genes are significantly upregulated when Salmonella resides inside macrophages (PLoS Pathog. 2015, 11(11):e1005262; Science. 2018, 362(6419):1156-1160), it is likely that zinc acts as a limiting factor and may not attain very high concentration during Salmonella's growth within macrophages.

      Reviewer #2 (Public review):

      Summary:

      Salmonella exploits host- and bacteria-derived β-alanine to efficiently replicate in host macrophages and cause systemic disease. β-alanine executes this by increasing the expression of zinc transporter genes and therefore the uptake of zinc by intracellular Salmonella.

      Strengths:

      The experiments designed are thorough and the claims made are directly related to the outcome of the experiments. No overreaching claims were made.

      Weaknesses:

      A little deeper insight was expected, particularly towards the mechanistic aspects. For example, zinc transport was found to be the cause of the b-alanine-mediated effect on Salmonella intracellular replication. It would have been very interesting to see which are the governing factors that may get activated or inhibited due to Zn accumulation that supports such intracellular replication.

      We appreciate your review and advice. To further investigate the mechanisms by which β-alanine, panD, and zinc influence Salmonella infection, we have conducted additional experiments as suggested. For instance, we examined the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (Δ_panD_). This approach indirectly reflects zinc acquisition by intracellular Salmonella, as it is challenging to isolate sufficient amounts of the bacteria from infected cells or tissues for zinc concentration measurement. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared to that in WT-infected counterparts (Figures 5E and 6A). This suggests that the panD gene and β-alanine are crucial for Salmonella to absorb zinc from host cells. This new information has been included in the revised manuscript (lines 325-329, 344-348).

      Zinc is essential for bacterial survival and growth, as zinc-binding proteins constitute approximately 5% of the bacterial proteome and play crucial roles in bacterial metabolism and growth. (J Proteome Res. 2006, 5(11):3173-8; Future Med Chem. 2017, 9(9):899-910 ). Regarding Salmonella, zinc is also employed to undermine the antimicrobial host defense mechanisms of macrophages, by inhibiting NF-кB activation and impairing NF-кB-dependent bacterial clearance (J Biol Chem. 2018, 293(39):15316-15329; Infect Immun. 2017, 85(12):e00418-17). Thus, efficient zinc uptake could be crucial for Salmonella survival and replication within macrophages, where zinc availability is extremely limited (Infect Immun. 2007, 75(12):5867-76; Biochim Biophys Acta. 2016, 1860(3):534-41). It has been reported that Salmonella exploits the high-affinity ZnuABC zinc transporter to maximize zinc availability in host cells (Infect Immun. 2007, 75(12):5867-76). Here, we discovered that β-alanine can enhance the expression of the zinc transporter genes znuABC, which might serve as a supplementary mechanism for the efficient uptake of zinc by Salmonella within macrophages. We have addressed this issue in the revised manuscript (lines 459-466).

      Reviewer #2 (Recommendations for the authors):

      A few general clarifications and suggested experiments:

      (1) Metabolome analysis: Salmonella can itself produce b-alanine. Given that it is isolated from infected cells where salmonella has scavenged b-alanine from host cytosol as well as produced it, how b-alanine levels went down in metabolome analysis is confusing.

      Thank you for the comment. The method for targeted metabolic profiling is conducted as outlined in a recently published paper by our group (Nat Commun. 2021, 12(1):879). To prevent delays and changes in metabolite concentrations during the separation of bacterial contents from macrophages, we determined the combined metabolite concentrations directly from infected cells and Salmonella. We observed that each Salmonella cell contained only 0.01%-0.02% of the concentration of each corresponding combined metabolite. Approximately 94% of the infected macrophages contained no more than ten bacteria at 8 hours post-infection, confirming that the combined metabolites were predominantly from the host. We have included an explanation of this issue in the method section. (lines 557-560).

      (2) What is the basal level of b-alanine produced by macrophages? How was 1 mM conc. chosen?

      According to our results, the content of β-alanine in uninfected RAW264.7 cells is 26-33 μM/10<sup>7</sup> cell (700-900 ng/10<sup>7</sup> cell). The 1 mM concentration was chosen based on a published report (Appl Microbiol Biotechnol. 2004, 65(5):576-82).

      Additionally, we have supplemented the culture medium (RPMI) of RAW264.7 cells with 0.5, 1, 2, and 4 mM β-alanine and subsequently infected them with Salmonella to assess the impact of β-alanine supplementation on the bacterium's replication within macrophages. Our observations revealed that the supplementation with 1, 2, and 4 mM β-alanine significantly (P < 0.001) enhanced Salmonella replication in RAW264.7 cells. Furthermore, the addition of β-alanine to the infected cells resulted in a dose-dependent increase in Salmonella intracellular replication, as depicted in Figure 1E. These findings further support the notion that host-derived β-alanine facilitates Salmonella replication within macrophages. This data has been incorporated into the revised manuscript (lines 141-149).

      (3) The antimicrobial activity of macrophages preventing the growth of intracellular Salmonella will primarily be governed by genes such as GBPs, defensins, nitric oxide, etc. The expression of these genes should be tested rather than cytokines which are secreted with little effect on intracellular Salmonella.

      Thank you for the suggestion. We have investigated the levels of ROS (reactive oxygen species) and RNS (reactive nitrogen species) in Salmonella-infected RAW264.7 cells, both in the presence and absence of 1 mM β-alanine. The results indicated that β-alanine did not affect the ROS and RNS levels in RAW 264.7 cells (Figure 1_figure Supplement 1), suggesting that β-alanine does not influence the antimicrobial activity of macrophages. We have included these results in the revised manuscript (lines150-153).

      (4) For animal experiments, how many times was the experiment repeated? Can the animal experiment be done with b-alanine supplementation and panD mutant? Can the liver be stained to detect the bacteria?

      Thank you for the comment.

      i) Mouse infection assays were conducted twice, with at least 2 mice (n ≥ 2) in each injection group. The combined data from the two experiments was used for statistical analysis. This information has been added to the revised manuscript. (lines 678-681).

      ii) As suggested, mice infected with the panD mutant (Δ_panD_) were administered β-alanine (500 mg/kg/day, Behav Brain Res. 2014, 272:131-40; Physiol Behav. 2015, 145:29-37) orally on a daily basis. On the third day post-infection, the bacterial burden in the liver and spleen and the body weight of the infected mice were measured. The results indicated that administering β-alanine to mice did not affect the bacterial burden of ΔpanD in the liver and spleen nor did it influence the body weight of the infected mice (please refer to Author response image 1 below). It has been reported that β-alanine is a rate-limiting precursor for the biosynthesis of carnosine in mammals (Med Sci Sports Exerc. 2010, 42(6):1162-73; Neurochem Int. 2010, 57(3):177-88). Following supplementation, β-alanine may be rapidly synthesized into carnosine in mice, and the free β-alanine, particularly that which enters the macrophages of the liver and spleen, may be limited and insufficient to enhance Salmonella replication.

      Author response image 1.

      iii) Through immunofluorescence staining, we have investigated the bacterial count of Salmonella wild-type (WT), panD mutant (Δ_panD_), and complemented strain (c_panD_) within the macrophages of the mouse liver. The findings indicate that the number of Δ_panD_ in each liver macrophage was significantly (P < 0.0001) lower than that of WT, and the complementation of Δ_panD_ increased the bacterial count in each liver macrophage to the level of WT (Figure 3E in the revised manuscript). These results have been included in the revised manuscript. (lines 234-239).

      Reviewer #3 (Public review):

      Summary:

      Salmonella is interesting due to its life within a compact compartment, which we call SCV or Salmonella containing vacuole in the field of Salmonella. SCV is a tight-fitting vacuole where the acquisition of nutrients is a key factor by Salmonella. The authors among many nutrients, focussed on beta-alanine. It is also known from many other studies that Salmonella requires beta-alanine. The authors have done in vitro RAW macrophage infection assays and In vivo mouse infection assays to see the life of Salmonella in the presence of beta-alanine. They concluded by comprehending that beta-alanine modulates the expression of many genes including zinc transporters which are required for pathogenesis.

      Strengths:

      This study made a couple of knockouts in Salmonella and did a transcriptomic investigation to understand the global gene expression pattern.

      Weaknesses:

      The following questions are unanswered:

      (1) It is not clear how the exogenous beta-alanine is taken up by macrophages.

      We thank the reviewer for the question. It has been reported that β-alanine is transported into eukaryotic cells via the TauT (SLC6A6) and PAT1 (SLC36A1) transporters (Acta Physiol (Oxf). 2015, 213(1):191-212; Am J Physiol Cell Physiol. 2020 Apr 1;318(4):C777-C786; Biochim Biophys Acta. 1994, 1194(1):44-52.).

      (2) It is not clear how the Beta-alanine from the cytosol of the macrophage enters the SCV.

      According to the published report, translocation of SPI2 effector proteins induces the formation of specific tubular membrane compartments extend from the SCV, known as Salmonella-induced filaments (SIFs) (Traffic. 2001, 2(9):643-53; Traffic. 2007, 8(3):212-25; Traffic. 2008, 9(12):2100-16; Microbiology (Reading). 2012, 158(Pt 5):1147-1161). The membranes and lumens of both SIFs and SCVs form a continuous network, allowing vacuolar Salmonella to access various types of endocytosed materials (Front Cell Infect Microbiol. 2021, 11:624650; Cell Host Microbe. 2017, 21(3):390-402). We hypothesize that β-alanine may enter SCVs from the cytoplasm of macrophages via SIFs. This information has been included in the revised manuscript (lines 56-61).

      (3) It is not clear how the beta-alanine from SCV enters the bacterial cytosol.

      Thank you for the question. We have attempted to identify the transporter of β-alanine in Salmonella, but we found that the CycA transporter, which transports β-alanine in Escherichia coli, does not function in the same manner in Salmonella, despite Salmonella being closely related to E. coli.

      BasC is a bacterial LAT (L-Amino acid transporter) with an APC fold (J Gen Physiol. 2019, 151(4):505-517). The basC gene is reported to be present in the genomes of Pseudomonas, Acinetobacter, and Aeromonas, etc. Following your suggestion, we searched the genome of Salmonella Typhimurium at NCBI and did not find any basC gene or genes with a sequence similar to basC. Unfortunately, we have yet to identify the β-alanine transporter in Salmonella, and we will persist in our search in future work.

      (4) There is no clarity on the utilization of exogenous beta-alanine of the host and the de novo synthesis of beta-alanine by panD of Salmonella.

      Thank you for the comment. Our findings indicated that β-alanine levels were reduced in Salmonella-infected RAW264.7 cells. Furthermore, the addition of β-alanine to the culture medium (RPMI) of RAW264.7 cells significantly enhanced Salmonella replication, suggesting that the intracellular Salmonella utilize host-derived β-alanine for their growth. However, to date, we have not identified the transporter responsible for the uptake of exogenous β-alanine into the Salmonella cytosol.

      Moreover, we have discovered that the replication of the Salmonella panD mutant within macrophages and its virulence in mice are significantly reduced compared to the wild type (WT), indicating that the de novo synthesis of β-alanine is crucial for Salmonella's intracellular replication and virulence.

      These results indicate that either acquisition from the host or de novo synthesis of β-alanine is critical for Salmonella replication inside macrophages.

      Reviewer #3 (Recommendations for the authors):

      Cite this paper from 1985, which talks about the role of beta-alanine in Salmonella infection J Gen Microbiol,. 1985 May;131(5):1083-90. doi: 10.1099/00221287-131-5-1083. A Salmonella typhimurium strain defective in uracil catabolism and beta-alanine synthesis, T P West, T W Traut, M S Shanley, G A O'Donovan

      We have now cited this paper in the revised manuscript (lines 82-83).

      (2) BasC- can be important for beta-alanine transport. CycA transporter was not found to be involved in beta-alanine. However, it is important to find out which transporter is required for the uptake of beta-alaine.

      Thank you for pointing it out. We agree that it is important to determine which transporter is necessary for the uptake of β-alanine in Salmonella. BasC is a bacterial LAT (L-Amino acid transporter) with an APC fold (J Gen Physiol. 2019, 151(4):505-517). The basC gene is reported to be present in the genomes of Pseudomonas, Acinetobacter, and Aeromonas, etc. Following your suggestion, we searched the genome of Salmonella Typhimurium at NCBI and did not find any basC gene or genes with a sequence similar to basC. Unfortunately, we have yet to identify the β-alanine transporter in Salmonella, and we will persist in our search in future work.

      (3) Bacteria being quite stringent with its energy resources, it is unlikely that it will use de novo synthesis if the host resources are available. Only if the host resources are depleted, can it turn on the de novo synthesis involving panD. What is the status of fold-replication of panD mutant in the presence of exogenous addition of beta-alanine?

      Thank you for the comment. The addition of 1 to 4 mM of β-alanine increased the replication of the panD mutant (Δ_panD_) in RAW264.7 cells by 1.7- to 3.1-fold. This increase in Salmonella intracellular replication was dose-dependent, as shown in Figure 2H of the revised manuscript, further illustrating that host-derived β-alanine promotes Salmonella replication inside macrophages.

      We agree that bacteria are quite stringent with their energy resources. The results of this work indicate that either acquisition from the host or de novo synthesis of β-alanine is critical for Salmonella replication inside macrophages. We speculate that Salmonella relies on a large amount of β-alanine to efficiently replicate in macrophages, thereby highlighting the importance of β-alanine for Salmonella intracellular growth. We have discussed this issue in the revised manuscript. (lines 392-396).

      (4) 100% survival of animals infected with panD mutant is a bit of concern. What happens when beta-alanine is fed to mice and infected with panD mutant?

      Thank you for the comment. As suggested, mice infected with the panD mutant (ΔpanD) were administered β-alanine (500 mg/kg/day, as reported in Behav Brain Res. 2014, 272:131-40; Physiol Behav. 2015, 145:29-37) orally on a daily basis. On the third day post-infection, the bacterial load in the liver and spleen, as well as the body weight of the infected mice, were measured. The results indicated that administering β-alanine did not affect the bacterial load of Δ_panD_ in the liver and spleen nor did it influence the body weight of the infected mice (refer to Author response image 1). It has been reported that β-alanine is a rate-limiting precursor for the biosynthesis of carnosine in mammals (Med Sci Sports Exerc. 2010, 42(6):1162-73; Neurochem Int. 2010, 57(3):177-88). Following supplementation, β-alanine may be rapidly converted into carnosine in mice, and the free β-alanine, particularly that which enters the macrophages of the liver and spleen, may be limited and insufficient to enhance Salmonella replication.

      (5) How does beta-alanine from macrophages' cytosol enter the SCV.

      Thank you for pointing it out. According to published reports, the translocation of SPI2 effectors triggers the formation of specialized tubular membrane compartments, known as Salmonella-induced filaments (SIFs), which extend from the SCV (Traffic. 2001, 2(9):643-53; Traffic. 2007, 8(3):212-25; Traffic. 2008, 9(12):2100-16; Microbiology. 2012, 158:1147-1161). The membranes and lumens of SIFs and SCVs create a continuous network, allowing vacuolar Salmonella to access various types of endocytosed materials (Front Cell Infect Microbiol. 2021, 11:624650; Cell Host Microbe. 2017, 21(3):390-402). Consequently, it is plausible that β-alanine enters SCVs from the macrophage cytoplasm via SIFs. This information has been included in the revised manuscript.(lines 56-61).

      (6) It would be essential to dissect the role of exogenous beta-alanine and the use of de novo synthesized beta-alanine.

      We agree that it is essential to dissect the role of exogenous β-alanine and the use of de novo synthesized β-alanine. Our results indicate that Salmonella-infected macrophages exhibited lower levels of β-alanine compared to mock-infected macrophages. Furthermore, β-alanine supplementation in the cell medium enhanced Salmonella replication within macrophages in a dose-dependent manner, revealing that Salmonella utilizes host-derived β-alanine to promote intracellular replication. Additionally, a deficiency in the biosynthesis of β-alanine, resulting from mutation of the rate-limiting gene panD, led to reduced Salmonella replication in macrophages and systemic infection in mice. This suggests that Salmonella also employs bacterial-derived β-alanine to enhance intracellular replication and pathogenicity.

      We sought to identify the main transporters responsible for β-alanine uptake in Salmonella. Unfortunately, we have not yet found the transporter. We will address this issue in our future work.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study investigated the factors related to understudied genes in biomedical research. It showed that understudied genes are largely abandoned at the writing stage, and it identified a number of biological and experimental factors that influence which genes are selected for investigation. The study is a valuable contribution to this branch of meta-research, and while the evidence in support of the findings is solid, the interpretation and presentation of the results (especially the figures) needs to be improved.

      We thank the editor and reviewers for their detailed and thoughtful assessment of our work. Below, we present detailed responses to reviewers’ comments and suggestions. We are also submitting a version edited for clarity of presentation and precision of interpretation.

      Following the eLife assessment, we also tried to identify further statements where results could be presented in a more precise way.

      First, in the section Subsequent reception by other scientists does not penalize studies on understudied genes, we now state “This result again opposes the hypothesis that less-investigated genes will yield articles with lower impact.”

      Second, in section Identification of biological and experimental factors associated with selection of highlighted genes, we now state:

      “We cautiously hypothesize that this might reflect on many different research groups producing reagents surrounding the genes that they actively study. The most informative continuous factor is the number of research articles about a gene (Figure 1B).”, removing claims of causality.

      Finally, for improved readability, we have moved all supplemental tables into separate .xlsx files.

      Reviewer #1 (Public Review):

      Summary and strengths

      The authors tried to address why only a subset of genes are highlighted in many publications. Is it because these highlighted genes are more important than others? Or is it because there are non-genetic reasons? This is a critical question because in the effort to discover new genes for drug targets and clinical benefit, we need to expand a pool of genes for deep analyses. So I appreciate the authors' efforts in this study, as it is timely and important. They also provided a framework called FMUG (short for Find My Understudied Gene) to evaluate genes for a number of features for subsequent analyses.

      We thank the reviewer for their insightful comments and are pleased that the reviewer shares our appreciation for the gravity of these questions. As the reviewer emphasizes, it is critical to understand whether the choice of genes reflects their importance or non-genetic reasons. Previously we and others demonstrated that this choice does not reflect biological importance, when the latter is assessed through unbiased genome-wide data (e.g.: Haynes et al., 2018; Stoeger et al. 2018). Now we contribute to this critical question by systematically evaluating individual non-genetic reasons. We address the reviewer’s comments below.

      Weaknesses

      Many of the figures are hard to comprehend, and the figure legends do not sufficiently explain them.

      For example, what was plotted in Fig 1b? The number of articles increased from results -> write-ups -> follow-ups in all four categories with different degrees. But it does not seem to match what the authors meant to deliver.

      We apologize for the lack of clarity. We identified two interrelated elements that we have now fixed: i) the prior figure legend provided for each genomics approach n number of articles, such as “GWAS (n=450 articles)”; ii) the prior y-axis was labelled “Number of articles”.

      Addressing the first element, we now rephrased the legend for clarity:

      “b, We identified articles reporting on genome-wide CRISPR screens (CRISPR, 15 focus articles and 18 citing articles), transcriptomics (T-omics, 148 focus articles and 1,678 citing articles), affinity purification–mass spectrometry (AP-MS, 296 focus articles and 1,320 citing articles), and GWAS (450 focus articles and 3,524 citing articles). Focusing only on protein-coding genes (white box plot), we retrieved data uploaded to repositories describing which genes came up as “hits” in each experiment (first colored box plot). We then retrieved the hits mentioned in the titles and abstracts of those articles (second colored box plot) and hits mentioned in the titles and abstracts of articles citing those articles (third colored box plot). Unique hit genes are only counted once.”

      The number of genes in each box plot is now reported in the x-axis labels for each step. For example, the results for CRISPR were obtained from 15 focus studies (original research) and 18 subsequent studies (papers citing focus articles). Those 15 studies identified 9,268 genes where loss-of-function changed phenotypes but, in their titles and abstracts, mentioned only 18 of those 9,268 genes. While the 9,268 hit genes have received similar research attention to the entirety of protein-coding genes, the 18 hit genes mentioned in the title or abstract are significantly more well studied. The articles citing the focus articles also only mentioned in their titles and abstracts 19 highly studied hit genes.

      Addressing the second element, we updated the axis label to “Number of articles about gene”, to distinguish it from number of articles mentioned in the legend, convey that this is the number of articles about each gene that were published independently of the genomics assays we inspect. To further underscore this point we now label the “20% highest-studied genes” that we mention in the main text, and reworded the figure caption to better capture where the critical increase occurs: “A shift in focus towards well-studied genes occurs during the summarization and write-up of results and remains in subsequent studies.”.

      Fig 4 is also confusing. It appears that the genes were clustered by many features that the authors developed. But does it have any relationship with genes being under- or over-studied?

      We again apologize for the lack of clarity. As is described in the main text, while the results of Figs. 1-2 suggest that gene popularity may be predict the highlighting of a differentially expressed gene in the title or abstract, we want to conduct a systematically analysis of the factors that correlate with such a decision. We thus build a set of 45 factors that have been discussed as factors explaining why some genes receive increased research attention.

      The data in Fig. 4 shows that those 45 factors are not independent but that some are highly correlated. Because of those correlations, we are able to select a smaller number as representative of the full set. Those are the default factors shown to users of FMUG. While users can choose all factors that are significantly correlated with the highlighting in title or abstract, the default of presenting factors representing different clusters of factors enabled us to limit the number of factors that are initially displayed.

      Please note that following the suggestion of Reviewer 3, we have now moved this Figure to the supplemental material, as Figure S11.

      Reviewer #2 (Public Review)

      Summary and strengths

      In this manuscript the authors analyse the trajectory of understudied genes (UGs) from experiment to publication and study the reasons for why UGs remain underrepresented in the scientific literature. They show that UGs are not underrepresented in experimental datasets, but in the titles and abstracts of the manuscripts reporting experimental data as well as subsequent studies referring to those large-scale studies. They also develop an app that allows researchers to find UGs and their annotation state. Overall, this is a timely article that makes an important contribution to the field. It could help to boost the future investigation of understudied genes, a fundamental challenge in the life sciences. It is concise and overall well-written, and I very much enjoyed reading it. However, there are a few points that I think the authors should address.

      We thank the reviewer for their kind assessment.

      Weaknesses

      The authors conclude that many UGs "are lost" from genome-wide assay at the manuscript writing stage. If I understand correctly, this is based on gene names not being reported in the title or abstract of these manuscripts. However, for genome-wide experiments, it would be quite difficult for authors to mention large numbers of understudied genes in the abstract. In contrast, one might highlight the expected behaviour of a well-studied protein simply to highlight that the genome-wide study provides credible results.

      We agree that it is not reasonable to expect a title or abstract to highlight hundreds or even thousands of differentially expressed genes. We’ve now extended our Study Limitations section to address this:

      “we take a gene being mentioned in the title or abstract of an article as a proxy for a gene receiving attention by the article’s authors. The title and abstract are space-limited and thus cannot accommodate discussion of large numbers of genes.”

      We also agree that highlighting the expected behavior of a well-studied protein may provide credibility to a study and increase confidence on other results. The soundness of such a strategy was quantitatively studied in a study by Uzzi et al. (Science 2013), which we now include in the section on study limitations as:

      “authors beginning manuscripts with something familiar before introducing something new”.

      To convey the practical limitation of abstracts needing to be concise, we added the following sentence to our discussion section, when suggesting controlled trials that add genes to abstracts:

      “This intervention would need to be carefully designed since abstracts are limited in their size.”

      To avoid over-interpretation we have in the discussion also extended the sentence on “lost in a leaky pipeline” to “lost to titles and abstracts of research articles in a leaky pipeline”.

      Our focus on titles and abstracts has been equally motivated by their availability (full text still is often behind paywalls and/or not accessible for bulk-download and text-mining) and by abstracts being the most visible and most read parts of research articles (e.g.: bioRxiv estimates that for the preprint for the present manuscript, the abstract was read ~10 times more frequently than full-text HTML and 4 times more frequently than the pdf).

      Could this bias the authors' conclusions and, if so, how could this be addressed? For example, would it be worth to normalise studies based on the total number of genes they cover?

      We previously described that – in line with the reviewer’s expectations – unstudied genes are preferentially added to the title or abstract of articles that feature more genes in the title or abstract (Stoeger et al., Plos Biology, 2022; Fig. 2B). Normalizing by the total number of genes should thus preserve the pronounced division between well-studied genes and unstudied genes show in Figure 1B. In line with these predictions, we randomly select one gene per title/abstract and find that the effect remains (see new Figure S7).

      Author response image 1.

      Figure 1B is confusing in its present form. I think the plot and/or the legend need revising. For example, what "numbers to the right of each box plot" are the authors referring to? Also, I assume that the filled boxes are understudied genes and the empty/white box is "all genes", but that's not explained in the legend. In the main text, the figure is referred to with the sentence "we found that hit genes that are highlighted in the title or abstract are strongly over-represented among the 20% highest-studied genes in all biomedical literature ". I cannot follow how the figure shows this. My interpretation is that the y-axis is not showing the number of articles, but represents the percentage of articles mentioning a gene in the title/abstract, displayed on a log scale. If so, perhaps a better axis labels and legend text could be sufficient. But then one would also need to somehow connect this to the statement in the main text about the 20% highest-studied genes (a dashed line?). Alternatively, the authors could consider other ways of plotting these data, e.g. simply plotting the "% of publication in which a gene appears" from 0-100% or so.

      Reviewer 1 raised a similar point on overall figure clarity. We identified two interrelated elements that contribute to overall confusion and have now fixed them (see response to Reviewer 1 beginning on page 2 of this document).

      We attempted an alternative plotting of Fig 1B according to the reviewer’s suggestion. In the version below, the y-axis instead shows the percent of gene-related articles that are about each gene. We chose to keep the original y-axis (showing number of articles about each gene) as it additionally conveys the absolute scale of scholarship on individual genes.

      Author response image 2.

      Reviewer #3 (Public Review):

      Summary and strengths

      The manuscript investigated the factors related to understudied genes in biomedical research. It showed that understudied are largely abandoned at the writing stage and identified biological and experimental factors associated with selection of highlighted genes.

      It is very important for the research community to recognize the systematic bias in research of human genes and take precautions when designing experiments and interpreting results. The authors have tried to profile this issue comprehensively and promoted more awareness and investigation of understudied genes.

      We thank the reviewer for their kind assessment of our work.

      Weaknesses

      Regarding result section 1 "Understudied genes are abandoned at synthesis/writing stage", the figures are not clear and do not convey the messages written in the main text. For example, in Figure 1B, figure S5 and S6,

      • There is no "numbers to the right of each box plot".

      The “numbers to the right” statement in the caption was an erroneous inclusion from an earlier version of the figure. We apologize for our error and have now removed this statement.

      • Do these box plots only show understudied genes? How many genes are there in each box plot? The definition and numbers of understudied genes are not clear.

      The x-axis describes genes featured in each stage of the publication process (from all protein-coding genes to genes found as hits in genome-wide screen to genes found in the title/abstract to genes found in the title/abstract of citing articles) and the y-axis describes the number of articles annotated to those genes. We have also now added the number of genes in each box plot to the figure. This information is also in Materials and Methods under each technology’s heading (see also response to Reviewer 1 beginning on page 2 of this document).

      Author response image 3.

      • "We found that hit genes that are highlighted in the title or abstract are strongly over-represented among the 20% highest-studied genes in all biomedical literature (Figure 1B)". This is not clear from the figure.

      We have revised Figure 1B and its caption to better communicate the main point of the figure: that genes which make it to the title/abstract of the reporting article tend to be more popular than genes which are hits in genome-wide experiments from those articles. We have added a horizontal line that shows the cutoff for the top 20% most popular genes.

      Regarding result section 2 "Subsequent reception by other scientists does not penalize studies on understudied genes", the authors showed in figure 2 that there is a negative correlation between articles per gene before 2015 and median citations to articles published in 2015. Another explanation could be that for popular genes, there are more low-quality articles that didn't get citations, not necessarily that less popular genes attract more citations.

      We believe that both explanations for the observed phenomenon are not mutually exclusive. Previously, we focused on the median of citations to articles about a gene to capture the typical effect. In a new analysis, we also find support for the possibility outlined by the reviewer and believe that adding this to our manuscript complements and balances our analysis of citations. Specifically, in the new Figure S8B we find that most popular genes are slightly more likely to be among least cited papers (and in Figure S8A that the least studied genes have been much more likely to be among the most cited papers). In-text, we state:

      “Further, since 1990, articles about the least popular genes have at times been 3 to 4 times more likely to be among the most cited articles than articles on the most popular genes whereas articles on the most popular genes have been slightly less to be highly cited than lowly cited (Figure S8)”.

      We thank the reviewer for their suggestion, which strengthens our manuscript. The figure caption reads:

      “Figure S8: Likelihoods of being highly cited (top 5% of citations among all articles about genes, panel a) or lowly cited (bottom 5% of citations among all articles about genes, panel b) for articles about the most popular genes (top 5% accumulated articles) versus articles about the least popular genes (bottom 5% accumulated articles) by year of publication. Only articles with a single gene in the title/abstract are considered. Shaded regions show ±1 standard error of the proportion."

      Author response image 4.

      Regarding result section 3 "Identification of biological and experimental factors associated with selection of highlighted genes", in Figure 3 and table s2, the author stated that "hits with a compound known to affect gene activity are 5.114 times as likely to be mentioned in the title/abstract in an article using transcriptomics", The number 5.144 comes out of nowhere both in the figure and the table. In addition, figure 4 is not informative enough to be included as a main figure.

      This is the result of both a typo and imprecise terminology. The number should read 4.262 (the likelihood ratio of being mentioned in the title/abstract between genes with and without a compound), which corresponds to an odds ratio of 4.331. We have clarified this in the table caption, stating:

      “e.g. hits with a compound known to affect gene activity are 4.262 times as likely to be mentioned in the title/abstract in an article using transcriptomics, corresponding to an odds ratio of 4.331".

      We have removed Figure 4 as a main-text figure and added a version, with revised color scheme along comments of Reviewer 1, as Figure S11. We added to the figure caption “Bold indicates FMUG ‘s default factors, which we selected based on this clustering and based on their strength of association with gene selection (Figure 3, Table S2 and Table S3)."

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      • Fig 2a shows that papers highlighting understudied genes are actually cited more. I wonder why authors only looked at data before 2015. Fig 2b shows an increased correlation since 2015. Please consider redrawing Fig 2a to include data from 2015-2020?

      We highlight data from 2015 since, from our used version of iCite (v32, released July 2022, covering citations made through most of 2021), papers published in 2015 have had about 6 years to accumulate citations. With fewer years to accumulate citations, insufficient signal may cause correlation to converge toward zero. Below, we repeat the analysis in Figure 2 but only considering citations made within a year of an article’s publication, which substantially reduces correlation (although remaining significant).

      Author response image 5.

      We added a note to the figure caption:

      “We forgo depicting more recent years than 2015 to allow for citations to accumulate over multiple years, providing a more sensitive and robust readout of long-term impact.”

      For Figure 2B, we add:

      “For more recent years, where articles have had less time to accumulate citations, insufficient signal may cause correlation to converge toward zero.”

      • Can FMUG be posted on the web for easy access by researchers with non-computational backgrounds?"

      We presently regretfully do not have the resources to create or maintain a web-based version. We hope that the publication of this manuscript will enable us to attract resources to create and maintain a web-based version.

      Reviewer #2 (Recommendations for the authors):

      • Related to the first weakness in my public review: The observed disparity between CRISPR and GWAS study in terms of which genes they promote to the abstract is interesting. I wonder if this has to do with the application of these techniques. GWAS studies will often highlight that they retrieve known associations between a gene and a phenotype, to show that a screen is working. I guess often the point is to subsequently identify more genes associated with a particular phenotype, but often it is unclear how to validate/verify newly found associations. In contrast, CRISPR screens might be more focussed on functionally/mechanistically understanding unknown processes, e.g. observing a phenotype that appears/disappears in response to a gene deletion. In such studies, the follow-up of a previously unknown gene could be more straightforward and relevant to the outcome. Does that mean CRIPSR screens are better than GWAS studies for addressing the UG problem? Perhaps the authors could briefly discuss this issue.

      The number of studies we included featuring CRISPR screens is relatively small (n = 15 compared to n = 450 for GWAS). Thus, it is not possible to conclude in a statistically sound manner whether authors of CRISPR screens are truly more likely to highlight understudied genes.

      However, the reviewer raises compelling reasons for why this might be the case, and we now embed the broader discussion point that some techniques might be more powerful toward understudied genes.

      The discussion now includes:

      “Further, the observed discrepancy between the popularity of hits highlighted by GWAS versus other technologies suggests that some -omics technologies may be more powerful than others for characterizing understudied genes. This possibility merits further research and researchers participating in unknomics should consider the relative strengths of each technology towards providing tractable results for follow-up.”

      • Affinity capture mass spectrometry (Aff-MS): Perhaps I misunderstood this, but typically this is referred to as affinity purification MS (AP-MS)

      Thank you for the clarification. We have changed ‘Aff-MS’ to ‘AP-MS’ throughout the manuscript.

      • Page 3, line 96. The sentence "The first possibility is that seemingly understudied genes are, in fact, not understudied as they would rarely be identified through experiments.". Would they not still be understudied, just not intentionally?

      We have rephrased this sentence to:

      “The first possibility is that some genes are less studied because they are rarely identified as hits in experiments.”

      • Fig 4 is very interesting, but I also found it a bit confusing. First, the choice of colour scheme, where blue shows the absence and white shows the presence of something, seems counterintuitive, especially on a white background. Second, I find it confusing that only some of the experiments are labelled in the heatmap. Could the authors not simply use Fig S9 as Fig 4? Or alternatively, only include the 8 labelled factors in the simplified figure.

      In line with this feedback and that of Review #1 and #3, we have removed Figure 4 as a main-text figure and instead include this figure as Supplementary Figure S11. We have reversed the color scheme so that purple indicates one and white indicates zero. We also now label all factors. Previously we had only listed the default features of FMUG. We also now updated the figure legend to convey how it assisted the choice of default factors in FMUG. It reads:

      “Bold indicates FMUG ‘s default factors, which we selected based on this clustering and based on their strength of association with gene selection (Figure 3, Table S2 and Table S3)”.

      • The FMUG app is fantastic and sounds exactly like something that is required to boost the visibility of understudied genes and overcome the understudied gene bias. However, I did not understand the choice of reporting this in the Discussion section.

      We thank the reviewer for their enthusiasm, and have now moved FMUG into the results section.

      • To further increase usability of the FMUG app, is there a way it could be deployed online? I appreciate this could require a major amount of coding work, which would not be reasonable to demand. So please consider this a suggestion, potentially for a future implementation.

      We presently regretfully do not have the resources to create or maintain a web-based version. We hope that the publication of this manuscript will enable us to attract resources to create and maintain a web-based version.

      Reviewer #3 (Recommendations for the authors):

      Table s2 and s3: p values are indicated by star signs. However, with so many hypothesis tests, the p values should be corrected for multiple tests.

      We have now applied Benjamini-Hochberg multiple hypothesis correction to these tables, correcting p-values within each of the four technologies. We update our significance calling to read:

      “We identified 45 factors that relate to genes and found 33 (12 out of 23 binary factors and 21 out of 22 continuous factors) associated with selection in at least one assay type at Benjamini-Hochberg FDR < 0.001.”

      Figure S1 - S4

      These figures contain too many noninformative boxes. In all the figures, only the last three boxes are informative (reports assessed for eligibility, reports excluded, and studies included in review). The rest boxes convey little information and should be simplified.

      We have simplified these diagrams, removing boxes which contained no information.

      Figure S6: what does it mean by "prior to the publication of the first article represented in this sample"? What is "this sample"?

      “This sample” refers to the collection of 450 GWAS articles, 296 articles using AP-MS, 148 transcriptomics articles, and 15 genome-wide CRISPR screen articles. We have rephrased this sentence to make this clear. It now reads:

      “Variant of Figure 1B only considering articles published in 2002 or before, prior to the publication of any of the articles featuring -omics experiments which we considered for this analysis.”

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      This neuroimaging and electrophysiology study in a small cohort of congenital cataract patients with sight recovery aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in visual cortex. While contrasting sight-recovery with visually intact controls suggested the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, it provided only incomplete evidence supporting claims about the effects of early deprivation itself. The reported data were considered valuable, given the rare study population. However, the small sample sizes, lack of a specific control cohort and multiple methodological limitations will likely restrict usefulness to scientists working in this particular subfield.

      We thank the reviewing editors for their consideration and updated assessment of our manuscript after its first revision.

      In order to assess the effects of early deprivation, we included an age-matched, normally sighted control group recruited from the same community, measured in the same scanner and laboratory. This study design is analogous to numerous studies in permanently congenitally blind humans, which typically recruited sighted controls, but hardly ever individuals with a different, e.g. late blindness history. In order to improve the specificity of our conclusions, we used a frontal cortex voxel in addition to a visual cortex voxel (MRS). Analogously, we separately analyzed occipital and frontal electrodes (EEG).

      Moreover, we relate our findings in congenital cataract reversal individuals to findings in the literature on permanent congenital blindness. Note, there are, to the best of our knowledge, neither MRS nor resting-state EEG studies in individuals with permanent late blindness.

      Our participants necessarily have nystagmus and low visual acuity due to their congenital deprivation phase, and the existence of nystagmus is a recruitment criterion to diagnose congenital cataracts.

      It might be interesting for future studies to investigate individuals with transient late blindness. However, such a study would be ill-motivated had we not found differences between the most “extreme” of congenital visual deprivation conditions and normally sighted individuals (analogous to why earlier research on permanent blindness investigated permanent congenitally blind humans first, rather than permanently late blind humans, or both in the same study). Any result of these future work would need the reference to our study, and neither results in these additional groups would invalidate our findings.

      Since all our congenital cataract reversal individuals by definition had visual impairments, we included an eyes closed condition, both in the MRS and EEG assessment. Any group effect during the eyes closed condition cannot be due to visual acuity deficits changing the bottom-up driven visual activation.

      As we detail in response to review 3, our EEG analyses followed the standards in the field.

      Public Reviews:

      Reviewer (1 (Public review):

      Summary

      In this human neuroimaging and electrophysiology study, the authors aimed to characterise effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects, because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then perform multiple exploratory correlations between MRS measures and visual acuity, and report a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected two electrodes placed in the visual cortex for analysis and report a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. Control electrodes in the frontal region did not present with the same pattern. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel. Nevertheless, the study provides a rare and valuable insight into experience-dependent plasticity in the human brain.

      Strengths of study

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well written.

      Limitations

      Low sample size. Ten for CC and ten for SC, and further two SC participants were rejected due to lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      In the updated manuscript, the authors have provided justification for their sample size by pointing to prior studies and the inherent difficulties in recruiting individuals with bilateral congenital cataracts. Importantly, this highlights the value the study brings to the field while also acknowledging the need to replicate the effects in a larger cohort.

      Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from a more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      In the updated version, the authors have indicated that future studies can pursue comparisons between congenital cataract participants and cohorts with later sight loss.

      MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      In the updated version, the authors have added more information that informs the reader of the MRS quality differences between voxel locations. This increases the transparency of their reporting and enhances the assessment of the results.

      Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drives the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised to due congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      The updated manuscript contains key reference from non-human work to justify their interpretation.

      Heterogeneity in patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The updated document has addressed this caveat.

      Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      This has now been done throughout the document and increases the transparency of the reporting.

      P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlates with age.

      This caveat has been addressed in the revised manuscript.

      Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Fig.4. yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      This has been done throughout the document and increases the transparency of the reporting.

      The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      This caveat has been addressed. The authors have added frontal electrodes to their analysis, providing an essential regional control for the visual cortex location.

      Comments on the latest version:

      The authors have made reasonable adjustments to their manuscript that addressed most of my comments by adding further justification for their methodology, essential literature support, pointing out exploratory analyses, limitations and adding key control analyses. Their revised manuscript has overall improved, providing valuable information, though the evidence that supports their claims is still incomplete.

      We thank the reviewer for suggesting ways to improve our manuscript and carefully reassessing our revised manuscript.

      Reviewer 2 (Public review):

      Summary:

      The study examined 10 congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts, measuring neural activity and neuro chemical profiles from the visual cortex. The declared aim is to test whether restoring visual function after years of complete blindness impacts excitation/inhibition balance in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways in which this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      The main methodological limitation is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested that Excitation/Inhibition ratio in the visual cortex is increased in congenitally blind patients; the present study reports that E/I ratio decreases instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      We thank the reviewer for suggesting ways to improve our manuscript and carefully reassessing our revised manuscript.

      Since we have not been able to acquire longitudinal data with the experimental design of the present study in congenital cataract reversal individuals, we compared the MRS and EEG results of congenital cataract reversal individuals  to published work in congenitally permanent blind individuals. We consider this as a resource saving approach. We think that the results of our cross-sectional study now justify the costs and enormous efforts (and time for the patients who often have to travel long distances) associated with longitudinal studies in this rare population.

      There are also more technical limitations related to the correlation analyses, which are partly acknowledged in the manuscript. A bland correlation between GLX/GABA and the visual impairment is reported, but this is specific to the patients group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patients group.

      Given the exploratory nature of the correlations, we do not base the majority of our conclusions on this analysis. There are no doubts that the reported correlations need replication; however, replication is only possible after a first report. Thus, we hope to motivate corresponding analyses in further studies.

      It has to be noted that in the present study significance testing for correlations were corrected for multiple comparisons, and that some findings replicate earlier reports (e.g. effects on EEG aperiodic slope, alpha power, and correlations with chronological age).

      Conclusions:

      The main claim of the study is that sight recovery impacts the excitation/inhibition balance in the visual cortex, estimated with MRS or through indirect EEG indices. However, due to the weaknesses outlined above, the study cannot distinguish the effects of sight recovery from those of visual deprivation. Moreover, many aspects of the results are interesting but their validation and interpretation require additional experimental work.

      We interpret the group differences between individuals tested years after congenital visual deprivation and normally sighted individuals as supportive of the E/I ratio being impacted by congenital visual deprivation. In the absence of a sensitive period for the development of an E/I ratio, individuals with a transient phase of congenital blindness might have developed a visual system indistinguishable  from normally sighted individuals. As we demonstrate, this is not so. Comparing the results of congenitally blind humans with those of congenitally permanently blind humans (from previous studies) allowed us to identify changes of E/I ratio, which add to those found for congenital blindness.  

      We thank the reviewer for the helpful comments and suggestions related to the first submission and first revision of our manuscript. We are keen to translate some of them into future studies.

      Reviewer 3 (Public review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship and to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      First of all, I would like to disclose that I am not an expert in congenital visual deprivation, nor in MRS. My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods.

      Although the authors addressed some of the concerns of the previous version, major concerns and flaws remain in terms of methodological and statistical approaches along with the (over)interpretation of the results. Specific concerns include:

      (1 3.1 Response to Variability in Visual Deprivation<br /> Rather than listing the advantages and disadvantages of visual deprivation, I recommend providing at least a descriptive analysis of how the duration of visual deprivation influenced the measures of interest. This would enhance the depth and relevance of the discussion.

      Although Review 2 and Review 3 (see below) pointed out problems in interpreting multiple correlational analyses in small samples, we addressed this request by reporting such correlations between visual deprivation history and measured EEG/MRS outcomes.

      Calculating the correlation between duration of visual deprivation and behavioral or brain measures is, in fact, a common suggestion. The existence of sensitive periods, which are typically assumed to not follow a linear gradual decline of neuroplasticity, does not necessary allow predicting a correlation with duration of blindness. Daphne Maurer has additionally worked on the concept of “sleeper effects” (Maurer et al., 2007), that is, effects on the brain and behavior by early deprivation which are observed only later in life when the function/neural circuits matures.

      In accordance with this reasoning, we did not observe a significant correlation between duration of visual deprivation and any of our dependent variables.

      (2 3.2) Small Sample Size

      The issue of small sample size remains problematic. The justification that previous studies employed similar sample sizes does not adequately address the limitation in the current study. I strongly suggest that the correlation analyses should not feature prominently in the main manuscript or the abstract, especially if the discussion does not substantially rely on these correlations. Please also revisit the recommendations made in the section on statistical concerns.

      In the revised manuscript, we explicitly mention that our sample size is not atypical for the special group investigated, but that a replication of our results in larger samples would foster their impact. We only explicitly mention correlations that survived stringent testing for multiple comparisons in the main manuscript.

      Given the exploratory nature of the correlations, we have not based the majority of our claims on this analysis.

      (3 3.3) Statistical Concerns

      While I appreciate the effort of conducting an independent statistical check, it merely validates whether the reported statistical parameters, degrees of freedom (df), and p-values are consistent. However, this does not address the appropriateness of the chosen statistical methods.

      We did not intend for the statcheck report to justify the methods used for statistics, which we have done in a separate section with normality and homogeneity testing (Supplementary Material S9), and references to it in the descriptions of the statistical analyses (Methods, Page 13, Lines 326-329 and Page 15, Lines 400-402).

      Several points require clarification or improvement:

      (4) Correlation Methods: The manuscript does not specify whether the reported correlation analyses are based on Pearson or Spearman correlation.

      The depicted correlations are Pearson correlations. We will add this information to the Methods.

      (5) Confidence Intervals: Include confidence intervals for correlations to represent the uncertainty associated with these estimates.

      We will add the confidence intervals to the second revision of our manuscript.

      (6) Permutation Statistics: Given the small sample size, I recommend using permutation statistics, as these are exact tests and more appropriate for small datasets.

      Our study focuses on a rare population, with a sample size limited by the availability of participants. Our findings provide exploratory insights rather than make strong inferential claims. To this end, we have ensured that our analysis adheres to key statistical assumptions (Shapiro-Wilk as well as Levene’s tests, Supplementary Material S9),and reported our findings with effect sizes, appropriate caution and context.

      (7) Adjusted P-Values: Ensure that reported Bonferroni corrected p-values (e.g., p > 0.999) are clearly labeled as adjusted p-values where applicable.

      In the revised manuscript, we will change Figure 4 to say ‘adjusted p,’  which we indeed reported.

      (8) Figure 2C

      Figure 2C still lacks crucial information that the correlation between Glx/GABA ratio and visual acuity was computed solely in the control group (as described in the rebuttal letter). Why was this analysis restricted to the control group? Please provide a rationale.

      Figure 2C depicts the correlation between Glx/GABA+ ratio and visual acuity in the congenital cataract reversal group, not the control group. This is mentioned in the Figure 2 legend, as well as in the main text where the figure is referred to (Page 18, Line 475).

      The correlation analyses between visual acuity and MRS/EEG measures were only performed in the congenital cataract reversal group since the sighed control group comprised of individuals with vision in the normal range; thus this analyses would not make sense. Table 1 with the individual visual acuities for all participants, including the normally sighted controls, shows the low variance in the latter group.  

      For variables in which no apiori group differences in variance were predicted, we performed the correlation analyses across groups (see Supplementary Material S12, S15).

      We will highlight these motivations more clearly in the Methods of the revised manuscript.

      (9 3.4) Interpretation of Aperiodic Signal

      Relying on previous studies to interpret the aperiodic slope as a proxy for excitation/inhibition (E/I) does not make the interpretation more robust.

      How to interpret aperiodic EEG activity has been subject of extensive investigation. We cite studies which provide evidence from multiple species (monkeys, humans) and measurements (EEG, MEG, ECoG), including studies which pharmacologically manipulated E/I balance.

      Whether our findings are robust, in fact, requires a replication study. Importantly, we analyzed the intercept of the aperiodic activity fit as well, and discuss results related to the intercept.

      Quote:

      “3.4 Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Response: Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Response: Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in humans, in addition to monkey ECoG (Muthukumaraswamy & Liley, 2018). Further, Medel et al. (now published as Medel et al., 2023) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG from humans.

      In the introduction of the revised manuscript, we have made more explicit that this metric is indirect (Page 3, Line 91), (additionally see Discussion, Page 24, Lines 644-645, Page 25, Lines 650-657).

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged. We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity.“

      (10) Additionally, the authors state:

      "We cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness."

      (11) This could be addressed directly by including skull thickness as a covariate or visualizing it in scatterplots, for instance, by representing skull thickness as the size of the dots.

      We are not aware of any study that would justify such an analysis.

      Our analyses were based on previous findings in the literature.

      Since to the best of our knowledge, no evidence exists that congenital cataracts go together with changes in skull thickness, and that skull thickness might selectively modulate visual cortex Glx/GABA+ but not NAA measures, we decided against following this suggestion.

      Notably, the neurotransmitter concentration reported here is after tissue segmentation of the voxel region. The tissue fraction was shown to not differ between groups in the MRS voxels (Supplementary Material S4). The EEG electrode impedance was lowered to <10 kOhm in every participant (Methods, Page 13, Line 344), and preparation was identical across groups.

      (12 3.5) Problems with EEG Preprocessing and Analysis

      Downsampling: The decision to downsample the data to 60 Hz "to match the stimulation rate" is problematic. This choice conflates subsequent spectral analyses due to aliasing issues, as explained by the Nyquist theorem. While the authors cite prior studies (Schwenk et al., 2020; VanRullen & MacDonald, 2012) to justify this decision, these studies focused on alpha (8-12 Hz), where aliasing is less of a concern compared of analyzing aperiodic signal. Furthermore, in contrast, the current study analyzes the frequency range from 1-20 Hz, which is too narrow for interpreting the aperiodic signal as E/I. Typically, this analysis should include higher frequencies, spanning at least 1-30 Hz or even 1-45 Hz (not 20-40 Hz).

      As mentioned in the Methods (Page 15 Line 376) and the previous response, the pop_resample function used by EEGLAB applies an anti-aliasing filter, at half the resampling frequency (as per the Nyquist theorem https://eeglab.org/tutorials/05_Preprocess/resampling.html). The upper cut off of the low pass filter set by EEGlab prior to down sampling (30 Hz) is still far above the frequency of interest in the current study  (1-20 Hz), thus allowing us to derive valid results.

      Quote:

      “- The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      Response: This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which ranged in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; Vanrullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .”

      Moreover, the resting-state data were not resampled to 60 Hz. We will make this clearer in the Methods of the revised manuscript.

      Our consistent results of group differences across all three  EEG conditions, thus, exclude any possibility that they were driven by aliasing artifacts.

      The expected effects of this anti-aliasing filter can be seen in the attached Figure R1, showing an example participant’s spectrum in the 1-30 Hz range (as opposed to the 1-20 Hz plotted in the manuscript), clearly showing a 30-40 dB drop at 30 Hz. Any aliasing due to, for example, remaining line noise, would additionally be visible in this figure (as well as Figure 3) as a peak.

      Author response image 1.

      Power spectral density of one congenital cataract-reversal (CC) participant in the visual stimulation condition across all channels. The reduced power at 30 Hz shows the effects of the anti-aliasing filter applied by EEGLAB’s pop_resample function.

      As we stated in the manuscript, and in previous reviews, so far there has been no consensus on the exact range of measuring aperiodic activity. We made a principled decision based on the literature (showing a knee in aperiodic fits of this dataset at 20 Hz) (Medel et al., 2023; Ossandón et al., 2023), data quality (possible contamination by line noise at higher frequencies) and the purpose of the visual stimulation experiment (to look at the lower frequency range by stimulating up to 60 Hz, thereby limiting us to quantifying below 30 Hz), that 1-20 Hz would be the fit range in this dataset.

      Quote:

      “(3) What's the underlying idea of analyzing two separate aperiodic slopes (20-40Hz and 1-19Hz). This is very unusual to compute the slope between 20-40 Hz, where the SNR is rather low.

      "Ossandón et al. (2023), however, observed that in addition to the flatter slope of the aperiodic power spectrum in the high frequency range (20-40 Hz), the slope of the low frequency range (1-19 Hz) was steeper in both, congenital cataract-reversal individuals, as well as in permanently congenitally blind humans."

      Response: The present manuscript computed the slope between 1-20 Hz. Ossandón et al. as well as Medel et al. (2023) found a “knee” of the 1/f distribution at 20 Hz and describe further the motivations for computing both slope ranges. For example, Ossandón et al. used a data driven approach and compared single vs. dual fits and found that the latter fitted the data better. Additionally, they found the best fit if a knee at 20 Hz was used. We would like to point out that no standard range exists for the fitting of the 1/f component across the literature and, in fact, very different ranges have been used (Gao et al., 2017; Medel et al., 2023; Muthukumaraswamy & Liley, 2018).“

      (13) Baseline Removal: Subtracting the mean activity across an epoch as a baseline removal step is inappropriate for resting-state EEG data. This preprocessing step undermines the validity of the analysis. The EEG dataset has fundamental flaws, many of which were pointed out in the previous review round but remain unaddressed. In its current form, the manuscript falls short of standards for robust EEG analysis. If I were reviewing for another journal, I would recommend rejection based on these flaws.

      The baseline removal step from each epoch serves to remove the DC component of the recording and detrend the data. This is a standard preprocessing step (included as an option in preprocessing pipelines recommended by the EEGLAB toolbox, FieldTrip toolbox and MNE toolbox), additionally necessary to improve the efficacy of ICA decomposition (Groppe et al., 2009).

      In the previous review round, a clarification of the baseline timing was requested, which we added. Beyond this request, there was no mention of the appropriateness of the baseline removal and/or a request to provide reasons for why it might not undermine the validity of the analysis.

      Quote:

      “- "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      Response: The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has been explicitly stated in the revised manuscript (Page 13, Line 354).”

      Prior work in the time (not frequency) domain on event-related potential (ERP) analysis has suggested that the baselining step might cause spurious effects (Delorme, 2023) (although see (Tanner et al., 2016)). We did not perform ERP analysis at any stage. One recent study suggests spurious group differences in the 1/f signal might be driven by an inappropriate dB division baselining method (Gyurkovics et al., 2021), which we did not perform.

      Any effect of our baselining procedure on the FFT spectrum would be below the 1 Hz range, which we did not analyze.  

      Each of the preprocessing steps in the manuscript match pipelines described and published in extensive prior work. We document how multiple aspects of our EEG results replicate prior findings (Supplementary Material S15, S18, S19), reports of other experimenters, groups and locations, validating that our results are robust.

      We therefore reject the claim of methodological flaws in our EEG analyses in the strongest possible terms.

      Quote:

      “3.5 Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      Response: As pointed out in the methods and Figure 1, we only analyzed data from two occipital channels, O1 and O2 neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023). As control sites we added the frontal channels FP1 and Fp2 (see Supplementary Material S14)

      Neither Ossandón et al. (2023) nor Pant et al. (2023) considered frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations (Methods, Page 14, Lines 365-367). The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used spectrum interpolation to remove line noise; the group differences remained stable (Ossandón et al., 2023). We have reported this analysis in the revised manuscript (Page 14, Lines 364-357).

      Further, both groups were measured in the same lab, making line noise (~ 50 Hz) as an account for the observed group effects in the 1-20 Hz frequency range highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      Response: The mean percentage of 1 second segments rejected for each resting state condition and the percentage of 6.25 long segments rejected in each group for the visual stimulation condition have been added to the revised manuscript (Supplementary Material S10), and referred to in the Methods on Page 14, Lines 372-373).

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      Response: This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which changed in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; VanRullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has now been explicitly stated in the revised manuscript (Page 14, Lines 379-380).<br /> - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      Response: We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the Methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values. Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023). The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former, as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      Response: In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group.

      In the revised manuscript, we added the fit quality metrics (average R<sup>2</sup> values > 0.91 for each group and condition) (Methods Page 15, Lines 395-396; Supplementary Material S11) and additionally show individual subjects’ fits (Supplementary Material S11).“

      (14) The authors mention:

      "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided."

      The authors addressed this comment and adjusted the statement. However, I do not understand, why not the full sample published earlier (Ossandón et al., 2023) was used in the current study?

      The recording of EEG resting state data stated in 2013, while MRS testing could only be set up by the end of 2019. Moreover, not all subjects who qualify for EEG recording qualify for being scanned (e.g. due to MRI safety, claustrophobia)

      References

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Delorme, A. (2023). EEG is better left alone. Scientific Reports, 13(1), 2372. https://doi.org/10.1038/s41598-023-27528-0

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, R., Peterson, E. J., & Voytek, B. (2017). Inferring synaptic excitation/inhibition balance from field potentials. NeuroImage, 158(March), 70–78. https://doi.org/10.1016/j.neuroimage.2017.06.078

      Groppe, D. M., Makeig, S., & Kutas, M. (2009). Identifying reliable independent components via split-half comparisons. NeuroImage, 45(4), 1199–1211. https://doi.org/10.1016/j.neuroimage.2008.12.038

      Gyurkovics, M., Clements, G. M., Low, K. A., Fabiani, M., & Gratton, G. (2021). The impact of 1/f activity and baseline correction on the results and interpretation of time-frequency analyses of EEG/MEG data: A cautionary tale. NeuroImage, 237. https://doi.org/10.1016/j.neuroimage.2021.118192

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Maurer, D., Mondloch, C. J., & Lewis, T. L. (2007). Sleeper effects. In Developmental Science. https://doi.org/10.1111/j.1467-7687.2007.00562.x

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Tanner, D., Norton, J. J. S., Morgan-Short, K., & Luck, S. J. (2016). On high-pass filter artifacts (they’re real) and baseline correction (it’s a good idea) in ERP/ERMF analysis. Journal of Neuroscience Methods, 266, 166–170. https://doi.org/10.1016/j.jneumeth.2016.01.002

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4


      The following is the authors’ response to the original reviews.

      eLife Assessment

      This potentially useful study involves neuro-imaging and electrophysiology in a small cohort of congenital cataract patients after sight recovery and age-matched control participants with normal sight. It aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in the visual cortex. While the findings are taken to suggest the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, the evidence supporting these claims is incomplete. Specifically, small sample sizes, lack of a specific control cohort, and other methodological limitations will likely restrict the usefulness of the work, with relevance limited to scientists working in this particular subfield.

      As pointed out in the public reviews, there are very few human models which allow for assessing the role of early experience on neural circuit development. While the prevalent research in permanent congenital blindness reveals the response and adaptation of the developing brain to an atypical situation (blindness), research in sight restoration addresses the question of whether and how atypical development can be remediated if typical experience (vision) is restored. The literature on the role of visual experience in the development of E/I balance in humans, assessed via Magnetic Resonance Spectroscopy (MRS), has been limited to a few studies on congenital permanent blindness. Thus, we assessed sight recovery individuals with a history of congenital blindness, as limited evidence from other researchers indicated that the visual cortex E/I ratio might differ compared to normally sighted controls.

      Individuals with total bilateral congenital cataracts who remained untreated until later in life are extremely rare, particularly if only carefully diagnosed patients are included in a study sample. A sample size of 10 patients is, at the very least, typical of past studies in this population, even for exclusively behavioral assessments. In the present study, in addition to behavioral assessment as an indirect measure of sensitive periods, we investigated participants with two neuroimaging methods (Magnetic Resonance Spectroscopy and electroencephalography) to directly assess the neural correlates of sensitive periods in humans. The electroencephalography data allowed us to link the results of our small sample to findings documented in large cohorts of both, sight recovery individuals and permanently congenitally blind individuals. As pointed out in a recent editorial recommending an “exploration-then-estimation procedure,” (“Consideration of Sample Size in Neuroscience Studies,” 2020), exploratory studies like ours provide crucial direction and specific hypotheses for future work.

      We included an age-matched sighted control group recruited from the same community, measured in the same scanner and laboratory, to assess whether early experience is necessary for a typical excitatory/inhibitory (E/I) ratio to emerge in adulthood. The present findings indicate that this is indeed the case. Based on these results, a possible question to answer in future work, with individuals who had developmental cataracts, is whether later visual deprivation causes similar effects. Note that even if visual deprivation at a later stage in life caused similar effects, the current results would not be invalidated; by contrast, they are essential to understand future work on late (permanent or transient) blindness.

      Thus, we think that the present manuscript has far reaching implications for our understanding of the conditions under which E/I balance, a crucial characteristic of brain functioning, emerges in humans.

      Finally, our manuscript is one of the first few studies that relate MRS neurotransmitter concentrations to parameters of EEG aperiodic activity. Since present research has been using aperiodic activity as a correlate of the E/I ratio, and partially of higher cognitive functions, we think that our manuscript additionally contributes to a better understanding of what might be measured with aperiodic neurophysiological activity.

      Public Reviews:<br /> Reviewer #1 (Public Review):

      Summary:

      In this human neuroimaging and electrophysiology study, the authors aimed to characterize the effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of the group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then performed multiple exploratory correlations between MRS measures and visual acuity, and reported a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected only two electrodes placed in the visual cortex for analysis and reported a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for a higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel.

      Strengths of study:

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well-written.

      Limitations:

      (1.1) Low sample size. Ten for CC and ten for SC, and a further two SC participants were rejected due to a lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      Applying strict criteria, we only included individuals who were born with no patterned vision in the CC group. The population of individuals who have remained untreated past infancy is small in India, despite a higher prevalence of childhood cataract than Germany. Indeed, from the original 11 CC and 11 SC participants tested, one participant each from the CC and SC group had to be rejected, as their data had been corrupted, resulting in 10 participants in each group.

      It was a challenge to recruit participants from this rare group with no history of neurological diagnosis/intake of neuromodulatory medications, who were able and willing to undergo both MRS and EEG. For this study, data collection took more than 2.5 years.

      We took care of the validity of our results with two measures; first, we assessed not just MRS, but additionally, EEG measures of E/I ratio. The latter allowed us to link results to a larger population of CC individuals, that is, we replicated the results of a larger group of 28 additional individuals (Ossandón et al., 2023) in our sub-group.

      Second, we included a control voxel. As predicted, all group effects were restricted to the occipital voxel.

      (1.2) Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      The existing work on visual deprivation and neurochemical changes, as assessed with MRS, has been limited to permanent congenital blindness. In fact, most of the studies on permanent blindness included only congenitally blind or early blind humans (Coullon et al., 2015; Weaver et al., 2013), or, in separate studies, only late-blind individuals (Bernabeu et al., 2009). Thus, accordingly, we started with the most “extreme” visual deprivation model, sight recovery after congenital blindness. If we had not observed any group difference compared to normally sighted controls, investigating other groups might have been trivial. Based on our results, subsequent studies in late blind individuals, and then individuals with developmental cataracts, can be planned with clear hypotheses.

      (1.3) MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      Worse data quality in the frontal than the visual cortex has been repeatedly observed in the MRS literature, attributable to magnetic field distortions (Juchem & Graaf, 2017) resulting from the proximity of the region to the sinuses (recent example: (Rideaux et al., 2022)). Nevertheless, we chose the frontal control region rather than a parietal voxel, given the potential neurochemical changes in multisensory regions of the parietal cortex due to blindness. Such reorganization would be less likely in frontal areas associated with higher cognitive functions. Further, prior MRS studies of the visual cortex have used the frontal cortex as a control region as well (Pitchaimuthu et al., 2017; Rideaux et al., 2022). In the revised manuscript, we more explicitly inform the reader about this data quality difference between regions in the Methods (Pages 11-12, MRS Data Quality/Table 2) and Discussion (Page 25, Lines 644- 647).

      Importantly, while in the present study data quality differed between the frontal and visual cortex voxel, it did not differ between groups (Supplementary Material S6).  

      Further, we checked that the frontal cortex datasets for Glx and GABA+ concentrations were of sufficient quality: the fit error was below 8.31% in both groups (Supplementary Material S3). For reference, Mikkelsen et al. reported a mean GABA+ fit error of 6.24 +/- 1.95% from a posterior cingulate cortex voxel across 8 GE scanners, using the Gannet pipeline. No absolute cutoffs have been proposed for fit errors. However, MRS studies in special populations (I/E ratio assessed in narcolepsy (Gao et al., 2024), GABA concentration assessed in Autism Spectrum Disorder (Maier et al., 2022) have used frontal cortex data with a fit error of <10% to identify differences between cohorts (Gao et al., 2024; Pitchaimuthu et al., 2017). Based on the literature, MRS data from the frontal voxel of the present study would have been of sufficient quality to uncover group differences.

      In the revised manuscript, we added the recently published MRS quality assessment form to the supplementary materials (Supplementary Excel File S1). Additionally, we would like to allude to our apriori prediction of group differences for the visual cortex, but not for the frontal cortex voxel. Finally, EEG data quality did not differ between frontal and occipital electrodes; therefore, lower sensitivity of frontal measures cannot easily explain the lack of group differences for frontal measures.

      (1.4) Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drive the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience-dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised due to congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      Indeed, higher inhibition was not predicted, which we attempt to reconcile in our discussion section. We base our discussion mainly on the non-human animal literature, which has shown evidence of homeostatic changes after prolonged visual deprivation in the adult brain (Barnes et al., 2015). It is also interesting to note that after monocular deprivation in adult humans, resting GABA+ levels decreased in the visual cortex (Lunghi et al., 2015). Assuming that after delayed sight restoration, adult neuroplasticity mechanisms must be employed, these studies would predict a “balancing” of the increased excitatory drive following sight restoration by a commensurate increase in inhibition (Keck et al., 2017). Additionally, the EEG results of the present study allowed for speculation regarding the underlying neural mechanisms of an altered E/I ratio. The aperiodic EEG activity suggested higher spontaneous spiking (increased intercept) and increased inhibition (steeper aperiodic slope between 1-20 Hz) in CC vs SC individuals (Ossandón et al., 2023).

      In the revised manuscript, we have more clearly indicated that these speculations are based primarily on non-human animal work, due to the lack of human studies on the subject (Page 23, Lines 609-613).

      (1.5) Heterogeneity in the patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The goal of the present study was to assess whether we would observe changes in E/I ratio after restoring vision at all. We would not have included patients without nystagmus in the CC group of the present study, since it would have been unlikely that they experienced congenital patterned visual deprivation. Amongst diagnosticians, nystagmus or strabismus might not be considered genuine “comorbidities” that emerge in people with congenital cataracts. Rather, these are consequences of congenital visual deprivation, which we employed as diagnostic criteria. Similarly, absorbed lenses are clear signs that cataracts were congenital. As in other models of experience dependent brain development (e.g. the extant literature on congenital permanent blindness, including anophthalmic individuals (Coullon et al., 2015; Weaver et al., 2013), some uncertainty remains regarding whether the (remaining, in our case) abnormalities of the eye, or the blindness they caused, are the factors driving neural changes. In case of people with reversed congenital cataracts, at least the retina is considered to be intact, as they would otherwise not receive cataract removal surgery.

      However, we consider it unlikely that strabismus caused the group differences, because the present study shows group differences in the Glx/GABA+ ratio at rest, regardless of eye opening or eye closure, for which strabismus would have caused distinct effects. By contrast, the link between GABA concentration and, for example, interocular suppression in strabismus, have so far been documented during visual stimulation (Mukerji et al., 2022; Sengpiel et al., 2006), and differed in direction depending on the amblyopic vs. non-amblyopic eye. Further, one MRS study did not find group differences in GABA concentration between the visual cortices of 16 amblyopic individuals and sighted controls (Mukerji et al., 2022), supporting that the differences in Glx/GABA+ concentration which we observed were driven by congenital deprivation, and not amblyopia-associated visual acuity or eye movement differences. 

      In the revised manuscript, we discussed the inclusion criteria in more detail, and the aforementioned reasons why our data remains interpretable (Page 5, Lines 143 – 145, Lines 147-149). 

      (1.6) Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones were shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, and not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      In the revised manuscript, we have clearly indicated that the exploratory correlation analyses are reported to put forth hypotheses for future studies (Page 4, Lines 118-128; Page 5, Lines 132-134; Page 25, Lines 644- 647).

      (1.7) P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlate with age.

      The correlation between chronological age and aperiodic intercept was observed across groups, but the correlation between Glx and the intercept of the aperiodic EEG activity was seen only in the CC group, even though the SC group was matched for age. Thus, such a correlation was very unlikely to be predominantly driven by an effect of chronological age.

      In the revised manuscript, we added the linear regressions with age as a covariate (Supplementary Material S16, referred to in the main Results, Page 21, Lines 534-537), demonstrating the significant relationship between aperiodic intercept and Glx concentration in the CC group. 

      (1.8) Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones were shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Figure 4. Yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      In the revised manuscript, we improved the phrasing (Page 5, Lines 130-132) and consistently reported the correlations as exploratory in the Methods and Discussion. We consider the correlation analyses as exploratory due to our sample size and the absence of prior work. However, we did hypothesize that both MRS and EEG markers would concurrently be altered in CC vs SC individuals.

      (1.9) The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      The aperiodic intercept and slope did not differ between CC and SC individuals for Fp1 and Fp2, suggesting the spatial specificity of the results. In the revised manuscript, we added this analysis to the Supplementary Material (Supplementary Material S14) and referred to it in our Results (Page 20, Lines 513-514).

      Further, Glx concentration in the visual cortex did not correlate with the aperiodic intercept in the SC group (Figure 4), suggesting that this relationship was indeed specific to the CC group.

      The data from all electrodes has been analyzed and published in other studies as well (Pant et al., 2023; Ossandón et al., 2023). 

      Reviewer #2 (Public Review):

      Summary:

      The manuscript reports non-invasive measures of activity and neurochemical profiles of the visual cortex in congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts. The declared aim of the study is to find out how restoring visual function after several months or years of complete blindness impacts the balance between excitation and inhibition in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      (2.1) The main issue is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested an increased excitation/Inhibition ratio in the visual cortex of congenitally blind patients; the present study reports a decreased E/I ratio instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      Longitudinal studies would indeed be the best way to test the hypothesis that the lower E/I ratio in the CC group observed by the present study is a consequence of sight restoration.

      We have now explicitly stated this in the Limitations section (Page 25, Lines 654-655).

      However, longitudinal studies involving neuroimaging are an effortful challenge, particularly in research conducted outside of major developed countries and dedicated neuroimaging research facilities. Crucially, however, had CC and SC individuals, as well as permanently congenitally blind vs SC individuals (Coullon et al., 2015; Weaver et al., 2013), not differed on any neurochemical markers, such a longitudinal study might have been trivial. Thus, in order to justify and better tailor longitudinal studies, cross-sectional studies are an initial step.

      (2.2) MR Spectroscopy shows a reduced GLX/GABA ratio in patients vs. sighted controls; however, this finding remains rather isolated, not corroborated by other observations. The difference between patients and controls only emerges for the GLX/GABA ratio, but there is no accompanying difference in either the GLX or the GABA concentrations. There is an attempt to relate the MRS data with acuity measurements and electrophysiological indices, but the explorative correlational analyses do not help to build a coherent picture. A bland correlation between GLX/GABA and visual impairment is reported, but this is specific to the patients' group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - the opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patient group.

      We interpret these findings differently, that is, in the context of experiments from non-human animals and the larger MRS literature (Page 23, Lines 609-611).

      Homeostatic control of E/I balance assumes that the ratio of excitation (reflected here by Glx) and inhibition (reflected here by GABA+) is regulated. Like prior work (Gao et al., 2024, 2024; Narayan et al., 2022; Perica et al., 2022; Steel et al., 2020; Takado et al., 2022; Takei et al., 2016), we assumed that the ratio of Glx/GABA+ is indicative of E/I balance rather than solely the individual neurotransmitter levels. One of the motivations for assessing the ratio vs the absolute concentration is that as per the underlying E/I balance hypothesis, a change in excitation would cause a concomitant change in inhibition, and vice versa, which has been shown in non-human animal work (Fang et al., 2021; Haider et al., 2006; Tao & Poo, 2005) and modeling research (Vreeswijk & Sompolinsky, 1996; Wu et al., 2022). Importantly, our interpretation of the lower E/I ratio is not just from the Glx/GABA+ ratio, but additionally, based on the steeper EEG aperiodic slope (1-20 Hz). 

      As stated in the Discussion section and Response 1.4, we did not expect to see a lower Glx/GABA+ ratio in CC individuals. We discuss the possible reasons for the direction of the correlation with visual acuity and aperiodic offset during passive visual stimulation, and offer interpretations and (testable) hypotheses.

      We interpret the direction of the Glx/GABA+ correlation with visual acuity to imply that patients with highest (compensatory) balancing of the consequences of congenital blindness (hyperexcitation), in light of visual stimulation, are those who recover best. Note, the sighted control group was selected based on their “normal” vision. Thus, clinical visual acuity measures are not expected to sufficiently vary, nor have the resolution to show strong correlations with neurophysiological measures. By contrast, the CC group comprised patients highly varying in visual outcomes, and thus were ideal to investigate such correlations.

      This holds for the correlation between Glx and the aperiodic intercept, as well. Previous work has suggested that the intercept of the aperiodic activity is associated with broadband spiking activity in neural circuits (Manning et al., 2009). Thus, an atypical increase of spiking activity during visual stimulation, as indirectly suggested by “old” non-human primate work on visual deprivation (Hyvärinen et al., 1981) might drive a correlation not observed in healthy populations.

      In the revised manuscript, we have more clearly indicated in the Discussion that these are possible post-hoc interpretations (Page 23, Lines 584-586; Page 24, Lines 609-620; Page 24, Lines 644-647; Pages 25, Lines 650 - 657). We argue that given the lack of such studies in humans, it is all the more important that extant data be presented completely, even if the direction of the effects are not as expected.

      (2.3) For these reasons, the reported findings do not allow us to draw firm conclusions on the relation between EEG parameters and E/I ratio or on the impact of early (vs. late) visual experience on the excitation/inhibition ratio of the human visual cortex.

      Indeed, the correlations we have tested between the E/I ratio and EEG parameters were exploratory, and have been reported as such.

      We have now made this clear in all the relevant parts of the manuscript (Introduction, Page 5, Lines 132-135; Methods, Page 16, Line 415; Results, Page 21, Figure 4; Discussion, Page 22, Line 568, Page 25, Lines 644-645, Page 25, Lines 650-657).

      The goal of our study was not to compare the effects of early vs. late visual experience. The goal was to study whether early visual experience is necessary for a typical E/I ratio in visual neural circuits. We provided clear evidence in favor of this hypothesis. Thus, the present results suggest the necessity of investigating the effects of late visual deprivation. In fact, such research is missing in permanent blindness as well.

      Reviewer #3 (Public Review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods. I have several major concerns in terms of methodological and statistical approaches along with the (over)interpretation of the results. These major concerns are detailed below.

      (3.1) Variability in visual deprivation:

      - The document states a large variability in the duration of visual deprivation (probably also the age at restoration), with significant implications for the sensitivity period's impact on visual circuit development. The variability and its potential effects on the outcomes need thorough exploration and discussion.

      We work with a rare, unique patient population, which makes it difficult to systematically assess the effects of different visual histories while maintaining stringent inclusion criteria such as complete patterned visual deprivation at birth. Regardless, we considered the large variance in age at surgery and time since surgery as supportive of our interpretation: group differences were found despite the large variance in duration of visual deprivation. Moreover, the existing variance was used to explore possible associations between behavior and neural measures, as well as neurochemical and EEG measures.

      In the revised manuscript, we have detailed the advantages (Methods, Page 5, Lines 143 – 145, Lines 147-149; Discussion, Page 26, Lines 677-678) and disadvantages (Discussion, Page 25, Lines 650-657) of our CC sample, with respect to duration of congenital visual deprivation.

      (3.2) Sample size:

      - The small sample size is a major concern as it may not provide sufficient power to detect subtle effects and/or overestimate significant effects, which then tend not to generalize to new data. One of the biggest drivers of the replication crisis in neuroscience.

      We address the small sample size in our Discussion, and make clear that small sample sizes were due to the nature of investigations in special populations. In the revised manuscript, we added the sample sizes of previous studies using MRS in permanently blind individuals (Page 4, Lines 108 - 109). It is worth noting that our EEG results fully align with those of larger samples of congenital cataract reversal individuals (Page 25, Lines 666-676, Supplementary Material S18, S19) (Ossandón et al., 2023), providing us confidence about their validity and reproducibility. Moreover, our MRS results and correlations of those with EEG parameters were spatially specific to occipital cortex measures.

      The main problem with the correlation analyses between MRS and EEG measures is that the sample size is simply too small to conduct such an analysis. Moreover, it is unclear from the methods section that this analysis was only conducted in the patient group (which the reviewer assumed from the plots), and not explained why this was done only in the patient group. I would highly recommend removing these correlation analyses.

      In the revised manuscript, we have more clearly marked the correlation analyses as exploratory (Introduction, Page 4, Lines 118-128 and Page 5, Lines 132-134; Methods Page 16, Line 415; Discussion Page 22, Line 568, Page 24, Lines 644-645, Page 25, Lines 650-657); note that we do not base most of our discussion on the results of these analyses.

      As indicated by Reviewer 1, reporting them allows for deriving more precise hypothesis for future studies. It has to be noted that we investigate an extremely rare population, tested outside of major developed economies and dedicated neuroimaging research facilities. In addition to being a rare patient group, these individuals come from poor communities. Therefore, we consider it justified to report these correlations as exploratory, providing direction for future research.

      (3.3) Statistical concerns:

      - The statistical analyses, particularly the correlations drawn from a small sample, may not provide reliable estimates (see https://www.sciencedirect.com/science/article/pii/S0092656613000858, which clearly describes this problem).

      It would undoubtedly be better to have a larger sample size. We nonetheless think it is of value to the research community to publish this dataset, since 10 multimodal data sets from a carefully diagnosed, rare population, representing a human model for the effects of early experience on brain development, are quite a lot. Sample sizes in prior neuroimaging studies in transient blindness have most often ranged from n = 1 to n = 10. They nevertheless provided valuable direction for future research, and integration of results across multiple studies provides scientific insights. 

      Identifying possible group differences was the goal of our study, with the correlations being an exploratory analysis, which we have clearly indicated in the methods, results and discussion.

      - Statistical analyses for the MRS: The authors should consider some additional permutation statistics, which are more suitable for small sample sizes. The current statistical model (2x2) design ANOVA is not ideal for such small sample sizes. Moreover, it is unclear why the condition (EO & EC) was chosen as a predictor and not the brain region (visual & frontal) or neurochemicals. Finally, the authors did not provide any information on the alpha level nor any information on correction for multiple comparisons (in the methods section). Finally, even if the groups are matched w.r.t. age, the time between surgery and measurement, the duration of visual deprivation, (and sex?), these should be included as covariates as it has been shown that these are highly related to the measurements of interest (especially for the EEG measurements) and the age range of the current study is large.

      In our ANOVA models, the neurochemicals were the outcome variables, and the conditions were chosen as predictors based on prior work suggesting that Glx/GABA+ might vary with eye closure (Kurcyus et al., 2018). The study was designed based on a hypothesis of group differences localized to the occipital cortex, due to visual deprivation. The frontal cortex voxel was chosen to indicate whether these differences were spatially specific. Therefore, we conducted separate ANOVAs based on this study design.

      We have now clarified the motivation for these conditions in the Introduction (Page 4, Lines 122-125) and the Methods (Page 9, Lines 219-224).

      In the revised manuscript, we added the rationale for parametric analyses for our outcomes (Shapiro-Wilk as well as Levene’s tests, Supplementary Material S9). Note that in the Supplementary Materials (S12, S14), we have reported the correlations between visual history metrics and MRS/EEG outcomes, thereby investigating whether the variance in visual history might have driven these results. Specifically, we found a (negative) correlation between visual cortex Glx/GABA+ concentration during eye closure and the visual acuity in the CC group (Figure 2c). None of the other exploratory correlations between MRS/EEG outcomes vs time since surgery, duration of blindness or visual acuity were significant in the CC group (Supplementary Material S12, S15).  

      The alpha level used for the ANOVA models specified in the Methods section was 0.05. The alpha level for the exploratory analyses reported in the main manuscript was 0.008, after correcting for (6) multiple comparisons using the Bonferroni correction, also specified in the Methods. Note that the p-values following correction are expressed as multiplied by 6, due to most readers assuming an alpha level of 0.05 (see response regarding large p-values).

      We used a control group matched for age, recruited and tested in the same institutes, using the same setup. We feel that we followed the gold standards for recruiting a healthy control group for a patient group.

      - EEG statistical analyses: The same critique as for the MRS statistical analyses applies to the EEG analysis. In addition: was the 2x3 ANOVA conducted for EO and EC independently? This seems to be inconsistent with the approach in the MRS analyses, in which the authors chose EO & EC as predictors in their 2x2 ANOVA.

      The 2x3 ANOVA was not conducted independently for the eyes open/eyes closed condition. The ANOVA conducted on the EEG metrics was 2x3 because it had two groups (CC, SC) and three conditions (eyes open (EO), eyes closed (EC) and visual stimulation (LU)) as predictors.

      - Figure 4: The authors report a p-value of >0.999 with a correlation coefficient of -0.42 with a sample size of 10 subjects. This can't be correct (it should be around: p = 0.22). All statistical analyses should be checked.

      As specified in the Methods and Figure legend, the reported p values in Figure 4 have been corrected using the Bonferroni correction, and therefore multiplied by the number of comparisons, leading to the seemingly large values.

      Additionally, to check all statistical analyses, we put the manuscript through an independent Statistics Check (Nuijten & Polanin, 2020) (https://michelenuijten.shinyapps.io/statcheck-web/) and have uploaded the consistency report with the revised Supplementary Material (Supplementary Report 1).

      - Figure 2c. Eyes closed condition: The highest score of the *Glx/GABA ratio seems to be ~3.6. In subplot 2a, there seem to be 3 subjects that show a Glx/GABA ratio score > 3.6. How can this be explained? There is also a discrepancy for the eyes-closed condition.

      The three subjects that show the Glx/GABA+ ratio > 3.6 in subplot 2a are in the SC group, whereas the correlations plotted in figure 2c are only for the CC group, where the highest score is indeed ~3.6.

      (3.4) Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      In the revised manuscript, we have cited those studies not already included in the Introduction (Page 3, Lines 92-94).

      - Especially the aperiodic intercept is a very sensitive measure to many influences (e.g. skull thickness, electrode impedance...). As crucial results (correlation aperiodic intercept and MRS measures) are facing this problem, this needs to be reevaluated. It is safer to make statements on the aperiodic slope than intercept. In theory, some of the potentially confounding measures are available to the authors (e.g. skull thickness can be computed from T1w images; electrode impedances are usually acquired alongside the EEG data) and could be therefore controlled.

      All electrophysiological measures indeed depend on parameters such as skull thickness and electrode impedance. As in the extant literature using neurophysiological measures to compare brain function between patient and control groups, we used a control group matched in age/sex, recruited in the same region, tested with the same devices, and analyzed with the same analysis pipeline. For example, impedance was kept below 10 kOhm for all subjects.

      This is now mentioned in the Methods, Page 13, Line 344.

      There is no evidence available suggesting that congenital cataracts are associated with changes in skull thickness that would cause the observed pattern of group results. Moreover, we cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness.

      - The authors wrote: "Higher frequencies (such as 20-40 Hz) have been predominantly associated with local circuit activity and feedforward signaling (Bastos et al., 2018; Van Kerkoerle et al., 2014); the increased 20-40 Hz slope may therefore signal increased spontaneous spiking activity in local networks. We speculate that the steeper slope of the aperiodic activity for the lower frequency range (1-20 Hz) in CC individuals reflects the concomitant increase in inhibition." The authors confuse the interpretation of periodic and aperiodic signals. This section refers to the interpretation of the periodic signal (higher frequencies). This interpretation cannot simply be translated to the aperiodic signal (slope).

      Prior work has not always separated the aperiodic and periodic components, making it unclear what might have driven these effects in our data. The interpretation of the higher frequency range was intended to contrast with the interpretations of lower frequency range, in order to speculate as to why the two aperiodic fits might go in differing directions. Note that Ossandón et al. reported highly similar results (group differences for CC individuals and for permanently congenitally blind humans) for the aperiodic activity between 20-40 Hz and oscillatory activity in the gamma range.

      In the revised Discussion, we removed this section. We primarily interpret the increased offset and prior findings from fMRI-BOLD data (Raczy et al., 2023) as an increase in broadband neuronal firing.

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in humans, in addition to monkey ECoG (Muthukumaraswamy & Liley, 2018). Further, Medel et al. (now published as Medel et al., 2023) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG from humans.

      In the introduction of the revised manuscript, we have made more explicit that this metric is indirect (Page 3, Line 91), (additionally see Discussion, Page 24, Lines 644-645, Page 25, Lines 650-657).

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged. We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity.

      (3.5) Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      As pointed out in the methods and Figure 1, we only analyzed data from two occipital channels, O1 and O2 neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023). As control sites we added the frontal channels FP1 and Fp2 (see Supplementary Material S14)

      Neither Ossandón et al. (2023) nor Pant et al. (2023) considered frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations (Methods, Page 14, Lines 365-367). The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used spectrum interpolation to remove line noise; the group differences remained stable (Ossandón et al., 2023). We have reported this analysis in the revised manuscript (Page 14, Lines 364-357).

      Further, both groups were measured in the same lab, making line noise (~ 50 Hz) as an account for the observed group effects in the 1-20 Hz frequency range highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      The mean percentage of 1 second segments rejected for each resting state condition and the percentage of 6.25 long segments rejected in each group for the visual stimulation condition have been added to the revised manuscript (Supplementary Material S10), and referred to in the Methods on Page 14, Lines 372-373).

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which changed in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; VanRullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has now been explicitly stated in the revised manuscript (Page 14, Lines 379-380).

      - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the Methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values. Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023). The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former, as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group.

      In the revised manuscript, we added the fit quality metrics (average R<sup>2</sup> values > 0.91 for each group and condition) (Methods Page 15, Lines 395-396; Supplementary Material S11) and additionally show individual subjects’ fits (Supplementary Material S11).

      (3.6) Validity of GABA measurements and results:

      - According the a newer study by the authors of the Gannet toolbox (https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/nbm.5076), the reliability and reproducibility of the gamma-aminobutyric acid (GABA) measurement can vary significantly depending on acquisition and modeling parameter. Thus, did the author address these challenges?

      We took care of data quality while acquiring MRS data by ensuring appropriate voxel placement and linewidth prior to scanning (Page 9, Lines 229-237). We now address this explicitly in the Methods in the “MRS Data Quality” section. Acquisition as well as modeling parameters were constant for both groups, so they cannot have driven group differences.

      The linked article compares the reproducibility of GABA measurement using Osprey (Oeltzschner et al., 2020), which was released in 2020 and uses linear combination modeling to fit the peak, as opposed to Gannet’s simple peak fitting (Hupfeld et al., 2024). The study finds better test-retest reliability for Osprey compared to Gannet’s method.

      As the present work was conceptualized in 2018, we used Gannet 3.0, which was the state-of-the-art edited-spectrum analysis toolbox at the time, and still is widely used.

      In the revised manuscript, we re-analyzed the data using linear combination modeling with Osprey (Oeltzschner et al., 2020), and reported that the main findings remained the same, i.e. the Glx/GABA+ concentration ratio was lower in the visual cortex of congenital cataract reversal individuals compared to normally sighted controls, regardless of whether participants were scanned with eyes open or with eyes closed. Further, NAA concentration did not differ between groups (Supplementary Material S3). Thus, we demonstrate that our findings were robust to analysis pipelines, and state this in the Methods (Page 9, Lines 242-246) and Results (Page 19, Lines 464-467).

      - Furthermore, the authors wrote: "We confirmed the within-subject stability of metabolite quantification by testing a subset of the sighted controls (n=6) 2-4 weeks apart. Looking at the supplementary Figure 5 (which would be rather plotted as ICC or Blant-Altman plots), the within-subject stability compared to between-subject variability seems not to be great. Furthermore, I don't think such a small sample size qualifies for a rigorous assessment of stability.

      Indeed, we did not intend to provide a rigorous assessment of within-subject stability. Rather, we aimed to confirm that data quality/concentration ratios did not systematically differ between the same subjects tested longitudinally; driven, for example, by scanner heating or time of day. As with the phantom testing, we attempted to give readers an idea of the quality of the data, as they were collected from a primarily clinical rather than a research site.

      In the revised manuscript, we have removed the statement regarding stability and the associated section.

      - "Why might an enhanced inhibitory drive, as indicated by the lower Glx/GABA ratio" Is this interpretation really warranted, as the results of the group differences in the Glx/GABA ratio seem to be rather driven by a decreased Glx concentration in CC rather than an increased GABA (see Figure 2).

      We used the Glx/GABA+ ratio as a measure, rather than individual Glx or GABA+ concentration, which did not significantly differ between groups. As detailed in Response 2.2, we think this metric aligns better with an underlying E/I balance hypothesis and has been used in many previous studies (Gao et al., 2024; Liu et al., 2015; Narayan et al., 2022; Perica et al., 2022).

      Our interpretation of an enhanced inhibitory drive additionally comes from the combination of aperiodic EEG (1-20 Hz) and MRS measures, which, when considered together, are consistent with a decreased E/I ratio.

      In the revised manuscript, we have rewritten the Discussion and removed this section.   

      - Glx concentration predicted the aperiodic intercept in CC individuals' visual cortices during ambient and flickering visual stimulation. Why specifically investigate the Glx concentration, when the paper is about E/I ratio?

      As stated in the methods, we exploratorily assessed the relationship between all MRS parameters (Glx, GABA+ and Glx/GABA+ ratio) with the aperiodic parameters (slope, offset), and corrected for multiple comparisons accordingly. We think this is a worthwhile analysis considering the rarity of the dataset/population (see 1.2, 1.6, 2.1 and Reviewer 1’s comments about future hypotheses). We only report the Glx – aperiodic intercept correlation in the main manuscript as it survived correction for multiple comparisons.

      (3.7) Interpretation of the correlation between MRS measurements and EEG aperiodic signal:

      - The authors wrote: "The intercept of the aperiodic activity was highly correlated with the Glx concentration during rest with eyes open and during flickering stimulation (also see Supplementary Material S11). Based on the assumption that the aperiodic intercept reflects broadband firing (Manning et al., 2009; Winawer et al., 2013), this suggests that the Glx concentration might be related to broadband firing in CC individuals during active and passive visual stimulation." These results should not be interpreted (or with very caution) for several reasons (see also problem with influences on aperiodic intercept and small sample size). This is a result of the exploratory analyses of correlating every EEG parameter with every MRS parameter. This requires well-powered replication before any interpretation can be provided. Furthermore and importantly: why should this be specifically only in CC patients, but not in the SC control group?

      We have indicated clearly in all parts of the manuscript that these correlations are presented as exploratory. Further, we interpret the Glx-aperiodic offset correlation, and none of the others, as it survived the Bonferroni correction for multiple comparisons. We offer a hypothesis in the Discussion as to why such a correlation might exist in the CC but not the SC group (see response 2.2), and do not speculate further.

      (3.8) Language and presentation:

      - The manuscript requires language improvements and correction of numerous typos. Over-simplifications and unclear statements are present, which could mislead or confuse readers (see also interpretation of aperiodic signal).

      In the revised manuscript, we have checked that speculations are clearly marked, and typos are removed.

      - The authors state that "Together, the present results provide strong evidence for experience-dependent development of the E/I ratio in the human visual cortex, with consequences for behavior." The results of the study do not provide any strong evidence, because of the small sample size and exploratory analyses approach and not accounting for possible confounding factors.

      We disagree with this statement and allude to convergent evidence of both MRS and neurophysiological measures. The latter link to corresponding results observed in a larger sample of CC individuals (Ossandón et al., 2023). In the revised manuscript, we have rephrased the statement as “to provide initial evidence” (Page 22, Line 676).

      - "Our results imply a change in neurotransmitter concentrations as a consequence of *restoring* vision following congenital blindness." This is a speculative statement to infer a causal relationship on cross-sectional data.

      As mentioned under 2.1, we conducted a cross-sectional study which might justify future longitudinal work. In order to advance science, new testable hypotheses were put forward at the end of a manuscript.

      In the revised manuscript, we rephrased the sentence and added “might imply” to better indicate the hypothetical character of this idea (Page 22, Lines 586-587).

      - In the limitation section, the authors wrote: "The sample size of the present study is relatively high for the rare population , but undoubtedly, overall, rather small." This sentence should be rewritten, as the study is plein underpowered. The further justification "We nevertheless think that our results are valid. Our findings neurochemically (Glx and GABA+ concentration), and anatomically (visual cortex) specific. The MRS parameters varied with parameters of the aperiodic EEG activity and visual acuity. The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) (Ossandón et al., 2023), and effects of chronological age were as expected from the literature." These statements do not provide any validation or justification of small samples. Furthermore, the current data set is a subset of an earlier published paper by the same authors "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided.

      Our intention was not to justify having a small sample, but to justify why we think the results might be valid as they align with/replicate existing literature.

      In the revised manuscript, we added a figure showing that the EEG results of the 10 subjects considered here correspond to those of the 28 other subjects of Ossandón et al (Supplementary Material S18). We adapted the text accordingly, clearly stating that the pattern of EEG results of the ten subjects reported here replicate those of the 28 additional subjects of Ossandón et al. (2023) (Page 25, Lines 671-672).

      References (Public Review)

      Barnes, S. J., Sammons, R. P., Jacobsen, R. I., Mackie, J., Keller, G. B., & Keck, T. (2015). Subnetwork-specific homeostatic plasticity in mouse visual cortex in vivo. Neuron, 86(5), 1290–1303. https://doi.org/10.1016/J.NEURON.2015.05.010

      Bernabeu, A., Alfaro, A., García, M., & Fernández, E. (2009). Proton magnetic resonance spectroscopy (1H-MRS) reveals the presence of elevated myo-inositol in the occipital cortex of blind subjects. NeuroImage, 47(4), 1172–1176. https://doi.org/10.1016/j.neuroimage.2009.04.080

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Consideration of Sample Size in Neuroscience Studies. (2020). Journal of Neuroscience, 40(21), 4076–4077. https://doi.org/10.1523/JNEUROSCI.0866-20.2020

      Coullon, G. S. L., Emir, U. E., Fine, I., Watkins, K. E., & Bridge, H. (2015). Neurochemical changes in the pericalcarine cortex in congenital blindness attributable to bilateral anophthalmia. Journal of Neurophysiology. https://doi.org/10.1152/jn.00567.2015

      Fang, Q., Li, Y. T., Peng, B., Li, Z., Zhang, L. I., & Tao, H. W. (2021). Balanced enhancements of synaptic excitation and inhibition underlie developmental maturation of receptive fields in the mouse visual cortex. Journal of Neuroscience, 41(49), 10065–10079. https://doi.org/10.1523/JNEUROSCI.0442-21.2021

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, Y., Liu, Y., Zhao, S., Liu, Y., Zhang, C., Hui, S., Mikkelsen, M., Edden, R. A. E., Meng, X., Yu, B., & Xiao, L. (2024). MRS study on the correlation between frontal GABA+/Glx ratio and abnormal cognitive function in medication-naive patients with narcolepsy. Sleep Medicine, 119, 1–8. https://doi.org/10.1016/j.sleep.2024.04.004

      Haider, B., Duque, A., Hasenstaub, A. R., & McCormick, D. A. (2006). Neocortical network activity in vivo is generated through a dynamic balance of excitation and inhibition. Journal of Neuroscience. https://doi.org/10.1523/JNEUROSCI.5297-05.2006

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Hupfeld, K. E., Zöllner, H. J., Hui, S. C. N., Song, Y., Murali-Manohar, S., Yedavalli, V., Oeltzschner, G., Prisciandaro, J. J., & Edden, R. A. E. (2024). Impact of acquisition and modeling parameters on the test–retest reproducibility of edited GABA+. NMR in Biomedicine, 37(4), e5076. https://doi.org/10.1002/nbm.5076

      Hyvärinen, J., Carlson, S., & Hyvärinen, L. (1981). Early visual deprivation alters modality of neuronal responses in area 19 of monkey cortex. Neuroscience Letters, 26(3), 239–243. https://doi.org/10.1016/0304-3940(81)90139-7

      Juchem, C., & Graaf, R. A. de. (2017). B0 magnetic field homogeneity and shimming for in vivo magnetic resonance spectroscopy. Analytical Biochemistry, 529, 17–29. https://doi.org/10.1016/j.ab.2016.06.003

      Keck, T., Hübener, M., & Bonhoeffer, T. (2017). Interactions between synaptic homeostatic mechanisms: An attempt to reconcile BCM theory, synaptic scaling, and changing excitation/inhibition balance. Current Opinion in Neurobiology, 43, 87–93. https://doi.org/10.1016/J.CONB.2017.02.003

      Kurcyus, K., Annac, E., Hanning, N. M., Harris, A. D., Oeltzschner, G., Edden, R., & Riedl, V. (2018). Opposite Dynamics of GABA and Glutamate Levels in the Occipital Cortex during Visual Processing. Journal of Neuroscience, 38(46), 9967–9976. https://doi.org/10.1523/JNEUROSCI.1214-18.2018

      Liu, B., Wang, G., Gao, D., Gao, F., Zhao, B., Qiao, M., Yang, H., Yu, Y., Ren, F., Yang, P., Chen, W., & Rae, C. D. (2015). Alterations of GABA and glutamate-glutamine levels in premenstrual dysphoric disorder: A 3T proton magnetic resonance spectroscopy study. Psychiatry Research - Neuroimaging, 231(1), 64–70. https://doi.org/10.1016/J.PSCYCHRESNS.2014.10.020

      Lunghi, C., Berchicci, M., Morrone, M. C., & Russo, F. D. (2015). Short‐term monocular deprivation alters early components of visual evoked potentials. The Journal of Physiology, 593(19), 4361. https://doi.org/10.1113/JP270950

      Maier, S., Düppers, A. L., Runge, K., Dacko, M., Lange, T., Fangmeier, T., Riedel, A., Ebert, D., Endres, D., Domschke, K., Perlov, E., Nickel, K., & Tebartz van Elst, L. (2022). Increased prefrontal GABA concentrations in adults with autism spectrum disorders. Autism Research, 15(7), 1222–1236. https://doi.org/10.1002/aur.2740

      Manning, J. R., Jacobs, J., Fried, I., & Kahana, M. J. (2009). Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 29(43), 13613–13620. https://doi.org/10.1523/JNEUROSCI.2041-09.2009

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Mukerji, A., Byrne, K. N., Yang, E., Levi, D. M., & Silver, M. A. (2022). Visual cortical γ−aminobutyric acid and perceptual suppression in amblyopia. Frontiers in Human Neuroscience, 16. https://doi.org/10.3389/fnhum.2022.949395

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Narayan, G. A., Hill, K. R., Wengler, K., He, X., Wang, J., Yang, J., Parsey, R. V., & DeLorenzo, C. (2022). Does the change in glutamate to GABA ratio correlate with change in depression severity? A randomized, double-blind clinical trial. Molecular Psychiatry, 27(9), 3833—3841. https://doi.org/10.1038/s41380-022-01730-4

      Nuijten, M. B., & Polanin, J. R. (2020). “statcheck”: Automatically detect statistical reporting inconsistencies to increase reproducibility of meta-analyses. Research Synthesis Methods, 11(5), 574–579. https://doi.org/10.1002/jrsm.1408

      Oeltzschner, G., Zöllner, H. J., Hui, S. C. N., Mikkelsen, M., Saleh, M. G., Tapper, S., & Edden, R. A. E. (2020). Osprey: Open-source processing, reconstruction & estimation of magnetic resonance spectroscopy data. Journal of Neuroscience Methods, 343, 108827. https://doi.org/10.1016/j.jneumeth.2020.108827

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Perica, M. I., Calabro, F. J., Larsen, B., Foran, W., Yushmanov, V. E., Hetherington, H., Tervo-Clemmens, B., Moon, C.-H., & Luna, B. (2022). Development of frontal GABA and glutamate supports excitation/inhibition balance from adolescence into adulthood. Progress in Neurobiology, 219, 102370. https://doi.org/10.1016/j.pneurobio.2022.102370

      Pitchaimuthu, K., Wu, Q. Z., Carter, O., Nguyen, B. N., Ahn, S., Egan, G. F., & McKendrick, A. M. (2017). Occipital GABA levels in older adults and their relationship to visual perceptual suppression. Scientific Reports, 7(1). https://doi.org/10.1038/S41598-017-14577-5

      Rideaux, R., Ehrhardt, S. E., Wards, Y., Filmer, H. L., Jin, J., Deelchand, D. K., Marjańska, M., Mattingley, J. B., & Dux, P. E. (2022). On the relationship between GABA+ and glutamate across the brain. NeuroImage, 257, 119273. https://doi.org/10.1016/J.NEUROIMAGE.2022.119273

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Sengpiel, F., Jirmann, K.-U., Vorobyov, V., & Eysel, U. T. (2006). Strabismic Suppression Is Mediated by Inhibitory Interactions in the Primary Visual Cortex. Cerebral Cortex, 16(12), 1750–1758. https://doi.org/10.1093/cercor/bhj110

      Steel, A., Mikkelsen, M., Edden, R. A. E., & Robertson, C. E. (2020). Regional balance between glutamate+glutamine and GABA+ in the resting human brain. NeuroImage, 220. https://doi.org/10.1016/J.NEUROIMAGE.2020.117112

      Takado, Y., Takuwa, H., Sampei, K., Urushihata, T., Takahashi, M., Shimojo, M., Uchida, S., Nitta, N., Shibata, S., Nagashima, K., Ochi, Y., Ono, M., Maeda, J., Tomita, Y., Sahara, N., Near, J., Aoki, I., Shibata, K., & Higuchi, M. (2022). MRS-measured glutamate versus GABA reflects excitatory versus inhibitory neural activities in awake mice. Journal of Cerebral Blood Flow & Metabolism, 42(1), 197. https://doi.org/10.1177/0271678X211045449

      Takei, Y., Fujihara, K., Tagawa, M., Hironaga, N., Near, J., Kasagi, M., Takahashi, Y., Motegi, T., Suzuki, Y., Aoyama, Y., Sakurai, N., Yamaguchi, M., Tobimatsu, S., Ujita, K., Tsushima, Y., Narita, K., & Fukuda, M. (2016). The inhibition/excitation ratio related to task-induced oscillatory modulations during a working memory task: A multtimodal-imaging study using MEG and MRS. NeuroImage, 128, 302–315. https://doi.org/10.1016/J.NEUROIMAGE.2015.12.057

      Tao, H. W., & Poo, M. M. (2005). Activity-dependent matching of excitatory and inhibitory inputs during refinement of visual receptive fields. Neuron, 45(6), 829–836. https://doi.org/10.1016/J.NEURON.2005.01.046

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Vreeswijk, C. V., & Sompolinsky, H. (1996). Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science, 274(5293), 1724–1726. https://doi.org/10.1126/SCIENCE.274.5293.1724

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4

      Weaver, K. E., Richards, T. L., Saenz, M., Petropoulos, H., & Fine, I. (2013). Neurochemical changes within human early blind occipital cortex. Neuroscience. https://doi.org/10.1016/j.neuroscience.2013.08.004

      Wu, Y. K., Miehl, C., & Gjorgjieva, J. (2022). Regulation of circuit organization and function through inhibitory synaptic plasticity. Trends in Neurosciences, 45(12), 884–898. https://doi.org/10.1016/J.TINS.2022.10.006

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for The Authors):

      Thank you for the interesting submission. I have inserted my comments to the authors here. Some of them will be more granular comments related to the concerns raised in the public review.

      (1) Introduction:

      Could you please justify the rationale for using eyes open and eyes closed in the MRS condition, and the use of the three different conditions in the EEG experiment? If these resulted in negative findings, then the implications should be discussed.

      Previous work with MRS in sighted individuals has suggested that eye opening in darkness results in a decrease of visual cortex GABA+ concentration, while visual stimulation results in an increase of Glx concentration, compared to a baseline concentration at eye closure (Kurcyus et al., 2018). Moreover visual stimulation/eye opening is known to result in an alpha desynchronization (Adrian & Matthews, 1934).

      While previous work of our group has shown significantly reduced alpha oscillatory activity in congenital cataract reversal individual, desynchronization following eye opening was indistinguishable when compared to normally sighted controls (Ossandón et al., 2023; Pant et al., 2023).

      Thus, we decided to include both conditions to test whether a similar pattern of results would emerge for GABA+/Glx concentration.

      We added our motivation to the Introduction of the revised manuscript (Page 4, Lines 122-125) along with the Methods (Page 9, Lines 219-223).

      It does not become clear from the introduction why a higher intercept is predicted in the EEG measure. The rationale for this hypothesis needs to be explained better.

      Given the prior findings suggesting an increased E/I ratio in CC individuals and the proposed link between neuronal firing (Manning et al., 2009) and the aperiodic intercept, we expected a higher intercept for the CC compared to the SC group.

      We have now added this explanation to the Introduction (Page 4, Lines 126-128).

      (2) Participants

      Were participants screened for common MRS exclusion criteria such as history of psychiatric conditions or antidepressant medication, which could alter neurochemistry? If not, then this needs to be pointed out.

      All participants were clinically screened at the LV Prasad Eye Institute, and additionally self-reported no neurological or psychiatric conditions or medications. Moreover, all subjects were screened based exclusion criteria for being scanned using the standard questionnaire of the radiology center.

      We have now made this clear in the Methods (Page 7, Lines 168-171).

      Table 1 needs to show the age of the participant, which can only be derived by adding the columns 'duration of deprivation' and 'time since surgery'. Table 1 also needs to include the controls.

      We have accordingly modified Table 1 in the revised manuscript and added age for the patients as well as the controls (Table 1, Pages 6-7).

      The control cohort is not specific enough to exclude reduced visual acuity, or co-morbidities, as the primary driver of the differences between groups. Ideally, a cohort with developmental cataracts is recruited. Normally sighted participants as a control cohort cannot distinguish between different types of sight loss, or stages of plasticity.

      The goal of this study was not to distinguish between different types of sight loss or stages of plasticity. We aimed to assess whether the most extreme forms of visual deprivation (i.e. congenital and total patterned vision loss) affected the E/I ratio. Low visual acuity and nystagmus are genuine diagnostic criteria (Methods, Page 5, Lines 142-145). Visual acuity cannot solely explain the current findings, since the MRS data were acquired both with eyes closed or diffuse visual stimulation in a dimly lit room, without any visual task.

      With the awareness of the present results, we consider it worthwhile for the future to investigate additional groups such as developmental cataract-reversal individuals, to narrow down the contribution of the age of onset and degree of visual deprivation to the observed group differences.

      (3) Data collection and analysis

      - More detail is needed: how long were the sessions, how long was each part?

      We have added this information on Page 7, Lines 178-181 of the Methods. MRS scanning took between 45 and 60 minutes, EEG testing took 20 minutes excluding the time for capping, and visual acuity testing took 3-5 minutes.

      - It should be mentioned here that the EEG data is a reanalysis of a subset of legacy data, published previously in Ossandón et al., 2023; Pant et al., 2023.

      In the revised manuscript, we explicitly state at the beginning of the “Electrophysiology recordings” section of the Methods (Page 13, Lines 331-334) that the EEG datasets were a subset of previously published data.

      (4) MRS Spectroscopy

      - Please fill out the minimum reporting standards form (Lin et al., 2021), or report all the requested measures in the main document https://pubmed.ncbi.nlm.nih.gov/33559967/

      We have now filled out this form and added it as Supplementary Material (Supplementary Excel File 1). Additionally, all the requested information has been moved to the Methods section of the main document (MRS Data Quality, Pages 10-12).

      - Information on how the voxels were placed is missing. The visual cortex voxel is not angled parallel to the calcarine, as is a common way to capture processing in the early visual cortex. Describe in the paper what the criteria for successful placement were, and how was it ensured that non-brain tissue was avoided in a voxel of this size.

      Voxel placement was optimized in each subject to avoid the meninges, ventricles, skull and subcortical structures, ensured by examining the voxel region across slices in the acquired T1 volume for each subject. Saturation bands were placed to nullify the skull signal during MRS acquisition, at the anterior (frontal) and posterior (visual) edge of the voxel for every subject. Due to limitations in the clinical scanner rotated/skewed voxels were not possible, and thus voxels were not always located precisely parallel to the calcarine.

      We have added this information to Page 9 (Lines 229-237) of the revised manuscript.

      - Figure 1. shows voxels that are very close to the edge of the brain (frontal cortex) or to the tentorium (visual cortex). Could the authors please calculate the percentage overlap between the visual cortex MRS voxel and the visual cortex, and compare them across groups to ensure that there is no between-group bias from voxel placement?

      We have now added the requested analysis to Supplementary Material S2 and referred to it in the main manuscript on Page 9, Lines 236-237.

      Briefly, the percentage overlap with areas V1-V6 in every individual subject’s visual cortex voxel was 60% or more; the mean overlap in the CC group was 67% and the SC group 70%. The percentage overlap did not differ between groups ( t-test (t(18) = -1.14, p = 0.269)).

      - Figure 1. I would recommend displaying data on a skull-stripped image to avoid identifying information from the participant's T1 profile.

      We have now replaced the images in Figure 1 with skull-stripped images. Note that images from SPM12 were used instead of GannetCoregister, as GannetCoregister only displays images with the skull.

      - Please show more rigor with the MRS quality measures. Several examples of inconsistency and omissions are below.

      • SNR was quantified and shows a difference in SNR between voxel positions, with lower SNR in the frontal cortex. No explanation or discussion of the difference was provided.

      • Looking at S1, the linewidth of NAA seems to be a lot broader in the frontal cortex than in the visual cortex. The figures suggest that acquisition quality was very different between voxel locations, making the comparison difficult.

      • Linewidth of NAA is a generally agreed measure of shim quality in megapress acquisitions (Craven et al., 2022).

      The data quality difference between the frontal and visual cortices has been observed in the literature (Juchem & Graaf, 2017; Rideaux et al., 2022). We nevertheless chose a frontal cortex voxel as control site instead of the often-chosen sensorimotor cortex. The main motivation was to avoid any cortical region linked to sensory processing since crossmodal compensation as a consequence of visual deprivation is a well-documented phenomenon.

      We now make this clearer in the Methods (Page 11, Lines 284 – 299), in the Discussion/Limitations (Page 25, Lines 662 - 665).  

      - To get a handle on the data quality, I would recommend that the authors display their MRS quality measures in a separate section 'MRS quality measure', including NAA linewidth, NAA SNR, GABA+ CRLB, Glx CRLB, and test for the main effects and interaction of voxel location (VC, FC) and group (SC, CC) and discuss any discrepancies.

      We have moved all the quality metric values for GABA+, Glx and NAA from the supplement to the Methods section (see Table 2), and added the requested section titled “MRS Data quality.”

      We have conducted the requested analyses and reported them in Supplementary Material S6: there was a strong effect of region confirming that data quality was better in the visual than frontal region. We have referred to this in the main manuscript on Page 11, Line 299.

      In the revised manuscript, we discuss the data quality in the frontal cortex, and how we ensured it was comparable to prior work. Moreover, there were no significant group effects, or group-by-region interactions, suggesting that group differences observed for the visual cortex voxel cannot be accounted for by differences in data quality. We now included a section on data quality, both in the Methods (Page 11, Lines 284 – 299), and the limitations section of the Discussion (Page 25, Lines 662 - 665).

      Please clarify the MRS acquisition, "Each MEGA- PRESS scan lasted for 8 minutes and was acquired with the following specifications: TR = 2000 ms, TE = 68 ms, Voxel size = 40 mm x 30 mm x 25mm, 192 averages (each consists of two TRs). "192 averages x 2 TRs x 2s TR = 12.8 min, not 8 min, apologies if I have misunderstood these details.

      We have corrected this error in the revised manuscript and stated the parameters more clearly – there were a total of 256 averages, resulting in an (256 repetitions with 1 TR * 2 s/60) 8.5-minute scan (Page 8, Lines 212-213).

      - What was presented to participants in the eyes open MRS? Was it just normal room illumination or was it completely dark? Please add details to your methods.

      The scans were conducted in regular room illumination, with no visual stimulation.

      We have now clarified this on Page 9 (Lines 223-224) of the Methods.

      (5) MRS analysis

      How was the tissue fraction correction performed? Please add or refer to the exact equation from Harris et al., 2015.

      We have clarified that the reported GABA+/Glx values are water-normalized alpha corrected values (Page 10, Line 249), and cited Harris et al., 2015 on Page 10 (Line 251) of the Methods.

      (6) Statistical approach

      How was the sample size determined? Please add your justification for the sample size

      We collected as many qualifying patients as we were able to recruit for this study within 2.5 years of data collection (commencing August 2019, ending February 2022), given the constraints of the patient population and the pandemic. We have now made this clear in the Discussion (Page 25, Lines 650-652).

      Please report the tests for normality.

      We have now reported the Shapiro-Wilk test results for normality as well as Levene’s test for homogeneity of variance between groups for every dependent variable in our dataset in Supplementary Material S9, and added references to it in the descriptions of the statistical analyses (Methods, Page13, Lines 326-329 and Page 15, Lines 400-402).

      Calculate the Bayes Factor where possible.

      As our analyses are all frequentist, instead of re-analyzing the data within a Bayesian framework, we added partial eta squared values for all the reported ANOVAs (η<sub>p</sub><sup>²</sup>) for readers to get an idea of the effect size (Results).

      I recommend partial correlations to control for the influence of age, duration, and time of surgery, rather than separate correlations.

      Given the combination of small sample size and the expected multicollinearity in our variables (duration of blindness, for example, would be expected to correlate with age, as well as visual acuity post-surgery), partial correlations could not be calculated on this data.

      We are aware of the limits of correlational analyses. Given the unique data set of a rare population we had exploratorily planned to relate behavioral, EEG and MRS parameters by calculating correlations. Since no similar data existed when we started (and to the best of our knowledge our data set is still unique), these correlation analyses were explorative, but the most transparent to run.

      We have now clearly outlined these limitations in our Introduction (Page 5, Lines 133-135), Methods (Page 15, Lines 408-410) and Discussion section (Page 24, Line 634, Page 25, Lines 652-65) to ensure that the results are interpreted with appropriate caution.

      (7) Visual acuity

      Is the VA monocular average, from the dominant eye, or bilateral?

      We have now clarified that the VA reported here is bilateral (Methods, Page 7 Line 165 and Page 15, Line 405). Bilateral visual acuity in congenital cataract-reversal individuals typically corresponds to the visual acuity of the best eye.

      It is mentioned here that correlations with VA are exploratory, please be consistent as the introduction mentions that there was a hypothesis that you sought to test.

      We have now accordingly modified the Introduction (Page 5, Lines 133-135) and added the appropriate caveats in the discussion with regards to interpretations (Page 25, Lines 652-665).

      (8) Correlation analyses between MRS and EEG

      It is mentioned here that correlations between EEG and MRS are exploratory, please consistently point out the exploratory nature, as these results are preliminary and should not be overinterpreted ("We did not have prior hypotheses as to the best of our knowledge no extant literature has tested the correlation between aperiodic EEG activity and MRS measures of GABA+,Glx and Glx/GABA+." ).

      In the revised manuscript, we explicitly state the reported associations between EEG (aperiodic component) and MRS parameters allow for putting forward directed / more specific hypotheses for future studies (Introduction, Page 5, Lines 133-135; Methods, Page 15, Line 415. Discussion, Page 25, Lines 644-645 and Lines 652-665).

      (9) Results

      Figure 2 uses the same y-axis for the visual cortex and frontal cortex to facilitate a comparison between the two locations. Comparing Figure 2 a with b demonstrates poorer spectral peaks and reduced amplitudes. Lower spectral quality in the frontal cortex voxel could contribute to the absence of a group effect in the control voxel location. The major caveat that spectral quality differs between voxels needs to be pointed out and the limitations thereof discussed.

      We have now explicitly pointed out this issue in the Methods (MRS Data Quality, Supplementary Material S6) and Discussion in the Limitations section (Page 25, Lines 662-665). While data quality was lower for the frontal compared to the visual cortex voxels, as has been observed previously (Juchem & Graaf, 2017; Rideaux et al., 2022), this was not an issue for the EEG recordings. Thus, lower sensitivity of frontal measures cannot easily explain the lack of group differences for frontal measures. Crucially, data quality did not differ between groups.

      The results in 2c are the result of multiple correlations with metabolite values ("As in previous studies, we ran a number of exploratory correlation analyses between GABA+, Glx, and Glx/GABA+ concentrations, and visual acuity at the date of testing, duration of visual deprivation, and time since surgery respectively in the CC group"), it seems at least six for the visual acuity measure (VA vs Glx, VA vs GABA+, VA vs Glx/GABA+ x 2 conditions). While the trends are interesting, they should be interpreted with caution because of the exploratory nature, small sample size, the lack of multiple comparison correction, and the influence of two extreme data points. The authors should not overinterpret these results and should point out the need for replication.

      See response to (6) last section, which we copy here for convenience:

      We are aware of the limits of correlational analyses. Given the unique data set of a rare population we exploratorily related behavioral, EEG and MRS parameters by calculating correlations. Since no similar data existed when we started (and to the best of our knowledge our data set is still unique), these correlation analyses were explorative, but the most transparent to run.

      We have now clearly outlined these limitations in our Discussion section to ensure that the results are interpreted with appropriate caution (Discussion, Page 25, Lines 644-645 and Lines 652-665).

      (10) Discussion:

      Please explain the decrease in E/I balance from MRS in view of recent findings on an increase in E/I balance in CC using RSN-fMRI (Raczy et al., 2022) and EEG (Ossandon et al. 2023).

      We have edited our Abstract (Page 1-2, Lines 31-35) and Discussion (Page 23, Lines 584-590; Page 24, Lines 613-620). In brief, we think our results reflect a homeostatic regulation of E/I balance, that is, an increase in inhibition due to an increase in stimulus driven excitation following sight restoration.

      Names limitations but does nothing to mitigate concerns about spatial specificity. The limitations need to be rewritten to include differences in SNR between the visual cortex and frontal lobe. Needs to include caveats of small samples, including effect inflation.

      We have now discussed the data quality differences between the visual and frontal cortex voxel in MRS data quality, which we find irrespective of group (MRS Data Quality, Supplementary Material S6). We also reiterate why this might not explain our results; data quality was comparable to prior studies which have found group differences in frontal cortex (Methods Page 11, Lines 284 – 299), and data quality did not differ between groups. Further, EEG data quality did not differ across frontal and occipital regions, but group differences in EEG datasets were localized to the occipital cortex.

      Reviewer #2 (Recommendations for The Authors):

      To address the main weakness, the authors could consider including data from a third group, of congenitally blind individuals. Including this would go a very long way towards making the findings interpretable and relating them to the rest of the literature.

      Unfortunately, recruitment of these groups was not possible due to the pandemic. Indeed, we would consider a pre- vs post- surgery approach the most suitable design in the future, which, however, will require several years to be completed. Such time and resource intensive longitudinal studies are justified by the present cross-sectional results.

      We have explicitly stated our contribution and need for future studies in the Limitations section of the Discussion (Page 25, Lines 650-657).

      Analysing the amplitude of alpha rhythms, as well as the other "aperiodic" components, would be useful to relate the profile of the tested patients with previous studies. Visual inspection of Figure 3 suggests that alpha power with eyes closed is not reduced in the patients' group compared to the controls. This would be inconsistent with previous studies (including research from the same group) and it could suggest that the small selected sample is not really representative of the sight-recovery population - certainly one of the most heterogeneous study populations. This further highlights the difficulty of drawing conclusions on the effects of visual experience merely based on this N=10 set of patients.

      Alpha power was indeed reduced in the present subsample of 10 CC individuals (Supplementary Material S19). A possible source of the confusion (that the graphs of the CC and SC group look so similar for the EC condition in Figure 3) likely is that the spectra are shown with aperiodic components not yet removed, and scales to accommodate very different alpha power values. As documented in Supplementary Material S18 and S19, alpha power and the aperiodic intercept/slope results of the resting state data in the present 10 CC individuals correspond to the results from a larger sample of CC individuals (n = 28) in Ossandón et al., 2023. We explicitly highlight this “replication” in the main manuscript (Page 25 -26, Lines 671-676). Thus, the present sub-sample of CC individuals are representative for their population.

      To further characterise the MRS results, the authors may consider an alternative normalisation scheme. It is not clear whether the lack of significant GABA and GLX differences in the face of a significant group difference in the GLX/GABA ratio is due to the former measures being noisier since taking the ratio between two metabolites often helps reduce inter-individual variability and thereby helps revealing group differences. It remains an open question whether the GABA or GLX concentrations would show significant group differences after appropriate normalisation (e.g. NAA?).

      We repeated the analysis with Creatine-normalized values of GABA+ and Glx, and the main results i.e. reduced Glx/GABA+ concentration in the visual cortex of CC vs SC individuals, and no such difference in the frontal cortex, remained the same (Supplementary Material S5).

      Further, we re-analyzed the data using Osprey, an open-source toolbox that uses linear combination modeling, and found once more that our results did not change (Supplementary Material S3). We refer to these findings in the Methods (Page 10, Lines 272-275) and Results (Page 10, Lines 467-471) of the main manuscript.

      In fact, the Glx concentration in the visual cortex of CC vs SC individuals was significantly decreased when Cr-normalized values were used (which was not significant in the original analysis). However, we do not interpret this result as it was not replicated with the water-normalized values from Gannet or Osprey.

      I suggest revising the discussion to present a more balanced picture of the existent evidence of the relation between E/I and EEG indices. Although there is evidence that the 1/f slope changes across development, in a way that could be consistent with a higher slope reflecting more immature and excitable tissue, the link with cortical E/I is far from established, especially when referring to specific EEG indices (intercept vs. slope, measured in lower vs. higher frequency ranges).

      We have revised the Introduction (Page 4, Line 91, Lines 101-102) and Discussion (Page 22, Lines 568-569, Page 24, Lines 645-647 and Lines 654-657) in the manuscript accordingly; we allude to the fact that the links between cortical E/I and aperiodic EEG indices have not yet been unequivocally established in the literature.

      Minor:

      - The authors estimated NAA concentration with different software than the one used to estimate GLX and GABA; this examined the OFF spectra only; I suggest that the authors consider running their analysis with LCModel, which would allow a straightforward approach to estimate concentrations of all three metabolites from the same edited spectrum and automatically return normalised concentrations as well as water-related ones.

      We re-analyzed all of the MRS datasets using Osprey, which uses linear combination modelling and has shown quantification results similar to LCModel for NAA (Oeltzschner et al., 2020). The results of a lower Glx/GABA+ concentration in the visual cortex of CC vs SC individuals, and no difference in NAA concentration, were replicated using this pipeline.

      We have now added these analyses to the Supplementary Material S3 and referred to them in the Methods (Page 9, Lines 242-246) and Results (Page 18, Lines 464-467).

      - Of course the normalisation used to estimate GABA and GLX values is completely irrelevant when the two values are expressed as ratio GLX/GABA - this may be reflected in the text ("water normalised GLX/GABA concentration" should read "GLX/GABA concentration" instead).

      We have adapted the text on Page 16 (Line 431) and have ensured that throughout the manuscript the use of “water-normalized” is in reference to Glx or GABA+ concentration, and not the ratio.

      - Please specify which equation was used for tissue correction - is it alpha-correction?

      We have clarified that the reported GABA+/Glx values are water-normalized alpha corrected values (Page 10, Line 249), and cited Harris et al., 2015 on Page 10 (Line 251) of the Methods.

      - Since ANOVA was used, the assumption is that values are normally distributed. Please report evidence supporting this assumption.

      We have now reported the Shapiro-Wilk test results for normality as well as Levene’s test for homogeneity of variance between groups for every dependent variable in our dataset in Supplementary Material S9, and added references to it in the Methods (Page 13, Lines 326-329 and Page 15, Lines 400-402).

      Reviewer #3 (Recommendations for The Authors):

      In addition to addressing major comments listed in my Public Review, I have the following, more granular comments, which should also be addressed:

      (1) The paper's structure could be improved by presenting visual acuity data before diving into MRS and EEG results to better contextualize the findings.

      We now explicitly state in the Methods (Page 5, Line 155) that lower visual acuity is expected in a cohort of CC individuals with long lasting congenital visual deprivation.

      We have additionally included a plot of visual acuities of the two groups (Supplementary Material S1).

      (2) The paper should better explain the differences between CC for which sight is restored and congenitally blind patients. The authors write in the introduction that there are sensitive periods/epochs during the lifespan for the development of local inhibitory neural circuits. and "Human neuroimaging studies have similarly demonstrated that visual experience during the first weeks and months of life is crucial for the development of visual circuits. If human infants born with dense bilateral cataracts are treated later than a few weeks from birth, they suffer from a permanent reduction of not only visual acuity (Birch et al., 1998; Khanna et al., 2013) and stereovision (Birch et al., 1993; Tytla et al., 1993) but additionally from impairments in higher-level visual functions, such as face perception (Le Grand et al., 2001; Putzar et al., 2010; Röder et al., 2013)...".

      Thus it seems that the current participants (sight restored after a sensitive period) seem to be similarly affected by the development of the local inhibitory circuits as congenitally blind. To assess the effect of plasticity and sight restoration longitudinal data would be necessary.

      In the Introduction (Page 2, Lines 59-64; Page 3, Lines 111-114) we added that in order to identify sensitive periods e.g. for the elaboration of visual neural circuits, sight recovery individuals need to be investigated. The study of permanently blind individuals allows for investigating the role of experience (whether sight is necessary to introduce the maturation of visual neural circuits), but not whether visual input needs to be available at early epochs in life (i.e. whether sight restoration following congenital blindness could nevertheless lead to the development of visual circuits).

      This is indeed the conclusion we make in the Discussion section. We have now highlighted the need for longitudinal assessments in the Discussion (Page 25, Lines 654-656).

      (3) What's the underlying idea of analyzing two separate aperiodic slopes (20-40Hz and 1-19Hz). This is very unusual to compute the slope between 20-40 Hz, where the SNR is rather low.

      "Ossandón et al. (2023), however, observed that in addition to the flatter slope of the aperiodic power spectrum in the high frequency range (20-40 Hz), the slope of the low frequency range (1-19 Hz) was steeper in both, congenital cataract-reversal individuals, as well as in permanently congenitally blind humans."

      The present manuscript computed the slope between 1-20 Hz. Ossandón et al. as well as Medel et al. (2023) found a “knee” of the 1/f distribution at 20 Hz and describe further the motivations for computing both slope ranges. For example, Ossandón et al. used a data driven approach and compared single vs. dual fits and found that the latter fitted the data better. Additionally, they found the best fit if a knee at 20 Hz was used. We would like to point out that no standard range exists for the fitting of the 1/f component across the literature and, in fact, very different ranges have been used (Gao et al., 2017; Medel et al., 2023; Muthukumaraswamy & Liley, 2018).

      (4) "For this scan, participants were instructed to keep their eyes closed and stay as still as possible." Why should it be important to have the eyes closed during a T1w data acquisition? This statement at this location does not make sense.

      To avoid misunderstandings, we removed this statement in this context.

      (5) "Two SC subjects did not complete the frontal cortex scan for the EO condition and were excluded from the statistical comparisons of frontal cortex neurotransmitter concentrations."<br /> Why did the authors not conduct whole-brain MRS, which seems to be on the market for quite some time (e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3590062/) ?

      Similar to previous work (Coullon et al., 2015; Weaver et al., 2013) our hypothesis was related to the visual cortex, and we chose the frontal cortex voxel as a control. This has now been clarified in the Introduction (Page 4, Lines 103-114), Methods (Page 9, Lines 225-227) and Discussion (Page 25, Lines 662-665).

      (6) In "....during visual stimulation with stimuli that changed in luminance (LU) (Pant et al., 2023)." the authors should provide a link on the visual stimulation, which is provided further below

      In the revised manuscript, we have moved up the description of the visual stimulation (Page 13, Line 336).

      (7) "During the EO condition, participants were asked to fixate on a blank screen." This is not really possible. Typically, resting state EO conditions include a fixation cross, as the participants would not be able to fixate on a blank screen and move their eyes, which would impact the recordings.

      We have now rephrased this as “look towards” with the goal of avoiding eye movements (Page 14, Line 347).

      (8) "Components corresponding to horizontal or vertical eye movements were identified via visual inspection and removed (Plöchl et al., 2012)." It is unclear what the Plöchl reference should serve for. Is the intention of the authors to state that manual (and subjective) visual inspection of the ICA components is adequate? I would recommend removing this reference.

      The intention was to provide the basis for classification during the visual inspection, as opposed to an automated method such as ICLabel.

      We stated this clearly in the revised manuscript (Page 14 Lines 368-370).

      (9) "The datasets were divided into 6.25 s long epochs corresponding to each trial." This is a bit inaccurate, as the trial also included some motor response task. Thus, I assume the 6.25 s are related to the visual stimulation.

      We have modified the sentence accordingly (Page 15, Line 378).

      (10) Figure 2. a & b. Just an esthetic suggestion: I would recommend removing the lines between the EC and EO conditions, as they suggest some longitudinal changes. Unless it is important to highlight the changes between EC and EO within each subject.

      In fact, EC vs. EO was a within-subject factor with expected changes for the EEG and possible changes in the MRS parameters. To allow the reader to track changes due to EC vs. EO for individual subjects (rather than just comparing the change in the mean scores), we use lines.  

      (11) Figure 3A: I would plot the same y-axis range for both groups to make it more comparable.

      We have changed Figure 3A accordingly.

      (12) " flattening of the intercept" replaces flattening, as it is too related to slope.

      We have replaced “flattening” with “reduction” (Page 20, Line 517).

      (13) The plotting of only the significant correlation between MRS measures and EEG measures seems to be rather selective reporting. For this type of exploratory analysis, I would recommend plotting all of the scatter plots and moving the entire exploratory analysis to the supplementary (as this provides the smallest evidence of the results).

      We have made clear in the Methods (Page 16, Lines 415-426), Results and Discussion (page 24, Lines 644-645), as well as in the Supplementary material, that the reason for only reporting the significant correlation was that this correlation survived correction for multiple comparisons, while all other correlations did not. We additionally explicitly allude to the Supplementary Material where the plots for all correlations are shown (Results, Page 21, Lines 546-552).

      (14) "Here, we speculate that due to limited structural plasticity after a phase of congenital blindness, the neural circuits of CC individuals, which had adapted to blindness after birth, employ available, likely predominantly physiological plasticity mechanisms (Knudsen, 1998; Mower et al., 1985; Röder et al., 2021), in order to re-adapt to the newly available visual excitation following sight restoration."

      I don't understand the logic here. The CC individuals are congenitally blind, thus why should there be any physiological plasticity mechanism to adapt to blindness, if they were blind at birth?

      With “adapt to blindness” we mean adaptation of a brain to an atypical or unexpected condition when taking an evolutionary perspective (i.e. the lack of vision). We have made this clear in the revised manuscript (Introduction, Page 4, Lines 111-114; Discussion, Page 23, Lines 584-591).

      (15) "An overall reduction in Glx/GABA ratio would counteract the aforementioned adaptations to congenital blindness, e.g. a lower threshold for excitation, which might come with the risk of runaway excitation in the presence of restored visually-elicited excitation."

      This could be tested by actually investigating the visual excitation by visual stimulation studies.

      The visual stimulation condition in the EEG experiment of the present study found a higher aperiodic intercept in CC compared to SC individuals. Given the proposed link between the intercept and spontaneous neural firing (Manning et al., 2009), we interpreted the higher intercept in CC individuals as increased broadband neural firing during visual stimulation (Results Figure 3; Discussion Page 24, Lines 635-640). This idea is compatible with enhanced BOLD responses during an EO condition in CC individuals (Raczy et al., 2022). Future work should systematically manipulate visual stimulation to test this idea.

      (16) As the authors also collected T1w images, the hypothesis of increased visual cortex thickness in CC. Was this investigated?

      This hypothesis was investigated in a separate publication which included this subset of participants (Hölig et al., 2023), and found increased visual cortical thickness in the CC group. We refer to this publication, and related work (Feng et al., 2021) in the present manuscript.

      (17) The entire discussion of age should be omitted, as the current data set is too small to assess age effects.

      We have removed this section and just allude to the fact that we replicated typical age trends to underline the validity of the present data (Page 26, Lines 675-676).

      (18) Table1: should include the age and the age at the time point of surgery.

      We added age to the revised Table 1. We clarified that in CC individuals, duration of blindness is the same as age at the time point of surgery (Page 6, Line 163).

      (19) Why no group comparisons of visual acuity are reported?

      Lower visual acuity in CC than SC individuals is a well-documented fact.

      We have now added the visual acuity plots for readers (Supplementary Material S1, referred to in the Methods, Page 5, Line 155) which highlight this common finding.

      References (Recommendations to the Authors)

      Adrian, E. D., & Matthews, B. H. C. (1934). The berger rhythm: Potential changes from the occipital lobes in man. Brain. https://doi.org/10.1093/brain/57.4.355

      Coullon, G. S. L., Emir, U. E., Fine, I., Watkins, K. E., & Bridge, H. (2015). Neurochemical changes in the pericalcarine cortex in congenital blindness attributable to bilateral anophthalmia. Journal of Neurophysiology. https://doi.org/10.1152/jn.00567.2015

      Feng, Y., Collignon, O., Maurer, D., Yao, K., & Gao, X. (2021). Brief postnatal visual deprivation triggers long-lasting interactive structural and functional reorganization of the human cortex. Frontiers in Medicine, 8, 752021. https://doi.org/10.3389/FMED.2021.752021/BIBTEX

      Gao, R., Peterson, E. J., & Voytek, B. (2017). Inferring synaptic excitation/inhibition balance from field potentials. NeuroImage, 158(March), 70–78. https://doi.org/10.1016/j.neuroimage.2017.06.078

      Hölig, C., Guerreiro, M. J. S., Lingareddy, S., Kekunnaya, R., & Röder, B. (2023). Sight restoration in congenitally blind humans does not restore visual brain structure. Cerebral Cortex, 33(5), 2152–2161. https://doi.org/10.1093/CERCOR/BHAC197

      Juchem, C., & Graaf, R. A. de. (2017). B0 magnetic field homogeneity and shimming for in vivo magnetic resonance spectroscopy. Analytical Biochemistry, 529, 17–29. https://doi.org/10.1016/j.ab.2016.06.003

      Kurcyus, K., Annac, E., Hanning, N. M., Harris, A. D., Oeltzschner, G., Edden, R., & Riedl, V. (2018). Opposite Dynamics of GABA and Glutamate Levels in the Occipital Cortex during Visual Processing. Journal of Neuroscience, 38(46), 9967–9976. https://doi.org/10.1523/JNEUROSCI.1214-18.2018

      Manning, J. R., Jacobs, J., Fried, I., & Kahana, M. J. (2009). Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 29(43), 13613–13620. https://doi.org/10.1523/JNEUROSCI.2041-09.2009

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Oeltzschner, G., Zöllner, H. J., Hui, S. C. N., Mikkelsen, M., Saleh, M. G., Tapper, S., & Edden, R. A. E. (2020). Osprey: Open-source processing, reconstruction & estimation of magnetic resonance spectroscopy data. Journal of Neuroscience Methods, 343, 108827. https://doi.org/10.1016/j.jneumeth.2020.108827

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Raczy, K., Holig, C., Guerreiro, M. J. S., Lingareddy, S., Kekunnaya, R., & Roder, B. (2022). Typical resting-state activity of the brain requires visual input during an early sensitive period. Brain Communications, 4(4). https://doi.org/10.1093/BRAINCOMMS/FCAC146

      Rideaux, R., Ehrhardt, S. E., Wards, Y., Filmer, H. L., Jin, J., Deelchand, D. K., Marjańska, M., Mattingley, J. B., & Dux, P. E. (2022). On the relationship between GABA+ and glutamate across the brain. NeuroImage, 257, 119273. https://doi.org/10.1016/J.NEUROIMAGE.2022.119273

      Weaver, K. E., Richards, T. L., Saenz, M., Petropoulos, H., & Fine, I. (2013). Neurochemical changes within human early blind occipital cortex. Neuroscience. https://doi.org/10.1016/j.neuroscience.2013.08.004

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript by Neininger-Castro and colleagues presents a novel automatic image analysis method for assessing sarcomeres, the basic units of myofibrils and validates this tool in a couple of experimental approaches that interfere with sarcomere assembly in iPSCcardiomyocytes (iPSC-CM).

      Automatic quantification of sarcomeres is definitely something that is useful to the field. I am surprised that there is no reference in the manuscript to SarcTrack, published by Toepfer and colleagues in 2019 (PMID 30700234), which has exactly the same purpose. The advantage of the image analysis software presented in the current manuscript appears to me to be that it can cover both mature sarcomeres and nascent sarcomeres in premyofibrils effectively.

      We whole-heartedly disagree that SarcTrack has the exact same purpose as sarcApp. sarcApp measures more than the frequency of actinin2 images, and can measure real-space quantifications of actinin, myomesin, and titin, which has not been done before in this way. However, SarcTrack is an interesting method that we hope many researchers find helpful in their research. SarcTrack is a particle tracker that outputs the dimensions of the objects found, but does not distinguish between Z-Lines and other actinin2-positive structures (Z-Bodies, adhesions). It also does not group these structures into higher order structures such as myofibrils and muscle stress fibers.

      When going through the manuscript there were a few issues that should be addressed in a revised version of the manuscript:

      1) I am a bit puzzled that they took 1.4 um length as a cutoff length for a mature A-band in their quantifications, since the consensus in the field for thick filament length seems to be 1.6 um?

      We use 1.4 µm as a cutoff length for the length of a Z-Line rather than the A-Band. We believe the reviewer is referring to the width of the A-Band perpendicular to the Z-lines, which is indeed 1.6 µm. However, we are referring to the length of the Z-Lines, which can span anywhere from 1.4 µm to up to 10 or more µm. Thank you for allowing us to make the clarification.

      2) When doing the knockdown for alpha and beta-myosin heavy chain, respectively, why did they not also do a Western blot for the "other" isoform as well (Figure 7)? We know that iPSCCM express a mixture, so the relatively mild phenotype that they observe in single knockdown experiments may well be due to concomitant upregulation of the expression of the other isoform. In my point of view this should be checked.

      It is likely that in the single knockdown experiments the other isoform is upregulated, which is why we were careful in stating that neither muscle myosin alone is required for sarcomere formation. We do agree this would be an interesting experiment to check beyond the scope of this manuscript.

      3) There seems to be a disconnect between the images for myomesin knockdown shown in Figure 8H and the quantification shown in Figure 8I, which makes me wonder whether the image shown in H middle (MYOM1 (1) KD), where the beta-myosin doublets do not seem to be much affected is really representative?

      The image shown in the middle of H is representative of the mean length of beta-myosin doublets in MYOM1 (1) KD hiCMs. While the beta-myosin doublets are still present and organized, they are significantly shorter. In the zoomed out image, you can appreciate much shorter arrays of beta-myosin doublets that, while extending across the entire cell, are thinner than control cells.

      Reviewer #2 (Public Review):

      Neininger-Castro et al report on their original study entitled "Independent regulation of Z-lines and M-lines during sarcomere assembly in cardiac myocytes revealed by the automatic image analysis software sarcApp", In this study, the research team developed two software, yoU-Net and sarcApp, that provide new binarization and sarcomere quantification methods. The authors further utilized human induced pluripotent stem cell-derived cardiomyocytes (hiCMs) as their model to verify their software by staining multiple sarcomeric components with and without the treatment of Blebbistatin, a known myosin II activity inhibitor. With the treatment of different Blebbistatin concentrations, the morphology of sarcomeric proteins was disturbed. These disrupted sarcomeric structures were further quantified using sarcApp and the quantification data supported the phenotype. The authors further investigated the roles of muscle myosins in sarcomere assembly by knocking down MYH6, MYH7, or MYOM in hiCMs. The knockdown of these genes did not affect Z-line assembly yet the knockdown of MYOM affected M-line assembly. The authors demonstrated that different muscle myosins participate in sarcomere assembly in different manners.

      Reviewer #3 (Public Review):

      Neininger-Castro and colleagues developed software tools for the quantification of sarcomeres and sarcomere-precursor features in immunostained human induced pluripotent stem cellderived cardiac myocytes (hiCMs). In the first part they used a deep-learning- based model called a U-Net to construct and train a network for binarization of immunostained cardiomyocyte images. They also wrote graphical user interface (GUI) software that will assist other labs in using this approach and made it publicly available. They did not compare their approach to existing ones, but an example from one image suggests their binarization tool outperforms Otsu thresholding binarization.

      In the second part they developed a software tool called sarcApp that classifies sarcomere structures in the binarized image as a Z-Line or Z-Body and assigns each to either a myofibril or to stress fibers. The tools can then automatically count and measure multiple features (33 per cell and 24 per myofibril) and report them on a per-cell, per-myofibril, and per- stress fiber basis.

      To test the tools they used Blebbistatin to inhibit sarcomere assembly and showed that the sarcApp tool could capture changes in multiple features such as fewer myofibrils, fewer Z-Lines, decreased myofibril persistence, decreased Z-Line length and altered myofibril orientation in the Blebbistatin treated cells. With some changes the tool was also shown to quantify sarcomeres in titin and myomesin stained cardiomyocytes.

      Finally they used sarcApp to quantify the changes in sarcomere assembly after siRNA mediated knockout of MYH7, MYH7, or MYOM. The analysis indicates that neither MYH6 nor MYH7 knockdown perturbed the assembly of Z- or M-lines, and that knockdown of MYOM perturbed the A-band/M-Line but not the Z-Line assembly according to features captured by the sarcApp tool.

      Overall the authors developed and made publicly available an excellent software tool that will be very useful for labs that are interested in studying sarcomere assembly. Multiple features that are difficult to measure or count manually can be automatically measured by the software quickly and accurately.

      There are however some remaining questions about these tools:

      1) The binarization tool which is tailored to sarcomere image binarization appears promising but was not systematically compared with existing approaches.

      We compared it with the existing approach we used previously in the lab, which was Otsu’s method for binarization. We are not aware of several other binarization approaches to compare to, other than using other machine learning techniques that are less advanced than a U-Net, the current standard in image-to-image translation.

      2) How robust is the tool? The tool was tested on images from one type of cardiomyocytes (hiCMs) taken from one lab using Nikon Spinning Disk confocal microscope equipped with Apo TIRF Oil 100X 1.49 NA objective or instant Structured Illumination Microscopy (iSIM), using deconvolution (Microvolution software) and in a specific magnification. It remains to be seen whether the tool would be equally effective with images taken with other microscopy systems, with other cardiomyocytes (chick or neonatal rat), with different magnifications, live imaging, etc.

      We tested the software with several magnifications, with live imaging, and with other tissues. We did not include the information in the manuscript because the data we tested the software with is for future manuscripts studying different aspects of sarcomere formation and maintenance. sarcApp reliably identifies Z-Lines and sarcomeres with deconvolved widefield fluorescence images of hiCMs and frozen human tissue, and are currently using it to measure zebrafish data for another study. Further, it works for live imaging with an actinin2-GFP (or similar) label. For the titin quantification, we would recommend using only 60-100X magnification, as the titin structures (doublets and rings) are not resolvable at lower magnifications.

      3) The tool was developed for evaluation of sarcomere assembly. The authors show that for this application it can detect the perturbation by Blebbistatin, or knockdown of sarcomeric genes. It remains to be seen if this tool is also useful for assessment of sarcomere structure for other questions beside sarcomere assembly and in other sarcomere pathologies.

      While this is beyond the scope of this specific methods paper, we welcome other researchers to use our software for other questions in other pathologies. We are currently doing the same for other manuscripts from our lab.

      Reviewer #1 (Recommendations For The Authors):

      1)"alpha-actinin..., which border the sarcomeric contractile machinery (thin and thick filaments); Z-lines do NOT border thick filaments in a relaxed sarcomere

      We have removed “(thin and thick filaments)” from the text.

      2) myomesin targeting siRNAs (gene name MYOM): there are actually three genes encoding for myomesin family members, specify, which one was targeted (I am assuming MYOM1).

      Thank you for the clarification: we do target MYOM1

      3) I am not surprised that they found not many mature Z-lines in the absence of both sarcomeric myosins; a similar codependence of assembly of mature Z-discs and the presence of functional thick filaments was previously shown by Geach and colleagues in 2015 (PMID 25845369)

      Thank you for sharing this manuscript: we have added a reference to it in our study.

      Reviewer #2 (Recommendations For The Authors):

      This work offers the possibility to gain more insights into the process of sarcomere assembly through the advancement in sarcomeric or myofibril structure analyses. However, some clarifications are needed from the authors, please see below for the comments.

      1) It is recommended that the authors include the time points for replating and harvesting hiCMs. After replating, the cardiomyocytes require at least three to four days for sarcomeric structures to reform. If the hiCMs were fixed before sarcomere assembly had completed, the staining of sarcomeric proteins including ACTN2 and titin could be compromised and it is difficult to tell if the phenotypes observed were consequences of drug treatments or knockdown of sarcomeric genes or simply because the replating hiCMs were fixed before their sarcomeric structures had fully regrown. It is also recommended that the authors replate hiCMs at a fixed time point to avoid discrepancies in the data.

      Cardiomyocytes do not require three to four days for sarcomeric structures to re-form, and indeed only require 24 hours, with the first sarcomeres typically appearing at ~6 hours. We and others have published several studies demonstrating this (Fenix et al., eLIfe 2018, Taneja, Neininger and Burnette MBoC 2020, Chen et al. Nature Methods, 2022). While sarcomeres continue to develop and turn over after this time, our lab is interested in the beginning steps of sarcomerogenesis rather than the turnover of mature structures.

      2) The sarcApp automatically identifies Z-lines and Z-bodies; however, is there an option for the users to set their own thresholds? Some users may select different criterions when quantifying sarcomeres. Moreover, the Z-lines and Z-bodies identified by the software are not always accurate. Can the users modify the list manually in an unbiased way. If this function is not available, the authors may consider adding this function to their software. sarcApp measures Zline and Z-bodies length but does not measure Z-line and Z-bodies width, but sometimes it is also necessary to measure the width.

      Absolutely, users can modify the thresholds to identify Z-Lines and Z-Bodies. There is not a way for users to modify the list in an unbiased way per se, as editing the list of Z-Lines and Z-Bodies based on non-mathematical measurements is inherently biased, but the user is free to add in other Z-Lines and Z-Bodies as they wish. In this context, “manually” and “unbiased” is mutually exclusive.

      3) It is recommended that the authors include the original images beside the sarcomeric structures identified by sarcApp (Figure 2A, 2C, 4C-F and more). It would be easier to compare the original Z-lines and Z-bodies with those identified by the software.

      We have added these in Author response image 1.

      Author response image 1.

      Uncropped images and merges from Figures 2, 4 and 6, respectively.

      4) The M-line length quantification data in Figure 3G, 5F, and 6H showed different colored-dots labeling n1 to n3, but the authors did not discuss the significance of these symbols.

      We are not sure what the reviewer means by this statement: there is no significance of the different colored dots other than to mark the biological replicate shown. These graphs were created using SuperPlots, which was not stated in the original methods. It has now been added to the Statistical Analysis section.

      5) Can the authors elaborate more on the reasons why they treated Blebbistatin at concentrations of 50µM and 100µM. Previous studies showed that 25µM of Blebbistatin was sufficient to delay the transformation of cardiomyocytes (PMID 27072942). Can the authors also comment on why they selected 6 hours, 12 hours, and 24 hours post replating for drug treatment. Moreover, the drug treatment at different time points was only done on ACTN2 but not titin or myomesin.

      We selected 6, 12, and 24 hours for actinin2 to show the time course of sarcomere formation and to show that sarcomeres are developed by 24 hours, as also mentioned above. We are interested in future studies of the time course of titin and myomesin over time, and are working on it in the lab.

      We chose 50 and 100 µM Blebbistatin as these completely blocked sarcomere assembly whereas treatment with 25 µM did not. This manuscript is a methods paper that aims to validate sarcApp and show how it could be used. We did not intend for it to be a comprehensive study of how different concentrations of blebbistatin affects sarcomere assembly.

      We are also unsure what the reviewer means by “transformation of cardiomyocytes”. The manuscript with the PMID of 27072942 does not address this issue. The paper is a “review and analyze readmission data for patients who received a continuous flow left ventricular assist device (LVAD)”. We assume the reviewer is referring to differentiation. The model system we developed and published in eLife in 2018 does not use differentiating iPSC cardiac myocytes. The hiCMs we use are terminally differentiated but still immature, as they are more transcriptionally similar to primary fetal myocytes. As such, they do not maintain their sarcomeres when they removed from the 96 well and plated onto a glass coverslip for highresolution microscopy. These assemble sarcomeres within 24 hours with the sarcomeres forming close to the dorsal membrane and then rearrange overtime (e.g., moving from the top of the cell to the bottom) (Fenix et al., eLife 2018). With that said, we do agree with the reviewer that a study of sarcomere assembly in the context of cardiac myocyte differentiation would be a fascinating direction for future studies, and we think sarcApp could facilitate such studies.

      6) The authors mentioned that the myofibrils of Z-line, titin, and M-line were randomly oriented after Blebbistatin treatments. The myofibrils were randomly oriented for titin and M-line. However, the orientation of Z-line after 50µM Blebbistatin treatment was not necessarily random, only the orientation after 100µM Blebbistatin treatment was randomized. The authors might consider changing bar graph to other types of charts if the orientation was really randomized after quantification.

      We find that the bar chart is the most informative to us, but users can consider other types of charts in their analyses.

      7) It is recommended that the authors include images staining ACTN2 at lower magnifications (Figure 1A, 1C). With current images, it is true that yoU-Net can separate Z-lines from Z-bodies yet it is difficult to tell if yoU-Net can still distinguish Z-lines from Z-bodies with larger images or it only applies to a small portion of the image.

      The yoU-Net can distinguish Z-Lines from Z-Bodies with images of any size, as image size (height vs. width in pixels) does not affect how binarization occurs. During binarization, the only pixel requirement is that the width and height are divisible by 8 (for downsampling purposes). Usually this is not the case with raw images, so the image borders are slightly cropped to make them usable. In terms of resolution, we recommend using 60X-100X objectives on confocal or superresolution data for the clearest results. We have, however, successfully binarized deconvolved widefield images at 100X as well.

      8) The authors mentioned that the knockdown of MYH7 did not affect Z-lines and M-lines; however, the structures of ACTN2, myomesin, and titin appeared more organized as compared to those in control.

      We agree that the sarcomeres and myofibrils look slightly more organized, and did mean to state that the knockdown did not negatively affect Z-Lines and M-Lines and have updated the manuscript to be more accurate.

      9) Please provide the merge images for Fig. 4D, 4E, 6B

      The merge images for Fig. 4D, 4E, and 6B are included with the original images requested above (point 3)

      10) In the text, they described" "antibodies to the titin I-band localize to both MSFs and sarcomeres in hiCMs (Figure 4A). Titin forms ring-like structures around the Z-Bodies of MSFs that are closer to the apparent sarcomere transition point (Figure 4A)" However, based on the antibody information they provided, it is not explicitly recognized for N-or C-terminus TITIN. Please provide TTN N-terminus or TTN-C terminus co-stainings with ACTN2 antibody to understand which part of TTN together with ACTN2 forms a Z-Body.

      The TTN antibody is an N-terminal antibody localizing to the I-Band region of sarcomeres. We agree with the reviewer that a more thorough study of titin will be of interest and we are currently undertaking such a study. However, this is a methods paper presenting a tool. While some of the data we present does point to mechanistic hypotheses, it is beyond the scope of this study to fully characterize titin during sarcomere assembly.

      11) TITIN doublet was used to indicate a sarcomere in Fig. 4C-D. Moreover, they also used another combination (myomesin and F-ACTIN) to label a sarcomere in Fig. 6D. Can they compare the difference between these two methods or by using these two methods (TITIN doublet) and (myomesin and F-ACTIN), how is the average length of sarcomere? Will the sarcomere length be the same?

      We noted in the manuscript that due to the organization of titin doublets (wrapping around the ends of Z-Lines) that the average titin doublet will be approximately 0.3 um longer than the ZLine. We did not expect to see a difference in lengths of myomesin M-Lines and mature actinin2 Z-Lines and indeed do not see major differences in the average lengths (between 2.0 and 2.5 um in 24 hour control cells)

      12) They used siRNA method to knockdown MYH6, MYH7 and MYOM and concluded that the knockdown of these genes did not affect the Z-line assembly. Even though they showed very nice knockdown efficiency of these proteins, they should (1) co-stain MYH6/TITIN/actinin2 and MYH6/ myomesin /actinin2 for Fig. 7C. (2) MYH7/TITIN/actinin2 and MYH7/ myomesin /actinin2 for Fig. 7I. (3) MYOM1/TITIN/actinin2 and MYOM2/TITIN/actinin2 for Fig. 8A. (4) MYH7/MYOM1 and MYH7/MYOM2 for Fig. 8H to make sure the cells they measured were truly knockdownpositive cells,

      The antibodies for alpha and beta myosin are not very efficient for immunofluorescence, and work best for western blots. We decided also to choose a random subset of the cells on the dish to be sure to eliminate any risk of cherry-picking. While imaging cells on the dish, we looked only at the DAPI nuclear channel and selected 50 cells minimum per dish with only this channel, then imaged the other channels.

      Minor comments:

      1) Well-organized sarcomere structure on DMSO treated cells in Fig.5A and Fig. 6A, but it was disarray in Fig. S3M. Why?

      Figure S3 shows hiCMs that have only been allowed to spread for 6 hours, which have not formed mature sarcomeres yet, hence the disarray.

      2) Fig 1A, Fig2B: please label the name of the antibody, not the actin filament

      We used phalloidin labelling here, which marks actin filaments. We have updated the figure legends to be more clear. Thank you!

      3) Fig. 7I: actinin2 instead of actinin

      Thank you for catching this! We have fixed it.

      Reviewer #3 (Recommendations For The Authors):

      Testing the app using images shot by other microscopy systems, magnifications, and cardiomyocytes from other species, as noted in the public review above, should make the app even more wildly useful.

      A more formal head-to-head comparison with other approaches will be more convincing in showing the new tool is superior

      I also think that a more detailed protocol for using the app will help other investigators.

      The app counts and measures many features, but it is not always clear how and using what algorithm these are measured. Including these details in a protocol or even as comments in the code will be very helpful for others.

      The protocol found on the public GitHub for the app will help other investigators to download, use, and understand the application. We have received contact from researchers who have been able to use the application without assistance from us, which is a good sign that the application is user-friendly and that the online protocol is sufficient.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The reviewers praised multiple aspects of our study. Reviewer 1 noted that “the work aligns well with current research trends and will greatly interest researchers in the field.” Reviewer 2 highlighted the unique capability of our imaging approach, which “allows for investigation of the heterogeneity of response across individual dopamine axons, unlike other common approaches such as fiber photometry.” Reviewer 3 commented that “the experiments are beautifully executed” and “are revealing novel information about how aversive and rewarding stimuli is encoded at the level of individual axons, in a way that has not been done before.”

      In addition to the positive feedback, the reviewers also provided useful criticisms and suggestions, some of which may not be fully addressed in a single study. For instance, questions regarding whether dopamine axons encode the valence or specific identity of the stimuli, or the most salient aspects of the environment, remain open. At the same time, as all the reviewers agreed, our report on the diversity of dopamine axonal responses using a novel imaging design introduces significant new insights to the neuroscience community. Following the reviewers’ recommendations, we have refrained from making interpretations that could be perceived as overinterpretation, such as concluding that “dopamine axons are involved in aversive processing.” This has necessitated extensive revisions, including modifying the title of our manuscript to make clear that the novelty of our work is revealing ‘functional diversity’ using our new imaging approach.

      Below, we respond to the reviewers’ comments point by point.

      eLife assessment

      This valuable study shows that distinct midbrain dopaminergic axons in the medial prefrontal cortex respond to aversive and rewarding stimuli and suggest that they are biased toward aversive processing. The use of innovative microprism based two-photon calcium imaging to study single axon heterogeneity is solid, although the experimental design could be optimized to distinguish aversive valence from stimulus salience and identity in this dopamine projection. This work will be of interest to neuroscientists working on neuromodulatory systems, cortical function and decision making.

      Reviewer #1

      Summary:

      In this manuscript, Abe and colleagues employ in vivo 2-photon calcium imaging of dopaminergic axons in the mPFC. The study reveals that these axons primarily respond to unconditioned aversive stimuli (US) and enhance their responses to initially-neutral stimuli after classical association learning. The manuscript is well-structured and presents results clearly. The utilization of a refined prism-based imaging technique, though not entirely novel, is well-implemented. The study's significance lies in its contribution to the existing literature by offering single-axon resolution functional insights, supplementing prior bulk measurements of calcium or dopamine release. Given the current focus on neuromodulator neuron heterogeneity, the work aligns well with current research trends and will greatly interest researchers in the field.

      However, I would like to highlight that the authors could further enhance their manuscript by addressing study limitations more comprehensively and by providing essential details to ensure the reproducibility of their research. In light of this, I have a number of comments and suggestions that, if incorporated, would significantly contribute to the manuscript's value to the field.

      Strengths:

      • Descriptive.

      • Utilization of a well-optimized prism-based imaging method.

      • Provides valuable single-axon resolution functional observations, filling a gap in existing literature.

      • Timely contribution to the study of neuromodulator neuron heterogeneity.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      (1) It's important to fully discuss the fact that the measurements were carried out only on superficial layers (30-100um), while major dopamine projections target deep layers of the mPFC as discussed in the cited literature (Vander Weele et al., 2018) and as illustrated in FigS1B,C. This limitation should be explicitly acknowledged and discussed in the manuscript, especially given the potential functional heterogeneity among dopamine neurons in different layers. This potential across-layer heterogeneity could also be the cause of discrepancy among past recording studies with different measurement modalities. Also, mentioning technical limitations would be informative. For example: how deep the authors can perform 2p-imaging through the prism? was the "30-100um" maximum depth the authors could get?

      Thank you for pointing out this important issue about layer differences.

      It is possible that the mesocortial pathway has layer-specific channels, with some neurons targeting supra granular layers and others targeting infragranular ones. Alternatively, it is also plausible that the axons of the same neurons branch into both superficial and deep layers. This is a critical issue that has not been investigated in anatomical studies and will require single-cell labeling of dopamine neurons (Matsuda et al 2009 and Aransay et al 2015). We now discuss this issue in the Discussion.

      As for the imaging depth of 30–100 m, we were unable to visualize deeper axons in a live view mode. Our imaging system has already been optimized to detect weak signals (e.g., we have employed an excitation wavelength of 980 nm, dispersion compensation, and a hybrid photodetector). It is possible that future studies using improved imaging approaches may be able to visualize deeper layers. Importantly, sparse axons in the supragranular layers are advantageous in detecting weak signals; dense labeling of axons would increase the background fluorescence relative to signals. We now reference this layer issue in the Results and Discussion sections.

      (2) In the introduction, it seems that the authors intended to refer to Poulin et al. 2018 regarding molecular/anatomical heterogeneity of dopamine neurons, but they inadvertently cited Poulin et al. 2016 (a general review on scRNAseq). Additionally, the statement that "dopamine neurons that project to the PFC show unique genetic profiles (line 85)" requires clarification, as Poulin et al. 2018 did not specifically establish this point. Instead, they found at least the Vglut2/Cck+ population projects into mPFC, and they did not reject the possibility of other subclasses projecting to mPFC. Rather, they observed denser innervation with DAT-cre, suggesting that non-Vglut2/Cck populations would also project to mPFC. Discuss the potential molecular heterogeneity among mPFC dopamine axons in light of the sampling limitation mentioned earlier.

      We thank the reviewer for pointing this out. Genetic profiles of PFC-projecting DA neurons are still being investigated, so describing them as “unique” was misleading. We have edited the Introduction accordingly, and now discuss this issue in detail in the Discussion.

      (3) I find the data presented in Figure 2 to be odd. Firstly, the latency of shock responses in the representative axons (right panels of G, H) is consistently very long - nearly 500ms. It raises a query whether this is a biological phenomenon or if it stems from a potential technical artifact, possibly arising from an issue in synchronization between the 2-photon imaging and stimulus presentation. My reservations are compounded by the notable absence of comprehensive information concerning the synchronization of the experimental system in the method section.

      The synchronization of the stimulus and data acquisition is accomplished at a sub-millisecond resolution. We use a custom-made MATLAB program that sends TTL commands to standard imaging software (ThorImage or ScanImage) and a stimulator for electrical shocks. All events are recorded as analogue inputs to a different DAQ to ensure synchronization. We have provided additional details regarding the configuration in the Methods section.

      We consider that the long latency of shock response is biological. For instance, a similar long latency was found after electrical shock in a photometry imaging study (Kim, …, Deisseroth, 2016).

      Secondly, there appear to be irregularities in Panel J. While the authors indicate that "Significant axons were classified as either reward-preferring (cyan) or aversive-preferring (magenta), based on whether the axons are above or below the unity line of the reward/aversive scatter plot (Line 566)," a cyan dot slightly but clearly deviates above the unity line (around coordinates (x, y) = (20, 21)). This needs clarification. Lastly, when categorizing axons for analysis of conditioning data in Fig3 (not Fig2), the authors stated "The color-coded classification (cyan/magenta) was based on k-means clustering, using the responses before classical conditioning (Figure 2J)". I do not understand why the authors used different classification methods for two almost identical datasets.

      We thank the reviewer for pointing out these insufficient descriptions. We classified the axons using k-means clustering, and the separation of the two clusters happened to roughly coincide with the unity line of the reward/aversive scatter plot in Fig 2J. In other words, we did not use the unity line to classify the data points (which is why the color separation of the histogram is not at 45 degrees). We have clarified this point in the Methods section.

      (4) In connection with Point 3, conducting separate statistical analyses for aversive and rewarding stimuli would offer a fairer approach. This could potentially reveal a subset of axons that display responses to both aversive and appetitive stimuli, aligning more accurately with the true underlying dynamics. Moreover, the characterization of Figure 2J as a bimodal distribution while disregarding the presence of axons responsive to both aversive and appetitive cues seems somewhat arbitrary and circular logic. A more inclusive consideration of this dual-responsive population could contribute to a more comprehensive interpretation.

      We also attempted k-means clustering with additional dimensions (e.g., temporal domains as shown in Fig. 3I, J), but no additional clusters were evident. We note that the lack of other clusters does not exclude the possibility of their existence, which may only become apparent with a substantial increase in the number of samples. In the current report, we present the clusters that were the easiest/simplest for us to identify.

      Additionally, we have revised our manuscript to reflect that many axons respond to both reward and aversive stimuli, and that aversive-preferring axons do not exclusively respond to the aversive stimulus.

      (5) The contrast in initialization to novel cues between aversive and appetitive axons mirrors findings in other areas, such as the tail-of-striatum (TS) and ventral striatum (VS) projecting dopamine neurons (Menegas et al., 2017, not 2018). You might consider citing this very relevant study and discussing potential collateral projections between mPFC and TS or VS.

      Thank you for pointing this out. We have now included Menegas et al., 2017, and also discuss the possibility of collaterals to these areas. In addition, we also referred to Azcorra et al., 2023 - this was published after our initial submission.

      (6) The use of correlation values (here >0.65) to group ROIs into axons is common but should be justified based on axon density in the FOV and imaging quality. It's important to present the distribution of correlation values and demonstrate the consistency of results with varying cut-off values. Also, provide insights into the reliability of aversive/appetitive classifications for individual ROIs with high correlations. Importantly, if you do the statistical testing and aversive/appetitive classifications for individual ROIs with above-threshold high correlation (to be grouped into the same axon), do they always fall into the same category? How many false positives/false negatives are observed?


      "Our results remained similar for different correlation threshold values (Line 556)" (data not shown) is obsolete.

      We have conducted additional analysis using correlation values 0.5 and 0.3 that resulted in a smaller number of axon terminals. In essence, the relationship between reward responses and aversive responses remained very similar to Fig. 2J, K.

      Author response image 1.

      Reviewer #2 (Public Review):

      Summary:

      This study aims to address existing differences in the literature regarding the extent of reward versus aversive dopamine signaling in the prefrontal cortex. To do so, the authors chose to present mice with both a reward and an aversive stimulus during different trials each day. The authors used high spatial resolution two-photon calcium imaging of individual dopaminergic axons in the medial PFC to characterize the response of these axons to determine the selectivity of responses in unique axons. They also paired the reward (water) and an aversive stimulus (tail shock) with auditory tones and recorded across 12 days of associative learning.

      The authors find that some axons respond to both reward and aversive unconditioned stimuli, but overall, there is a strong preference to respond to aversive stimuli consistent with expectations from prior studies that used other recording methods. The authors find that both of their two auditory stimuli initially drive responses in axons, but that with training axons develop more selective responses for the shock associated tone indicating that associative learning led to changes in these axon's responses. Finally, the authors use anticipatory behaviors during the conditioned stimuli and facial expressions to determine stimulus discrimination and relate dopamine axons signals with this behavioral evidence of discrimination. This study takes advantage of cutting-edge imaging approaches to resolve the extent to which dopamine axons in PFC respond appetitive or aversive stimuli. They conclude that there is a strong bias to respond to the aversive tail shock in most axons and weaker more sparse representation of water reward.

      Strengths:

      The strength of this study is the imaging approach that allows for investigation of the heterogeneity of response across individual dopamine axons, unlike other common approaches such as fiber photometry which provide a measure of the average population activity. The use of appetitive and aversive stimuli to probe responses across individual axons is another strength.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      A weakness of this study is the design of the associative conditioning paradigm. The use of only a single reward and single aversive stimulus makes it difficult to know whether these results are specific to the valence of the stimuli versus the specific identity of the stimuli. Further, the reward presentations are more numerous than the aversive trials making it unclear how much novelty and habituation account for results. Moreover, the training seems somewhat limited by the low number of trials and did not result in strong associative conditioning. The lack of omission responses reported may reflect weak associative conditioning. Finally, the study provides a small advance in our understanding of dopamine signaling in the PFC and lacks evidence for if and what might be the consequence of these axonal responses on PFC dopamine concentrations and PFC neuron activity.

      We thank the reviewer for the suggestions.

      We agree that interpreting the response change during classical conditioning is not straightforward. Although the reward and aversive stimuli we employed are commonly used in the field, future studies with more sophisticated paradigms will be necessary to address whether dopamine axons encode the valence of the stimuli, the specific identity of the stimuli, or novelty and habituation. In our current manuscript, we refrain from making a conclusion that distinct groups of neurons encode different valances. In fact, many axons respond to both stimuli, at different ratios. We have removed descriptions that may suggest exclusive coding of reward or aversive processing. Additionally, we have extensively discussed possible interpretations.

      In terms of the strength of the conditioning association, behavioral results indicated that the learning plateaued – anticipatory behaviors did not increase during the last two phases when the conditioned span was divided into six phases (Figure 3–figure supplement 1).

      Our goal in the current manuscript is to provide new insight into the functional diversity of dopamine axons in the mPFC. Investigating the impact of dopamine axons on local dopamine concentration and neural activity in the mPFC is important but falls beyond the scope of our current study. In particular, given the functional diversity of dopamine axons, interpreting bulk optogenetic or chemogenetic axonal manipulation experiments would not be straightforward. As suggested, measuring the dopamine concentration through two-photon imaging of dopamine sensors and monitoring the activity of dopamine recipient neurons (e.g., D1R- or D2R-expressing neurons) is a promising approach that we plan to undertake in the near future.

      Reviewer #3 (Public Review):

      Summary:

      The authors image dopamine axons in medial prefrontal cortex (mPFC) using microprism-mediated two-photon calcium imaging. They image these axons as mice learn that two auditory cues predict two distinct outcomes, tailshock or water delivery. They find that some axons show a preference for encoding of the shock and some show a preference for encoding of water. The authors report a greater number of dopamine axons in mPFC that respond to shock. Across time, the shock-preferring axons begin to respond preferentially to the cue predicting shock, while there is a less pronounced increase in the water-responsive axons that acquire a response to the water-predictive cue (these axons also increase non-significantly to the shock-predictive cue). These data lead the authors to argue that dopamine axons in mPFC preferentially encode aversive stimuli.

      Strengths:

      The experiments are beautifully executed and the authors have mastered an impressively complex technique. Specifically, they are able to image and track individual dopamine axons in mPFC across days of learning. This technique is used the way it should be: the authors isolate distinct dopamine axons in mPFC and characterize their encoding preferences and how this evolves across learning of cue-shock and cue-water contingencies. Thus, these experiments are revealing novel information about how aversive and rewarding stimuli is encoded at the level of individual axons, in a way that has not been done before. This is timely and important.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      The overarching conclusion of the paper is that dopamine axons preferentially encode aversive stimuli. This is prevalent in the title, abstract, and throughout the manuscript. This is fundamentally confounded. As the authors point out themselves, the axonal response to stimuli is sensitive to outcome magnitude (Supp Fig 3). That is, if you increase the magnitude of water or shock that is delivered, you increase the change in fluorescence that is seen in the axons. Unsurprisingly, the change in fluorescence that is seen to shock is considerably higher than water reward.

      We agree that the interpretation of our results is not straightforward. Our current manuscript now focuses on our strength, which is reporting the functional diversity of dopamine axons. Therefore, we avoid using the word ‘encode’ when describing the response.

      We believe that our results could reconcile the apparent discrepancy as to why some previous studies reported only aversive responses while others reported reward responses. In particular, if the reward volume were very small, the reward response could go undetected.

      Further, when the mice are first given unexpected water delivery and have not yet experienced the aversive stimuli, over 40% of the axons respond [yet just a few lines below the authors write: "Previous studies have demonstrated that the overall dopamine release at the mPFC or the summed activity of mPFC dopamine axons exhibits a strong response to aversive stimuli (e.g., tail shock), but little to rewards", which seems inconsistent with their own data].

      We always recorded the reward and aversive response together, which might have confused the reviewer. Therefore, there is no inconsistency in our data. We have clarified our methods and reasoning accordingly.

      Given these aspects of the data, it could be the case that the dopamine axons in mPFC encodes different types of information and delegates preferential processing to the most salient outcome across time.

      This is certainly an exciting interpretation, so we have included it in our discussion. Meanwhile, ‘the most salient outcome’ alone cannot fully capture the diverse response patterns of the dopaminergic axons, particularly reward-preferring axons. We discuss our findings in more detail in the revised manuscript.

      The use of two similar sounding tones (9Khz and 12KHz) for the reward and aversive predicting cues are likely to enhance this as it requires a fine-grained distinction between the two cues in order to learn effectively. There is considerable literature on mPFC function across species that would support such a view. Specifically, theories of mPFC function (in particular prelimbic cortex, which is where the axon images are mostly taken) generally center around resolution of conflict in what to respond, learn about, and attend to. That is, mPFC is important for devoting the most resources (learning, behavior) to the most relevant outcomes in the environment. This data then, provides a mechanism for this to occur in mPFC. That is, dopamine axons signal to the mPFC the most salient aspects of the environment, which should be preferentially learned about and responded towards. This is also consistent with the absence of a negative prediction error during omission: the dopamine axons show increases in responses during receipt of unexpected outcomes, but do not encode negative errors. This supports a role for this projection in helping to allocate resources to the most salient outcomes and their predictors, and not learning per se. Below are a just few references from the rich literature on mPFC function (some consider rodent mPFC analogous to DLPFC, some mPFC), which advocate for a role in this region in allocating attention and cognitive resources to most relevant stimuli, and do not indicate preferential processing of aversive stimuli.

      Distinguishing between 9 kHz and 12 kHz sound tones may not be that difficult, considering anticipatory licking and running are differentially manifested. In addition, previous studies have shown that mice can distinguish between two sound tones when they are separated by 7% (de Hoz and Nelken 2014). Nonetheless, we agree with the attractive interpretation that “the mPFC devotes the most resources (learning, behavior) to the most relevant outcomes in the environment” and that dopamine is a mechanism for this. Therefore, we discuss this interpretation in the revised text.

      References:

      (1) Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual review of neuroscience, 24(1), 167-202.

      (2) Bissonette, G. B., Powell, E. M., & Roesch, M. R. (2013). Neural structures underlying set-shifting: roles of medial prefrontal cortex and anterior cingulate cortex. Behavioural brain research, 250, 91101.

      (3) Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual review of neuroscience, 18(1), 193-222.

      (4) Sharpe, M. J., Stalnaker, T., Schuck, N. W., Killcross, S., Schoenbaum, G., & Niv, Y. (2019). An integrated model of action selection: distinct modes of cortical control of striatal decision making. Annual review of psychology, 70, 53-76.

      (5) Ridderinkhof, K. R., Ullsperger, M., Crone, E. A., & Nieuwenhuis, S. (2004). The role of the medial frontal cortex in cognitive control. science, 306(5695), 443-447.

      (6) Nee, D. E., Kastner, S., & Brown, J. W. (2011). Functional heterogeneity of conflict, error, taskswitching, and unexpectedness effects within medial prefrontal cortex. Neuroimage, 54(1), 528-540.

      (7) Isoda, M., & Hikosaka, O. (2007). Switching from automatic to controlled action by monkey medial frontal cortex. Nature neuroscience, 10(2), 240-248.

      Reviewer #1 (Recommendations For The Authors):

      Specific Suggestions and Questions on the Methods Section:

      In general, the methods part is not well documented and sometimes confusing. Thus, as it stands, it hinders reproducible research. Specific suggestions/questions are listed in the following section.

      (1) Broussard et al. 2018 introduced axon-GCaMP6 instead of axon-jGCaMP8m. The authors should provide details about the source of this material. If it was custom-made, a description of the subcloning process would be appreciated. Additionally, consider depositing sequence information or preferably the plasmid itself. Furthermore, the introduction of the jGCaMP8 series by Zhang, Rozsa, et al. 2023 should be acknowledged and referenced in your manuscript.

      We thank the reviewer for pointing this out. We have now included details on how we prepared the axon-jGCaMP8m, which was based on plasmids available at Addgene. Additionally, we have deposited our construct to Addgene ( https://www.addgene.org/216533/ ). We have also cited Janelia’s report on jGCaMP8, Zhang et al.

      (2) The authors elaborate on the approach taken for experimental synchronization. Specifically, how was the alignment achieved between 2-photon imaging, treadmill recordings, aversive/appetitive stimuli, and videography? It would be important to document the details of the software and hardware components employed for generating TTLs that trigger the pump, stimulator, cameras, etc.

      We have now included a more detailed explanation about the timing control. We utilize a custommade MATLAB program that sends TTL square waves and analogue waves via a single National Instruments board (USB-6229) to control two-photon image acquisition, behavior camera image acquisition, water syringe movement, current flow from a stimulator, and sound presentation. We also continuously recorded at 30 kHz via a separate National Instrument board (PCIe-6363) the frame timing of two-photon imaging, the frame timing of a behavior camera, copies of command waves (sent to the syringe pump, the stimulator, and the speaker), and signals from the treadmill corresponding to running speed.

      (3) The information regarding the cameras utilized in the study presents some confusion. In one instance, you mention, "To monitor licking behavior, the face of each mouse was filmed with a camera at 60 Hz (CM3-U3-13Y3M-CS, FLIR)" (Line 488). However, there's also a reference to filming facial expressions using an infrared web camera (Line 613). Could you clarify whether the FLIR camera (which is an industrial CMOS not a webcam) is referred to as a webcam? Alternatively, if it's a different camera being discussed, please provide product details, including pixel numbers and frame rate for clarity.

      We thank the reviewer for pointing this out. This was a mistake on our end. The camera used in the current project was a CM3-U3-13Y3M-CS, not a web camera. We have now corrected this.

      (4) Please provide more information about the methodology employed for lick detection. Specifically, did the authors solely rely on videography for this purpose? If so, why was an electrical (or capacitive) detector not used? It would provide greater accuracy in detecting licking.

      Lick detection was performed offline based on videography, using DeepLabCut. As licking occurs at a frequency of ~6.5 Hz (Xu, …, O’Connor Nature Neurosci, 2022), the movement can be detected at a frame rate of 60 Hz. Initially, we used both a lick sensor and videography. However, we favored videography because it could potentially provide non-binary information.

      Other Minor Points:

      (5) Ensure consistency in the citation format; both Vander Weele et al. 2018 and Weele et al. 2019, share the same first author.

      Thank you for pointing this out. Endnote processes the first author’s name differently depending on the journal. We fixed the error manually. The first paper (2018) is an original research paper, and the second one (2019) is a review about how dopamine modulates aversive processing in the mPFC. We cited the second one in three instances where we mentioned review papers.

      (6) The distinction between "dashed vs dotted lines" in Figure 3K and 3M appears to be very confusing. Please consider providing a clearer visualization/labeling to mitigate this confusion.

      We have now changed the line styles.

      (7) Additionally plotting mean polar angles of aversive/appetitive axons as vectors in the Cartesian scatter plots (2J, 3I,J) would make interpretation easier.

      We have now made this change to Figures 2, 3, 4.

      (8) Data and codes should be shared in a public database. This is important for reproducible research and we believe that "available from the corresponding author upon reasonable request" is outdated language.

      We have uploaded the data to GitHub, https://github.com/pharmedku/2024-elife-da-axon.

      Reviewer #2 (Recommendations For The Authors):

      (1) Authors don't show which mouse each axon data comes from making it hard to know if differences arise from inter-mouse differences vs differences in axons. The best way to address this point is to show similar plots as Figure 2J & K but broken down by mouse to shows whether each mouse had evidence of these two clusters.

      We have now made this change to Figure 2-figure supplement 3.

      (2) Line 166: Should this sentence point to panels 2F, G, H rather than 2I which doesn't show a shock response?

      We thank the reviewer for pointing this out. We have fixed the incorrect labels.

      Line 195: The population level bias to aversive stimuli was shown previously using photometry so it is not justified to say "for the first time" regarding this statement.

      We have adjusted this sentences so the claim of ”for the first time” is not associated with the population-level bias.

      (4) The paper lacks a discussion of the potential role that novelty plays in the amplitude of the responses given that tail shocks occur less often that rewards. Is the amplitude of the first reward of the day larger than subsequent rewards? Would tail shock responses decay if they occurred in sequential trials?

      Following the reviewer's suggestion, we conducted a comparison of individual axonal responses to both conditioned and unconditioned stimuli across the first trial and subsequent trials. Our findings reveal a notable trend: aversive-preferring axons exhibited attenuation in response to CSreward, yet enhancement in response to CSaversive. Conversely, the response of these axons to USreward was attenuated, with no significant change observed for USaversive. In contrast, reward-preferring axons displayed an invariable activity pattern from the initial trial, highlighting the functional diversity present within dopamine axons. This analysis has been integrated into Figure 3-figure supplement 4 and is elaborated upon in the Discussion section.

      (5) Fix typo in Figure 1 - supplement 1. Shift

      We have now corrected this. Thank you.

      (6) The methods section needs information about trial numbers. Please indicate how many trials were presented to each mouse per day.

      We have now added the information about trial numbers to the Methods section.

      Reviewer #3 (Recommendations For The Authors):

      In line with the public review, my recommendation is for the authors to remain as objective about their data as possible. There are many points in the manuscript where the authors seem to directly contradict their own data. For example, they first detail that dopamine axons respond to unexpected water rewards. Indeed, they find that there are 40% of dopamine axons that respond in this way. Then, a few paragraphs later they state: "Previous studies have demonstrated that the overall dopamine release at the mPFC or the summed activity of mPFC dopamine axons exhibits a strong response to aversive stimuli (e.g., tail shock), but little to rewards". As detailed above, I do not think these data support an idea that dopamine axons in mPFC preferentially encode aversive outcomes. If the authors wanted to examine a role for mPFC in preferential encoding of aversive stimuli, you would first have to equate the outcomes by magnitude and then compare how the axons acquire preferences across time. Alternatively, a prediction of a more general process that I detail above would predict that you could give mice two rewards that differ in magnitude (e.g., lots of food vs. small water) and you would see the same results that the authors have seen here (i.e., a preference for the food, which is the larger and more salient outcome). Without other tests of how dopamine axons in mPFC respond to situations like this, I don't think any conclusion around mPFC in favoring aversive stimuli can be made.

      As suggested, we have made the current manuscript as objective as possible, removing interpretation aspects regarding what dopamine axons encode and emphasizing their functional diversity. In particular, we remove the word ‘encode’ when describing the response of dopamine axons.

      Although it may have appeared unclear, there was no contradiction within our data regarding the response to reward and aversive stimuli. We have now improved the readability of the Results and Methods sections. Concerning the interpretation of what exactly the mPFC dopamine axons encode, we have rewritten the discussion to be as objective about our data as possible, as suggested. We also have edited our title and abstract accordingly. Meanwhile, we wish to emphasize that our reward and aversive stimuli are standard paradigms commonly used in the field. We believe, and all the reviewers agreed, that reporting the diversity of dopamine axonal responses with a novel imaging design constitutes new insight for the neuroscience community. Therefore, we have decided to leave the introduction of new behavioral tasks for future studies and instead expanded our discussion.

      As mentioned, I think the experiments are executed really well and the technological aspects of the authors' methods are impressive. However, there are also some aspects of the data presentation that would be improved. Some of the graphs took a considerable amount of effort to unpack. For example, Figure 4 is hard going. Is there a way to better illustrate the main points that this figure wants to convey? Some of this might be helped by a more complete description in the figure captions about what the data are showing. It would also be great to see how the response of dopamine axons changes across trial within a session to the shock and water-predictive cues. Supp Figure 1 should be in the main text with standard error and analyses across time. Clarifying these aspects of the data would make the paper more relevant and accessible to the field.

      We thank the reviewer for pointing out that the legend of Figure 4 was incomplete. We have fixed it, along with improving the presentation of the figure. We have also prepared a new figure (Figure 3– figure supplement 4) to compare CSaversive and CSreward signals for the first and rest of the trials within daily sessions, revealing further functional diversity in dopamine axons. We have decided to keep Figure 1–figure supplement 2 as a figure supplement with an additional analysis, as another reviewer pointed out that the design is not completely new. Furthermore, as eLife readers can easily access figure supplements, we believe it is appropriate to maintain it in this way.

      Minor points:

      (1) What is the control period for the omission test? Was omission conducted for the shock?

      The control period for reward omission is a 2-second period just before the CS onset. We did not include shock omission, because a sufficient number of trials (> 6 trials) for the rare omission condition could not be achieved within a single day.

      (2) The authors should mention how similar the tones were that predicted water and shock.

      According to de Hoz and Nelken (2014), a frequency difference of 4–7% is enough for mice to discriminate between tones. In addition, anticipatory licking and running confirmed that the mice could discriminate between the frequencies. We have now included this information in the Discussion.

      (3) I realize the viral approach used in the current studies may not allow for an idea of where in VTA dopamine neurons are that project to mPFC- is there data in the literature that speak to this? Particularly important as we now know that there is considerable heterogeneity in dopamine neuronal responses, which is often captured by differences in medial/lateral position within VTA.

      Some studies have suggested that mesocortical dopamine neurons are located in the medial posterior VTA (e.g., Lammel et al., 2008). However, in mouse anterograde tracing, it is not possible to spatially confine the injection of conventional viruses/tracers. We now refer to Lammel et al., 2008 in the Introduction.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      HP1 plays a pivotal role in orchestrating chromatin packaging through the creation of biomolecular condensates. The existence of distinct homologs offers an intriguing avenue for investigating the interplay between genetic sequence and condensate formation. In this study, the authors conducted extensive coarse-grained simulations to delve into the phase separation behavior of HP1 paralogs. Additionally, the researchers delved into the captivating possibility of various HP1 paralogs co-localizing within assemblies composed of multiple components. Importantly, the study also delved into the critical role of DNA in finely tuning this complex process.

      Strengths:

      I applaud the authors for their methodical approach in conducting simulations aimed at dissecting the contributions of hinges, CTE, NTE, and folded regions. The comprehensive insights unveiled in Figure 3 compellingly substantiate the significance of these protein components in facilitating the process of phase separation.

      This systematic exploration has yielded several innovative revelations. Notably, the authors uncovered a nuanced interplay between the folded and disordered domains. Although disordered regions have traditionally been linked to driving phase separation through their capacity for forming multivalent interactions, the authors have demonstrated that the contribution of the CD cannot be overlooked, as it significantly impacts the saturation concentration.

      The outcomes of this study serve to elucidate the intricate mechanisms and regulatory aspects governing HP1 LLPS.

      Weaknesses:

      The authors do not provide an assessment of the quantitative precision of their model. To illustrate, HP1a is anticipated to undergo phase separation primarily under low salt concentrations. Does the model effectively capture this sensitivity to salt conditions? Regrettably, the specific salt conditions employed in the simulations are not explicitly stated. While I anticipate that numerous findings in the manuscript remain valid, it could be beneficial to acknowledge potential limitations tied to the simulations. For instance, might the absence of quantitative precision impact certain predictions, such as the CD's influence on phase separation?

      We thank the reviewer for their kind feedback and for highlighting the essential mechanistic insights obtained from our study. We have addressed the concerns raised by the reviewer below, and the specific amendments made in the manuscript are also delineated.

      We appreciate the reviewer's comment on our model. Our coarse-grained (CG) physics-based model integrates electrostatic and short-range interactions, parametrized based on the Urry hydrophobicity scale. This approach effectively bridges the timescale gap between simulation and experiment, offering a transferable framework to compute protein phase diagrams in temperature-concentration space that can be compared to experimental phase behavior (1). Additionally, the vdW contact probability per residue correlation between AA and CG simulations (Fig. S1 f-h) underscores our model’s capability to uncover the mechanistic insights into the phase separation of HP1 paralogs. Despite its simplicity and widespread adoption for studying sequence-dependent phase separation in biomolecular condensates, we recognize that our CG model does not yet fully replicate experimental observations or the nuanced effects of local secondary structures on phase-separation propensities. We are actively refining our methods and exploring new strategies to enhance the accuracy and efficiency of CG models for the study of biological phase separation.

      In assessing the influence of salt on the LLPS of HP1α, we note that Wang et al. (2) demonstrated that HP1α can undergo LLPS at a low salt concentration (50 mM KCl). Furthermore, Wohl et al. (3) showed that the CG HPS (Kapcha-Rossky) model can capture the salt-dependent LLPS behavior through the electrostatic screening in HP1a, a Drosophila homolog of human HP1α. In our CG model, the salt concentration is captured by the DebyeHuckle term with tunable screening lengths, which allows for the simulations of salt-dependent effects in the low salt regime. We have added Figure S5 to illustrate the influence of salt on the LLPS propensity of HP1α. In the low-salt regime (50 mM), the Csat of HP1α was reduced by twofold compared to that at 100 mM. Increasing the salt concentration to 150 mM raised the Csat and started destabilizing the condensate. In the high salt regime (200500 mM), HP1α did not undergo phase separation, consistent with the experimental observations (2, 4–6).

      Author response image 1.

      Salt-dependent effects on the LLPS of HP1α homodimer. (a, b) Density profiles and snapshots of HP1α homodimer simulation with the box dimensions of 170x170x1190 Å3 at differing salt concentrations, 50, 100, 150, 200, 250, and 500 mM, respectively. The simulations were conducted at 320 K using the HPS-Urry model.

      However, the primary objectives of our study are to elucidate the molecular interactions and to delineate the domain contributions that dictate the distinct phase-separation behaviors of the HP1 paralogs. To this end, we standardized our simulation conditions to a physiological salt concentration of 100 mM for all paralog constructs, facilitating a direct comparison and enabling physiologically relevant predictions, including those for the CD domain. We have added the salt concentration used in the CG simulations in the Materials and Methods section, relevant figure captions, and the following sentence in the third paragraph of the Discussions section to improve clarity.

      “…Our CG simulations corroborate these experimental observations, indicating that a low salt concentration (50 mM) promotes the LLPS of HP1α. Raising the salt concentration weakens the electrostatic interactions and increases the Csat, eventually precluding HP1α’s phase separation at high salt regimes (200-500 mM) (Fig. S5).”

      Reviewer #2 (Public Review):

      In this paper, Phan et al. investigate the properties of human HP1 paralogs, their interactions and abilities to undergo liquid-liquid phase separation. For this, they use a coarse-grained computational approach (validated with additional all-atom simulations) which allows to explore complex mixtures. Matching (wet-lab) experimental results, HP1 beta (HP1b) exhibits different properties from HP1 alpha and gamma (HP1a,g), in that it does not phase separate. Using domain switch experiments, the authors determine that the more negatively charged hinge in HP1b, compared to HP1a and HP1g, is mainly responsible for this effect. Exploring heterotypic complexes, mixtures between HP1 subtypes and DNA, the authors further show that HP1a can serve as a scaffold for HP1b to enter into condensed phases and that DNA can further stabilize phase separated compartments. Most interestingly, they show that a multicomponent mixture containing DNA, and HP1a and HP1b generates spatial separation between the HP1 paralogs: due to increased negative charge of DNA within the condensates, HP1b is pushed out and accumulates at the phase boundary. This represents an example how complex assemblies could form in the cell.

      Overall, this is purely computational work, which however builds on extensive experimental results (including from the authors). The methods showcase how coarse-grained models can be employed to generate and test hypotheses how proteins can condense. Applied to HP1 proteins, the results from this tour-de-force study are consistent and convincing, within the experimental constraints. Moreover, they generate further models to test experimentally, in particular in light of multicomponent mixtures.

      There are, of course, some limitations to these models.

      First, the CG models employed probably will not be able to pick up more complex structure-driven interactions (i.e. specific binding of a peptide in a protein cleft, including defined H-bonds, or induced structural elements). Some of those interactions (i.e. beyond charge-charge or hydrophobics) may also play a role in HP1, and might be ignored here. There is also the question of specificity, i.e. how can diverse phases coexist in cells, when the only parameters are charge and hydrophobicity? Does the arrangement of charges in the NTD, hinges and CTDs matter or are only the average properties important?

      We thank the reviewer for the thoughtful comments. We also appreciate the opportunity to incorporate the feedback on the reviewer’s concerns below.

      We agree that the interaction picture becomes more sophisticated, and many interaction modes may be involved in the phase coexistence in the cell environment. However, due to system sizes and required sampling, studying LLPS at an atomistic resolution remains challenging with the current state-of-the-art computer hardware. Our approach employs the CG model to reduce the computational cost but still capture the predominant interactions at the residue level. We have added the plots (Fig. S1 f-h) to show the correlation of the vdW contact probability per residue for each paralog between AA and CG simulation. The Pearson correlation coefficient is approximately 0.86, suggesting a strong positive linear correlation in the contact propensity between AA and CG simulations.

      Author response image 2.

      Our sequence analysis reveals a high fraction of charged residues in HP1 paralogs, with Arg, Lys, Glu, and Asp constituting 39-45% of the total amino acid count in the sequence. This property may explain why the electrostatic interactions are predominantly involved in the phase-separation behaviors of HP1 paralogs. Our findings on electrostatically driven phase separation and co-localization of HP1 paralogs are consistent with experimental observations by Larson et al. and Keenen et al. (5, 6). Significantly, we observe that the charge patterning in the disordered regions (NTE, hinge, and CTE) plays a critical role in the LLPS of HP1 paralogs, as articulated in the second paragraph of the Discussions section. Modifying this charge patterning, such as by phosphorylating serine residues in HP1α, excising the HP1α CTE, or substituting four acidic residues with basic ones in the HP1β hinge, can profoundly augment the LLPS of these proteins (4, 5, 7). Our in silico molecular details, complemented by in vitro observations, lay a solid foundation for future experiments. These future investigations may delve deeper into the specificity of interactions and the role of structural elements in modulating HP1 phase separation.

      Second, the authors fix CSD-CSD dimers, whereas these interactions are expected to be quite dynamic. In the particular example of HP1 proteins, having dimerization equilibria may change the behavior of complex mixtures significantly, e.g. in view of the proposed accumulation of HP1b at a phase boundary. This point would warrant more discussion in the paper. Moreover, the biological plausibility of such a behavior would be interesting. Is there any experimental data supporting such assemblies?

      We appreciate the reviewer's insightful comment regarding the dynamic nature of CSD-CSD interactions in HP1 proteins. Our assumption of fixing CSD-CSD dimers is grounded on reported dissociation constant (Kd) values for HP1α and HP1β, which are within the nanomolar range, indicative of strong dimerization affinity (4, 8). While the precise Kd values for HP1γ are not available, a study has demonstrated that HP1γ dimerization is crucial for its interaction with chromatin, suggesting a similar strong dimerization tendency as its paralogs (9, 10). Furthermore, evidence from the literature underscores the dimeric functionality of HP1 paralogs facilitated by their ChromoShadow Domains (CSD), which are instrumental in forming stable genomic domains and engaging in crucial interactions within chromatin architecture (5, 6, 11).

      However, we acknowledge that despite the strong dimerization affinity, the CSD-CSD interactions exhibit dynamics, which may influence the behavior of complex mixtures, particularly at phase boundaries. A study by Nielsen et al. (12) shows that mammalian HP1 paralogs can interact directly with one another to form heterodimers. Moreover, the CSD-CSD interface has been shown to act as a hub for transient interactions with diverse binding partner proteins (5, 13). These experimental observations reflect the dynamic nature of CSD-CSD interactions. However, due to the computational constraints and the focus of our study, a simplified static model was employed to gain initial insights into the phase separation behaviors of HP1 paralogs. We believe that the dynamic nature of CSD-CSD interactions and its implications for phase behavior in complex mixtures form an exciting avenue for future computational and experimental studies.

      In light of the reviewer’s comment, we have expanded our discussion in the 6th paragraph of the Discussions Section:

      “... It is important to emphasize that our model is predicated on the assumption that HP1 proteins establish stable chromoshadow domain (CSD-CSD) dimers, a hypothesis supported by their Kd values being in the nanomolar range (13, 53). While this simplification serves as a useful starting point, it may not fully capture the dynamic nature of HP1 dimerization. Further computational and experimental studies are needed to understand better the behavior of the complex mixtures of HP1 paralogs, particularly at phase boundaries.”

      References: 1) R. M. Regy, J. Thompson, Y. C. Kim, J. Mittal, Improved coarse‐grained model for studying sequence dependent phase separation of disordered proteins. Protein Sci., doi: 10.1002/pro.4094 (2021).

      2) L. Wang, Y. Gao, X. Zheng, C. Liu, S. Dong, R. Li, G. Zhang, Y. Wei, H. Qu, Y. Li, C. D. Allis, G. Li, H. Li, P. Li, Histone Modifications Regulate Chromatin Compartmentalization by Contributing to a Phase Separation Mechanism. Mol. Cell 76, 646-659.e6 (2019).

      3) S. Wohl, M. Jakubowski, W. Zheng, Salt-Dependent Conformational Changes of Intrinsically Disordered Proteins. J. Phys. Chem. Lett. 12, 6684–6691 (2021).

      4) C. Her, T. M. Phan, N. Jovic, U. Kapoor, B. E. Ackermann, A. Rizuan, Y. C. Kim, J. Mittal, G. T. Debelouchina, Molecular interactions underlying the phase separation of HP1α: role of phosphorylation, ligand and nucleic acid binding. Nucleic Acids Res., gkac1194 (2022).

      5) A. G. Larson, D. Elnatan, M. M. Keenen, M. J. Trnka, J. B. Johnston, A. L. Burlingame, D. A. Agard, S. Redding, G. J. Narlikar, Liquid droplet formation by HP1α suggests a role for phase separation in heterochromatin. Nature 547, 236–240 (2017).

      6) M. M. Keenen, D. Brown, L. D. Brennan, R. Renger, H. Khoo, C. R. Carlson, B. Huang, S. W. Grill, G. J. Narlikar, S. Redding, HP1 proteins compact dna into mechanically and positionally stable phase separated domains. eLife 10, 1–38 (2021).

      7) W. Qin, A. Stengl, E. Ugur, S. Leidescher, J. Ryan, M. C. Cardoso, H. Leonhardt, HP1β carries an acidic linker domain and requires H3K9me3 for phase separation. Nucleus 12, 44–57 (2021).

      8) S. V. Brasher, The structure of mouse HP1 suggests a unique mode of single peptide recognition by the shadow chromo domain dimer. EMBO J. 19, 1587–1597 (2000).

      9) X. Li, S. Wang, Y. Xie, H. Jiang, J. Guo, Y. Wang, Z. Peng, M. Hu, M. Wang, J. Wang, Q. Li, Y. Wang, Z. Liu, Deacetylation induced nuclear condensation of HP1γ promotes multiple myeloma drug resistance. Nat. Commun. 14, 1290 (2023).

      10) Y. Mishima, C. D. Jayasinghe, K. Lu, J. Otani, M. Shirakawa, T. Kawakami, H. Kimura, H. Hojo, P. Carlton, S. Tajima, I. Suetake, Nucleosome compaction facilitates HP1γ binding to methylated H3K9. Nucleic Acids Res. 43, 10200–10212 (2015).

      11) D. O. Trembecka-Lucas, J. W. Dobrucki, A heterochromatin protein 1 (HP1) dimer and a proliferating cell nuclear antigen (PCNA) protein interact in vivo and are parts of a multiprotein complex involved in DNA replication and DNA repair. Cell Cycle 11, 2170–2175 (2012).

      12) A. L. Nielsen, M. Oulad-Abdelghani, J. A. Ortiz, E. Remboutsika, P. Chambon, R. Losson, Heterochromatin formation in mammalian cells: Interaction between histones and HP1 Proteins. Mol. Cell 7, 729–739 (2001).

      13) A. Thiru, D. Nietlispach, H. R. Mott, M. Okuwaki, D. Lyon, P. R. Nielsen, M. Hirshberg, A. Verreault, N. V. Murzina, E. D. Laue, Structural basis of HP1/PXVXL motif peptide interactions and HP1 localisation to heterochromatin. EMBO J. 23, 489–499 (2004).

      14) P. Yu Chew, J. A. Joseph, R. Collepardo-Guevara, A. Reinhardt, Thermodynamic origins of two-component multiphase condensates of proteins. Chem. Sci. 14, 1820–1836 (2023).

      Recommendations for the authors:

      In this important work, the authors apply a residue-resolution protein coarse-grained model to investigate the differences in molecule dimensions and phase behaviour of three HP1 paralogs, HP1 paralog mixtures, and HP1/DNA mixtures. The simulations are well designed to investigate the impact of HP1 sequence on its phase behaviour. The work reveals that electrostatic interactions are a key determinant of HP1 paralog phase behaviour; hence advancing our understanding of the molecular mechanisms driving the phase separation behaviour of HP1 paralogs. Notably, the authors uncovered a nuanced interplay between the folded and disordered domains of HP1. Although disordered regions have traditionally been linked to driving phase separation through their capacity for forming multivalent interactions, the authors demonstrate that the contribution of the CD cannot be overlooked, as it significantly impacts the saturation concentration.

      Essential revisions (based on reviewers assessment below):

      1) The manuscript describes the results of both single-molecule simulations and direct coexistence simulations. However, it is not very easy for the reader to determine which types simulations were performed in each section. The details on the simulations input parameters are also missing. Such details are needed throughout, i.e. to allow readers to follow the work and its implications. For instance, the specific salt conditions employed in the simulations are not explicitly stated. Since HP1 charge is presented as a key regulator for the modulation of HP1 paralogs radii of gyration and their phase behaviour, it is crucial for the authors to explicitly describe the salt concentration used for the different simulations and highlight how the relative differences observed are expected to change as the salt concentration decreases/increases.

      We have turned the first sentences in the paragraphs into subtitles to describe the results of single homodimers in dilute phase and multi-dimers in phase coexistence simulations.

      “Sequence variation affects the conformations of HP1 paralogs in the dilute phase.”

      “Sequence variation in HP1 paralogs leads to their distinct phase separation behaviors.”

      To improve the clarity, we have also added the following sentence to Fig. 2 caption.

      “… Figs. 2a-e show the results obtained under dilute conditions, while Figs. 2f-m illustrate the conditions of phase coexistence.”

      We have specified the salt concentration used in the CG simulations in the Materials and Methods section and the relevant figure captions to improve clarity. We also addressed the reviewer’s comment on salt concentration in the public review above.

      2) Since direct coexistence simulations suffer from important finite-size effects, especially for multi-component mixtures as those investigated here, describing how many proteins/DNA copies were used per system, the size of the simulation, and which checks were done to check for finite-size effects is important. Regarding this point, estimating C_sat from Direct Coexistence simulations is extremely challenging, given the sensitivity of the dilute phase concentration to the box dimensions. Hence, it would be valuable if the authors clarify that the differences on C_sat provided represent a qualitative comparison and are sensitive to the simulation conditions. Importantly, the observation of spatial segregation of components in multi-component condensates could be an artefact of the box dimensions, relative copies of the various components, and overall system density.

      We appreciate the reviewer’s concern regarding the finite-size effects in phase coexistence simulations and potential artifacts arising from box dimensions and system composition. In response to this, we have expanded the Materials and Methods section to elaborate on the specific checks to examine the finite-size effects. The new texts and additional SI figures are shown below.

      “Previous studies have demonstrated that slab geometry can help mitigate finite-size effects and facilitate efficient sampling of the phase diagram (41). To assess the potential impact of finite-size effects with our chosen box dimensions, we conducted a test using the HP1α homodimer, which serves as a representative system given the comparable sequence lengths of HP1 paralogs and their chimeras. By reducing the system size by 30% and constructing its phase diagram, we observed that both the original system size (50 dimers) and the reduced counterpart (35 dimers) produced similar phase diagrams, with critical temperatures of 353.3 K and 352.1 K, respectively, as shown in Figs. S4a,b.

      We further evaluated the influence of the xy cross-sectional area on the measurement of Csat. With the z-direction box length fixed at 1190 ų, we varied the xy cross-sectional areas (120x120, 150x150, and 200x200 Ų) while maintaining the protein density consistent with the control case (170x170 Ų). Given that HP1 dimers are multidomain proteins, a 120x120 Ų cross-section was the minimum size feasible to prevent particle overlap in HOOMD simulations due to the constraints of the small box size. Our findings indicate that the condensates remained stable across all tested cross-sectional areas and that there were no significant differences in Csat measurements within the margin of error, as depicted in Figs. S4c,d. These results confirm that our chosen box size is sufficiently large to minimize finite-size effects, thus ensuring the robustness of our results.”

      Author response image 3.

      Finite-size analysis. (a) Phase diagrams for the HP1α homodimer (50 dimers) and for a system reduced in size by 30% (35 dimers), with critical temperatures of 353.3 K and 352.1 K, respectively. (b) Density profiles of HP1α and its reduced size counterpart at various temperatures. (c, d) Density profiles and snapshots of HP1α homodimer simulation with box dimensions of 170x170x1190 Å3 and for systems with z-direction length fixed at 1190 Å and varying cross-sectional areas: 120x120, 150x150, and 200x200 Å2. The black dashed line shows the simulated saturation concentration of wildtype HP1α homodimer in the box dimensions of 170x170x1190 Å3. The simulations were conducted at 320 K and 100 mM salt concentrations. The error bars represent the standard deviation from triplicate simulation sets.

      In response to the observed spatial segregation in our multi-component condensates, we have carefully considered finite-size effects and are confident that the segregation reflects genuine phase behavior rather than an artifact of simulation parameters. This interpretation is supported by findings from Chew et al. (14), who observed similar multilayered condensates and conducted thorough validations to verify these phases. To clarify our approach, we have included additional details in the Materials and Methods section of our manuscript.

      “... By selecting a box size that minimizes finite-size effects, we can ensure that the spatial segregation observed in our multi-component condensates reflects genuine phase behavior. This finding aligns with Chew et al. (66), who also reported well-separated multilayered condensates and conducted thorough validations to confirm these phases.”

      3) The authors should provide a clearer assessment of the quantitative precision of their model. For instance, the authors use all-atom simulations to compare with CG interaction maps. The all-atom maps are sparser due to less sampling, but the authors state that the maps are 'in good agreement'. How do the authors judge this? The issue of model validation is very important: to illustrate, HP1a is anticipated to undergo phase separation primarily under low salt concentrations. Does the model effectively capture this sensitivity to salt conditions? While numerous findings in the manuscript likely remain valid, it could be beneficial to acknowledge potential limitations tied to the simulations. For instance, might the absence of quantitative precision impact certain predictions, such as the CD's influence on phase separation?<br /> The CG models employed do not consider the specific binding of a peptide in a protein cleft, including defined H-bonds, or induced structural elements. Thus, the authors should discuss whether specific interactions (i.e. beyond charge-charge or hydrophobics) may also play a role in the phase behaviour of HP1, and why it makes sense to ignore them in this study. If the only important parameters are charge and hydrophobicity, how can diverse phases coexist in cells? Does the arrangement of charges in the NTD, hinges and CTDs matter or are only the average properties important?

      This is similar to the point made by Reviewer 2 in the Public Review. We have addressed these questions in the public review and incorporated new plots (Fig. S1 f-h) in the SI.

      4) The authors fix CSD-CSD dimers, whereas these interactions are expected to be quite dynamic. In the particular example of HP1 proteins, having dimerization equilibria may change the behaviour of complex mixtures significantly, e.g. in view of the proposed accumulation of HP1b at a phase boundary. This point warrants more discussion in the paper.

      We have addressed the comment in the public review and extended the discussion in the Discussion section.

      Reviewer #2 (Recommendations For The Authors):

      The authors use all-atom simulations to validate their CG model. In Figure S1, they compare interaction maps. Of course, the AA maps are sparser due to less sampling, but the authors state that the maps are 'in good agreement'. How do the authors judge this (they do not look very similar to me, e.g. the NTD-hinge interactions are mostly lacking)?

      This is similar to Reviewer 1’s concern. We agree that the AA simulations are moderately limited over 5 μs due to the large size of the HP1 proteins (~400 residues in a dimer). However, the expansion trends of the average dimensions of the HP1 paralogs agree with the CG simulations (Fig. S1 a,b). Regarding the AA contact maps, we agree that they are relatively sparse, which makes it difficult to compare them to the CG maps. We have added new plots (Fig. S1 f-h) to show the correlation of the vdW contact probability per residue for each paralog in the AA and CG simulations. The Pearson correlation coefficients are approximately 0.86, suggesting a strong positive linear correlation in the contact propensity between AA and CG simulations.

    1. Author Response

      The following is the authors’ response to the original reviews.

      REVIEWER #1:

      The authors present a carefully controlled set of experiments that demonstrate an additional complexity for GPCR signaling in that endosomal signaling make be different when b-arrestin is or isn't associated with a G protein-bound V2R vasopressin receptor. It uses state of the art biosensorbased approaches and b-arrestin KO lines to assess this. It adds to a growing body of evidence that G proteins and b-arrestin can associate with GPCR complexes simultaneously. They also demonstrate the possibility that Gaq might also be activated by the V2R receptor. My sense is one thing they may need to be considered is the possibility of such "megacomplexes" might actually involve receptor dimers or oligomers.

      1.1 Can the authors please review the data that describes the concept of "GPCR megacomplexes"? I feel this is missing from the introduction. The notion means different things to different people. As you will see from my other comments, you should especially focus on evidence at the level of the single receptor.

      We appreciate the reviewer’s comments and have now included a more wholesome description of the GPCR megacomplex, or ‘megaplex’, concept in the introduction (page 2, 1st paragraph).

      1.2 The authors use mini-G proteins to conclude that V2R receptors interact with Gaq (in addition to Gas). I would prefer if there were a more direct measure of this. Can the authors show that the receptor interacts with full length Gaq (and not the other G proteins in Figure)? Is there a signaling phenotype associated with Gaq coupling? Is it sensitive to Gaq inhibition?

      Excellent point and we are happy to expand further on this. The ability of the V2R to activate Gq/11 has already been demonstrated before (Zhu, X. et al. Mol Pharmacol 46(3):460-9 (1994); Lykke, K. et al. Physiol Rep. 3(8):e12519 (2015); Avet, C. et al. eLife 11: e74101 (2022); Heydenreich, F.M. et al. Mol Pharmacol 102(3):139-49 (2022). Therefore, we did not attempt to document this activation using more traditional assays. On the other hand, to demonstrate an interaction between V2R and Ga subunit in cells is challenging for several reasons. First, the full-length Ga subunit is already located at the plasma membrane at basal state, and thus, generates high background signals in proximity assays. Second, upon receptor activation, the Ga subunit interaction with V2R is so transient that it is difficult, if not impossible, to catch this transient moment in a proximity assay. Although the miniG proteins are highly engineered, coupling specificity of the different subtypes (Gas, Gai/o, Gaq/11, and Ga12/13) to GPCRs is maintained. In addition, as they are homogenously expressed in the cytosol under basal states rather than at the membrane, they generate low background noise. Upon agonist stimulation, miniG proteins are recruited from the cytosol to the V2R at the plasma membrane, resulting in a robust signal in proximity assays. Thus, miniG proteins are unique in that they can actually detect GPCR–G protein interactions in cellular proximity assays, which is very challenging using full-length Ga subunits.

      That being said, we fully understand the reviewer’s concern and greatly value the effort in enhancing robustness of our study. Therefore, we have now monitored downstream signaling events of Gaq/11 in the absence or presence of the selective Gaq/11 inhibitor YM-254890 as a secondary method of documenting Gaq/11 activity. Specifically, we used a newly developed biosensor to measure diacylglycerol (DAG) production, a downstream second messenger of Gaq/11 activation, at both the plasma membrane and endosomes. Using a second biosensor, we detect general protein kinase C (PKC) activation, which is another downstream signaling event of Gaq/11 activation. Together, we demonstrated that AVP-stimulation leads to DAG production at both the plasma membrane and endosomes (Fig. 1C-D) as well as PKC activation (Fig. 1E), which all are sensitive to YM-254890 inhibition (Fig. 1C-D and E). Together these results rigorously suggest that the V2R interacts with and activates Gaq/11.

      1.3 I raise a similar concern with Gaq coupling in endosomes.

      For similar reasons that miniG proteins are excellent tools for demonstrating V2R interaction with G proteins at the plasma membrane, miniG proteins can also be used to detect V2R interaction with G proteins at endosomes by measuring proximity between miniG and an endosomal marker in response to agonist challenge. However, to ensure that the endosomal recruitment of miniGsq to the V2R demonstrated in our study corresponds to endosomal Gaq/11 activation, we monitored the production of DAG at the early endosomes in a similar way to which we detected DAG production at the plasma membrane. As shown in Fig. 1D, stimulation of V2R with AVP induces recruitment of the DAG-binding biosensor to the early endosomal marker Rab5. Pre-treatment of the cells with the selective Gaq/11 inhibitor YM-254890 abrogated this response, confirming that V2R activation leads to production of DAG at the early endosomes in a Gaq/11-dependent manner (Fig. 1D).

      1.4 Can the confocal data be shown for Gai and Ga12?

      Yes, we can certainly show this data as negative control. We have now included the confocal data using Halo-mGsi as a negative control for confocal microscopy (Fig. 2). As seen on this figure, mGsi does not colocalize with Lck (plasma membrane), nor with EEA1 (early endosomes) upon stimulation of cells with AVP in line with a receptor that does not couple to Gai/o.

      We did not include data using Halo-mG12, as this G protein subtype, similar to Gi/o, does not couple functionally to V2R. Therefore, it is highly unlikely we would obtain different results from the experiments using Halo-mGsi.

      1.5 The authors want us to believe that there is simultaneous binding of G proteins and b-arrestin. This is never demonstrated and is at odds with the structural basis of G protein and b-arrestin binding. Have the authors considered that "simultaneous" occupancy might simply reflect binding at distinct GPCR monomers in the context of dimeric or oligomeric receptors? They could I suppose provide data at the level of a single receptor rather than using the bulk BRET approaches used.

      We appreciate the comment and opportunity to highlight some of our previous work, which address the megacomplexes at the level of a single receptor. First, we have characterized the megacomplex biochemically and structurally at a low resolution (Thomsen ARB et al. 2016, Cell 166(4):907-19). The results unequivocally demonstrate that a single GPCR interacts simultaneously with heterotrimeric G protein, at the receptor core, and with b-arrestin via the phosphorylated receptor carboxy-terminal. We also documented functionality of the megacomplex as the receptor can interact with and activate the G protein, which were shown by 3 different biochemical approaches (Thomsen ARB et al. 2016, Cell 166(4):907-19). In addition, we solved a high-resolution cryo-EM structure of a megacomplex further highlighting the architecture of this complex (Nguyen AH et al. 2019, Nat Struct Mol Biol 26:1123-31). As both biochemical and structural analyses were done in vitro in which the receptor was embedded in a detergent micelle, we also confirmed that the megacomplex structural architecture fits naturally within the context of a membrane in molecular dynamics simulation experiments (Nguyen AH et al. 2019, Nat Struct Mol Biol 26:1123-31).

      In cells, we and others have also showed that GPCRs such as the V2R can bind b-arrestins exclusively via the phosphorylated carboxy-terminal tail as it does in the megacomplex (Kumari P et al. 2016, Nat Commun 7:13416; Cahill III TJ et al. 2017, PNAS 114(10):2562-67; Kumari P et al. 2017, Mol Biol Cell 28(8):1003-10; Chen K et al. 2023, Nature (online doi: https://doi.org/10.1038/s41586-023-06420-x). In addition, we and others have used BRET and confocal microscopy to show that the V2R and other GPCRs recruit G protein and b-arrestin simultaneously and that the three components colocalize in endosomes upon prolonged agonist exposure (Thomsen ARB et al. 2016, Cell 166(4):907-19; Chen K et al. 2023, Nature (online doi: https://doi.org/10.1038/s41586-023-06420-x). As the reviewer correctly points out, in these cellular experiments (as well as in single molecule microscopy), the working resolution is not high enough to rule out that the receptors that co-recruit G protein and b-arrestin in endosomes could be dimeric instead of monomeric. Thus, we conducted a series of experiments with GPCR–b-arrestin fusions where the two proteins are covalently attached at the receptor carboxy-terminal tail. We showed that despite the GPCR–b-arrestin coupling being fully functional (in respect to b-arrestin promoting a highaffinity state of the receptor for agonist binding and constitutively internalizing the receptor) the receptor could still activate G proteins (Thomsen ARB et al. 2016, Cell 166(4):907-19; Nguyen AH et al. 2019, Nat Struct Mol Biol 26:1123-31), which demonstrates that the single receptor megaplex can physically form in cells.

      We have now included an extra paragraph in the discussion to go over these megaplex-related considerations (5th paragraph in the discussion), and we thank the reviewer for raising this point.

      1.6 Please introduce abbreviations when you first use this- this was not done consistently.

      Thank you for noticing these errors, which we now have corrected.  

      REVIEWER #2:

      This manuscript by Daly et al., probes the emerging paradigm of GPCR signaling from endosomes using the V2R as a model system with an emphasis on Gaq/11 and b-arrestins. The study employs cellular imaging, enzyme complementation assays and energy transfer-based sensors to probe the potential formation of GPCR-G-protein-b-arrestin megaplexes. While the study is certainly very interesting, it appears to be very preliminary at many levels, and clearly requires further development in order to make robust conclusions. The authors should consider expanding on this work further to make the points more convincingly to make the work solid and impactful. The two corresponding authors are among the leaders in the field having demonstrated the existence of megaplexes, and building on the work in a systematic fashion should certainly move the paradigm forward. As the work presented in the current manuscript is already pre-printed, the authors should take this opportunity to present a completer and more comprehensive story to the field.

      We are grateful for the time and efforts the reviewer has put into reviewing our work. We are certainly excited to learn that the reviewer finds our work “very interesting”. Regarding the robustness, we have added extra control experiments to increase the completeness of the study. These experiments include:

      • Measurements of AVP-stimulated diacylglycerol production, a signaling event downstream of Gaq/11 activation. These measurements were conducted both at plasma membrane (Fig. 1C) and early endosomes (Fig. 1D) using a newly developed DAG-binding biosensor, and demonstrate that the V2R activates Gaq/11 at both of these subcellular locations.

      • Monitoring AVP-promoted protein kinase C activation, another downstream signaling effect of Gaq/11 activation (Fig. 1E). The result of this approach shows in another way that V2R activates of Gaq/11.

      • Inhibition of signaling events downstream of Gaq/11 activation using the selective of Gaq/11 inhibitor YM254890. YM-254890 inhibits both AVP-stimulated DAG production at plasma membrane and endosomes as well as PKC activation (Fig. 1C-E), which strongly confirms that these signaling outputs are results of Gaq/11 activation.

      • We have also included the confocal data using Halo-mGsi as a negative control for confocal microscopy (Fig. 2). As seen in this figure, mGsi does not translocate to the plasma membrane or early endosomes upon stimulation with AVP, which validates that V2R activation does not couple to and activate Gai/o.

      Finally, we would like to kindly remind the reviewer that the production of the pre-print manuscript is part of the peer-review process in eLife.

      2.1 The use of miniG proteins in these experiments is a major concern as these are highly engineered and may not represent the true features of G proteins. While these have been used as a readout in other publications, their use in demonstrating megaplex formation is sub-optimal, and native, full-length G proteins should be used.

      We are a bit unsure as to what the reviewer means by using native full-length G proteins. If the reviewer is suggesting to co-immunoprecipitate V2R with native unlabeled G protein and b-arrestin, it should be considered that the G protein interaction with the receptor is extremely transient and unlikely to survive the pull-down procedure unless stabilized by a nanobody or crosslinking. Although the b-arrestin interaction with the receptor is more stable of nature, co-immunoprecipitation with the receptor requires crosslinking or stabilization with a Fab/nanobody. Therefore, we do not think this approach can be used as a more accurate way of detecting native megaplexes.

      If the reviewer is suggesting the use of full-length G proteins in our cell-based proximity assays instead of miniG proteins, we would like to highlight that this approach is somewhat prone to false-positive responses. The major reason behind this is that G proteins are located at regions in membranes close to the receptor whereas b-arrestins are distributed throughout the cytosol. Upon activation of the V2R, barrestins translocate to the receptor at the plasma membrane, which results in enhanced BRET between V2R-coupled G protein subtypes and b-arrestins (see Author response image 1 below of preliminary data). This translocation also results in non-specific BRET signals between b-arrestins and G protein subtypes at the plasma membrane that do not couple to V2R but are located in close proximity to the receptor. As these nonspecific BRET signals do not report on the formation of functional V2R megaplexes (see Author response image 1), we have purposely not used this approach.

      Author response image 1.

      To overcome this technical hurdle in detection of functional megaplexes, we have replaced full-length G proteins by miniG proteins as the latter are located in the cytosol at resting states and only translocate to the membrane area if a receptor adopts an active conformation. This replacement is advantageous since activation of megaplex-forming receptors such as the V2R results in simultaneous translocation of miniG proteins and b-arrestins from the cytosol to the receptor at the plasma membrane, which produces a highly specific proximity signal (see Author response image 2 below of preliminary data). When stimulating the V2R, we only observe increases in proximity between b-arrestin1 and miniG proteins that are activated by the V2R (miniGs and miniGsq) but not the miniG proteins that are not activated by this receptor (miniGsi and miniG12) (see Author response image 2). Therefore, usage of miniG proteins offers a more accurate experimental approach to detect functional megaplexes as compared to the usage of full-length G proteins.

      Author response image 2.

      2.2 The interpretation of complementation (NanoLuc) or proximity (BRET) as evidence of signaling is not appropriate, especially when overexpression system and engineered constructs are being used.

      We thank the reviewer for raising this concern. We have previously demonstrated global Gas activation and Gas signaling in form of cAMP stimulated by internalized V2R (Thomsen ARB et al. 2016, Cell 166(4):907-19). As mentioned previously, in the current updated manuscript we have now included experiments to document downstream signaling events in response to Gaq/11 activation. These experiments include measurement of production of DAG at the plasma membrane (Fig. 1C) and early endosomes (Fig. 1D), as well as phosphorylation/activation of PKC (Fig. 1E). Pre-incubation with the selective Gaq/11 inhibitor YM-254890, abrogated all these downstream signals and confirms that the V2R stimulates Gaq/11 protein signaling at both the plasma membrane and endosomes (Fig. 1C-E).

      2.3 After the original work from the same corresponding authors on megaplex formation, the major challenge in the field is to demonstrate the existence and relevance of megaplex formation at endogenous levels of components, and the current study focuses solely on showing the proximity of Gaq and b-arrestins.

      We completely agree with the reviewer that it will be important to demonstrate functionality endogenous megaplexes and we are currently working on this in other studies using different receptor systems. However, doing this is not trivial and we will have to overcome major technical barriers that we feel is somewhat out of the scope of the current study. The goal of our V2R study is to demonstrate that V2R megaplexes form with Gaq/11 resulting to Gaq/11 activation at endosomes, and that endosomal G protein activation by the V2R can occur independently of b-arrestin, which we in our humble opinion accomplish.

      2.4 The study lacks a coherent approach, and the assays are often shifted back and forth between the two b-arrestin isoforms (1 and 2), for example, confocal vs. complementation etc.

      We understand the reviewer’s concern. However, as opposed to the β2-adrenergic receptor that binds βarrestin2 with higher affinity than β-arrestin1, V2R has a strong affinity for both β-arrestin1 and β-arrestin2 (Oakley et al. 2000, JBC 275(22):17201-10). The V2R’s almost identical affinity for β-arrestin1 and βarrestin2 is well illustrated in Fig. 3B. Thus, although different β-arrestin isoforms were used in some experiments, it is very unlikely that the overall results and conclusions from this study will change by adding extra experiments to ensure that both β-arrestin isoforms are used in every experiment.

      2.5 In every assay, only the G proteins and b-arrestins are monitored without a direct assessment of the presence of receptor, and absent that data, it is difficult to justify calling these entities megaplexes.

      Mini G proteins and b-arrestin come into close proximity upon agonist stimulation of the V2R. Using confocal microscopy, we observed this co-recruitment of miniGs/miniGsq and b-arrestin in response to prolonged V2R stimulation at endosomes specifically (Fig. 3D-F). In absence of GPCR stimulation, both miniG and b-arrestin would be homogenously distributed throughout the cytosol, and thus, the only reason to why both proteins have been recruited to endosomes in response to AVP challenge is that they are recruited to internalized and active V2R. This point was obviously not adequately described in the original manuscript, and thus, we have now clarified this further in the updated manuscript at the 8th sentence of the last paragraph of the "The V2R recruits Gas/Gaq and barrs simultaneously" section.

      REVIEWER #3:

      The manuscript by Daly et al. examines endosomal signaling of the vasopressin type 2 receptors using engineered mini G protein (mG proteins) and a number of novel techniques to address if sustained G protein signaling in the endosomal compartment is enhanced by b-arrestin. Employing these interesting techniques they have how V2R could activates Gas and Gaq in the endosomal compartments and how this modulation could occur in arrestin-dependent and -independent manner. Although the phenomenon of endosomal signaling is complex to address the authors have tried their best to examine these using a number of well controlled set of experiments. Though this is an interesting and well carried out study of endosomal signaling of G proteins, my concerns are:

      3.1 The study is done in overexpressed HEK 293 cells with these engineered constructs making me wonder if the kinetics would be the same in primary cells?

      The reviewer raises an interesting and valid point. It is possible that in the context of primary cells the kinetic would differ slightly and it would definitely be interesting to address this in a subsequent study. However, despite being an interesting aspect of our study, the kinetic itself is not our major take home message, but rather the subcellular localization of the G protein activation and the role of β-arrestin in these events. We have now highlighted this aspect in our updated manuscript (1st paragraph of the discussion) and we thank the reviewer for addressing this.

      3.2 The use of the phrase "G protein activation independent of b-arrestins to a minor degree" would make me question its physiological relevance. The authors should discuss the relevance of their findings in physiological or pathological context.

      We are glad that the reviewer focuses on this point, and we would like to highlight that other GPCRs including the glucagon-like peptide-1 receptor (GLP1R) internalizes in a β-arrestin-independent manner (Claing A et al. 2000 PNAS 97(3):1119-24), while signaling through Gas from endosomes. In the case of the GLP1R, this endosomal Gas signaling promotes glucose-stimulated insulin secretion in pancreatic βcells (Kuna RS et al. 2013 Am J Physiol Endocrinol Metab 305:E161-70). Consequently, β-arrestinindependent endosomal G protein signaling appears to have some physiological relevance. Similarly, in a very recent pre-print from the von Zastrow group (Blythe EE and von Zastrow M 2023 BioRxiv https://doi.org/10.1101/2022.09.07.506997), it was reported that endogenously-expressed vasoactive intestinal peptide receptor 1 (VIPR1), which regulates gastro-intestinal functions, promotes robust G protein signaling from endosomes in a completely β-arrestin-independent fashion. This again suggest that endogenously expressed GPCRs can internalize and activate G proteins from endosomes independently from β-arrestin to produce physiological responses. We have now discussed about these studies in the 6th paragraph of the discussion.

      3.3 The confocal colocalization studies shown in Figure 2 and their conclusion "suggesting a certain level of endosomal Gas/Gaq signaling despite the absence of barr2" seems rather inconclusive.

      As opposed to V2R a receptor that retains β-arrestin in endosomes upon internalization, β-arrestin quickly dissociates from V2b2AR after internalization due to the low affinity of the carboxy-terminal of β2AR for βarrestin. In the previous Fig. 2 (now Fig. 3), after 45 minutes of AVP stimulation, no β-arrestin is visible at endosomes in cells expressing V2b2AR as β-arrestin has already dissociated from the receptor and translocated back to the cytosol. However, clear green clusters of mGs and mGsq are still visible at endosomes indicating the presence of active receptor interacting with Gas or Gaq despite the fact that βarrestin is back to the cytosol. We quantified the percentage of the green mGs or mGsq clusters that do not colocalize with β-arrestin and have added this information to the updated version of the manuscript (Fig. 3G). In V2R-expressing cells, almost all active receptors that interact with Gas or Gaq/11 also associate with β-arrestin (Fig. 3G). In contrast, in V2b2AR-expressing cells, approximately 75% of the active receptors do not interact with β-arrestin (Fig. 3G). This suggests that β-arrestin binding to V2R is not an absolute requirement for endosomal Gas and Gaq activation by V2R. This point was obviously not addressed adequately in the original manuscript, and thus, we have now elaborated further on this in the updated version in the last paragraph of the "The V2R recruits Gas/Gaq and βarrs simultaneously" section.

      3.4 Though a novel observation it is not clear to me how V2R would internalize after activation without arrestin. Is it some sort of generalized microcytosis occurring in these overexpressed cells? Should discuss.

      This is certainly a very interesting observation and something other research laboratories also have seen recently – in particular, in context to endosomal G protein signaling (Blythe EE and von Zastrow M 2023 BioRxiv https://doi.org/10.1101/2022.09.07.506997). The main and best characterized pathway for GPCR internalization is clathrin-dependent where receptors most commonly are associated with β-arrestins. However, for some GPCRs, the β-arrestin association is not required for clathrin-mediated internalization. One example is the apelin receptor that can internalize via clathrin-coated pits, but in β-arrestinindependent manner (Pope GR et al. 2016 Moll Cell Endocrinol. 437:108-19). Alternatively, GPCRs can also internalize independently of any clathrin and β-arrestin associations via caveolae or fast endophilinmediated endocytosis (FEME). We have now expanded our discussion of possible mechanisms for βarrestin-independent receptor internalization in the updated manuscript in the 6th paragraph of the discussion, and we thank the reviewer for the suggestion.

      3.5 Is use of mini G protein a good representation? The authors should justify.

      Excellent point and something we have comprehensively discussed in our response to reviewer 1 and 2 (points 1.2 and 2.1).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Bendzunas, Byrne et al. explore two highly topical areas of protein kinase regulation in this manuscript. Firstly, the idea that Cys modification could regulate kinase activity. The senior authors have published some standout papers exploring this idea of late, and the current work adds to the picture of how active site Cys might have been favoured in evolution to serve critical regulatory functions. Second, BRSK1/2 are understudied kinases listed as part of the "dark kinome" so any knowledge of their underlying regulation is of critical importance to advancing the field.

      Strengths:

      In this study, the author pinpoints highly-conserved, but BRSK-specific, Cys residues as key players in kinase regulation. There is a delicate balance between equating what happens in vitro with recombinant proteins relative to what the functional consequence of Cys mutation might be in cells or organisms, but the authors are very clear with the caveats relating to these connections in their descriptions and discussion. Accordingly, by extension, they present a very sound biochemical case for how Cys modification might influence kinase activity in cellular environs.

      Weaknesses:

      I have very few critiques for this study, and my major points are barely major.

      Major points

      (1) My sense is that the influence of Cys mutation on dimerization is going to be one of the first queries readers consider as they read the work. It would be, in my opinion, useful to bring forward the dimer section in the manuscript.

      We agree that the influence of Cys on BRSK dimerization is a topic of significant interest. Our primary focus was to explore oxidative regulation of the understudied BRSK kinases as they contain a conserved T-loop Cys, and we have previously demonstrated that equivalent residues at this position in related kinases were critical drivers of oxidative modulation of catalytic activity. We have demonstrated here that BRSK1 & 2 are similarly regulated by redox and this is due to oxidative modification of the T+2 Cys, in addition to Cys residues that are conserved amongst related ARKs as well as BRSK-specific Cys. Although we also provide evidence for limited redox-sensitive higher order BRSK species (dimers) in our in vitro analysis, these represent a small population of the total BRSK protein pool (this was validated by SEC-MALs analysis). As such, we do not have strong evidence to suggest that these limited dimers significantly contribute to the pronounced inhibition of BRSK1 & 2 in the presence of oxidizing agents, and instead believe that other biochemical mechanisms likely drive this response. This may result from oxidized Cys altering the conformation of the activation loop. Indeed, the formation of an intramolecular disulfide within the T-loop of BRSK1 & 2, which we detected by MS, is one such regulatory modification. It is noteworthy, that intramolecular disulfide bonds within the T-loop of AKT and MELK have already been shown to induce an inactive state in the kinase, and we posit a similar mechanism for BRSKs.

      While we recognize the potential importance of dimerization in this context, our current data from in vitro and cell-based assays do not provide substantial evidence to assert dimerization as a primary regulatory mechanism. Hence, we maintained a more conservative stance in our manuscript, discussing dimerization in later sections where it naturally followed from the initial findings. That being said, we acknowledge the potential significance of dimerization in the regulation of the BRSK T-loop cysteine. We believe this aspect merits further investigation and could indeed be the focus of a follow-up study.

      (2) Relatedly, the effect of Cys mutation on the dimerization properties of preparations of recombinant protein is not very clear as it stands. Some SEC traces would be helpful; these could be included in the supplement.

      In order to determine whether our recombinant BRSK proteins (and T-loop mutants) existed as monomers or dimers, we performed SDS-PAGE under reducing and non-reducing conditions (Fig 7). This unambiguously revealed that a monomer was the prominent species, with little evidence of dimers under these experimental conditions (even in the presence of oxidizing agents). Although we cannot discount a regulatory role for BRSK dimers in other physiological contexts, we could not produce sufficient evidence to suggest that multimerization played a substantial role in modifying BRSK kinase activity in our assays. We note that our in vitro analysis was performed using truncated forms of the protein, and as such it is entirely possible that regions of the protein that flank the kinase domain may serve additional regulatory functions that may include higher order BRSK conformations. In this regard, although we have not included SEC traces of our recombinant proteins, we have included analytical SEC-MALS of the truncated proteins (Supplementary Figure 6) which we believe to be more informative. We have also now included additional SEC-MALS data for BRSK2 C176A and C183A (Supplementary Figure 6d and e), which supports our findings in Fig 7, demonstrating the presence of limited dimer species under non-reducing conditions.

      (3) Is there any knowledge of Cys mutants in disease for BRSK1/2?

      We have conducted an extensive search across several databases: COSMIC (Catalogue of Somatic Mutations in Cancer), ProKinO (Protein Kinase Ontology), and TCGA (The Cancer Genome Atlas). These databases are well-regarded for their comprehensive and detailed records of mutations related to cancer and protein kinases. Our analysis using the COSMIC and TCGA databases focused on identifying any reported instances of Cys mutations in BRSK1/2 that are implicated in cancer. Additionally, we utilized the ProKinO database to explore the broader landscape of protein kinase mutations, including any potential disease associations of Cys mutations in BRSK1/2. However, we found no evidence to indicate the presence of Cys mutations in BRSK1/2 that are associated with cancer or disease. This lack of association in the current literature and database records suggests that, as of our latest search, Cys mutations in BRSK1/2 have not been reported as significant contributors to pathogenesis.

      (4) In bar charts, I'd recommend plotting data points. Plus, it is crucial to report in the legend what error measure is shown, the number of replicates, and the statistical method used in any tests.

      We have added the data points to the bar charts and included statistical methods in figure legends.

      (5) In Figure 5b, the GAPDH loading control doesn't look quite right.

      The blot has been repeated and updated.

      (6) In Figure 7 there is no indication of what mode of detection was used for these gels.

      We have updated the figure legend to confirm that the detection method was western blot.

      (7) Recombinant proteins - more detail should be included on how they were prepared. Was there a reducing agent present during purification? Where did they elute off SEC... consistent with a monomer of higher order species?

      We have added ‘produced in the absence of reducing agents unless stated otherwise’ in the methods section to improve clarity. Although we have not added additional sentences to describe the elution profile of the BRSK proteins by SEC during purification, we believe that the inclusion of analytical SEC-MALS data is sufficient evidence that the proteins are largely monomeric under non-reducing conditions.

      Reviewer #2 (Public Review):

      Summary:

      In this study by Bendzunas et al, the authors show that the formation of intra-molecular disulfide bonds involving a pair of Cys residues near the catalytic HRD motif and a highly conserved T-Loop Cys with a BRSK-specific Cys at an unusual CPE motif at the end of the activation segment function as repressive regulatory mechanisms in BSK1 and 2. They observed that mutation of the CPE-Cys only, contrary to the double mutation of the pair, increases catalytic activity in vitro and drives phosphorylation of the BRSK substrate Tau in cells. Molecular modeling and molecular dynamics simulations indicate that oxidation of the CPE-Cys destabilizes a conserved salt bridge network critical for allosteric activation. The occurrence of spatially proximal Cys amino acids in diverse Ser/Thr protein kinase families suggests that disulfide-mediated control of catalytic activity may be a prevalent mechanism for regulation within the broader AMPK family. Understanding the molecular mechanisms underlying kinase regulation by redox-active Cys residues is fundamental as it appears to be widespread in signaling proteins and provides new opportunities to develop specific covalent compounds for the targeted modulation of protein kinases.

      The authors demonstrate that intramolecular cysteine disulfide bonding between conserved cysteines can function as a repressing mechanism as indicated by the effect of DTT and the consequent increase in activity by BSK-1 and -2 (WT). The cause-effect relationship of why mutation of the CPE-Cys only increases catalytic activity in vitro and drives phosphorylation of the BRSK substrate Tau in cells is not clear to me. The explanation given by the authors based on molecular modeling and molecular dynamics simulations is that oxidation of the CPE-Cys (that will favor disulfide bonding) destabilizes a conserved salt bridge network critical for allosteric activation. However, no functional evidence of the impact of the salt-bridge network is provided. If you mutated the two main Cys-pairs (aE-CHRD and A-loop T+2-CPE) you lose the effect of DTT, as the disulfide pairs cannot be formed, hence no repression mechanisms take place, however when looking at individual residues I do not understand why mutating the CPE only results in the opposite effect unless it is independent of its connection with the T+2residue on the A-loop.

      Strengths:

      This is an important and interesting study providing new knowledge in the protein kinase field with important therapeutic implications for the rationale design and development of next-generation inhibitors.

      Weaknesses:

      There are several issues with the figures that this reviewer considers should be addressed.

      Reviewer #1 (Recommendations for The Authors):

      Major points

      Page 26 - the discussion could be more concise. There's an element of recapping the results, which should be avoided.

      Regarding the conciseness of the discussion section, we have thoroughly revised it to ensure a more succinct presentation, deliberately avoiding the recapitulation of results. The revised discussion now focuses on interpreting the findings and their implications, steering clear of redundancy with the results section.

      Figure 1b seems to be mislabeled/annotated. I recommend checking whether the figure legends match more broadly. Figure 1 appears to be incorrectly cited throughout the results.

      Thank you for pointing out the discrepancies in the labeling and citation of Figure 1b. We have carefully reviewed and corrected these issues to ensure that all figure labels, legends, and citations accurately reflect the corresponding data and illustrations. We appreciate your attention to detail and the opportunity to improve the clarity and accuracy of our presentation.

      Figure 6 - please include a color-coding key in the figure. Further support for these simulations could be provided by supplementary movies or plots of the interaction. Figure 4 colour palette should be adjusted for the spheres in the Richardson diagrams to have greater distinction.

      As suggested, we have amended the colour palette in Figure 4 to improve conformity throughout the figure.

      Minor points

      Figure 2 - it'd be helpful to know what the percentage coverage of peptides is.

      We have updated the figure legend to include peptide coverage for both proteins

      Some typos - Supp 2 legend "Domians".

      Fixed

      Figure 6 legend - analyzed by needs a space;

      Fixed

      Fig 8 legend schematic misspelled.

      Fixed

      Broadly, if you Google T-loop you get a pot pourri of enzyme answers. Why not just use Activation loop?

      The choice of "T-loop" over "Activation loop" in our manuscript was made to maintain consistency with other literature in the field, and in particular our previous paper “Aurora A regulation by reversible cysteine oxidation reveals evolutionarily conserved redox control of Ser/Thr protein kinase activity” where we refer to the activation loop cysteine as T-loop + 2. We acknowledge the varied enzyme contexts in which "T-loop" is used and agree on the importance of clarity. To address this, we made an explicit note in the manuscript that the "T-loop" is also referred to as the "Activation loop", ensuring readers are aware of the interchangeable use of these terms. Additionally, this nomenclature facilitates a more straightforward designation of cysteine residues within the loop (T+2 Cysteine). We believe this approach balances adherence to established conventions with the need for clarity and precision in our descriptions.

      Methods - what is LR cloning. Requires some definition. Some manufacturer detail is missing in methods, and referring to prior work is not sufficient to empower readers to replicate.

      We agree, and have added the following to the methods section:

      “BRSK1 and 2 were sub-cloned into pDest vectors (to encode the expression of N-terminal Flag or HA tagged proteins) using the Gateway LR Clonase II system (Invitrogen) according to the manufacturer’s instructions. pENtR BRSK1/2 clones were obtained in the form of Gateway-compatible donor vectors from Dr Ben Major (Washington University in St. Louis). The Gateway LR Clonase II enzyme mix mediates recombination between the attL sites on the Entry clone and the attR sites on the destination vector. All cloned BRSK1/2 genes were fully sequenced prior to use.”

      Page 7 - optimal settings should be reported. How were pTau signals quantified and normalised?

      We have added the following to the methods section:

      “Two-color Western blot detection method employing infrared fluorescence was used to measure the ratio of Tau phospho serine 262 to total Tau. Total GFP Tau was detected using a mouse anti GFP antibody and visualized at 680 nm using goat anti mouse IRdye 680 while phospho-tau was detected using a Tau phospho serine 262 specific antibody and visualized at 800 nm using goat anti rabbit IRdye 800. Imaging was performed using a Licor Odessey Clx with scan control settings set to 169 μm, medium quality, and 0.0 mm distance. Quantification was performed using Licor image studio on the raw image files. Total Tau to phospho Tau ratio was determined by measuring the ratio of the fluorescence intensities measured at 800 nm (pTau) to those at 680 nm (total tau).”

      In the Figure 6g-j legend, the salt bridge is incorrectly annotated as E185-R248 rather than 258.

      Fixed

      Lines 393-395 provides a repeat statement on BRSKs phosphorylating Tau (from 388-389).

      We have removed the repetition and reworded the opening lines of the results section to improve the overall flow of the manuscript.

      Supp. Figure 1 is difficult to view - would it be possible to increase the size of the phylogenetic analysis?

      We thank the reviewer for this observation. We have rotated (90°) and expanded the figure so that it can be more clearly viewed

      Supp. Figure 2 - BRSK1/2 incorrectly spelled.

      Fixed

      Please check the alignment of labels in Supp. Figure 3e.

      Fixed

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figure 1, current panel b is not mentioned/described in the figure legend and as a consequence, the rest of the panels in the legends do not fit the content of the figure.

      Reviewer 1 also noted this error, and we have amended the manuscript accordingly.

      What is the rationale for using the HEK293T cells as the main experimental/cellular system? Are there cell lines that express both proteins endogenously so that the authors can recapitulate the results obtained from ectopic overexpression?

      The selection of HEK-293T cells was driven by their well-established utility in overexpression studies, which make them ideal for the investigation of protein interactions and redox regulation. This cell line's robust transfection efficiency and well-characterized biology provide a reliable platform for dissecting the molecular mechanisms underlying the redox regulation of proteins. Furthermore, the use of HEK-293T cells aligns with the broader scientific practice, serving as a common ground for comparability with existing literature in the field of BRSK1/2 signaling, protein regulation and interaction studies.

      The application of HEK-293T cells as a model system in our study serves as a foundational step towards eventually elucidating the functions of BRSK1/2 in neuronal cells, where these kinases are predominantly expressed and play critical roles. Given the fact that BRSKs are classed as ‘understudied’ kinases, the choice of a HEK-293T co-overexpression system allowed us to analyze the direct effects of BRSK kinase activity (using phosphorylation of Tau as a readout) in a cellular context and in more controlled manner. This approach not only aids in the establishment of a baseline understanding of the redox regulation of BRSK1/2, but also sets the stage for subsequent investigations in more physiologically relevant neuronal models

      In current panel d, could the authors recapitulate the same experimental conditions as in current panel c?

      Figure 1 panel c shows that both BRSK1 and 2 are reversibly inhibited by oxidizing agents such as H2O2, whilst panels d and e show the concentration dependent activation and inhibition of the BRSKs with increasing concentrations of DTT and H2O2 respectively. The experimental conditions were identical, other than changing amounts of reducing and oxidizing agents, and used the same peptide coupled assays. Data for all experiments were originally collected in ‘real time’ as depicted in Fig 1c (increase in substrate phosphorylation over time). However, to aid interpretation of the data, we elected to present the latter two panels as dose response curves by calculating the change in the rate of enzyme activity (shown as pmol phosphate incorporated into the peptide substrate per min) for each condition. To aid the reader, we now include an additional supplementary figure (new supplementary figure 2) depicting BRSK1 and 2 dependent phosphorylation of the peptide substrate in the presence of different concentrations of DTT and H2O2 in a real time (kinetic) assay. The new data shown is a subset of the unprocessed data that was used to calculate the rates of BRSK activity in Fig 1d & e.

      Why did the authors use full-length constructs in these experiments and did not in e.g. Figure 2 where they used KD constructs instead?

      In the initial experiments, illustrated in Figure 1, we employed full-length protein constructs to establish a proof of concept, demonstrating the overall behavior and interactions of the proteins in their full-length form. This confirmed that BRSK1 & 2, which both contain a conserved T + 2 Cys residue that is frequently prognostic for redox sensitivity in related kinases, displayed a near-obligate requirement for reducing agents to promote kinase activity.  

      Subsequently, in Figure 2, our focus shifted towards delineating the specific regions within the proteins that are critical for redox regulation. By using constructs that encompass only the kinase domain, we aimed to demonstrate that the redox-sensitive regulation of these proteins is predominantly mediated by specific cysteine residues located within the kinase domain itself. This strategic use of the kinase domain of the protein allowed for a more targeted investigation. Furthermore, in our hands these truncated forms of the protein were more stable at higher concentrations, enabling more detailed characterization of the proteins by DSF and SEC-MALS. We predict that the flanking disordered regions of the full-length protein (as predicted by AlphaFold) contribute to this effect.

      (2) In Figure 2, Did the authors try to do LC/MS-MS in the same experimental conditions as in Figure 1 (e.g. buffer minus/plus DTT, H2O2, H2O2 + DTT)?

      We would like to clarify that the mass spectrometry experiments were conducted exclusively on proteins purified under native (non-reducing) conditions. We did not extend the LC/MS-MS analyses to include proteins treated with various buffer conditions such as minus/plus DTT, H2O2, or H2O2 + DTT as used in the experiments depicted in Figure 1. Given that we could readily detect disulfides in the absence of oxidizing agents, we did not see the benefit of additional treatment conditions as peroxide treatment of protein samples can frequently complicate interpretation of MS data. However, it should be noted that prior to MS analysis, tryptic peptides were subjected to a 50:50 split, with one half alkylated in the presence of DTT (as described in the methods section) to eliminate disulfides and other transiently oxidized Cys forms. Comparative analysis between reduced and non-reduced tryptic peptides improved our confidence when assigning disulfide bonds (which were eliminated in identical peptides in the presence of DTT).

      On panel b, why did the authors show alphafold predictions and not empiric structural information (e.g. X-ray, EM,..)?

      The AlphaFold models were primarily utilized to map the general locations of redox-sensitive cysteine pairs within the proteins of interest. Although we have access to the crystal structure of mouse BRSK2, they do not fully capture the active conformation seen in the Alphafold model of the human version. The use of AlphaFold models for human proteins in this study aids in consistently tracking residue numbering across the manuscript, offering a useful framework for understanding the spatial arrangement of these critical cysteine pairs in their potentially active-like states. This approach facilitates our analysis and discussion by providing a reference for the structural context of these residues in the human proteins.

      What was the rationale for using the KD construct and not the FL as in Figure 1?

      The rationale to use the kinase domain was primarily based on the significantly lower confidence in the structural predictions for regions outside the kinase domain (KD). Our experimental focus was to investigate the role of conserved cysteine residues within the kinase domain, which are critical for the protein's function and regulation. This targeted approach allowed us to concentrate our analyses on the most functionally relevant and structurally defined portion of the protein, thereby enhancing the precision and relevance of our findings. As is frequently the case, truncated forms of the protein, consisting only of the kinase domain, are much more stable than their full length counterparts and are therefore more amenable to in vitro biochemical analysis. In our hands this was true for both BRSK1 and 2, and as such much of the data collected here was generated using kinase-domain (KD) constructs. Simulations using the KD structures are therefore much more representative of our original experimental setup.

      The BSK1 KD construct appears to be rather inactive and not responsive to DTT treatment. Could the authors comment on the differences observed with the FL construct of Figure 1

      It is important to note that BRSK1, in general, exhibits lower intrinsic activity compared to BRSK2. This reduced activity could be attributed to a range of factors, including the need for activation by upstream kinases such as LKB1, as well as potential post-translational modifications (PTMs) that may be absent in the bacterially expressed KD construct. The full-length forms of the protein were purified from Sf21 cells, and as such may have additional modifications that are lacking in the bacterially derived KD counterparts. We also cannot discount additional regulatory roles of the regions that flank the KD, and these may contribute in part to the modest discrepancy observed between constructs.  Despite these differences, it is crucial to emphasize that both the KD and FL constructs of BRSK1 are regulated by DTT, indicating a conserved redox-dependent activation for both of the related BRSK proteins.  

      (3) In Figure 4, on panel A wouldn´t the authors expect that mutating on the pairs e.g. C198A in BSK1 would have the same effect as mutating the C191 from the T+2 site? Did they try mutating individual sites of the aE/CHRD pair? The same will apply to BSK2

      We appreciate the insightful comment. It's important to clarify that the redox regulation of these proteins is influenced not solely by the formation of disulfide bonds but also by the oxidation state of individual cysteine residues, particularly the T+2 Cys. This nuanced mechanism of regulation allows for a diverse range of functional outcomes based on the specific cysteine involved and its state of oxidation. This aspect forms a key finding of our paper, highlighting the complexity of redox regulation beyond mere disulfide bond formation. For example, AURA kinase activity is regulated by oxidation of a single T+2 Cys (Cys290, equivalent to Cys191 and Cys176 of BRSK1 and 2 respectively), but this regulation can be supplemented through artificial incorporation of a secondary Cys at the DFG+2 position (Byrne et al., 2020). This targeted genetic modification or AURA mirrors equivalent regulatory disulfide-forming Cys pairs that naturally occur in kinases such as AKT and MELK, and which provide an extra layer of regulatory fine tuning (and a possible protective role to prevent deleterious over oxidation) to the T+2 Cys. We surmise that the CPE Cys is also an accessory regulatory element to the T+2 Cys in BRSK1 +2, which is the dominant driver of BRSK redox sensitivity (as judged by the fact that CPE Cys mutants are still potently regulated by redox [Fig 4]), by locking it in an inactive disulfide configuration.

      In our preliminary analysis of BRSK1, we observed that mutations of individual sites within the aE/CHRD pair was similarly detrimental to kinase activity as a tandem mutation (see reviewer figure 1). As discussed in the manuscript, we think that these Cys may serve important structural regulatory functions and opted to focus on co-mutations of the aE/CHRD pair for the remainder of our investigation.

      Author response image 1.

      In vitro kinase assays showing rates of in vitro peptide phosphorylation by WT and Cys-to-Ala (aE/CHRD residues) variants of BRSK1 after activation by LKB1.

      In panels C and D, the same experimental conditions should have been measured as in A and B.

      Panels A and B were designed to demonstrate the enzymatic activity and the response to DTT treatment to establish the baseline redox regulation of the kinase and a panel of Cys-to-Ala mutant variants. In contrast, panels C and D were specifically focused on rescue experiments with mutants that showed a significant effect under the conditions tested in A and B. These panels were intended to further explore the role of redox regulation in modulating the activity of these mutants, particularly those that retained some level of activity or exhibited a notable response to redox changes.

      The rationale for this experimental design was to prioritize the investigation of mutants, such as those at the T+2 and CPE cysteine sites, which provided the most insight into the redox-dependent modulation of kinase activity. Other mutants, which resulted in inactivation, were deprioritized in this context as they offered limited additional information regarding the redox regulation mechanism. This focused approach allowed us to delve deeper into understanding how specific cysteine residues contribute to the redox-sensitive control of kinase function, aligning with the overall objective of elucidating the nuanced roles of redox regulation in kinase activity.

      (4) In figure 5: Why did the authors use reduced Glutathione instead of DTT? The authors should have recapitulated the same experimental conditions as in Figure 4 and not focused only on the T+2 or the CPE single mutants but using the double and the aE/CHRD mutants as well, as internal controls and validation of the enzymatic assays using the modified peptide

      Regarding the use of reduced glutathione (GSH) instead of DTT in Figure 5, we chose GSH for its well characterized biological relevance as an antioxidant in cellular responses to oxidative stress. Furthermore, while DTT has been widely used in experimental setups, it is also potentially cytotoxic at high concentrations.

      Addressing the point on experimental consistency with Figure 4, we appreciate the suggestion and indeed had already conducted such experiments (Previously Supp Fig 3, now changed to current Supp Fig 4). These experiments include analyses of BRSK mutant activity in a HEK-293T model. However, we chose not to focus on inactivating mutants (such as the aE/CHRD mutants which had depleted expression levels possibly as a consequence of compromised structural integrity) or pursue the generation of double mutant CMV plasmids, as these were deemed unlikely to add significant insights into the core narrative of our study. Our focus remained on the mutants that yielded the most informative results regarding the redox regulation mechanisms in the in vitro setting, ensuring a clear and impactful presentation of our findings.

      A time course evaluation of the reducing or oxidizing reagents should have been performed. Would we expect that in WT samples, and in the presence of GSH, and also in the case of the CPE mutant, an increment in the levels of Tau phosphorylation as a readout of BSK1-2 activity?

      We acknowledge the importance of such analyses in understanding the dynamic nature of redox regulation on kinase activity and have included a time course (Supp Fig 2 e-g). These results confirm a depletion of Tau phosphorylation over time in response to peroxide generated by the enzyme glucose oxidase.

      (5) In Figure 6, did the authors look at the functional impact of the residues with which interact the T+2 and the CPE motifs e.g. T174 and the E185-R258 tether?

      Our primary focus was on the salt bridges, as this is a key regulatory structural feature that is conserved across many kinases. Regarding the additional interactions mentioned, we have thoroughly evaluated their roles and dynamics through molecular dynamics (MD) simulations but did not find any results of significant relevance to warrant inclusion.

      (6) In Figure 7: Did the author look at the oligomerization state of the BSK1-2 multimers under non-reducing conditions? Were they also observed in the case of the FL constructs? What was the stoichiometry?

      Our current work indicates that the kinase domain of BRSK1-2 primarily exists in a monomeric state, with some evidence of dimerization or multimer formation under specific conditions. Our SEC-MALS (Supp Fig 6) and SDS-PAGE analysis (Figure 7) clearly demonstrates that monomers are overwhelmingly the dominant species under non-reducing conditions (>90 %). We also conclude that these limited oligomeric species can be removed by inclusion of reducing agents such as DTT (Figure 7), which may suggest a role for a Cys residue(s). Notably, removal of the T+2 Cys was insufficient to prevent multimerization.

      We were unable to obtain reliable SEC-MALS data for the full-length forms of the protein, likely due to the presence of disordered regions that flank the kinase domain which results in a highly heterodispersed and unstable preparation (at the concentrations required for SEC-MALS). Although we are therefore unable to comment on the stoichiometry of FL BRSK dimers, we can detect BRSK1 and 2 hetero- and homo-complexes in HEK-293T cells by IP, which supports the existence of limited BRSK1 & 2 dimers (Supp Fig 6a). However, we were unable to detect intermolecular disulfide bonds by MS, although this does not necessarily preclude their existence. The physiological role of BRSK multimerization (if any) and establishing specifically which Cys residues drive this phenomenon is of significant interest to our future investigations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This manuscript represents a fundamental contribution demonstrating that fentanyl-induced respiratory depression can be reversed with a peripherally-restricted mu opioid receptor antagonist. The paper reports compelling and rigorous physiological, pharmacokinetic, and behavioral evidence supporting this major claim, and furthers mechanistic understanding of how peripheral opioid receptors contribute to respiratory depression. These findings reshape our understanding of opioid-related effects on respiration and have significant therapeutic implications given that medications currently used to reverse opioid overdose (such as naloxone) produce severe aversive and withdrawal effects via actions within the central nervous system.

      We thank the reviewers for their insightful comments and critiques, which we have incorporated into the manuscript. We believe these revisions have significantly improved the manuscript. Additionally, following discussions among the authors, we have revised the color scheme across all figures. For example, the color of the symbols in Figure 1B-D now match the bars in Figure 1E-J, rather than the symbols. We feel that this change improves the clarity and visual consistency of the figures, making it easier to interpret the data across figures.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper shows that the synthetic opioid fentanyl induces respiratory depression in rodents. This effect is revised by the opioid receptor antagonist naloxone, as expected. Unexpectedly, the peripherally restricted opioid receptor antagonist naloxone methiodide also blocks fentanyl-induced respiratory depression.

      Strengths:

      The paper reports compelling physiology data supporting the induction of respiratory distress in fentanyl-treated animals. Evidence suggesting that naloxone methiodide reverses this respiratory depression is compelling. This is further supported by pharmacokinetic data suggesting that naloxone methiodide does not penetrate into the brain, nor is it metabolized into brain-penetrant naloxone.

      Weaknesses:

      A weakness of the study is the fact that the functional significance of opioid-induced changes in neural activity in the nTS (as measured by cFos and GcAMP/photometry) is not established. Does the nTS regulate fentanyl-induced respiratory depression, and are changes in nTS activity induced by naloxone and naloxone methiodide relevant to their ability to reverse respiratory depression?

      Reviewer #2 (Public review):

      Summary:

      In this article, Ruyle and colleagues assessed the contribution of central and peripheral mu opioid receptors in mediating fentanyl-induced respiratory depression using both naloxone and naloxone methiodide, which does not cross the blood-brain barrier. Both compounds prevented and reversed fentanyl-induced respiratory depression to a comparable degree. The advantage of peripheral treatments is that they circumvent the withdrawal-like effects of naloxone. Moreover, neurons located in the nucleus of the solitary tract are no longer activated by fentanyl when nalaxone methiodide is administered, suggesting that these responses are mediated by peripheral mu opioid receptors. The results delineate a role for peripheral mu opioid receptors in fentanyl-derived respiratory depression and identify a potentially advantageous approach to treating overdoses without inflicting withdrawal on the patients.

      Strengths:

      The strengths of the article include the intravenous delivery of all compounds, which increase the translational value of the article. The authors address both the prevention and reversal of fentanyl-derived respiratory depression. The experimental design and data interpretation are rigorous and appropriate controls were used in the study. Multiple doses were screened in the study and the approaches were multipronged. The authors demonstrated the activation of NTS cells using multiple techniques and the study links peripheral activation of mu opioid receptors to central activation of NTS cells. Both males and females were used in the experiments. The authors demonstrate the peripheral restriction of naloxone methiodide.

      Weaknesses:

      Nalaxone is already broadly used to prevent overdoses from opioids so in some respects, the effects reported here are somewhat incremental.

      The reviewer is correct that naloxone is the standard antidote for reversing opioid-induced respiratory depression. However, its limitations, including the risk of precipitated withdrawal, are well-documented in both preclinical and clinical studies. The likelihood of withdrawal increases when multiple doses of naloxone are administered. Since naloxone-induced withdrawal is centrally mediated, this study aimed to evaluate a peripherally restricted MOR antagonist for its ability to prevent or reverse fentanyl-induced respiratory depression. A key finding is that NLXM reversed OIRD without inducing aversive behavior. This suggests that peripheral antagonists like NLXM may be integrated into intervention strategies that save lives while preventing the adverse behavioral and physiological effects that are observed after treatment with naloxone.

      Reviewer #3 (Public review):

      Summary:

      This manuscript outlines a series of very exciting and game-changing experiments examining the role of peripheral MORs in OIRD. The authors outline experiments that demonstrate a peripherally restricted MOR antagonist (NLX Methiodide) can rescue fentanyl-induced respiratory depression and this effect coincides with a lack of conditioned place aversion. This approach would be a massive boon to the OUD community, as there are a multitude of clinical reports showing that naloxone rescue post fentanyl over-intoxication is more aversive than the potential loss-of-life to the individuals involved. This important study reframes our understanding of successful overdose rescue with potential for reduced aversive withdrawal effects.

      Strengths:

      Strengths include the plethora of approaches arriving at the same general conclusion, the inclusion of both sexes and the result that a peripheral approach for OIRD rescue may side-step severe negative withdrawal symptoms of traditional NLX rescue.

      Weaknesses:

      The major weakness of this version relates to the data analysis assessed sex-specific contributors to the results.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Some points for the authors to consider are:

      (1) In the Abstract, it is unclear why "high potency and lipophilicity" contribute to opioid-induced respiratory depression.

      The higher potency of fentanyl compared to other opioids significantly increases the risk of overdose and subsequent respiratory depression. Its high lipophilicity facilitates rapid absorption and central nervous system penetration, which contributes to the rapid onset of these cardiorespiratory depression. The narrow therapeutic window of fentanyl further emphasizes the critical need for timely intervention when an overdose has occurred, and effective antagonists to reverse respiratory depression and save lives. We have revised the abstract to clarify these points.

      (2) Are the doses of fentanyl used in the study (2, 20, or 50 µg/kg IV) relevant to those achieved by fentanyl-exposed human drug users?

      In these studies, we intravenously administered three doses of fentanyl. The human equivalent doses (HED) of 20ug/kg and 50 ug/kg fentanyl are ~3 ug/kg and ~8 ug/kg, respectively. These doses have previously been shown to induce respiratory depression in humans (Dahan et al.,2005).

      (3) In Figure 1, it appeared that only a small fraction of tyrosine hydroxylase-positive (TH+) neurons expressed cFos in response to fentanyl, and the degree of cFos expression was largely similar across all fentanyl doses tested. Thus, it is unclear whether TH+ neurons play a role in fentanyl-induced respiratory depression, and the value of these data is unclear (see point #6 below also).

      As shown in the mean data, the lowest dose of fentanyl, which was below the threshold for inducing OIRD, activated approximately 50% of tyrosine hydroxylase-positive (TH+) nTS neurons. In contrast, the highest dose of fentanyl resulted in a statistically significant increase, with ~75% of TH+ cells co-expressing Fos-IR.

      We included the assessment of catecholaminergic nTS cells for several reasons. The regions of the nTS evaluated in this study contains high expression of MOR and are the termination points of sensory afferent fibers transmitting cardiorespiratory information to the nTS (Aicher et al., 2000; Furdui et al., 2024). Catecholaminergic cells receive direct excitatory inputs from visceral afferents (Appleyard et al., 2007) and exhibit intensity-dependent increases in Fos-IR in rats exposed to hypoxic air (Kline et al., 2010; King et al., 2012). These neurons are essential for generating appropriate cardiorespiratory responses to hypoxic challenges (Bathina et al., 2013; King et al., 2015). As the reviewer notes, rats exposed to fentanyl exhibit a high degree of Fos-IR in the nTS, including catecholaminergic neurons. Despite the robust fentanyl-induced activation (increased Fos-IR) nTS neurons, yet there appears to be a failure to initiate appropriate chemoreflex-mediated cardiorespiratory responses. Our photometry data further indicate that fentanyl-induced changes in neuronal activity are mediated, in part, by peripheral MOR. Collectively, these findings suggest that fentanyl impacts nTS activity through alterations in peripheral afferent signaling to the nTS, which may contribute to the severity and duration of OIRD.

      (4) It would help with the flow of the paper if the pharmacokinetic data shown in Figure 6 were presented earlier (as part of Figure 2).

      We have moved the biodistribution data earlier in the manuscript, now presenting it as Figure 2. The numbering of all subsequent figures has been adjusted accordingly.

      (5) In Figure 5, there appears to be a large number of GCaMP-expressing neurons located outside the nTS. To what degree can the changes in calcium signaling, attributed to alterations in neural activity in the nTS, be explained by altered activity of neurons located outside the nTS?

      The reviewer is correct that our viral spread extends beyond the boundaries of the nTS, raising the possibility that the responses observed in Figure 5 may be influenced by neural activity of cells outside the nTS. While some viral spread beyond the target region is unavoidable, calcium transients were measured at the tip of the fiber, which was positioned directly within the nTS.

      To address this concern further, we performed Fos immunohistochemistry in a subset of animals that received bilateral GCaMP virus injections into the nTS. Following fentanyl administration (50 µg/kg IV), brains were collected two hours later. As shown in the accompanying image, we observed Fos-IR co-expression with GCaMP exclusively within the nTS boundaries. No Fos-IR was detected outside the nTS, including in GCaMP cells. Taken together, these findings support our conclusion that the data depicted in our photometry figure (now Figure 6) accurately represent fentanyl-induced activity changes in nTS neurons.

      Author response image 1.

      Arrowheads: Fos-negative GCaMP cell; Arrows: Co-labeled Fos/GCaMP cell; Asterisk: Fos+ GCaMP-negative cell

      (6) Currently, the cFos and photometry data are descriptive in nature. Are opioid-induced changes in nTS neural activity relevant to respiratory depression? If so, one might expect DREADD-mediated stimulation of the nTS neural activity (or stimulating nTS activity by some other means) would reverse fentanyl-induced respiratory depression similar to naloxone and methyl-naloxone.

      The reviewer raises an interesting point regarding the relevance of the nTS in the context of OIRD. The nTS is a major site of integration of sensory afferent information and involved in the initiation of reflex responses that facilitate a return to homeostasis. As described above, we characterized the collective response of nTS neurons to intravenous fentanyl using both Fos immunohistochemistry and fiber photometry. Our data indicate that fentanyl-induced changes in nTS activity are strongly mediated by peripheral MOR. While the suggestion to use global chemogenetic activation of nTS neurons to reverse fentanyl-induced respiratory depression is intriguing, results from these experiments may be difficult to interpret due to the extensive heterogeneity of the nTS. However, we are currently conducting similar experiments using a more selective approach that will allow us to isolate and evaluate specific nTS phenotypes to better understand their contributions to OIRD.

      (7) Are peripherally restricted mu opioid receptor (MOR) agonists available? If so, it would strengthen the paper if such compounds could be used to show that stimulation of peripheral MORs is sufficient to induce respiratory distress independent of actions on centrally located MORs.

      Peripherally acting Mu Opioid Receptor Antagonists (PAMORAs) are indeed available and currently being evaluated in our laboratory.

      Reviewer #2 (Recommendations for the authors):

      Consider having the figures/data numbered in the order that they appear in the manuscript. Right now, Figure 6 is mentioned between Figures 1 and 2 (minor).

      Thank you for this suggestion. We have reordered the figures so that the biodistribution figure appears before the MOR antagonist pretreatment and reversal figures.

      Reviewer #3 (Recommendations for the authors):

      This manuscript outlines a series of very exciting and game-changing experiments examining the role of peripheral MORs in OIRD. The authors outline experiments that demonstrate a peripherally restricted MOR antagonist (NLX Methiodide) can rescue fentanyl-induced respiratory depression and this effect coincides with a lack of conditioned place aversion. This approach would be a massive boon to the OUD community, as there are a multitude of clinical reports showing that naloxone rescue post fentanyl over-intoxication is more aversive than the potential loss-of-life to the individuals involved. This important study reframes our understanding of successful overdose rescue with potential for reduced aversive withdrawal effects.

      While this is an exciting and important study, there are a few minor to moderate critiques for the authors to consider. These are below.

      (1) Title: "devoid of aversive effects" - While CPA is a good, cumulative indicator of potential aversive effects, it is not an exhaustive one. Since no other withdrawal measures were included, this is an overstatement.

      The reviewer is correct in noting that our analysis of aversive effects is not exhaustive. Since we only assessed changes in aversive behavior between NLX and NLXM, we believe it is more accurate to modify the title accordingly. We have changed the title from “devoid of aversive effects” to “devoid of aversive behavior” better reflect the scope of the experiments conducted.

      (2) Page 3, top line: MOR (mu opioid receptor) is highly expressed...

      An article should likely be included prior to MOR or make plural and adjust the sentence.

      Thank you for this suggestion. We have reworked this section in the manuscript.

      (3) Figure 6D: this figure is very important for the interpretation of every single figure. It should either be moved to figure 1 or 2 or combined with figure 1 or 2.

      Thank you for this suggestion. The biodistribution figure has been moved to Figure 2.

      (4) Page 5, line 164, Figure 21-D: remove the 1.

      Done.

      (5) Sex differences (or lack thereof):

      Throughout the manuscript, the authors report a lack of sex differences. However, while the data is not powered for the distinction of sex differences, there appears to be a bi-modal distribution of the individual data points that likely correspond to sex across most experiments. For example, in Figure 2E there are both color and clear dots, which this reviewer assumes indicates sex (however, this wasn't easily apparent if it was commented on at all in the paper). If you look at the saline oxygen saturation (nadir) levels (2e), there is wide variability with the red-filled circles, but not the clear ones. This may indicate a bimodal distribution (and may be related to the baseline HR sex differences highlighted). This is also the case in Figure 2L but is perhaps more obvious in the CPA score data (Figure 4d), where it seems the nlx negative CPA effects were likely driven primarily by one sex. While this reviewer does not expect a full powering of experiments for sex differences (and also is very appreciative of the inclusion of both sexes), full raw data with sex indicated included in the supplemental data would greatly aid the field in general and allow for those with a specific interest in this area to build upon this data. Additionally, further discussion regarding the potential role of sex differences in the translational value of these findings is also warranted.

      For all bar graphs, open symbols represent females and filled symbols represent males. This information can be found in the first paragraph of the Materials and Methods section. We have also added this information to each figure for increased visibility. We appreciate the acknowledgement of our inclusion of both sexes. For all experiments, we attempted to balance by sex. Unfortunately, we occasionally had to exclude animals for technical reasons (with clogged catheters being the most common reason for exclusion). This sometimes led to an imbalance in sex in some groups, as the reviewer has noted. In the graph of oxygen saturation nadir values in Fig 2E (now Fig 3E in the revised manuscript, all animals received intravenous fentanyl at a dose of 20 ug/kg. The reviewer is correct that there is greater variability in the males (filled symbols) compared to the females (open symbols) in this graph. However, this variability in the distribution was not observed in Fig 1E or Fig 4E, in which male and female rats received an identical dose of 20 ug/kg. Taking this into account, our overall interpretation of the data is that there is relatively minor sex difference in the responses observed after intravenous fentanyl, and the variability in Fig 3E is primarily due to a lower n compared to Fig 1E.

      All raw data will be uploaded to a data repository.

      (6) Page 7, line 209: Figure 5D should be Figure 6D.

      We have incorporated this change.

      (7) Page 8, line 267: Cure should be Curve.

      We have incorporated this change.

      (8) Discussion: Page10, line322 states that "no detectable NLX ... was found in brain tissue". This is incorrect based on Figure 6.

      The sentence the reviewer highlighted refers to detection of NLX or NLXM in brain tissue from animals that received intravenous NLXM. As demonstrated in the biodistribution figure (now Figure 2 in the manuscript), our data demonstrate that an intravenous injection of NLXM did not result in NLX formation in the brain. We have reworked the sentence for clarity.

      (9) jGCaMP injections: Figure 5B/c shows the distribution of the gcamp across animals. The optic fiber is placed directly over the NTs. However, how are we certain there isn't a nearby nuclei/structure outside the NTS that is contributing to the photometry data presented in D-G?

      See our above comment.  

      (10) Fiber Photometry and Sex: These studies unfortunately may have had only 1 of a sex included in the fiber photometry data. While the inclusion is overall good, the single value for a sex suggests that there are differences, given the clustering of the data. While the anesthesia may be driving this potential sex effect, it is not clear based on the data presented. For reference: https://link.springer.com/article/10.1007/s12975-012-0229-y

      The reviewer is correct that there was an imbalance of sex in this dataset. While we made every attempt to balance for sex across all experiments, we unfortunately had to exclude some animals for technical reasons (clogged catheter, missed injection site, etc). This produced an imbalance in our photometry studies and did not allow us to thoroughly evaluate sex differences in fentanyl-induced changes in neural activity or in the responses to anesthesia. We have expanded on this limitation in the discussion.

      (11) Figure 5 - the bars are not the color indicated by the legend.

      We have corrected this in the figure. Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      While very positive towards our manuscript, this reviewer also points out three suggestions for improvement.

      Overall, there are not many weaknesses. The main one I noticed is with the lipidomic analysis shown in Figs 3C, 7C, S1 and S3. While these data are an essential part of the analysis and provide strong evidence for the conclusions of the study, it is unfortunate that the methods used did not enable the distinction between two 18:1 isomers. These two isomers of 18:1 are important in C. elegans biology, because one is a substrate for FAT-2 (18:1n-9, oleic acid) and the other is not (18:1n-7, cis vaccenic acid). Although rarer in mammals, cisvaccenic acid is the most abundant fatty acid in C. elegans and is likely the most important structural MUFA. The measurement of these two isomers is not essential for the conclusions of the study, but the manuscript should include a comment about the abundance of oleic vs vaccenic acid in C. elegans (authors can find this information, even in the fat-2 mutant, in other publications of C. elegans fatty acid composition). Otherwise, readers who are not familiar with C. elegans might assume the 18:1 that is reported is likely to be mainly oleic acid, as is common in mammals.

      Excellent point. As suggested by the reviewer, we now include a clarification of this in the text: "Consistent with previous publications [10], the levels of 18:1 fatty acids were greatly increased in the fat-2(wa17) mutant. It is important to note that the majority of these 18:1 fatty acids is likely 18:1n7 (vaccenic acid) and not 18:1n9 (OA) [10,23], which is the substrate of FAT-2; the lipid analysis methods used here are not able to distinguish between the two 18:1 species."

      The title could be less specific; it might be confusing to readers to include the allele name in the title.

      We thank the reviewer for the suggestion, and we have now modified the title:

      "Forward Genetics In C. elegans Reveals Genetic Adaptations To Polyunsaturated Fatty Acid Deficiency"

      There are two errors in the pathway depicted in Figure 1A. The16:0-16:1 desaturation can be performed by FAT-5, FAT-6, and FAT-7. The 18:0-18:1 desaturation can only be performed by FAT-6 and FAT-7.

      We thank the reviewer for pointing out this mistake. The pathway in Fig. 1A has been corrected.

      Reviewer #2:

      This reviewer was also very positive towards our manuscript but also pointed out several suggestions for additional experiments or changes to the manuscript.

      Major recommendations

      (1) To conclude that membrane rigidification is not the major cause of defects associated with fat-2 mutations, the authors need to show that fluidity is rescued by their treatments (oleic acid or NP-40). I honestly doubt that it is the case, as oleic acid is already abundant in fat-2 mutants. It is possible that the treatments, which are effective in rescuing fluidity in paqr-2 mutants, do not have the same effects in fat-2 mutants.

      The reviewer raises an important point. In an effort to address this, we have now performed a FRAP study on fat-2(wa17) mutants with/without NP40 as a fluidizing agent (with wild-type and paqr-2 mutants as controls). The new data, now included as Fig. 2J, shows that NP40 did improve the fluidity of the intestinal cell membrane in the fat-2(wa17) mutant, though not to the same degree as in the paqr-2 mutant. This is now cited in the text as follows:

      "However, cultivating the fat-2(wa17) mutant in the presence of the non-ionic detergent NP40, which improves the growth of the paqr-2(tm3410) mutant [17], did not suppress the poor growth phenotype of the fat-2(wa17) mutant even though it did improve membrane fluidity as measured using FRAP (Fig. 2I-J). Similarly, supplementing the fat-2(wa17) mutant with the MUFA oleic acid (OA, 18:1), which also suppresses paqr-2(tm3410) phenotypes [17], did not suppress the poor growth phenotype of the fat-2(wa17) mutant (Fig 2K)."

      (2) It is not validated experimentally that the mutations converge into FTN-2 repression. This can be verified by analyzing mRNA or protein expression of FTN-2 in the egl-9 and hif-1 mutants obtained in the screening.

      Our manuscript does lean on several publications that previously established the HIF-1 pathway in C. elegans. Additionally, we now added a qPCR experiment showing that the newly isolated hif-1(et69) allele indeed suppresses the expression of ftn-2. This was an especially valuable experiment since the hif-1(et69) is proposed to act as a gain-of-function allele that would constitutively suppress ftn-2 expression. This new result is included as Fig. 6C and mentioned in the text:

      "Inhibition of egl-9 promotes HIF-1 activity [41], which we here verified for the egl-9(et60) allele using western blots (Fig 6A). Additionally, we found by qPCR that ftn-2 mRNA levels are as expected reduced by the proposed gain-of-function hif-1(et69) allele (Fig 6C). We conclude that the egl-9 and hif-1 suppressor mutations likely converge on inhibiting ftn-2 and thus act similarly to the ftn-2 loss-of-function alleles."

      (3) In the hif-1(et69) and ftn-2(et68) mutants, the rescues in lipid composition seem to be minor, with eicosapentaenoic acid (EPA) levels remaining low. The ftn-2 mutant data is especially concerning, as it suggests that egl-9 mutants rescue lipid composition via distinct mechanisms not including ftn-2 suppression. I suggest that the authors test the minimal doses of linoleic acid or EPA required to rescue fat-2 mutants and perform lipidomics to test which is the degree of EPA restoration that is needed. If a low level of restoration is sufficient, the hif-1 and ftn-2 mutants might indeed rescue phenotypes via a restoration of EPA levels. Otherwise, other mechanisms have to be considered.

      In line with the above issue, the low level or EPA restoration in hif-1 and ftn-2 mutants raise the possibility that the mutations rescue fat-2 mutants downstream of lipid changes. The reduction in HIF-1 levels in fat-2 mutants also suggest that lipid changes affect HIF-1 expression. Thus, the "impossibility to genetically compensate PUFA deficiency" might be wrong. The above experiment would answer to this point too.

      The reviewer is entirely correct to consider alternative explanations. In the lipidomics in Fig 3, we see that fat-2(wa17) worms on NGM have only ~1.5-2%mol EPA in phosphatidylcholines. When treated with 2 mM LA, the levels of EPA rise to ~10%mol, still below the ~ 25% observed in N2 but perhaps this is sufficient cause for restoring fat-2(wa17) health. Similarly, the hif-1(et69) and ftn-2(et68) mutant alleles elevate EPA levels to 5- 7% in fat-2(wa17). Thus, we have a correlation where a significant increase in EPA, obtained either through LA supplementation or through suppressor mutations (e.g. egl-9 (et60), hif-1(et69) or ftn-2(et68)), is associated with improved growth and health of the fat-2(wa17) mutant. However, correlation is of course not proof. The suggested experiment to titrate EPA to its lowest fat-2(wa17) rescuing levels and then perform lipidomics analysis was not possible in a reasonable time frame during this revision. However, preliminary experiments showed that even 25 μM LA (most of which will be converted to EPA by the worms) is enough to rescue the fat-2(wa17) or null mutant (Author response image 1), suggesting that even tiny amounts (much below the >250 μM used in our article) bring great benefits.

      Author response image 1.

      Nevertheless, we now acknowledge in the discussion that alternative explanations exist:

      "Other mechanisms are also possible. For example, mutations in the HIF-1 pathway could somehow reduce EPA turnover rates in the fat-2(wa17) mutant and allow its levels to rise above an essential threshold. This hypothesis is consistent with the observation that the suppressors can rescue both the fat-2(wa17) mutant and fat-2 RNAi-treated worms but not the fat-2 null mutant. It is even possible, though deemed unlikely, that the fat-2(wa17) suppressors act by compensating for the PUFA shortage via some undefined separate process downstream of the lipid changes and that they only indirectly result in elevated EPA levels."

      Additionally, another possible mechanism of action of the fat-2(wa17) suppressors could have been that they all cause upregulation of the FAT-2 protein. We have now explored this possibility using Western blots and found that this is an unlikely mechanism. This is presented in Fig. 6D-E and S3C-D, mentioned in the text as follows:

      "We also used Western blots to evaluate the abundance of the FAT-2 protein expressed from endogenous wild-type or mutant loci but to which a HA tag was fused using CRISPR/Cas9. We found that the FAT-2::HA levels are severely reduced when the locus contains the S101F substitution present in the wa17 allele, but restored close to wild-type levels by the fat2(et65) suppressor mutation (Fig 6D-E, S3C-D Fig). The levels of FAT-2 in the HIF-1 pathway suppressors varied between experiments, with the suppressors sometimes restoring FAT-2 levels and sometimes not even when the worms were growing well (Fig 6D-E, S3C-D Fig). The fat-2(wa17) suppressors, except for the intragenic fat-2 alleles, likely do not act by increasing FAT-2 protein levels."

      (4) It should be tested how Fe2+ levels are changed in the mutants, and how effective the ferric ammonium citrate treatment is. The authors might use a ftn-1::GFP reporter for this purpose.

      We did obtain a strain carrying the ftn-1::GFP reporter but could not generate conclusive data with it. In particular, we saw no increase in fluorescence in fat-2(wa17) worms carrying suppressor mutations. However, we also found that even FAC treatment that rescue the fat2(wa17) mutant did not result in a measurable increased GFP levels suggesting that the reporter is not sensitive enough.

      Minor comments

      (1) I think that putting Figure 6A in Figure 5 would be helpful for the readers, so that they understand that the mutations converge in the same pathway.

      This is now done.

      (2) Page 3: While it is clear that paqr-2 regulates lipid composition, I believe that it remains unclear if it "promote the production and incorporation of PUFAs into phospholipids to restore membrane homeostasis".

      A reference was missing to support that statement. Ruiz et al. (2023) is now cited for this (ref. 7).

      (3) C. elegans is extremely rich in EPA (see for example DOI: 10.3390/jcm5020019), but the lipidomics data in this study rather suggest that oleic acid is predominant. I recommend to check why this discrepancy occurs.

      OA (18:1n9) makes up only ~2%, but vaccenic acid (18:1n7) is ~21% in WT worms, EPA is slightly less at ~19% (Watts et al. 2002). These match with our lipidomics results although we cannot distinguish between 18:1n9 and n7. See also answer to Reviewer #1, comment 1.

      (4) Abstract: The authors write that mammals do not synthesize PUFAs, which is almost correct, but they still produce the PUFA mead acid. Thus, the statement is not completely right.

      Didn't know that! From literature, it is our understanding that mammals synthesize mead acid during FA deficiency but not in normal conditions, so they are not regularly producing mead acid. We have now updated the introduction:

      "An exception to this exists during severe essential fatty acid deficiency when mammals can synthesize mead acid (20:3n9), though this is not a common occurrence [11]"

      (5) Page 10: Eicosanoids are C20 lipid mediators, thus those produced from docosahexaenoic acid are not eicosanoids. Correct the statement.

      We thank the reviewer for pointing this out. We now write:

      " EPA and DHA, being long chain PUFAs should have similar fluidizing effects on membrane properties (though in vitro experiments challenge this view [78]), and both can serve as precursors of eicosanoids or docosanoids, particularly inflammatory ones [79]."

      (6) Page 7: "hif-1(et69) is similarly able to suppress fat-2(wa17) when ftn-2 is knocked out" I am not sure that the data agrees with this statement, and it is unclear what we can conclude from such observation.

      Fig. 2D shows that ftn-2(et68) suppresses fat-2(wa17) even in the presence of a hif-1(ok2654) null allele, showing that no HIF-1 function is required once ftn-2 is mutated. Conversely, Fig S2E shows that combining both the hif-1(et69) and the ftn-2(ok404) null allele also suppresses fat-2(wa17) (the worms do not fully reach N2 length, but they are significantly longer and were fertile adults); this is merely the expected outcome if the pathway converges on loss of ftn-2 function, though other interpretations could be possible from this experiment alone.

      (7) S3 Fig: in panel B, is the last column ftn-2;egl-9 mutant? I would imagine that it is ftn2;fat-2.

      We thank the reviewer for pointing this out. This has been corrected.

      (8) Fig 6B, how many times has been this experiment done?

      With these exact conditions (6h and 20h hypoxia) and order of strains the blot was done once, but the blot overall was done 5 times. We now added another replicate in Fig. S3A.

      Note also that a few minor modifications have been made throughout the text, which can be seen in the Word file with tracked changes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to the Joint Public Review:

      We are indebted to eLife’s reviewing process for helping us improve our manuscript and for highlighting that our study provides new molecular insights into SFT pathogenesis.  

      Response to Reviewers:

      (1) The authors state that "NAB2-STAT6 localization is exclusively driven by EGR1 binding" yet WT1 motives are also consistently enriched. Can you please touch upon the potential involvement of WT1 (or lack thereof, and why)?

      Our data suggest that EGR1 is the primary driver of NAB2-STAT6 localization. In fact, EGR1 is the most significantly enriched motif (Fig. 4) at NAB2-STAT6 binding sites and we detect an interaction between the fusion protein and EGR1 (Fig. 5). Conversely, we did not identify an interaction between NAB2-STAT6 and WT1. However, WT1 also belongs to the C2H2 zinc finger subclass and recognizes a motif bearing striking similarities to the EGR1/2 consensus. EGR1 has been previously described to bind WT1 motifs and to function as an activator of WT1 targets (as opposed to WT1 repressive abilities). See https://www.jbc.org/article/S0021-9258(20)74720-4/fulltext and https://www.sciencedirect.com/science/article/pii/S0378111901005935.

      (2) In the description of Figure 5C the authors observe nuclear staining of both NAB2 and STAT6 following NAB2-STAT6 fusion induction. They interpret this as the fusion stimulates nuclear translocation of endogenous NAB2. This statement can only be rigorously made if the authors can unequivocally demonstrate that their antibody exclusively detects endogenous NAB2 and not the NAB2 portion of the fusion. As presented, a more likely interpretation is that the NAB2 staining detects NAB2-STAT6 fusion protein. Since there is some cytoplasmic NAB2 signal still present, the findings in Figure 5c do not support nor disprove nuclear translocation of endogenous NAB2. It may be prudent to remove this section. Figure 5B is currently the best direct evidence of nuclear translocation.

      We agree with the reviewer that Fig. 5C does not rigorously show that NAB2-STAT6 fusion proteins drag endogenous NAB2 into the nucleus. The immunostaining reveals that wt NAB2 localization is overwhelmingly cytoplasmic at steady-state conditions (and prior to expression of the fusion protein). Instead, Figure 5B shows that endogenous NAB2 translocates to the nucleus upon NAB2-STAT6 expression. Additionally, figure 5A (along with Suppl. Fig. 5 E-F) demonstrates that endogenous NAB2 co-precipitates with NAB2-STAT6 fusions in nuclear extracts of U2OS and HEK293T cells. We have rephrased the paragraph accordingly.

      (3) Figure 5D: for the interpretation of the presented data to hold up, namely, NAB1 nuclear translocation upon NAB2-STAT6 expression, it is important to demonstrate that NAB1 antibodies do not cross-react with NAB2 given the similarity between NAB1 and NAB2. Without such control, another likely interpretation of the results in Figure 5D is that NAB1 antibody detects the NAB2 portion of the overexpressed fusion protein. This needs to be acknowledged in the text.

      We had similar concerns, therefore we confirmed that the NAB1 antibody does not cross react with NAB2 by immunoblot (see figure below). We overexpressed FLAG-NAB2, HA-NAB1 and GFP constructs in HEK293T cells, we performed immunoprecipitation with either HA or FLAG from whole cell extracts followed by western blot using anti-NAB2 and anti-NAB1 polyclonal antibodies. We did not observe cross-reactivity of these antibodies. We acknowledged antibody validation in the revised text.

      Author response image 1.

      (4) Also, to support the notion that NAB2-STAT6 fusion promotes nuclear translocation of the entire complex, an imaging approach detecting EGR1 similar to Figure 5C-D would be helpful. EGR1 staining also avoids the potential pitfall of NAB1/2 antibodies detecting NAB2-STAT6 overexpressed fusion instead of endogenous proteins.

      We agree with the reviewer that this would be a helpful approach. Unfortunately, none of the commercially available EGR1 antibodies that we tested were suitable for immunocytochemistry, as they either failed to show a proper signal or were marred by high nonspecific background signal.

      (5) The authors found increased mRNA expression of certain cytokines and secreted neuropeptides in SFTs. While this may be consistent with a secretory phenotype, additional evidence such as detection of elevated levels of these proteins in tumor lysates or in culture media is necessary to formally make this claim. Please rephrase.

      We have rephrased our claims as suggested. The revised text is now as follows: “​​We also identified a distinct secretory gene signature associated with SFTs. In fact, IGF2 is the most upregulated gene, via activation of an intronic enhancer by EGR1. IGF2 was pinpointed as the cause of hypoglycemia occurring in a very small subset of SFTs (Doege–Potter syndrome)(52). Our data suggest that IGF2 (and IGF1) upregulation is a common feature of all SFTs. In addition to insulin-like growth factors, STFs may secrete a host of peptides with diverse functions in neuronal processes, chemotaxis, and growth stimulation. The previously unrecognized neuronal features and the putative secretory phenotype of STFs set them apart from mesenchymal malignancies and relate them to neuroendocrine malignancies such as pheochromocytoma, oligodendroglioma and neuroblastoma.”

      (6) GSEA with 500 randomly selected genes from target datasets needs a more detailed description to clarify the method.

      To improve clarity, we added the following description: “Gene set enrichment analysis (GSEA) was done with 500 randomly selected genes from the given set of genes across the C2 collection of the human molecular signatures database or custom signatures using the GSEA function in clusterProfiler package in R (v4.6.2).

      (7) In the IP-MS description, please double check the NaCl concentration in the second extraction step - 0.5mM seems low. Also, in the IP part, a buffer recipe appears to have been incorrectly pasted.

      We thank the reviewer for identifying this typo. Indeed, we used 0.5M NaCl instead of 0.5mM. We have corrected the co-IP buffer recipe accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This important study represents a comprehensive computational analysis of Plasmodium falciparum gene expression, with a focus on var gene expression, in parasites isolated from patients; it assesses changes that occur as the parasites adapt to short-term in vitro culture conditions. The work provides technical advances to update a previously developed computational pipeline. Although the findings of the shifts in the expression of particular var genes have theoretical or practical implications beyond a single subfield, the results are incomplete and the main claims are only partially supported.

      The authors would like to thank the reviewers and editors for their insightful and constructive assessment. We particularly appreciate the statement that our work provides a technical advance of our computational pipeline given that this was one of our main aims. To address the editorial criticisms, we have rephrased and restructured the manuscript to ensure clarity of results and to support our main claims. For the same reason, we removed the var transcript differential expression analysis, as this led to confusion.

      Public Reviews:

      Reviewer #1:

      The authors took advantage of a large dataset of transcriptomic information obtained from parasites recovered from 35 patients. In addition, parasites from 13 of these patients were reared for 1 generation in vivo, 10 for 2 generations, and 1 for a third generation. This provided the authors with a remarkable resource for monitoring how parasites initially adapt to the environmental change of being grown in culture. They focused initially on var gene expression due to the importance of this gene family for parasite virulence, then subsequently assessed changes in the entire transcriptome. Their goal was to develop a more accurate and informative computational pipeline for assessing var gene expression and secondly, to document the adaptation process at the whole transcriptome level.

      Overall, the authors were largely successful in their aims. They provide convincing evidence that their new computational pipeline is better able to assemble var transcripts and assess the structure of the encoded PfEMP1s. They can also assess var gene switching as a tool for examining antigenic variation. They also documented potentially important changes in the overall transcriptome that will be important for researchers who employ ex vivo samples for assessing things like drug sensitivity profiles or metabolic states. These are likely to be important tools and insights for researchers working on field samples.

      One concern is that the abstract highlights "Unpredictable var gene switching..." and states that "Our results cast doubt on the validity of the common practice of using short-term cultured parasites...". This seems somewhat overly pessimistic with regard to var gene expression profiling and does not reflect the data described in the paper. In contrast, the main text of the paper repeatedly refers to "modest changes in var gene expression repertoire upon culture" or "relatively small changes in var expression from ex vivo to culture", and many additional similar assessments. On balance, it seems that transition to culture conditions causes relatively minor changes in var gene expression, at least in the initial generations. The authors do highlight that a few individuals in their analysis showed more pronounced and unpredictable changes, which certainly warrants caution for future studies but should not obscure the interesting observation that var gene expression remained relatively stable during transition to culture.

      Thank you for this comment. We were happy to modify the wording in the abstract to have consistency with the results presented by highlighting that modest but unpredictable var gene switching was observed while substantial changes were found in the core transcriptome. Moreover, any differences observed in core transcriptome between ex vivo samples from naïve and pre-exposed patients are diminished after one cycle of cultivation making inferences about parasite biology in vivo impossible.

      Therefore, – to our opinion – the statement in the last sentence is well supported by the data presented.

      Line 43–47: “Modest but unpredictable var gene switching and convergence towards var2csa were observed in culture, along with differential expression of 19% of the core transcriptome between paired ex vivo and generation 1 samples. Our results cast doubt on the validity of the common practice of using short-term cultured parasites to make inferences about in vivo phenotype and behaviour.” Nevertheless, we would like to note that this study was in a unique position to assess changes at the individual patient level as we had successive parasite generations. This comparison is not done in most cross-sectional studies and therefore these small, unpredictable changes in the var transcriptome are missed.

      Reviewer #2:

      In this study, the authors describe a pipeline to sequence expressed var genes from RNA sequencing that improves on a previous one that they had developed. Importantly, they use this approach to determine how var gene expression changes with short-term culture. Their finding of shifts in the expression of particular var genes is compelling and casts some doubt on the comparability of gene expression in short-term culture versus var expression at the time of participant sampling. The authors appear to overstate the novelty of their pipeline, which should be better situated within the context of existing pipelines described in the literature.

      Other studies have relied on short-term culture to understand var gene expression in clinical malaria studies. This study indicates the need for caution in over-interpreting findings from these studies.

      The novel method of var gene assembly described by the authors needs to be appropriately situated within the context of previous studies. They neglect to mention several recent studies that present transcript-level novel assembly of var genes from clinical samples. It is important for them to situate their work within this context and compare and contrast it accordingly. A table comparing all existing methods in terms of pros and cons would be helpful to evaluate their method.

      We are grateful for this suggestion and agree that a table comparing the pros and cons of all existing methods would be helpful for the general reader and also highlight the key advantages of our new approach. A table comparing previous methods for var gene and transcript characterisation has been added to the manuscript and is referenced in the introduction (line 107).

      Author response table 1.

      Comparison of previous var assembly approaches based on DNA- and RNA-sequencing.

      Reviewer #3:

      This work focuses on the important problem of how to access the highly polymorphic var gene family using short-read sequence data. The approach that was most successful, and utilized for all subsequent analyses, employed a different assembler from their prior pipeline, and impressively, more than doubles the N50 metric.

      The authors then endeavor to utilize these improved assemblies to assess differential RNA expression of ex vivo and short-term cultured samples, and conclude that their results "cast doubt on the validity" of using short-term cultured parasites to infer in vivo characteristics. Readers should be aware that the various approaches to assess differential expression lack statistical clarity and appear to be contradictory. Unfortunately, there is no attempt to describe the rationale for the different approaches and how they might inform one another.

      It is unclear whether adjusting for life-cycle stage as reported is appropriate for the var-only expression models. The methods do not appear to describe what type of correction variable (continuous/categorical) was used in each model, and there is no discussion of the impact on var vs. core transcriptome results.

      We agree with the reviewer that the different methods and results of the var transcriptome analysis can be difficult to reconcile. To address this, we have included a summary table with a brief description of the rationale and results of each approach in our analysis pipeline.

      Author response table 2.

      Summary of the different levels of analysis performed to assess the effect of short-term parasite culturing on var and core gene expression, their rational, method, results, and interpretation.

      Additionally, the var transcript differential expression analysis was removed from the manuscript, because this study was in a unique position to perform a more focused analysis of var transcriptional changes across paired samples, meaning the per-patient approach was more suitable. This allowed for changes in the var transcriptome to be identified that would have gone unnoticed in the traditional differential expression analysis.

      We thank the reviewer for his highly important comment about adjusting for life cycle stage. Var gene expression is highly stage-dependent, so any quantitative comparison between samples does need adjustment for developmental stage. All life cycle stage adjustments were done using the mixture model proportions to be consistent with the original paper, described in the results and methods sections:

      • Line 219–221: “Due to the potential confounding effect of differences in stage distribution on gene expression, we adjusted for developmental stage determined by the mixture model in all subsequent analyses.”

      • Line 722–725: “Var gene expression is highly stage dependent, so any quantitative comparison between samples needs adjustment for developmental stage. The life cycle stage proportions determined from the mixture model approach were used for adjustment.“

      The rank-expression analysis did not have adjustment for life cycle stage as the values were determined as a percentage contribution to the total var transcriptome. The var group level and the global var gene expression analyses were adjusted for life cycle stages, by including them as an independent variable, as described in the results and methods sections.

      Var group expression:

      • Line 321–326: “Due to these results, the expression of group A var genes vs. group B and C var genes was investigated using a paired analysis on all the DBLα (DBLα1 vs DBLα0 and DBLα2) and NTS (NTSA vs NTSB) sequences assembled from ex vivo samples and across multiple generations in culture. A linear model was created with group A expression as the response variable, the generation and life cycle stage as independent variables and the patient information included as a random effect. The same was performed using group B and C expression levels.“

      • Line 784–787: “DESeq2 normalisation was performed, with patient identity and life cycle stage proportions included as covariates and differences in the amounts of var transcripts of group A compared with groups B and C assessed (Love et al., 2014). A similar approach was repeated for NTS domains.”

      Gobal var gene expression:

      • Line 342–347: “A linear model was created (using only paired samples from ex vivo and generation 1) (Supplementary file 1) with proportion of total gene expression dedicated to var gene expression as the response variable, the generation and life cycle stage as independent variables and the patient information included as a random effect. This model showed no significant differences between generations, suggesting that differences observed in the raw data may be a consequence of small changes in developmental stage distribution in culture.”

      • Line 804–806: “Significant differences in total var gene expression were tested by constructing a linear model with the proportion of gene expression dedicated to var gene expression as the response variable, the generation and life cycle stage as an independent variables and the patient identity included as a random effect.“

      The analysis of the conserved var gene expression was adjusted for life cycle stage:

      • Line 766–768: “For each conserved gene, Salmon normalised read counts (adjusted for life cycle stage) were summed and expression compared across the generations using a pairwise Wilcoxon rank test.”

      And life cycle stage estimates were included as covariates in the design matrix for the domain differential expression analysis:

      • Line 771–773: “DESeq2 was used to test for differential domain expression, with five expected read counts in at least three patient isolates required, with life cycle stage and patient identity used as covariates.”

      Reviewer #1:

      1. In the legend to Figure 1, the authors cite "Deitsch and Hviid, 2004" for the classification of different var gene types. This is not the best reference for this work. Better citations would be Kraemer and Smith, Mol Micro, 2003 and Lavstsen et al, Malaria J, 2003.

      We agree and have updated the legend in Figure 1 with these references, consistent with the references cited in the introduction.

      1. In Figures 2 and 3, each of the boxes in the flow charts are largely filled with empty space while the text is nearly too small to read. Adjusting the size of the text would improve legibility.

      We have increased the size of the text in these figures.

      1. My understanding of the computational method for assessing global var gene expression indicates an initial step of identifying reads containing the amino acid sequence LARSFADIG. It is worth noting that VAR2CSA does not contain this motif. Will the pipeline therefore miss expression of this gene, and if so, how does this affect the assessment of global var gene assessment? This seems relevant given that the authors detect increased expression of var2csa during adaptation to culture.

      To address this question, we have added an explanation in the methods section to better explain our analysis. Var2csa was not captured in the global var gene expression analysis, but was analyzed separately because of its unique properties (conservation, proposed role in regulating var gene switching, slightly divergent timing of expression, translational repression).

      • Line 802/3: “Var2csa does not contain the LARSFADIG motif, hence this quantitative analysis of global var gene expression excluded var2csa (which was analysed separately).”
      1. In Figures 4 and 7, panels a and b display virtually identical PCA plots, with the exception that panel A displays more generations. Why are both panels included? There doesn't appear to be any additional information provided by panel B.

      We agree and have removed Figure 7b for the core transcriptome PCA as it did not provide any new information. The var transcript differential analysis (displayed in Figure 4) has been removed from the manuscript.

      1. On line 560-567, the authors state "However, the impact of short-term culture was the most apparent at the var transcript level and became less clear at higher levels." What are the high levels being referred to here?

      We have replaced this sentence to make it clearer what the different levels are (global var gene expression, var domain and var type).

      • Line 526/7: “However, the impact of short-term culture was the most apparent at the var transcript level and became less clear at the var domain, var type and global var gene expression level.”

      Reviewer #2:

      The authors make no mention or assessment of previously published var gene assembly methods from clinical samples that focus on genomic or transcriptomic approaches. These include:

      https://pubmed.ncbi.nlm.nih.gov/28351419/

      https://pubmed.ncbi.nlm.nih.gov/34846163/

      These methods should be compared to the method for var gene assembly outlined by the co-authors, especially as the authors say that their method "overcomes previous limitations and outperforms current methods" (128-129). The second reference above appears to be a method to measure var expression in clinical samples and so should be particularly compared to the approach outlined by the authors.

      Thank you for pointing this out. We have included the second reference in the introduction of our revised manuscript, where we refer to var assembly and quantification from RNA-sequencing data. We abstained from including the first paper in this paragraph (Dara et al., 2017) as it describes a var gene assembly pipeline and not a var transcript assembly pipeline.

      • Line 101–105: “While approaches for var assembly and quantification based on RNA-sequencing have recently been proposed (Wichers et al., 2021; Stucke et al., 2021; Andrade et al., 2020; TonkinHill et al., 2018, Duffy et al., 2016), these still produce inadequate assembly of the biologically important N-terminal domain region, have a relatively high number of misassemblies and do not provide an adequate solution for handling the conserved var variants (Table S1).”

      Additionally, we have updated the manuscript with a table (Table S1) comparing these two methods plus other previously used var transcript/gene assembly approaches (see comment to the public reviews).

      But to address this particular comment in more detail, the first paper (Dara et al., 2017) is a var gene assembly pipeline and not a var transcript assembly pipeline. It is based on assembling var exon 1 from unfished whole genome assemblies of clinical samples and requires a prior step for filtering out human DNA. The authors used two different assemblers, Celera for short reads (which is no longer maintained) and Sprai for long reads (>2000bp), but found that Celera performed worse than Sprai, and subsequently used Sprai assemblies. Therefore, this method does not appear to be suitable for assembling short reads from RNA-seq.

      The second paper (Stucke et al. 2021) focusses more on enriching for parasite RNA, which precedes assembly. The capture method they describe would complement downstream analysis of var transcript assembly with our pipeline. Their assembly pipeline is similar to our pipeline as they also performed de novo assembly on all P. falciparum mapping and non-human mapping reads and used the same assembler (but with different parameters). They clustered sequences using the same approach but at 90% sequence identity as opposed to 99% sequence identity using our approach. Then, Stucke et al. use 500nt as a cut-off as opposed to the more stringent filtering approach used in our approach. They annotated their de novo assembled transcripts with the known amino acid sequences used in their design of the capture array; our approach does not assume prior information on the var transcripts. Finally, their approach was validated only for its ability to recover the most highly expressed var transcript in 6 uncomplicated malaria samples, and they did not assess mis-assemblies in their approach.

      For the methods (619–621), were erythrocytes isolated by Ficoll gradient centrifugation at the time of collection or later?

      We have updated the methods section to clarify this.

      • Line 586–588: “Blood was drawn and either immediately processed (#1, #2, #3, #4, #11, #12, #14, #17, #21, #23, #28, #29, #30, #31, #32) or stored overnight at 4oC until processing (#5, #6, #7, #9, #10, #13, #15, #16, #18, #19, #20, #22, #24, #25, #26, #27, #33).”

      Was the current pipeline and assembly method assessed for var chimeras? This should be described.

      Yes, this was quantified in the Pf 3D7 dataset and also assessed in the German traveler dataset. For the 3D7 dataset it is described in the result section and Figure S1.

      • Line 168–174: “However, we found high accuracies (> 0.95) across all approaches, meaning the sequences we assembled were correct (Figure 2 – Figure supplement 1b). The whole transcript approach also performed the best when assembling the lower expressed var genes (Figure 2 – Figure supplement 1e) and produced the fewest var chimeras compared to the original approach on P. falciparum 3D7. Fourteen misassemblies were observed with the whole transcript approach compared to 19 with the original approach (Table S2). This reduction in misassemblies was particularly apparent in the ring-stage samples.” - Figure S1:

      Author response image 1.

      Performance of novel computational pipelines for var assembly on Plasmodium falciparum 3D7: The three approaches (whole transcript: blue, domain approach: orange, original approach: green) were applied to a public RNA-seq dataset (ENA: PRJEB31535) of the intra-erythrocytic life cycle stages of 3 biological replicates of cultured P. falciparum 3D7, sampled at 8-hour intervals up until 40hrs post infection (bpi) and then at 4-hour intervals up until 48 (Wichers al., 2019). Boxplots show the data from the 3 biological replicates for each time point in the intra-erythrocytic life cycle: a) alignment scores for the dominantly expressed var gene (PF3D7_07126m), b) accuracy scores for the dominantly var gene (PF3D7_0712600), c) number of contigs to assemble the dominant var gene (PF3D7_0712600), d) alignment scores for a middle ranking expressed vargene (PF3D7_0937800), e) alignment scores for the lowest expressed var gene (PF3D7_0200100). The first best blast hit (significance threshold = le-10) was chosen for each contig. The alignment score was used to evaluate the each method. The alignment score represents √accuracy* recovery. The accuracy is the proportion of bases that are correct in the assembled transcript and the recovery reflects what proportion of the true transcript was assembled. Assembly completeness of the dominant vargene (PF3D7 071200, length = 6648nt) for the three approaches was assessed for each biological f) biological replicate 1, g) biological replicate 2, h) biological replicate 3. Dotted lines represent the start and end of the contigs required to assemble the vargene. Red bars represent assembled sequences relative to the dominantly whole vargene sequence, where we know the true sequence (termed “reference transcript”).

      For the ex vivo samples, this has been discussed in the result section and now we also added this information to Table 1.

      • Line 182/3: “Remarkably, with the new whole transcript method, we observed a significant decrease (2 vs 336) in clearly misassembled transcripts with, for example, an N-terminal domain at an internal position.”

      • Table 1:

      Author response table 3.

      Statistics for the different approaches used to assemble the var transcripts. Var assembly approaches were applied to malaria patient ex vivo samples (n=32) from (Wichers et al., 2021) and statistics determined. Given are the total number of assembled var transcripts longer than 500 nt containing at least one significantly annotated var domain, the maximum length of the longest assembled var transcript in nucleotides and the N50 value, respectively. The N50 is defined as the sequence length of the shortest var contig, with all var contigs greater than or equal to this length together accounting for 50% of the total length of concatenated var transcript assemblies. Misassemblies represents the number of misassemblies for each approach. **Number of misassemblies were not determined for the domain approach due to its poor performance in other metrics.

      Line 432: "the core gene transcriptome underwent a greater change relative to the var transcriptome upon transition to culture." Can this be shown statistically? It's unclear whether the difference in the sizes of the respective pools of the core genome and the var genes may account for this observation.

      We found 19% of the core transcriptome to be differentially expressed. The per patient var transcript analysis revealed individually highly variable but generally rather subtle changes in the var transcriptome. The different methods for assessing this make it difficult to statistically compare these two different results.

      The feasibility of this approach for field samples should be discussed in the Discussion.

      In the original manuscript we reflected on this already several times in the discussion (e.g., line 465/6; line 471–475; line 555–568). We now have added another two sentences at the end of the paragraph starting in line 449 to address this point. It reads now:

      • Line 442–451: “Our new approach used the most geographically diverse reference of var gene sequences to date, which improved the identification of reads derived from var transcripts. This is crucial when analysing patient samples with low parasitaemia where var transcripts are hard to assemble due to their low abundancy (Guillochon et al., 2022). Our approach has wide utility due to stable performance on both laboratory-adapted and clinical samples. Concordance in the different var expression profiling approaches (RNA-sequencing and DBLα-tag) on ex vivo samples increased using the new approach by 13%, when compared to the original approach (96% in the whole transcript approach compared to 83% in Wichers et al., 2021. This suggests the new approach provides a more accurate method for characterising var genes, especially in samples collected directly from patients. Ultimately, this will allow a deeper understanding of relationships between var gene expression and clinical manifestations of malaria.”

      MINOR

      The plural form of PfEMP1 (PfEMP1s) is inconsistently used throughout the text.

      Corrected.

      404-405: statistical test for significance?

      Thank you for this suggestion. We have done two comparisons between the original analysis from Wichers et al., 2021 and our new whole transcript approach to test concordance of the RNAseq approaches with the DBLα-tag approach using paired Wilcoxon tests. These comparisons suggest that our new approach has significantly increased concordance with DBLα-tag data and might be better at capturing all expressed DBLα domains than the original analysis (and the DBLα-approach), although not statistically significant. We describe this now in the result section.

      • Line 352–361: “Overall, we found a high agreement between the detected DBLα-tag sequences and the de novo assembled var transcripts. A median of 96% (IQR: 93–100%) of all unique DBLα-tag sequences detected with >10 reads were found in the RNA-sequencing approach. This is a significant improvement on the original approach (p= 0.0077, paired Wilcoxon test), in which a median of 83% (IQR: 79–96%) was found (Wichers et al., 2021). To allow for a fair comparison of the >10 reads threshold used in the DBLα-tag approach, the upper 75th percentile of the RNA-sequencingassembled DBLα domains were analysed. A median of 77.4% (IQR: 61–88%) of the upper 75th percentile of the assembled DBLα domains were found in the DBLα-tag approach. This is a lower median percentage than the median of 81.3% (IQR: 73–98%) found in the original analysis (p= 0.28, paired Wilcoxon test) and suggests the new assembly approach is better at capturing all expressed DBLα domains.”

      Figure 4: The letters for the figure panels need to be added.

      The figure has been removed from the manuscript.

      Reviewer #3:

      It is difficult from Table S2 to determine how many unique var transcripts would have enough coverage to be potentially assembled from each sample. It seems unlikely that 455 distinct vars (~14 per sample) would be expressed at a detectable level for assembly. Why not DNA-sequence these samples to get the full repertoire for comparison to RNA? Why would so many distinct transcripts be yielded from fairly synchronous samples?

      We know from controlled human malaria infections of malaria-naive volunteers, that most var genes present in the genomic repertoire of the parasite strain are expressed at the onset of the human blood phase (heterogenous var gene expression) (Wang et al., 2009; Bachmann et al, 2016; Wichers-Misterek et al., 2023). This pattern shifts to a more restricted, homogeneous var expression pattern in semi-immune individuals (expression of few variants) depending on the degree of immunity (Bachmann et al., 2019).

      Author response image 2.

      In this cohort, 15 first-time infections are included, which should also possess a more heterogenous var gene expression in comparison to the pre-exposed individuals, and indeed such a trend is already seen in the number of different DBLa-tag clusters found in both patient groups (see figure panel from Wichers et al. 2021: blue-first-time infections; grey–pre-exposed). Moreover, Warimwe et al. 2013 have shown that asymptomatic infections have a more homogeneous var expression in comparison to symptomatic infections. Therefore, we expect that parasites from symptomatic infections have a heterogenous var expression pattern with multiple var gene variants expressed, which we could assemble due to our high read depth and our improved var assembly pipeline for even low expressed variants.

      Moreover, the distinct transcripts found in the RNA-seq approach were confirmed with the DBLα tag data. To our opinion, previous approaches may have underestimated the complexity of the var transcriptome in less immune individuals.

      Mapping reads to these 455 putative transcripts and using this count matrix for differential expression analysis seems very unlikely to produce reliable results. As acknowledged on line 327, many reads will be mis-mapped, and perhaps most challenging is that most vars will not be represented in most samples. In other words, even if mapping were somehow perfect, one would expect a sparse matrix that would not be suitable for statistical comparisons between groups. This is likely why the per-patient transcript analysis doesn't appear to be consistent. I would recommend the authors remove the DE sections utilizing this approach, or add convincing evidence that the count matrix is useable.

      We agree that this is a general issue of var differential expression analysis. Therefore, we have removed the var differential expression analysis from this manuscript as the per patient approach was more appropriate for the paired samples. We validated different mapping strategies (new Figure S6) and included a paragraph discussing the problem in the result section:

      • Line 237–255: “In the original approach of Wichers et al., 2021, the non-core reads of each sample used for var assembly were mapped against a pooled reference of assembled var transcripts from all samples, as a preliminary step towards differential var transcript expression analysis. This approach returned a small number of var transcripts which were expressed across multiple patient samples (Figure 3 – Figure supplement 2a). As genome sequencing was not available, it was not possible to know whether there was truly overlap in var genomic repertoires of the different patient samples, but substantial overlap was not expected. Stricter mapping approaches (for example, excluding transcripts shorter than 1500nt) changed the resulting var expression profiles and produced more realistic scenarios where similar var expression profiles were generated across paired samples, whilst there was decreasing overlap across different patient samples (Figure 3 – Figure supplement 2b,c). Given this limitation, we used the paired samples to analyse var gene expression at an individual subject level, where we confirmed the MSP1 genotypes and alleles were still present after short-term in vitro cultivation. The per patient approach showed consistent expression of var transcripts within samples from each patient but no overlap of var expression profiles across different patients (Figure 3 – Figure supplement 2d). Taken together, the per patient approach was better suited for assessing var transcriptional changes in longitudinal samples. It has been hypothesised that more conserved var genes in field isolates increase parasite fitness during chronic infections, necessitating the need to correctly identify them (Dimonte et al., 2020, Otto et al., 2019). Accordingly, further work is needed to optimise the pooled sample approach to identify truly conserved var transcripts across different parasite isolates in cross-sectional studies.” - Figure S6:

      Author response image 3.

      Var expression profiles across different mapping. Different mapping approaches Were used to quantify the Var expression profiles of each sample (ex Vivo (n=13), generation I (n=13), generation 2 (n=10) and generation 3 (n=l). The pooled sample approach in Which all significantly assembled van transcripts (1500nt and containing3 significantly annotated var domains) across samples were combined into a reference and redundancy was removed using cd-hit (at sequence identity = 99%) (a—c). The non-core reads of each sample were mapped to this pooled reference using a) Salmon, b) bowtie2 filtering for uniquely mapping paired reads with MAPQ and c) bowtie2 filtering for uniquely mapping paired reads with a MAPQ > 20. d) The per patient approach was applied. For each patient, the paired ex vivo and in vitro samples were analysed. The assembled var transcripts (at least 1500nt and containing3 significantly annotated var domains) across all the generations for a patient were combined into a reference, redundancy was removed using cd-hit (at sequence identity: 99%), and expression was quantified using Salmon. Pie charts show the var expression profile With the relative size of each slice representing the relative percentage of total var gene expression of each var transcript. Different colours represent different assembled var transcripts with the same colour code used across a-d.

      For future cross-sectional studies a per patient analysis that attempts to group per patient assemblies on some unifying structure (e.g., domain, homology blocks, domain cassettes etc) should be performed.

      Line 304. I don't understand the rationale for comparing naïve vs. prior-exposed individuals at ex-vivo and gen 1 timepoints to provide insights into how reliable cultured parasites are as a surrogate for var expression in vivo. Further, the next section (per patient) appears to confirm the significant limitation of the 'all sample analysis' approach. The conclusion on line 319 is not supported by the results reported in figures S9a and S9b, nor is the bold conclusion in the abstract about "casting doubt" on experiments utilizing culture adapted

      We have removed this comparison from the manuscript due to the inconsistencies with the var per patient approach. However, the conclusion in the abstract has been rephrased to reflect the fact we observed 19% of the core transcript differentially expressed within one cycle of cultivation.

      Line 372/391 (and for the other LMM descriptions). I believe you mean to say response variable, rather than explanatory variable. Explanatory variables are on the right hand side of the equation.

      Thank you for spotting this inaccuracy, we changed it to “response variable” (line 324, line 343, line 805).

      Line 467. Similar to line 304, why would comparisons of naïve vs. prior-exposed be informative about surrogates for in vivo studies? Without a gold-standard for what should be differentially expressed between naïve and prior-exposed in vivo, it doesn't seem prudent to interpret a drop in the number of DE genes for this comparison in generation 1 as evidence that biological signal for this comparison is lost. What if the generation 1 result is actually more reflective of the true difference in vivo, but the ex vivo samples are just noisy? How do we know? Why not just compare ex vivo vs generation 1/2 directly (as done in the first DE analysis), and then you can comment on the large number of changes as samples are less and less proximal to in vivo?

      In the original paper (Wichers et al., 2021), there were differences between the core transcriptome of naïve vs previously exposed patients. However, these differences appeared to diminish in vitro, suggesting the in vivo core transcriptome is not fully maintained in vitro.

      We have added a sentence explaining the reasoning behind this analysis in the results section:

      • Lines 414–423: “In the original analysis of ex vivo samples, hundreds of core genes were identified as significantly differentially expressed between pre-exposed and naïve malaria patients. We investigated whether these differences persisted after in vitro cultivation. We performed differential expression analysis comparing parasite isolates from naïve (n=6) vs pre-exposed (n=7) patients, first between their ex vivo samples, and then between the corresponding generation 1 samples. Interestingly, when using the ex vivo samples, we observed 206 core genes significantly upregulated in naïve patients compared to pre-exposed patients (Figure 7 – Figure supplement 3a). Conversely, we observed no differentially expressed genes in the naïve vs pre-exposed analysis of the paired generation 1 samples (Figure 7 – Figure supplement 3b). Taken together with the preceding findings, this suggests one cycle of cultivation shifts the core transcriptomes of parasites to be more alike each other, diminishing inferences about parasite biology in vivo.”

      Overall, I found the many DE approaches very frustrating to interpret coherently. If not dropped in revision, the reader would benefit from a substantial effort to clarify the rationale for each approach, and how each result fits together with the other approaches and builds to a concise conclusion.

      We agree that the manuscript contains many different complex layers of analysis and that it is therefore important to explain the rationale for each approach. Therefore, we now included the summary Table 3 (see comment to public review). Additionally, we have removed the var transcript differential expression due to its limitations, which we hope has already streamlined our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The current manuscript provides strong evidence that the molecular function of SLC35G1, an orphan human SLC transporter, is citrate export at the basolateral membrane of intestinal epithelial cells. Multiple lines of evidence, including radioactive transport experiments, immunohistochemical staining, gene expression analysis, and siRNA knockdown are combined to deduce a model of the physiological role of this transporter.

      Strengths:

      The experimental approaches are comprehensive, and together establish a strong model for the role of SLC35G1 in citrate uptake. The observation that chloride inhibits uptake suggests an interesting mechanism that exploits the difference in chloride concentration across the basolateral membrane.

      Weaknesses:

      Some aspects of the results would benefit from a more thorough discussion of the conclusions and/or model.

      For example, the authors find that SLC35G1 prefers the dianionic (singly protonated) form of citrate, and rationalize this finding by comparison with the substrate selectivity of the citrate importer NaDC1. However, this comparison has weaknesses when considering the physiological pH for SLC35G1 and NaDC1. NaDC1 binds citrate at a pH of ~5.4 (the pKa of citrate is 5.4, so there is a lot of dianionic citrate present under physiological circumstances). SLC35G1 binds citrate under pH conditions of ~7.5, where a very small amount of dianionic citrate is present. The data clearly show a pH dependence of transport, and the authors rule out proton coupling, but the discrepancy between the pH dependence and the physiological expectations should be addressed/commented on.

      Thank you for your insightful comment. Citrate exists mostly in its trianionic form under near neutral pH conditions in biological fluids, as you pointed out. Its dianionic form represents only a small portion (about 1/100) of total citrate due to the pKa. However, significant SLC35G1-specific uptake was observed under near neutral pH conditions (Figure 1G). Therefore, although SLC35G1-mediated citrate transport is less efficient under physiologically relevant near neutral pH conditions, it could still play a role particularly in the intestinal absorption process, in which the concentration gradient of dianionic citrate could be maintained by continuous supply by NaDC1-mediated apical uptake.

      The rationale for the series of compounds tested in Figure 1F, which includes metabolites with carboxylate groups, a selection of drugs including anion channel inhibitors and statins, and bile acids, is not described. Moreover, the lessons drawn from this experiment are vague and should be expanded upon. It is not clear what, if anything, the compounds that reduce citrate uptake have in common.

      Thank you for highlighting the need for clarity regarding the compounds tested in Figure 1F. The tested compounds were TCA cycle intermediates (fumarate, α-ketoglutarate, malate, pyruvate, and succinate) as substrate candidate carboxylates analogous to citrate, diverse anionic compounds (BSP, DIDS, probenecid, pravastatin, and taurocholate) as those that might be substrates or inhibitors, and diverse cationic compounds (cimetidine, quinidine, and verapamil) as those that are least likely to interact with SLC35G1. Among them, certain anionic compounds significantly reduced SLC35G1-specific citrate uptake, suggesting that they may interact with SLC35G1. However, we could not identify any structural features commonly shared by these compounds, except that they have anionic moieties. We acknowledge that it requires further elaboration to clarify such structural features. We have revised the relevant section on p. 3 (line 25 - 32) to include these.

      The transporter is described as a facilitative transporter, but this is not established definitively. For example, another possibility could involve coupling citrate transport to another substrate, possibly even chloride ion.

      Thank you for your insightful comment regarding the nature of SLC35G1's transport mechanism. While we have described SLC35G1 as a facilitative transporter based on our current data, we acknowledge that this has not been definitively proven, as you pointed out, and we cannot exclude the possibility that its sensitivity to extracellular Cl- might imply its operation as a citrate/Cl- exchanger. To examine the possibility, we would need to manipulate the chloride ion gradient across the plasma membrane. Particularly, generating an outward Cl- gradient to see if it could enhance citrate uptake could be a potential strategy. However, current techniques do not allow us to effectively generate the Cl- gradient, thus preventing us from conclusively verifying this possibility. We recognize the importance of further investigating this aspect in future studies. Your suggestion highlights an important area for additional research to fully understand the transport mechanism of SLC35G1. We have additionally commented on this issue on p. 4 (line 1 – 3).

      Reviewer #2 (Public Review):

      Summary:

      The primary goal of this study was to identify the transport pathway that is responsible for the release of dietary citrate from enterocytes into blood across the basolateral membrane.

      Strengths:

      The transport pathway responsible for the entry of dietary citrate into enterocytes was already known, but the transporter responsible for the second step remained unidentified. The studies presented in this manuscript identify SLC35G1 as the most likely transporter that mediates the release of absorbed citrate from intestinal cells into the serosal side. This fills an important gap in our current knowledge of the transcellular absorption of dietary citrate. The exclusive localization of the transporter in the basolateral membrane of human intestinal cells and the human intestinal cell line Caco-2 and the inhibition of the transporter function by chloride support this conclusion.

      Weaknesses:

      (i) The substrate specificity experiments have been done with relatively low concentrations of potential competing substrates, considering the relatively low affinity of the transporter for citrate. Given that NaDC1 brings in not only citrate as a divalent anion but also other divalent anions such as succinate, it is possible that SLC35G1 is responsible for the release of not only citrate but also other dicarboxylates. But the substrate specificity studies show that the dicarboxylates tested did not compete with citrate, meaning that SLc35G1 is selective for the citrate (2-), but this conclusion might be flawed because of the low concentration of the competing substrates used in the experiment.

      Thank you for your valuable comment on our substrate specificity experiments. As you pointed out, we cannot rule out the possibility that dicarboxylates might be recognized by SLC35G1 with low affinity as the tested concentration was relatively low. However, at the concentration of 200 μM, competing substrates with an affinity comparable to that of citrate could inhibit SLC35G1-specific citrate uptake by about 30%. Therefore, it is likely that the compounds that did not exhibit significant effect have no affinity or at least lower affinity than citrate to SLC35G1. Further studies should explore a broader range of concentrations for potential substrates including those with lower affinity. It would help clarify the substrate recognition characteristics of SLC35G1 and if it indeed has a unique preference for citrate over dicarboxylates. We have additionally mentioned that on p. 3, line 32 – 35.

      (ii) The authors have used MDCK cells for assessment of the transcellular transfer of citrate via SLC35G1, but it is not clear whether this cell line expresses NaDC1 in the apical membrane as the enterocytes do. Even though the authors expressed SLC35G1 ectopically in MDCK cells and showed that the transporter localizes to the basolateral membrane, the question as to how citrate actually enters the apical membrane for SLC35G1 in the other membrane to work remains unanswered.

      Thank you for highlighting this important aspect of our study. The mechanism of apical citrate entry in MDCKII cells is unknown, although NaDC1 or a similar transporter may be involved. However, this set of experiments have successfully demonstrated the basolateral localization of SLC35G1 and its operation for citrate efflux. Attempts to clarify the apical entry mechanism may need to be included in future studies for more detailed characterization of the model system using MDCKII cells. This would help in fully understanding the transcellular transport system for citrate. Investigation using Caco-2 cells or MDCKII cells double transfected with NaDC1 and SLC35G1 would also need to be induced in future studies to gain more definitive insights into the transcellular transport mechanism for citrate in the intestine, delineating the suggested cooperative role of NaDC1 and SLC35G1. We would be grateful for your understanding of our handling regarding this issue.

      (iii) There is one other transporter that has already been identified for the efflux of citrate in some cell types in the literature (SLC62A1, PLoS Genetics; 10.1371/journal.pgen.1008884), but no mention of this transporter has been made in the current manuscript.

      Thank you for bringing up the relevance of SLC62A1, which has recently been identified as a citrate efflux transporter in some cell types (PLoS Genet, 16, e1008884, 2020). We have now included comments on this transporter in Introduction (p. 2).

      Reviewer #3 (Public Review):

      Summary:

      Mimura et al describe the discovery of the orphan transporter SLC35G1 as a citrate transporter in the small intestine. Using a combination of cellular transport assays, they show that SLC35G1 can mediate citrate transport in small intestinal cell lines. Furthermore, they investigate its expression and localization in both human tissue and cell lines. Limited evidence exists to date on both SLC35G1 and citrate uptake in the small intestine, therefore this study is an important contribution to both fields. However, the main claims by the authors are only partially supported by experimental evidence.

      Strengths:

      The authors convincingly show that SLC35G1 mediates uptake of citrate which is dependent on pH and chloride concentration. Putting their initial findings in a physiological context, they present human tissue expression data of SLC35G. Their Transwell assay indicates that SLC35G1 is a citrate exporter at the basolateral membrane.

      Weaknesses:

      Further confirmation and clarification are required to claim that the SLC indeed exports citrate at the basolateral membrane as concluded by the authors. Most experiments measure citrate uptake, but the authors state that SLC35G1 is an exporter, mostly based on the lack of uptake at physiological conditions faced at the basolateral side. The Transwell assay in Figure 1L is the only evidence that it indeed is an exporter. However, in this experiment, the applied chloride concentration was not according to the proposed model (120 mM at the basolateral side). The Transwell assay, or a similar assay measuring export instead of import, should be carried out in knockdown cells to prove that the export indeed occurs through SLC35G1 and not through an indirect effect. Related to the mentioned chloride sensitivity, it is unclear how the proposed model works if the SLC faces high chloride conditions under physiological conditions though it is inhibited by chloride.

      Thank you for highlighting these important points. We used the Cl--rich medium in transcellular transport studies, as stated in the relevant section in Meterials and Methods (p. 6, line 2 – 5). The Cl- concentration (144 mM) was comparable to the physiological concentration in extracellular body fluids. To clarify that experimental condition, we have additionally noted that in the text (p. 4, line 9) and the legends of Figs. 1K and 1L. The results indicate that basolaterally localized SLC35G1 can mediate citrate export effectively under the Cl--rich extracellular condition. The transport mechanism regulated by Cl- is unclear, but it is difficult to further clarify the mechanism at this time. We recognize the importance of further investigating the aspect in future studies, including the possibility that SLC35G1 might be a citrate/Cl- exchanger, as pointed out by Reviewer #1 (3rd comment).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The figures are very tiny and difficult to see. The inset in Figure 1C is much too small to be readable. I suggest enlarging the panels.

      Thank you for your feedback. As advised, we have enlarged the panels to improve visibility.

      Line 74: "certain anionic compounds signficantly inhibited SLC35G1-specific citrate uptake, indicating they are also recognized by SLC35G1." This sentence should be reworded since the mechanism is not clear. The word "reduced" would be a better option than "inhibited." Are there other interpretations besides SLC35G1 binding to explain the observations?

      Thank you for your suggestion. We have reworded the sentence to improve clarity (p. 3, line 30). It may be possible to speculate that they interact with SLC35G1, but the mechanisms are not clear yet.

      The manuscript is vague about how the transporter was discovered. If a screen of orphan transporters was performed to identify a citrate transporter, this should be described.

      Thank you for pointing out the need for more details regarding the discovery of the transporter. We have added some detailed description at the beginning of Results and Discussion (p. 3).

      Reviewer #2 (Recommendations For The Authors):

      Recommendations for the authors:

      (1) For transcellular transport of citrate and the role of SLC35G1, it would be better to use Caco-2 cells cultured on Transwells because these cells express NaDC1 in the apical membrane and the authors have shown that SLC35G1 is expressed in the basolateral membrane in this cell line. The mechanism for the entry of citrate into MDCK cells used in the present manuscript is not known. If the authors prefer to use MDCK cells because of their superior use for polarization, they can use a double transfection (NaDC1 and SLC35G1) to differentially express the two transporters in the apical versus and basolateral membrane and then use the cells for trans cellular transport of citrate.

      Please refer to our reply to your second review comment.

      (2) The substrate specificity experiments should use concentrations higher than 0.2 mM for competing dicarboxylates because the Km for citrate is only 0.5 mM. It is likely that NaDC1 brings in citrate and other dicarboxylates into enterocytes and then SLC35G1 mediates the efflux of these metabolic intermediates into blood.

      Please refer to our reply to your first review comment.

      (3) One major aspect of the transport function of this newly discovered citrate efflux transporter that has not been explored is the role of membrane potential in the transport function. The transporter is not coupled to Na or K or even H; so then the transport of citrate via this transporter must be electrogenic. Of course, this would be perfect for the transporter to function in the efflux of citrate because of the inside-negative membrane potential, but the authors need to show that the transporter is electrogenic. This can be examined through Caco-2 cells and/or MDCK cells expressing SLC35G1 and examining the impact of changes in membrane potential (valinomycin and K) on the transport of citrate.

      Thank you for your suggestion. As shown in Figure 1D, the use of K-gluconate in place of Na-gluconate, which induces plasma membrane depolarization, had no impact on the specific uptake of citrate, suggesting that SLC35G1-mediated citrate transport is independent of membrane potential. We have additionally mentioned this on p. 3 (line 21 – 24).

      (4) The localization studies mention Na/K ATPase component as a basolateral membrane marker, but the text describes it as BCRP. This needs to be corrected.

      Thank you for pointing out the mistake. We have corrected that. The marker was ATP1A1.

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      (1) Most experiments measure citrate uptake, but the authors state that SLC35G1 is an exporter, mostly based on the lack of uptake at physiological conditions faced at the basolateral side. The Transwell assay in Figure 1L is the only evidence that it indeed is an exporter. However, in this experiment, the applied chloride concentration was not according to the proposed model (120mM at basolateral side). Why was this chloride concentration not mimicked accordingly in the Transwell assay?

      (2) The Transwell assay, or a similar assay measuring export instead of import, should be carried out in knockdown cells to prove that the export indeed occurs through SLC35G1 and not through an indirect effect.

      (3) Related to the mentioned chloride sensitivity, it is unclear how the proposed model works if the SLC faces high chloride conditions under physiological conditions though it is inhibited by chloride.

      Please refer to our reply to your review comments.

      Related to the localization of SLC35G1:

      (4) The polyclonal antibody against SLC35G1 should be validated to prove the specificity. This should be relatively straightforward given the authors have SLC35G1 knockdown cells.

      Thank you for your suggestion. To validate the specificity of the polyclonal antibody against SLC35G1, we prepared HEK293 cells transiently expressing SLC35G1 and SLC35G1 tagged with a FLAG epitope at the C-terminus (SLC35G1-FLAG). In the immunostained images, whereas only SLC35G1-FLAG was stained with the anti-FLAG antibody, both SLC35G1 and SLC35G1-FLAG were stained with the anti-SLC35G1 antibody, indicating that the anti-SLC35G1 antibody can recognize SLC35G1. In addition, the localization patterns of SLC35G1-FLAG observed with both antibodies were consistent, indicating furthermore that the anti-SLC35G1 antibody can recognize SLC35G1 specifically. Based on all these, the specificity of the anti-SLC35G1 antibody was validated.

      Author response image 1.

      (5) To strengthen the data on the localization of SLC35G1, the cell lines should be co-stained with a plasma membrane marker as well, not just in tissue with ATP1A1. In polarized cells co-staining with apical and basolateral markers should be applied.

      SLC35G1 was indicated to be localized to the basolateral membrane geometrically in both polarized MDCKII and Caco-2 cells. This finding aligns with its basolateral localization indicated by its colocalization with ATP1A1 in the human small intestinal section. These results are we consider sufficient to support the basolateral localization characteristics of SLC35G1.

      General points:

      (6) In the abstract the authors mention that they focus on highly expressed orphan transporters in the small intestine as candidates. However, no other candidates are mentioned or discussed in the study. Consequently, this should be rephrased.

      Thank you for the advice. Also taking into consideration the third recommendation point by Reviewer #1, we have added some detailed description at the beginning of Results and Discussion (p. 3).

      (7) As far as mentioned there is exactly one (other) publication on SLC35G1 (10.1073/pnas.1117231108). The authors should discuss this only publication with functional data on SLC35G1 in more detail. How do the authors integrate their findings with the existing knowledge? For example, why did the authors not investigate the impact of Ca2+ on SLC35G1 transport?

      Thank you for your suggestion. SLC35G1 was indicated to be mainly localized to the endoplasmic reticulum (ER) in the earlier study, in which SLC35G1 was tagged with GFP. A possibility is that SLC35G1 was wrongly directed to ER due to the modulation in the study. We have additionally mentioned this possibility in the relevant section (p. 3, line 9 – 11). We have also revised a relevant sentence on p. 3 (line 5).

      With regard to another point that GFP-tagged SLC35G1 was indicated to interact with STIM1, we examined its effect on SLC35G1-mediated citrate uptake supplementary. As shown in the accompanying figure, coexpression of HA-tagged STIM1 did not affect the elevated citrate uptake induced by FLAG-tagged SLC35G1, indicating that STIM1 has no impact on citrate transport function of SLC35G1 at the plasma membrane.

      Author response image 2.

      (A) Effect of the coexpression of HA-tagged STIM1 on [14C]citrate (1 μM) uptake by FLAG-tagged SLC35G1 transiently expressed in HEK293 cells. The uptake was evaluated for 10 min at pH 5.5 and 37°C. Data represent the mean ± SD of three biological replicates. Statistical differences were assessed using ANOVA followed by Dunnett’s test. *, p < 0.05 compared with the control (gray bar). (B) Western blot analysis was conducted by probing for the HA and FLAG tags, using the whole-cell lysate samples (10 µg protein aliquots) prepared from cells expressing HA-STIM1 and/or FLAG-SLC35G1. The blots of β-actin are shown for reference.

      (8) Generally, the introduction could provide more background.

      In response to your suggestion and also to the third review comment from Reviewer #2, we have now additionally included comments on SLC62A1, which has recently been reported as a citrate efflux transporter in some cell types, in Introduction.

      Minor points:

      (9) There is a typo in Figure 1D: manniotol instead of mannitol.

      Thank you for pointing that out. We have corrected the typo in Figure 1D.

      (10) Figure 1J: The resolution is low and the localization to the basolateral membrane is not conclusive based on this image. It seems rather localized at the whole membrane and intracellularly too.

      Thank you for your feedback. We have enhanced the resolution of the image and also enlarged it to improve clarity and make the basolateral membrane localization more discernible.

      (11) Figure 1K: Clarification is needed if the experiment was performed in the Transwell plate. Based on the results from the pH titration experiment, it is expected that there is no uptake at pH7.4. Therefore, this experiment does not seem to provide additional evidence or support the conclusions drawn related to cellular polarization.

      Please refer to our reply to your review comments.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This article by Navratna et al. reports the first structure of human HGSNAT in an acetyl-CoA-bound state. Through careful structural analysis, the authors propose potential reasons why certain human mutations lead to lysosomal storage disorders and outline a catalytic mechanism. The structural data are of good quality, and the manuscript is clearly written. This study represents an important step toward understanding the mechanism of HGSNAT and is valuable to the field. I have the following suggestions:

      (1) The authors should characterize whether the purified protein is active. Otherwise, how does one know if the detergent used maintains the protein in a biologically relevant state? The authors should at least attempt to do so. If these prove to be challenging, at the very least, the authors should try a cell-based assay to demonstrate that the GFP tag does not interfere with the function.

      We have addressed these concerns in the revised version and mentioned these efforts in our previous response letter. We’re briefly mentioning them here again. We attempted measuring HGSNAT catalyzed reaction by monitoring the decrease in acetyl-CoA in the presence of D-glucosamine (acetyl group acceptor) using a coupled enzyme acetyl-CoA assay kit from SIGMA (MAK039) that converts acetyl-CoA to a fluorescent product measurable at Ex/Em of 535/587 nm. We noticed a decrease in the level of acetyl-CoA (gray) upon the addition of HGSNAT (red) (Rebuttal figure 1).

      Author response image 1.

      Acetyl-CoA levels in absence and presence of HGSNAT purified in digitonin. Decrease in the levels of 10 M acetyl-CoA was measured in presence of 10 M D-glucosamine and 30 nM HGSNAT at pH 7.5.

      While optimizing the assay, Xu et al. (2024, Nat Struct Mol Biol) published structural and biochemical characterization of HGSNAT, showing that detergent-purified HGSNAT is active. In addition, we have shown by cryo-EM that GFP-tagged HGSNAT that we purified in detergent was already bound to the endogenous substrate ACO, an observation that has been observed by Xu et al., as well. Finally, we performed LC-MS on GFP-tagged HGSNAT purified in detergent to detect bound ACO, which could be further removed by dialysis. These results have been included in Figure S9. The endogenous binding of ACO to HGSNAT in detergent suggests that neither the tag nor detergent are detrimental to the function.

      (2) In Figure 5, the authors present a detailed schematic of the catalytic cycle, which I find to be too speculative. There is no evidence to suggest that this enzyme undergoes isomerization, similar to a transporter, between open-to-lumen and open-to-cytosol states. Could it not simply involve some movements of side chains to complete the acetyl transfer?

      We have already changed this figure in our latest submission. Perhaps the changes made were not obvious while reviewing. We agreed with this reviewer that the enzyme could likely achieve catalysis by simple side chain movements without undergoing extensive isomerization steps, as depicted in Figure 5. In the absence of data supporting large movements during the acetyl transfer reaction, old Figure 5 appeared speculative. Hence, we have edited Figure 5 in the revised version of the manuscript based on the observations we made in this study, and different states shown in the figure do not show any conformational changes and only depict acetyl transfer.

      Reviewer #2 (Public Review):

      Summary:

      This work describes the structure of Heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT), a lysosomal membrane protein that catalyzes the acetylation reaction of the terminal alpha-D-glucosamine group required for degradation of heparan sulfate (HS). HS degradation takes place during the degradation of the extracellular matrix, a process required for restructuring tissue architecture, regulation of cellular function and differentiation. During this process, HS is degraded into monosaccharides and free sulfate in lysosomes.

      HGSNAT catalyzes the transfer of the acetyl group from acetyl-CoA to the terminal non-reducing amino group of alpha-D-glucosamine. The molecular mechanism by which this process occur has not been described so far. One of the main reasons to study the mechanism of HGSNAT is that multiple mutations spanning the entire sequence of the protein, such as, nonsense mutations, splice-site variants, and missense mutations lead to dysfunction that causes abnormal accumulation of HS within the lysosomes. This accumulation is a cause of mucopolysaccharidosis IIIC (MPS IIIC), an autosomal recessive neurodegenerative lysosomal storage disorder, for which there are no approved drugs or treatment strategies.

      This paper provides a 3.26A structure of HGSNAT, determined by single-particle cryo-EM. The structure reveals that HGSNAT is a dimer in detergent micelles, and a density assigned to acetyl-CoA. The authors speculate about the molecular mechanism of the acetylation reaction, map the mutations known to cause MPS IIIC on the structure and speculate about the nature of the HGSNAT disfunction caused by such mutations.

      Strengths:

      The paper describes a structure of HGSNAT a member of the transmembrane acyl transferase (TmAT) superfamily. The high-resolution of a HGSNAT bound to acetyl-CoA is important for our understanding of HGSNAT mechanism. The density map is of high-quality, except for the luminal domain. The location of the acetyl-CoA allows speculation about the mechanistic role of multiple residues surrounding this molecule. The authors thoroughly describe the architecture of HGSNAT and map the mutations leading to MPS IIIC.

      Reviewer #3 (Public Review):

      Summary:

      Navratna et al. have solved the first structure of a transmembrane N-acetyltransferase (TNAT), resolving the architecture of human heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT) in the acetyl-CoA bound state using single particle cryo-electron microscopy (cryoEM). They show that the protein is a dimer, and define the architecture of the alpha- and beta-GSNAT fragments, as well as convincingly characterizing the binding site of acetyl-CoA.

      Strengths:

      This is the first structure of any member of the transmembrane acyl transferase superfamily, and as such it provides important insights into the architecture and acetyl-CoA binding site of this class of enzymes.

      The structural data is of a high quality, with an isotropic cryoEM density map at 3.3Å facilitating building of a high-confidence atomic model. Importantly, the density for the acetyl-CoA ligand is particularly well-defined, as are the contacting residues within the transmembrane domain.

      The structure of HSGNAT presented here will undoubtedly lay the groundwork for future structural and functional characterization of the reaction cycle of this class of enzymes.

      Weaknesses:

      While the structural data for the state presented in this work is very convincing, and clearly defines the binding site of acetyl-CoA, to get a complete picture of the enzymatic mechanism of this family, additional structures of other states will be required.

      A weakness of the study is the lack of functional validation. The enzymatic activity of the enzyme characterized was not measured, and the enzyme lacks native proteolytic processing, so it is a little unclear whether the structure represents an active enzyme.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      In the response to reviewers, the authors mention revised coordinates, but the revised coordinates provided to this reviewer do not reflect the stated changes (I assume a technical error somewhere)

      Perhaps, the old coordinates in the deposition system were resubmitted with the revised draft. Nevertheless, we have made the changes suggested by this reviewer to structure in the previous round and have released the new coordinates (PDB ID: 8TU9).

      Is there any evidence for the interprotomer disulfide except for the map? e.g. if it is a disulfide-linked dimer, one should see a shift in mobility on non-reducing vs reducing SDS-PAGE. Without this, the evidence from the map is not conclusive - while the symmetry-related cysteines are nearby to one another, based on the map I could argue that they could just as well be modeled with the cys sidechains reduced and pointing away from one another.

      In addition to building the density based on cryo-EM maps, we have performed FSEC-based thermal melt analysis of the Ala mutation of C334 that is involved in disulfide at the dimer interface. C334A is still expressed as a dimer, suggesting that C334A is not the only residue stabilizing the dimer. Upon heating the detergent-solubilized protein, we noticed that the FSEC peak for C334A shows a monomeric HGSNAT (Figure 4-Figure supplement 1 in main manuscript). We hypothesize that in the absence of C334 disulfide, the extensive hydrophobic side-chain interaction network displayed in Figure 2C is responsible for maintaining the integrity of the dimer. Heating disturbs these non-disulfide interactions, thereby rendering the protein monomer. We have also performed PAGE analysis as suggested by this reviewer and noticed that reducing conditions result in a monomeric protein band (Rebuttal figure 2). While we were revising this manuscript, two other groups published structures of HGSNAT (Xu et al., 2024, Nat. Struct Mol Biol, and Zhao et al., 2024, Nat. Comm). These groups have also identified this disulfide at the dimer interface in their HGSNAT structures. Zhao et al. showed that this disulfide is not crucial for dimerization and also suggested that it can break depending on the conformation of HGSNAT. Our FSEC results agree with this observation.

      Author response image 2.

      Comparison of purified HGSNAT on native and reducing SDS-PAGE. The arrows on both the gels indicate N-GFP-HGSNAT. The two bands on the SDS PAGE are, perhaps, two differentially glycosylated forms of HGSNAT.


      The following is the authors’ response to the original reviews.

      (1) The authors should characterize whether the purified protein is active. Otherwise, how does one know if the detergent used maintains the protein in a biologically relevant state? The authors should at least attempt to do so. If these prove to be challenging, at the very least, the authors should try a cell-based assay to demonstrate that the GFP tag does not interfere with the function. The authors would need to establish an in vitro assay using purified protein and assess the level of Acetyl-CoA in the reaction (there are commercial kits and a long list of literature showing how to measure this). They could also follow the HS acetylation reaction by e.g. HPLC-MS or NMR (among other methods).

      The cryo-EM sample was prepared without the exogenous addition of ligand, as noted in the manuscript. However, we see that acetyl-CoA was intrinsically bound to the protein, indicating the ability of GFP-tagged HGSNAT protein to bind the ligand. Upon dialysis, we see release of acetyl-CoA from the protein, which we have confirmed by LC-MS analysis (Fig S9). We purified the protein at a pH optimal for acetyl-CoA binding, as suggested by Bame, K. J. and Rome, L. H. (1985) and Meikle, P. J. et al., (1995). Because we see acetyl-CoA in a structure obtained using a GFP fusion, we argue that GFP does not interfere with protein stability and ability to bind to the co-substrate. As demonstrated by existing literature HGSNAT catalyzed reaction is compartmentalized spatially and conditionally. The binding of acetyl-CoA happens towards the cytosol and is optimal at pH 7-0.8.0, while the transfer of the acetyl group to heparan sulfate occurs towards the luminal side and is optimal at pH 5.0-6.0. We attempted measuring HGSNAT catalyzed reaction by monitoring decrease in acetyl-CoA in presence of D-glucosamine (acetyl group acceptor) using a coupled enzyme acetyl-CoA assay kit from SIGMA (MAK039) that converts acetyl-CoA to a fluorescent product measurable at Ex/Em of 535/587 nm. We noticed a decrease in the level of acetyl-CoA in the presence of HGSNAT-ACO complex (blue) and apo HGSNAT (red); the difference compared to the ACO standard (gray) was not significant. While optimizing the assay, Xu et al. (2024, Nat Struct Mol Biol) published structural and biochemical characterization of HGSNAT, showing that detergent-purified HGSNAT is active.

      Author response image 3.

      Acetyl-CoA levels in absence and presence of HGSNAT purified in digitonin. Decrease in the levels of 10 mM acetyl-CoA was measured in presence of 10 mM D-glucosamine and 30 nM HGSNAT at pH 7.5.

      (2) In Figure 5, the authors present a detailed schematic of the catalytic cycle, which I find to be too speculative. There is no evidence to suggest that this enzyme undergoes isomerization, similar to a transporter, between open-to-lumen and open-to-cytosol states. Could it not simply involve some movements of side chains to complete the acetyl transfer? The speculative nature of this assumption needs to be clearly acknowledged throughout the manuscript and discussed in more detail. The authors could use HDX-MS or introduce cysteine residues in the hypothetical inward- and outward-facing cavities and test accessibility by incubating the purified protein with maleimides or other agents reacting with free cysteine.

      We thank the reviewers for this insightful critique. Yes, the enzyme could likely achieve catalysis by simple side chain movements without undergoing extensive isomerization steps, as depicted in Figure 5. We also agree with the reviewer that HDX-MS could be the best way to monitor the substrate-induced conformational dynamics within HGSNAT experimentally. In the absence of data supporting large movements during the acetyl transfer reaction, figure 5 is speculative. We have now edited Figure 5 in the revised version of the manuscript based on the observations we made in this study.

      (3) The acetyl-CoA-bound state is described as the open-to-lumen state. Indeed, from Figure 1C, the lumen opening appears much larger than the cytosol opening. Is there any small tunnel that connects the substrate site to the cytosol? In other words, is this state accessible to both the lumen and the cytosol, albeit with a larger opening toward the lumen? This question arises because, in Figure S5, the tunnel calculated by MOLE seems to also connect to the cytosol.

      Yes, it is likely that the ACOS is accessible via lumen and cytosol to varying degrees, as evidenced by MOLE prediction. However, binding of the bulky nucleoside head group of acetyl-CoA at ACOS blocks the cytosolic entrance in the confirmation discussed in this manuscript. MOLE prediction was performed on a structure devoid of acetyl-CoA, and it is possible that the protein doesn’t essentially undergo isomerization between open-to-lumen and open-to-cytosol confirmations during acetyl transfer. Likely, ACOS is always accessible from both the lumen and cytosol, but depending on the substrates or products bound, the accessibility could be limited to either the lysosomal lumen or cytosol. We have rewritten all the statements mentioning an open-to-lumen confirmation to reflect this argument.

      (4) The authors state, "Interestingly, in most of the detergent conditions we tested, HGSNAT was predominantly dimeric (Fig S1C-H)," and also mention, "In all the detergents we tested, HGSNAT eluted as a dimer, a testament to the extensive side-chain interaction network." The dimerization is said to be mediated by a disulfide bond. I would be surprised if the detergents the authors tested could break a disulfide bond. Therefore, can this observation truly serve as a testament to an "extensive" side-chain interaction network?

      We agree with the reviewer that detergents are unlikely to break a disulfide bond. To address this comment, we generated a C334A mutant of HGSNAT and extracted it from cells in 1% digitonin. It is still expressed as a dimer (Fig S8E). However, upon heating the detergent solubilized protein, we noticed that the FSEC peak for C334A shows a monomeric HGSNAT (Fig S8I and S8K). We hypothesize that in the absence of C334 disulfide, the extensive hydrophobic side-chain interaction network displayed in Figure 2C is responsible for maintaining the integrity of the dimer. Heating disturbs these non-disulfide interactions, thereby rendering the protein monomer.

      (5) Apart from the cryo-EM structure, the article does not provide any other experimental evidence to support or explain a molecular mechanism. Due to the complete absence of functional assays, mutagenesis analysis, or other structures such as a ternary complex or an acetylated enzyme intermediate, the mechanistic model depicted in Figure 5 should be taken with caution. This uncertainty needs to be clearly described in the manuscript text. Performing additional mutagenesis experiments to test key hypotheses, or further discussing relevant data from the literature, would strengthen the manuscript.

      We agree with the reviewer on the lack of supporting evidence for the mechanistic models proposed in Fig 5. They were made based on previously reported biochemical characterization of HGSNAT by Rome & Crain (1981), Rome et al. (1983), Miekle et al. (1995), and Fan et al. (2011). However, we agree with the reviewer that this schematic is not experimentally proven and is speculative at best. We have edited Figure 5 in the revised version of the manuscript. In addition, we have also performed mutagenesis analysis to study the stability of mutants (Fig S8) and performed LC-MS analysis to identify endogenously bound acetyl-CoA (Fig S9) to strengthen parts of the manuscript. We have discussed our findings in the results and modified the discussion according to these suggestions.

      (6) It is discussed that H269 is an essential residue that participates in the acetylation reaction, possibly becoming acetylated during the process. However, there is no solid experimental evidence, e.g. mutagenesis analysis or structural analysis, in this or previous articles, that demonstrates this to be the case. Providing more information, ideally involving additional experimental work, would strengthen this aspect of the mechanism that is proposed. This would require establishing an in vitro assay, as described in 1).

      H269, as a crucial catalytic residue, was suggested by monitoring the effect of chemical modifications of amino acids on acetylation of HGSNAT membranes by Bame, K. J. and Rome, L. H. (1986). We generated N258I and H269A mutants of HGSNAT and analyzed their stability. We noticed a greater destabilization in N258I compared to H269A (Fig S8). We believe this is because of the loss of ability to bind acetyl-CoA, as the TMs around a catalytic core of the protein in our cryo-EM structure were stabilized by interactions with acetyl-CoA. Recently, Xu et al. (2024, Nat Struct Mol Biol) suggested that they do not observe acetylated histidine in their structure. However, our structure and that reported by Xu et al. (2024) are obtained at cytosolic pH. Perhaps, acetylation of H269 occurs at acidic lysosomal pH. Extensive structural and catalytic investigation of HGSNAT at low pH is required to rule out H269 acetylation as a step in the HGSNAT catalyzed reaction.

      (7) In the discussion part, the authors mention previous studies in which it was postulated that the catalytic reaction can be described by a random order mechanistic model or a Ping Pong Bi Bi model. However, the authors leave open the question of which of these mechanisms best describes the acetylation reaction. The structure presented here does not provide evidence that could support one mechanism or the other. The authors could explore if an in vitro experimental measurement of protein activity would provide any information in this regard.

      We agree with the reviewer that a more detailed kinetic analysis is necessary to define the bisubstrate reaction mechanism of HGSNAT. All the existing structural data on two isoforms of HGSNAT is obtained at basic pH. As a result, the existing structures do not unambiguously demonstrate the bisusbtrate mechanism of HGSNAT. We believe low pH structural characterization and a detailed kinetic and structural characterization of HGSNAT in membrane mimetics like nanodiscs could provide more insights into the mechanism. However, these studies are a future undertaking and are not a part of this manuscript.

      (8) Although the authors map the mutations leading to MPS IIIC on the structure and use FoldX software to predict the impact of these mutations on folding and fold stability, there is no experimental evidence to support FoldX's predictions. It would be ideal if an additional test for these predictions were included in the manuscript. The authors could follow the unfolding of purified mutants by SEC, FSEC, or changes in intrinsic fluorescence to assess protein stability.

      As suggested here, we prepared HGSNAT MPSIIIC variants and tested their expression and stability (please see Fig S8). These results have been included in the revised version of the manuscript.

      (9) Some sidechains that have quite strong sidechain density are missing atoms. I would be particularly careful with omitting sidechains that pack in the hydrophobic core, as this can tend to artificially reduce the clash score. Check F81, L62, P91 and V87, for example.

      We have revisited the modeling of these regions and deposited new coordinates.

      (10) W316 seems to have the wrong rotamer.

      This has been corrected in the new coordinate file that has been released.

      (11) N134 and N433 seem to have extra density. Are these known glycosylation sites?

      As per Hrebicek M. et al., 2006 and Feldhammer M. et al., 2009, there are five predicted glycosylation sites: N66, N114, N134, N433, and N602. However, we see evidence for NAG density at N114, N134, and N433. These have now been modeled in the structure.

      (12) At the C-terminal residue (Ile-635), the very C-terminal carboxylate is modeled pointing to a hydrophobic environment. It seems more likely to me that the Ile sidechain is packing here, with the C-terminal carboxylate facing the solvent.

      Thank you for pointing this out. We have edited the orientation of the Ile sidechain accordingly.

      Presentation and wording of results/methods:

      - Figure S3 legend "At places with missing density, the side chains were trimmed to C- alpha" - this is incorrect, I think the authors mean C-beta.

      We have corrected this error in the revised version of the manuscript.

      - Figure S3 legend - the authors refer to a gray mesh, where a transparent surface is displayed.

      Thanks for pointing this error out. We have corrected this in the revised version.

      - Some colloquial/vague wording in the main text (a lot of sentences starting with "Interestingly, ...". Making the wording more specific would help the reader I think.

      We have edited out ‘interestingly’ from the document and have re-written parts of the manuscript, per reviewers’ suggestion, for brevity.

      - Figure S2 legend, "throughout the processing workflow the resolution of luminal domain was used as a guidepost" - it is not entirely clear to me what this means in this context, perhaps revise the wording?

      We have rephrased this line in the revised draft of the manuscript.

      - Figure S2 and methods, Local refinements of LD and TMD are mentioned, but not indicated on the processing workflow.

      We have included a new Fig S2 & edited the legend, including these changes, per the reviewers’ suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      I will summarize my comments and suggestions below.

      (1) Abstract:

      "Non-catalytic (pseudo)kinase signaling mechanisms have been described in metazoans, but information is scarce for plants." To the best of my understanding EFR is an active protein kinase in vitro and in vivo and cannot be considered a pseudokinase. Consider rephrasing.

      We rephrased to: “Non-catalytic signaling mechanisms of protein kinase domains have been described in metazoans, but information is scarce for plants.”

      (2) Page 4: It should be noted, that while membrane associated Rap-RiD systems have been used in planta to activate receptor kinase intracellular domains by promoting interaction with a co-receptor kinase domain, this system does not resemble the actual activation mechanism in the plasma membrane. This would be worth discussing when introducing the system. For example, the first substrates of the RK signaling complex may also be membrane associated and not freely diffuse in solution, which may be important for enzyme-substrate interaction.

      We inserted on page 4: “The RiD system was previously applied in planta, maintaining membrane-association by N-terminal myristoylation (Kim et al., 2021). For the in vitro experiments, the myristoylation sites were excluded to facilitate the production of recombinant protein.”

      (3) Page 4 and Fig 1: The catalytic Asp in BRI1 is D1027 and not D1009 (https://pubmed.ncbi.nlm.nih.gov/21289069/). Please check and prepare the correct mutant protein if needed.

      We clarified this in the text by stating that we mutated the HRD-aspartate to asparagine in all our catalytic-dead mutants: “Kinase-dead variants with the catalytic residue (HRD-aspartate) replaced by asparagine (EFRD849N and BRI1D1009N), had distinct effects […]”. D1027 in BRI1 is the DFG-Asp, which was not mutated in our study.

      (4) Page 4 and Fig 1: Is BIK1 a known component of the BR signaling pathway and a direct BRI1 substrate? Or in other words how specific is the trans-phosphorylation assay? In my opinion, a more suitable substrate for BRI1/BAK1 would be BSK1 or BSK3 (for example https://pubmed.ncbi.nlm.nih.gov/30615605/).

      Kinase-dead BIK1 is a reported substrate of BRI1. We clarified this in the results section by inserting: “BIK1 was chosen as it is reported substrate of both, EFR/BAK1 and BRI1/BAK1 complexes (Lin et al., 2013).”

      (5) Fig. 1B Why is BIK1 D202N partially phosphorylated in the absence of Rap? I would suggest to add control lanes showing BRI1, EFR, FLS2, BAK1 and BIK1 in isolation. Given that a nice in vitro activation system with purified components is available, why not compare the different enzyme kinetics rather than band intensities at only 1 enzyme : substrate ratio?

      BIK1 D202N is partially phosphorylated due to the presence of active BAK1 that is capable of transphosphorylating BIK1 D202N as it has been reported in a previous study: (DOI: 10.1038/s41586-018-0471-x).

      (6) Page 4 and Fig 1: Is the kinase dead variant of EFR indeed kinase dead? I could still see a decent autorad signal for this mutant when expressed in E. coli (Fig 1 A in Bender et al., 2021; https://pubmed.ncbi.nlm.nih.gov/34531323/)? If this mutant is not completely inactive, could this change the interpretation of the experiments performed with the mutant protein in vitro and in planta in the current manuscript? In my opinion, it could be possible that a partially active EFR mutant can be further activated by BAK1, and in turn can phosphorylate BIK1 D202N. The differences in autorad signal for BRI1D1009?N and EFRD849N is very small, and the entire mechanism hinges on this difference.

      We would like to emphasize that the mechanism hinges on the difference between non-dimerized and dimerized kinase domains in the in vitro kinase assay. BRI1 D1009N fails to enhance BIK1 D202N trans-phosphorylation compared to the non-dimerized sample, while EFR D849N is still capable of enhancing BIK1 transphosphorylation upon dimerization as indicated by quantification of autorads (Figure 1B/C). We have also addressed this point in a section on the limitations of our study.

      (7) Fig 1B. "Our findings therefore support the hypothesis that EFR increases BIK1 phosphorylation by allosterically activating the BAK1 kinase domain." To the best of my understanding presence of wild-type EFR in the EFR-BAK1 signaling complex leads to much better phosphorylation of BIK1D202N when compared to the EFRD849N mutant. How does that support the allosteric mechanism? By assuming that the D849N mutant is in an inactive conformation and fully catalytically inactive (see above)? Again, I think the data could also be interpreted in such a way that the small difference in autorad signal for BIK1 between BRI1 inactive (but see above) and ERF inactive are due to EFR not being completely kinase dead (see above), rather than EFR being an allosteric regulator. To clarify this point I would suggest to a) perform quantitative auto- and trans-(generic substrate) phosphorylation assays with wt and D849N EFR to derive enzyme kinetic parameters, to (2) include the EFRD849 mutant in the HDX analysis and (3) to generate transgenic lines for EFRD489N/F761H/Y836F // EFRD489N/F761H/SSAA and compare them to the existing lines in Fig. 3.

      Mutations of proteins, especially those that require conformational plasticity for their function can have pleiotropic effects as the mutation may affect the conformational plasticity and consequently catalytic and non-catalytic functions that depend on the conformational plasticity. In such cases, it is difficult to fully untangle catalytic and non-catalytic functions. Coming back to EFR D849N, the D849N mutation may also impact the non-catalytic function by altering the conformational plasticity, explaining the difference observed in EFR vs EFR D849N. As you rightly suggested, HDX would be a way to address this but would still not clarify whether catalytic activity contributes to activation. We instead attempted to produce analog sensitive EFR variants for in vivo characterization of EFR-targeted catalytic inhibition. Unfortunately, we failed in producing an analog-sensitive variant for which we could show ATP-analog binding. To address your concern, we inserted a section on limitations of the study.

      (8) Fig. 2B,C, supplement 3 C,D. Has it been assessed if the different EFR versions were expressed to similar protein levels and still localized to the PM?

      Localization of the mutant receptors has not been explicitly evaluated by confocal microscopy. However, the selected mutation EFRF761H is shown to accumulate in stable Arabidopsis lines (Figure 3 – Supplement 1C) and BAK1 could be coIPed by all EFR variants upon elf18-treatment (Figure 3 B), indicating plasma membrane localization.

      (9) How the active-like conformation of EFR is in turn activating BAK1 is poorly characterized, but appears to be the main step in the activation of the receptor complex. Extending the HDX analyses to resting and Rap-activated receptor complexes could be a first step to address this question. I tried to come up with an experimental plan to test if indeed the kinase activity of BAK1 and not of EFR is essential for signal propagation, but this is a complex issue. You would need to be able to mimic an activated form of EFR (which you can), to make sure its inactive (possibly, see above) and likewise to engineer a catalytically inactive form of BAK1 in an active-like state (difficult). As such a decisive experiment is difficult to implement, I would suggest to discuss different possible interpretations of the existing data and alternative scenarios in the discussion section of the manuscript.

      We addressed your concern whether BAK1 kinase activity is essential for signaling propagation by pairing EFRF761H and BAK1D416N (Figure 4 Supplement 2 C) which fails to induce signaling. In this case, EFRF761H is in its activated conformation but cannot activate downstream signaling. We also attempted to address your concern by an in vitro kinase assay by pairing EFR and BAK1D416N and using a range of concentrations of the substrate BIK1D202N. We observed that catalytic activity of BAK1 but not EFR was essential for BIK1 phosphorylation. However, this experiment does not address whether activated EFR can efficiently propagate signaling in the absence of BAK1 catalytic activity. In the limitations of the study section, we now discuss the catalytic importance of EFR for signaling activation.

      Author response image 1.

      BIK1 trans-phosphorylation depends on BAK1 catalytic activity. Increasing concentrations of BIK1 D202N were used as substrate for Rap-induced dimers of EFR-BAK1, EFR D849N-BAK1, and EFR-BAK1 D416N respectively. BIK1 trans-phosphorylation depended on the catalytic activity of BAK1. Proteins were purified from E. coli λPP cells. Three experiments yielded similar results of which a representative is shown here.

      Reviewer #2:

      All of my suggestions are minor.

      Figure 1B, I think it would be more useful to readers to explain the amino acid in the D-N change, rather than just call it D-to-N? Also, please label the bands on the stained gel; the shift on FKBP-BRI1 and FKBP-EFR are noticeable on the Coomassie stain.

      We implemented your suggestions.

      Figure 1-Supplement 1. There is still a signal in pS612 BAK1 (it states 'also failed to induce BAK1 S612 phosphorylation' in the text, which is not quite correct). Also, could mention the gel shift seen in BAK1, which appears absent in Y836F.

      We corrected the text which now states: “To test whether the requirement for Y836 phosphorylation is similar, we immunoprecipitated EFR-GFP and EFRY836F-GFP from mock- or elf18-treated seedlings and probed co-immunoprecipitated BAK1 for S612 phosphorylation. EFRY836F also obstructed the induction of BAK1 S612 phosphorylation (Figure 1 – Supplement 1), indicating that EFRY836F and EFRSSAA impair receptor complex activation.” The gel shift of BAK1 you pointed out was not observed in replications and thus we prefer not to comment on it.

      Figure 2 and 3 are full of a, b, c,d's, which I don't understand. Sorry

      We used uppercase letters to indicate subpanels and lowercase letters to indicate the results of the statistical testing. In the figure caption, we have clarified that the lowercase letters refer to statistical comparisons.

      Figure 2 A. If each point on the x-axis is one amino acid, I think it would again be useful to name the amino acids that the gold or purple or blue colored lines extend through.

      Each point stands for a peptide which are sorted by position of their starting amino acid from N-terminus to C-terminus. We now added plots of HDX for individual peptides that correspond to the highlighted region in subpanel A.

      Figure Supplement 1 is very small for what it is trying to show, even on the printed page. If this residue were to be phosphorylated, what would happen to the H-bond?

      We suppose that VIa-Tyr phosphorylation would break the H-bond and causes displacement of the aC-b4 loop. Recent studies, published after our submission, highlight the importance of this loop for substrate coordination and ATP binding. Thus, phosphorylation of VIa-Tyr and displacing this loop may render the kinase rather unproductive. We have expanded the discussion to include this point.

      Figure 2B: Tyr 836 is not present in any of the alignments in Figure 2A. This should be rectified, because the text talks about the similarity to Tyr 156 in PKA.

      We have adjusted the alignments such that they now contain the VIa-Tyr residues of EFR and PKA.

      Figure 4D. Is there any particular reason that these Blots are so hard to compare or FKBP and BAK1?

      We assume it is referred to Figure 4 – Supplement 2 D. FKBP-EFR and FRB-BAK1 both are approximately the size of RubisCo, the most abundant protein in plant protein samples and which overlay the FKBP- and FRB-tagged kinase. Thus, it is difficult to detect these proteins.

      Reviewer #3:

      (1) The paper reporting the allosteric activation mechanism of EGFR should be cited.

      Will be included.

      (2)The authors showed that "Rap addition increased BIK1 D202N phosphorylation when the BRI1 or EFR kinase domains were dimerized with BAK1, but no such effect was observed with FLS2". Please explain why FLS2 failed to enhance BIK1 transphosphorylation by Rap treatment?

      Even though BIK1 is a reported downstream signaling component of FLS2/BAK1, it might be not the most relevant downstream signaling component and rather related RLCKs, like PBL1, might be better substrates for dimerized FLS2/BAK1. We haven’t tested this, however. Alternatively, the purified FLS2 kinase domain might be labile and quickly unfolds even though it was kept on ice until the start of the assay, or the N-terminal FKBP-tag may disrupt function. As the reason for our observation is not clear, we have removed FLS2 in vitro dimerization experiments from the manuscript.

      (3) Based solely on the data presented in Figure 1, it can be concluded that EFR's kinase activity is not required to facilitate BIK1 transphosphorylation. Therefore, the title of Figure 1, "EFR Allosterically Activates BAK1," may be inappropriate.

      We have changed the figure title to: “EFR facilitates BIK1 trans-phosphorylation by BAK1 non-catalytically.”

      (4) In Figure 1- Supplement 1, I could not find any bands in anti-GFP and anti-BAK1 pS612 of input. Please redo it.

      Indeed, we could not detect protein in the input samples of this experiment. BAK1 S612 phosphorylation is an activation mark and not necessarily expected to be abundant enough for detection in input samples. EFR-GFP, however, is usually detected in input samples and is reported in Macho et al. 2014 from which manuscript these lines come. Why EFR-GFP is not detected in this set of experiments is unclear but, in our opinion, does not detract from the conclusions drawn since similar amounts of EFR-GFP are pulled-down across all samples.

      (5) For Figure 2A, please mark the structure represented by each color directly in the figure.

      We have made the suggested change.

      (6) Please modify "EFRF761/Y836F and EFRF761H/SSAA restore BIK1 trans-phosphorylation" to "EFRF761H/Y836F and EFRF761H/SSAA restore BIK1 trans-phosphorylation".

      Thank you for spotting this. We changed it.

      (7) The HDX-MS analysis demonstrated that the EFR (Y836F) mutation inhibits the formation of the active-like conformation. Conversely, the EFR (F761H) mutation serves as a potent intragenic suppressor, significantly stabilizing the active-like conformation. Confirming through HDX-MS conformational testing that the EFR (Y836F F761H) double mutation does not hinder the formation of the active-like EFR kinase conformation would greatly strengthen the conclusions of the article.

      Response: We agree that this is beneficial, and we attempted to do it but failed to produce enough protein for HDX-MS analysis. We stated this now in an extra section of the paper (“Limitations of the study”).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Duan et al analyzed brain imaging data in UKBK and found a pattern in brain structure changes by aging. They identified two patterns and found links that can be differentiated by the categorization.

      Strengths:

      This discovery harbors a substantial impact on aging and brain structure and function.

      Weaknesses:

      (1) Therefore, the study requires more validation efforts. Most importantly, data underlying the stratification of the two groups are not obvious and lack further details. Can they also stratified by different methods? i.e. PCA?

      Response: Thanks for the comment. In this study, principal component analysis (PCA) was applied to individualized deviation of anatomic region of interest (ROI) for dimensionality reduction, which yielded the first 15 principal components explaining approximately 70% of the total variations for identifying longitudinal brain aging patterns. These two patterns can be stratified by both linear and non-linear dimensionality reduction methods: PCA and locally linear embedding (LLE)1. The grey matter volume (GMV) of 40 ROIs at baseline were linearly adjusted for sex, assessment center, handedness, ethnic, intracranial volume (ICV), and second-degree polynomial in age to be consistent with the whole-brain GMV trajectory model. There was a clear boundary between two patterns in the projected coordinate space, indicating distinct structural differences in brain aging between the two patterns (Author response image 1).

      Author response image 1.

      Stratification of the identified brain aging patterns using linear and non-linear dimensionality reduction methods. (a) The principal component space of PC1 and PC2, and (b) two-dimensional projected locally linear embedding space derived from brain volumetric measures. Points have been colored and shaped according to grouping labels of the brain aging patterns.

      (2) Are there any external data that can be used for validation?

      Response: Thanks for the comment. We were given access to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study, which aimed at determining the relationships between clinical, cognitive, imaging, genetic, and biochemical biomarkers across the entire spectrum of Alzheimer’s disease. ADNI recruits participants aged between 55 and 90 years at 57 sites in the United States and Canada, who undergo a series of initial tests that are repeated at intervals over subsequent years. 

      Unfortunately, there are no appropriate and sufficient data, especially clinical, cognitive, and genetic data, to support unbiased validation of the heterogeneity in structural brain aging patterns. Only 890 (31.83%) of the 2796 subjects included in the ADNI were cognitively normal, of which 656 were included in the analyses after quality control of structural MRI and exclusion of missing covariate, with a mean age at the screen visit of 70.8 years (SD = 6.48 years), and 60.21% of the subjects were female. Thus, there are significant differences between ADNI and UK Biobank in terms of the population composition, with ADNI collecting more older subjects due to its focus on defining the progression of Alzheimer’s disease.

      Moreover, among 656 subjects with structural imaging data, the dataset used to validate the clinical, cognitive, and genetic manifestations of the brain aging patterns were missing to varying degrees. For example, blood biochemistry tests and telomere length data were missing at baseline by approximately 58% and 82% respectively, and genotype data were not assayed for more than 70 percent of the subjects. As for cognitive function tests, only the results of Mini-Mental State Examination were complete, while other tests such as the Trail Making Test and Digit Span Backward were available for less than 10 percent of subjects. 

      (3) Other previous discoveries or claims supporting the results of the study should be explored to support the conclusion.

      Response: Thanks for the suggestion. As we mentioned in the manuscript lines 274-277, participants with brain aging pattern 2 (lower baseline total GMV and more rapid GMV decrease) were characterized by accelerated biological aging and cognitive decline. Previous research on brainAGE2,3 (the difference between chronological age and the age predicted by the machine learning model of brain imaging data) showed that as a biomarker of accelerated brain aging, people with older brainAGE have accelerated biological aging and early signs of cognitive decline, which is consistent with our discoveries in this study (lines 302-306).

      Further, genome-wide association studies identified significant genetic loci contributing to accelerated brain aging, some of which can be found in pervious GWAS on image-derived phenotypes4, such as regional and tissue volume, cortical area and white matter tract measurements, and specific brain aging mode using a data-driven decomposition approach5 (lines 207-213).

      In addition, we demonstrated the “last in, first out” mirroring patterns between structural brain aging and brain development, and found that mirroring patterns are predominantly localized to the lateral / medial temporal cortex and the cingulate cortex, noted in the manuscript lines 231-234. Large differences in the patterns of change between adolescent late development and aging in the medial temporal cortex were previously found in studies of  brain development and aging patterns6 (lines 315-317).

      (4) Sex was merely used as a covariate. Were there sex differences during brain aging? What was the sex ratio difference in groups 1 and 2?

      Thanks for the comment. Sex differences during brain aging can be observed by investigating sex-stratified whole-brain GMV trajectories. We fitted the growth curve and estimated rate of change for total grey matter volume (TGMV) separately for male and female using generalized additive mixed effect models (GAMM), which included 40,921 observations from 17,055 males and 19,958 females (Author response image 2). Overall, among healthy participants aged 44-82 years in UK Biobank, males overall had higher total GMV and a faster rate of GMV decrease over time, while females had lower total GMV and a lower rate of GMV decrease. Similar conclusion can be found in normative brain-volume trajectories across the human lifespan7 . Supplementary Table 5 showed baseline and demographic characteristics for all participants and participants stratified by brain aging patterns. There were slightly more females than males among the total participants and for brain aging pattern 1 (53.4%) and pattern 2 (54.4%), and χ^2 tests showed no significant difference in the sex ratio between the two patterns (P = 0.06).

      Author response image 2.

      Total gray matter volume (TGMV) (a) and the estimated rate of change (b) for females (red) and males (blue). Rates of volumetric change for total gray matter and each ROI were estimated using GAMM, which incorporates both cross-sectional between-subject variation and longitudinal withinsubject variation from 22,067 observations for 19,958 females, and 18,854 observations for 17,055 males. Covariates include assessment center, handedness, ethnic, and ICV. Shaded areas around the fit line denotes 95% CI.

      (5) Although statistically significant, Figure 3 shows minimal differences. LTL and phenoAge are displayed in adjusted values but what are the actual values that differ between patterns 1 and 2?

      Response: Thanks for the comment. We have modified the visualization of Figure 3 in the revised manuscript by adjusting the appropriate axes for leucocyte telomere length (LTL) and PhenoAge variables and removing the whisker from the boxplot. Associations between biological aging biomarkers and brain aging patterns were listed in Supplementary Table 6. Compared to brain aging pattern 1, participants in pattern 2 with more rapid GMV decrease had shorter leucocyte telomere

      length (P = 0.009, Cohen’s D = -0.028) and higher PhenoAge (P = 0.019, Cohen’s D = 0.027) without covariate adjustment. Specifically, participants in brain aging pattern 1 had average Z-standardized LTL 0.083 (SD 0.98) and average PhenoAge 41.35 years (SD 8.17 years), and those in pattern 2 had average Z-standardized LTL 0.055 (SD 0.97) and average PhenoAge 41.58 years (SD 8.32 years).

      (6) It is not intuitive to link gene expression results shown in Figure 8 and brain structure and functional differences between patterns 1 and 2. Any overlap of genes identified from analyses shown in Figure 6 (GWAS) and 8 (gene expression)?

      Response: Thanks for the comment. We apologize for the confusion. As we mentioned in the Result Section Gene expression profiles were associated with delayed brain development and accelerated brain aging, seventeen of the 45 genes mapped to GWAS significant SNP were found in Allen Human Brain Atlas (AHBA) dataset. Gene expression of LGR4 (rspearman = 0.56, Ppermutation = 2.5 × 10-4) were significantly associated with delayed brain development, and ESR1 (rspearman = 0.53, Ppermutation = 1.5 × 10-4) and FAM3C (rspearman = -0.37, Ppermutation = 0.004) were significantly associated with accelerated brain aging. BDNF-AS was positively associated with both delayed brain development and accelerated brain aging after spatial permutation test. Full association between gene expression profiles of mapped genes and estimated APC during brain development / aging were presented in Supplementary Tables 12 and 13, respectively.  

      Furthermore, we screened the genes based on their contributions and effect directions to the first PLS components in brain development and brain aging. We have found genes mapped to GWAS significant SNP among the genes screened for inclusion in the functional enrichment analysis (Author response table 1), with LGR4 (PLSw1(LGR4) = 3.70, P.FDR = 0.002) associated with delayed development and ESR1 (PLSw1(ESR1) = 3.91, P.FDR = 6.12 × 10-4) and FAM3C (PLSw1(FAM3C) = -3.68, P.FDR = 0.001) associated with accelerated aging.

      Author response table 1.

      Contributions and effect directions of the first PLS components in brain development and brain aging of genes that mapped to GWAS significant SNP. The bold P values reflect significance (P < 0.005, inclusion in the functional enrichment analysis) after FDR correction.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to understand the heterogeneity of brain aging by analyzing brain imaging data. Based on the concept of structural brain aging, they divided participants into two groups based on the volume and rate of decrease of gray matter volume (GMV). The group with rapid brain aging showed accelerated biological aging and cognitive decline and was found to be vulnerable to certain neuropsychiatric disorders. Furthermore, the authors claimed the existence of a "last in, first out" mirroring pattern between brain aging and brain development, which they argued is more pronounced in the group with rapid brain aging. Lastly, the authors identified genetic differences between the two groups and speculated that the cause of rapid brain aging may lie in genetic differences.

      Strengths:

      The authors supported their claims by analyzing a large amount of data using various statistical techniques. There seems to be no doubt about the quality and quantity of the data. Additionally, they demonstrated their strength in integrating diverse data through various analysis techniques to conclude.

      Weaknesses:

      There appears to be a lack of connection between the analysis results and their claims. Readers lacking sufficient background knowledge of the brain may find it difficult to understand the paper. It would be beneficial to modify the figures and writing to make the authors' claims clearer to readers. Furthermore, the paper gives an overall impression of being less polished in terms of abbreviations, figure numbering, etc. These aspects should be revised to make the paper easier for readers to understand.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Gray matter volume (GMV) is defined later in the manuscript and may confuse readers.

      Response: Thanks for the comment. We have now defined GMV upon its first appearance in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) In conducting GWAS, the authors used total GMV at the age of 60 as a phenotype (line 195). It would be beneficial to provide additional explanation as to why only the data from individuals aged 60 were utilized, especially considering the ample availability of GMV data.

      Response: Thanks for the comment and we apologize for the confusion. As we mentioned in the Methods Section Genome Wide Association Study to identify SNPs associated with brain aging patterns, we performed Genome-wide association studies (GWAS) on individual deviations of total GMV relative to the population average at 60 years using PLINK 2.0. Therefore, data from all individuals were used in the GWAS, rather than only those aged at 60y. To accomplish this, deviation of total GMV from the population average for each participant at age 60y was calculated using mixed effect regression model as described in the Methods Section Identification of longitudinal brain aging patterns.

      (2) Whole-brain gene expression data was linked to GMV (Line 237). Gray matter is known to account for about 40% of the total brain. Thus, interpreting whole-brain data in connection with GMV might introduce significant errors. Could this potential source of error be addressed?

      Response: Thanks for the comment. In our study, the Allen Human Brain Atlas (AHBA) dataset were processed using abagen toolbox version 0.1.3 (https://doi.org/10.5281/zenodo.5129257) with Desikan-Killiany atlas8, resulting in a matrix (83 regions × 15,633 gene expression levels) of transcriptional level values that contains brain structure of cortex and subcortex in bilateral hemispheres, and brainstem. Only data from 34 cerebral cortex regions, but not the whole brain, were included in the analysis of the association between regional change rate of gray matter volume and gene expression profiles using partial least squares (PLS) regression. We have clarified in the revised manuscript that we utilized AHBA microarray expression data from regions of interest (ROIs) in the cortex.

      (3) The paper lacks biological interpretation of the important genetic factors (SNPs and genes) for brain aging discovered in this study, as well as the results of gene ontology analysis. Many readers would be curious about the biological significance of these genetic differences and what kind of outcomes they may produce.

      Response: Thanks for the suggestion. As we mentioned in our manuscript, six independent single nucleotide polymorphisms (SNPs) were identified at genome-wide significance level (P < 5 ×1 0-8) (Fig. 6). Among them, two SNPs (rs10835187 and rs779233904) were also found to be associated with multiple brain imaging phenotypes in previous studies, such as regional and tissue volume, cortical area and white matter tract measurements. Compared to the GWAS using global gray matter volume as the phenotype, our GWAS revealed additional signal in chromosome 7 (rs7776725), which was mapped to the intron of FAM3C and encodes a secreted protein involved in pancreatic cancer and Alzheimer's disease. This signal was further validated to be associated with specific brain aging mode by another study using a data-driven decomposition approach. In addition, another significant locus (rs10835187, P = 1.11 ×1 0-13) is an intergenic variant between gene LGR4-AS1 and LIN7C, and was reported to be associated with bone density, and brain volume and total cortical area measurements. LIN7C encodes the Lin-7C protein, which is involved in the localization and stabilization of ion channels in polarized cells, such as neurons and epithelial cell. Previous study has revealed the association of both allelic and haplotypic variations in the LIN7C gene with ADHD. In addition, ESR1 was found to be involved in I-kappaB kinase/NF-kappaB signaling in the functional enrichment associated with accelerated brain aging (Figure 8 and Supplementary Figure 5), and its activation leads to a variety of human pathologies such as neurodegenerative, inflammatory, autoimmune and cancerous disease9. 

      In summary, the analyses from using the databases of GO biological processes and KEGG Pathways indicate synaptic transmission as an important process in the common mechanisms of brain development and aging, and cellular processes (autophagy), as well as the progression of neurodegenerative diseases, are important processes in the mechanisms of brain aging.

      (4) As mentioned in the public review, it would be helpful if figures were revised to more clearly represent the claims.

      (4.1) For Figure 1, it would be beneficial to explain how the authors analyzed the differences between the mentioned cross-section and longitudinal trajectory, which they identified as a strength of the study.

      Response: We have added the strengths of adopting longitudinal data for modeling brain aging trajectories compared to only using cross-sectional data in Figure 1 caption in the revised manuscript:

      “Fig. 1 Overview of the study workflow. a, Population cohorts (UK Biobank and IMAGEN) and data sources (brain imaging, biological aging biomarkers, cognitive functions, genomic data) involved in this study. b, Brain aging patterns were identified using longitudinal trajectories of the whole brain GMV, which enabled the capturing of long-term and individualized variations compared to only use cross-sectional data, and associations between brain aging patterns and other measurements (biological aging, cognitive functions and PRS of major neuropsychiatric disorders) were investigated. c, Mirroring patterns between brain aging and brain development was investigated using ztransformed brain volumetric change map and gene expression analysis.”

      (4.2) In Figure 3, it's challenging to distinguish differences between patterns 1 and 2 in LTL and PhenoAge. (e.g. It's unclear whether Pattern 1 is higher or lower). Clarifying this visually would be useful.

      Response: We have modified the visualization of Figure 3 in the revised manuscript by adjusting the appropriate axes for leucocyte telomere length (LTL) and PhenoAge variables and removing the whisker from the boxplot.

      Author response image 3.

      Distributions of biological aging biomarkers (leucocyte telomere length (LTL) and PhenoAge) among participants with brain aging patterns 1 and 2.

      (4.3) Figure 7 explains the mirroring pattern, but it's hard to discern significant differences from the figures alone (especially in Figures 7b and 7c). Using an alternative method (graph, etc.) to clearly represent this would be appreciated.

      Response: We have included an arrow pointing to the brain regions with significant differences in each subfigure.

      Author response image 4.

      The “last in, first out” mirroring patterns between brain development and brain aging.

      (5) Abbreviations should be explained when they are first introduced in the paper. For example, GMV continues to be used without explanation, and in line 203, it is written out as 'gray matter volume'. ADHD and ASD first appear at line 172, but the explanation is found in lines 177-178. Additionally, there are terms without explanations in the manuscript. For instance, BMI is not explained in the main manuscript but is defined in the Supplementary Information (Table S6).

      Response: We have corrected the inappropriate formatting regarding misplaced and missing abbreviations in the revised manuscript and Supplementary Information.

      (6) Figure numbers should follow the order of appearance in the paper. The first Supplementary Fig. in the manuscript is Supplementary Figure 3. It should be Supplementary Figure 1.

      Response: We have relabeled the figures with the order of appearance in the paper in the revised manuscript and Supplementary Information.

      Reference:

      (1) Roweis, S. T. & Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. science 290, 2323–2326 (2000).

      (2) Christman, S. et al. Accelerated brain aging predicts impaired cognitive performance and greater disability in geriatric but not midlife adult depression. Translational Psychiatry 10, 317 (2020).

      (3) Elliott, M. L. et al. Brain-age in midlife is associated with accelerated biological aging and cognitive decline in a longitudinal birth cohort. Molecular psychiatry 26, 3829–3838 (2021).

      (4) Smith, S. M. et al. An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature neuroscience 24, 737–745 (2021).

      (5) Smith, S. M. et al. Brain aging comprises many modes of structural and functional change with distinct genetic and biophysical associations. elife 9, e52677 (2020).

      (6) Tamnes, C. K. et al. Brain development and aging: overlapping and unique patterns of change. Neuroimage 68, 63–74 (2013).

      (7) Bethlehem, R. A. et al. Brain charts for the human lifespan. Nature 604, 525–533 (2022).

      (8) Desikan, R. S. et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 31, 968–980 (2006).

      (9) Singh, S. & Singh, T. G. Role of nuclear factor kappa B (NF-κB) signalling in neurodegenerative diseases: an mechanistic approach. Current Neuropharmacology 18, 918–935 (2020).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This is a comprehensive study that clearly and deeply investigates the function of GATA6 in human early cardiac development. 

      Strengths: 

      This study combines hESC engineering, differentiation, detailed gene expression, genome occupancy, and pathway modulation to elucidate the role of GATA6 in early cardiac differentiation. The work is carefully executed and the results support the conclusions. The use of publicly available data is well integrated throughout the manuscript. The RIME experiments are excellent. 

      Weaknesses: 

      Much has been known about GATA6 in mesendoderm development, and this is acknowledged by the authors. 

      We appreciate the comments and have tried to highlight both the early role of GATA6 in cardiac progenitor biology as well as the haploinsufficiency for relevance to human congenital heart disease, which we believe adds value to other recent published work, among others Sharma et al. eLife 2020.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript by Bisson et al describes the role of GATA6 to regulate cardiac progenitor cell (CPC) specification and cardiomyocyte (CM) generation using human embryonic stem cells (hESCs). The authors found that GATA6 loss-of-function hESC exhibits early defects in mesendoderm and lateral mesoderm patterning stages. Using RNA-seq and CUT&RUN assays the genes of the Wnt and BMP programs were found to be affected by the loss of GATA6 expression. Modulating Wnt and BMP during early cardiac differentiation can partially rescue CPC and CM defects in GATA6 hetero- and homozygous mutant hESCs. 

      Strengths: 

      The studies performed were rigorous and the rationale for the experimental design was logical. The results obtained were clear and supported the conclusions that the authors made regarding the role of GATA6 on Wnt and BMP pathway gene expression. 

      Weaknesses: 

      Given the wealth of studies that have been performed in this research area previously, the amount of new information provided in this study is relatively modest. Nevertheless, the results and quite clear and should make a strong contribution to the field. 

      Likewise for reviewer 2, we appreciate the comments and have tried to highlight both the early role of GATA6 in cardiac progenitor biology as well as the haploinsufficiency for relevance to human congenital heart disease.

      Reviewer #3 (Public review): 

      In this study, Bison et al. analyzed the role of the GATA6 transcription factor in patterning the early mesoderm and generating cardiomyocytes, using human embryonic stem cell differentiation assays and patient-derived hiPSCs with heart defects associated with mutations in the GATA6 gene. They identified a novel role for GATA6 in regulating genes involved in the WNT and BMP pathways -findings not previously noted in earlier analyses of GATA6 mutant hiPSCs during early cardiac mesoderm specification (Sharma et al., 2020). Modulation of the WNT and BMP pathways may partially rescue early cardiac mesoderm defects in GATA6 mutant hESCs. These results provide significant insights into how GATA6 loss-of-function and heterozygous mutations contribute to heart defects. 

      I have the following comments: 

      (1) Throughout the manuscript, Bison et al. alternate between different protocols to generate cardiomyocytes, which creates some confusion (e.g., Figure 1 vs. Supplemental Figure 2A). The authors should provide a clear justification for using alternative protocols. 

      We agree and clarified this issue in the revision (p. 6). The reviewer is correct that there are two widely used protocols for directed differentiation of PSCs to cardiac fate. One is a cytokine-based protocol (Fig. 1A) and the other uses small molecules to manipulate the WNT pathway (CHIR protocol, Supplemental Fig. 2B). In our study, we used the CHIR protocol only for experiments in Supplemental Figure 2B-E. Since our data implicated BMP and WNT as mediators of the GATA6-dependent program, we did this mainly to confirm that the phenotype we observed with the cytokine-based protocol was not biased by the differentiation protocol. However, we found the CHIR protocol to be overall relatively inefficient for cardiac differentiation using the parental H1 hESCs and the various isogenic lines. The in vitro cardiac differentiation protocols for hPSCs are known to be variable depending on lines and sometimes require extensive optimization for various media components and concentrations, cell seeding densities, and batch variations for crucial reagents. The cytokine-based protocol we optimized worked most efficiently with our hPSC lines to generate cardiomyocytes, therefore we committed to using it for the bulk of experiments in this study.  

      (2) The authors should characterise the mesodermal identity and cardiomyocyte subtypes generated with the activin/BMP-induction protocol thoroughly and clarify whether defects in the expression of BMP and WNTrelated gene affect the formation of specific cardiomyocyte subtypes in a chamber-specific manner. This analysis is important, as Sharma et al. suggested a role for GATA6 in orchestrating outflow tract formation, and Bison et al. similarly identified decreased expression of NRP1, a gene involved in outflow tract septation, in their GATA6 mutant cells. 

      We agree it is important that the mesodermal identities are quite thoroughly characterized.

      For example, Fig. 2 (K+P+, Brachyury, EOMES), Fig. 3G&H (lateral mesoderm, cardiac mesoderm RNAseq & GSEA comparing datasets from Koh et al.). The capacity of the cytokine-based protocol to generate both FHF and SHF derived sub-types has been rigorously evaluated by Keller and colleagues, which we now cite (Yang et al. 2022). Since the null cells do not generate CMs, chamber specific subtypes cannot be evaluated; whether the GATA6 heterozygous mutants are biased is an interesting question. Indeed, the top GO term identified by CUT&RUN analysis for GATA6 at day 2 of

      differentiation is outflow tract morphogenesis, which is consistent with the interpretation by Sharma et al., but implicates this program at a much earlier developmental stage, long before cardiomyocyte differentiation. We think this is one of the most important findings of our study and appreciate the chance to highlight this in the revision (p. 9, 17). When we evaluated chamber-specificity for differentiated cardiomyocytes, we did not find significant differences, as indicated for the reviewer in the panel below (day 20 of differentiation). Since our study focuses on early stages of progenitor specification rather than cardiomyocyte differentiation, we agree that a more rigorous analysis would be of value, and indicated this as a limitation of our current study (p. 18).

      Author response image 1.

      (3) The authors developed an iPSC line derived from a congenital heart disease (CHD) patient with an atrial septal defect and observed that these cells generate cTnnT+ cells less efficiently. However, it remains unclear whether atrial cardiomyocytes (or those localised specifically at the septum) are being generated using the activin/BMP-induction protocol and the patient-derived iPSC line.

      As indicated above, our study is focused on cardiac progenitor specification, and we found similar differences with the patient-derived iPSC-CMs compared to using hESC heterozygous targeted mutants. While we did not note any major differences in expression of cardiomyocyte markers, whether the mutants show any biases toward sub-types of cardiomyocytes is an interesting question to be pursued in subsequent work.

      (4) The authors should also justify the necessity of using the patient-derived line to further analyse GATA6 function. 

      This is a good point, and as suggested we provided the justification (p. 5-6). This is the first patient-derived iPSC line published with a heterozygous GATA6 mutation along with an isogenic mutation-corrected control generated for cardiac directed differentiation. Patients with congenital heart disease (CHD) associated with GATA6 mutations are typically heterozygous (also true for many other CHD variants; presumably homozygous null embryos would not survive). It is important to query if phenotypes found using targeted mutations in hESCs (or iPSCs) model the human disease, since the patient cells (or the hESCs) likely have additional genetic variants that might interact with the GATA6 mutation. The fact that both types of heterozygous cells (patient-derived iPSCs and targeted hESCs) generate similar defects in CM differentiation provides evidence supporting the use of these human cellular models to study the genetic and cellular basis for congenital heart disease. This is particularly important, since other models, such as heterozygous mice, do not show such phenotypes.

      (5) Figure 3 suggests an enrichment of paraxial mesoderm genes in the context of GATA6 loss-of-function, which is intriguing given the well-established role of GATA6 in specifying cardiac versus pharyngeal mesoderm lineages in model organisms. Could the authors expand their analysis beyond GO term enrichment to explore which alternative fates GATA6 mutant cells may acquire? Additionally, how does the potential enrichment of paraxial mesoderm, rather than pharyngeal mesoderm, relate to the initial mesodermal induction from their differentiation protocol? Could the authors also rule out the possibility of increased neuronal cell fates? 

      We need to interpret our in vitro differentiation data cautiously in relation to what has been shown in vivo, since we are unlikely to be reproducing all the complex signaling taking place in the embryo. Yet we do see modest increases in gene expression levels including signatures of paraxial mesoderm and ECM/mesenchymal at days 2 or 3 of differentiation in the GATA6 mutant cells. Therefore, we now include a heatmap showing enriched paraxial mesoderm gene expression in the mutant cells, new Fig. 3I (see page 10).

      A caveat of this result is that the cells are being differentiated toward cardiac fate, so a bias for alternative fates might be suppressed. We modified the protocol to favor paraxial fate by adding CHIR at day 2 (rather than XAV) and performing qPCR assays at day 3. We found this successfully induced paraxial mesoderm gene expression, but equally comparing wildtype, heterozygous, or null cells, so do not feel it warrants highlighting further. 

      Recommendations for the authors:  

      Reviewing Editor (Recommendations for the authors): 

      Incorporation of marker analysis for various stages of iPSC to CM differentiation (mesoderm, cardiac progenitor, CM subtypes) would increase the significance and support for the findings presented. Further data on the link (direct or indirect) between GATA6 and Wnt/BMP signalling would also add to the significance of this study. A number of textual changes/clarifications are also suggested to improve the manuscript. 

      We appreciate the feedback and provide responses for issues raised for markers, direct or indirect interactions, and textual changes/clarifications in the following sections. As indicated above, we did not find obvious alterations in cardiac subtypes, but since our study is focused on early progenitor specification, this is an interesting question that we think should be more rigorously evaluated in subsequent work.  

      Reviewer #1 (Recommendations for the authors)

      Minor details: 

      (1) On p6 "Principal component analysis (PCA) showed that the cells derived from each genotype were well separated from each other (Supplemental Figure 2C)". All genotypes should be in one PCA plot to better evaluate the three genotypes. 

      We prepared the new plot as suggested, presented as new Supplemental Fig. 2C. 

      (2) p10: "Chia et al.22 and found a significantly decreased enrichment in GATA6-/- cells relative to WT at day 2" decreased enrichment of what? Direct target genes? 

      Thank you for catching this. Yes, the text was changed to indicate a “decreased enrichment in GATA6-/- cells relative to WT at day 2 for putative direct GATA6 target genes.” 

      Reviewer #2 (Recommendations for the authors): 

      Overall, this is an interesting study that addresses the early developmental roles of GATA6 on cardiac differentiation. While the identification of Wnt and BMP pathway genes to be involved in GATA6 regulation is not entirely unexpected, the authors do bring forth some useful knowledge that helps to further elucidate the mechanism of pre-cardiac mesoderm regulation. Some suggestions for improvement are included below - 

      Major points: 

      (1) Since the loss of Gata6 in this study is global (either as heterozygous or homozygous, it is likely that the very early requirement of Gata6 (e.g. mesodermal stage of differentiation) is responsible for the cardiac transcriptional phenotype observed and not due to specific role of Gata6 in the cardiac lineage which would need to be addressed using conditional knock out of Gata6 in hPSC model. The authors should be more explicit when discussing the results as disruption of mesodermal differentiation leading to loss of downstream cardiac lineage cells. For example, I would change the title "GATA6 loss-of-function impairs CM differentiation" to "GATA6 loss-of-function impairs mesodermal (or mesodermal lineage) differentiation" and show the changes in cardiac progenitor cells genes (Isl1, Tbx1, Hand1, and BAF50c/Smarcd3) in addition to cardiomyocyte genes but no change in mesodermal (e.g. Brachyury, T, Eomes, Mesp1/2, etc) genes. 

      We agree with the reviewer’s interpretation. The title for the section was changed as suggested. In Fig. 1, we show changes in cardiac progenitor cell genes (Isl1, Hand1, and BAF50c/Smarcd3) while not seeing changes in mesodermal genes in Fig. 2 (e.g. Brachyury, Eomes, Mesp1/2). We note that the defect may be specific to cardiac (or anterior lateral) mesoderm, as the ability to express paraxial mesoderm markers was not impaired.  

      (2) The use of NKX2.5, TBX5, TBX20, and GATA4 as markers for CPC is not ideal. These markers are also expressed in differentiated cardiomycytes. ISL1 or TBX1 for second heart field progenitors and HAND1 or BAF60c/Smarcd3 for first heart field progenitors would be ideal.  

      As suggested, we included additional day 6 qPCR panel (new Fig. 1E) to evaluate the heart field progenitor markers. 

      (3) Much of the findings described in this study have been known in the field including the requirement of Wnt and BMP to induce mesodermal and subsequently cardiomyocyte differentiation. The key new information here is that Gata6 knockout disrupts Wnt and BMP signaling. It would help to further validate experimentally some of the Wnt and BMP genes as either direct or indirect targets of Gata6 using reporter assays. 

      While reporter assays are feasible and do provide relevant outputs, we feel that the use of any one or even several response elements in a reporter assay adds relatively little value compared to comprehensive analysis of bona fide network components. To address the reviewers concern we have included profiling heat maps for WNT and BMP pathway components to more rigorously and specifically evaluate the disruption in the signaling networks caused by loss of GATA6. Proving direct targets of endogenous genes is challenging, but we mapped many binding peaks for GATA6 to putative enhancers of WNT/BMP pathway genes (based on histone marks). We provide a list of these genes (new Fig. 4F) and distinguish these from WNT/BMP pathway genes that were not bound by GATA6 yet are down-regulated in the GATA6 mutant cells and are likely to be indirect targets (p. 12). 

      Minor points: 

      (1) Figures 1 and 2 - in the figure legend the labels w2, w4, m2, m5, m11, and m14 should be explained as the name of the clones of targeted hESC.  

      The legends were edited to provide this information.  

      (2) Supplemental Figure 3A - the resolution of the FACS plot is suboptimal. 

      We apologize and have corrected the plot resolution in the revised manuscript.  

      (3) Supplemental Table 1 - it's intriguing that amongst all the SWI/SNF factors, the one that is known to be cardiac-specific (SMARCD3) did not come up in the GATA6-RIME-enriched proteins. Is this a reflection of the early stage in which GATA6 plays a role in development (e.g. mesendoderm development but not precardiac mesoderm development when SMARCD3 is expressed)? 

      We agree and have noted this feature in the revised manuscript (p. 17). We note that SMARCD3 is expressed in the RNA-seq data as early as day 2. Although speculative, it may be that GATA6 primarily interacts with SWI/SNF complexes prior to the role for SMARCD3 in cardiac specification.

      Reviewer #3 (Recommendations for the authors): 

      (1) Figures 3G and 3H, as well as others, have resolution issues. The gene names are unreadable, and higherresolution images should be provided. 

      We apologize for the resolution issues and these have been fixed in the revised version. 

      (2) In their early manipulation of the WNT and BMP pathways (Figure 6A), it is unclear whether the activin/BMP protocol shown in Figure 1A was used. If this is the case, the authors should compare their results to a wild-type + DOX EV condition for consistency. 

      We clarified in the revision (Fig. 6A) that all the experiments in Fig. 6 use the cytokine protocol. In the revised figure, we included the wild-type + DOX EV condition as suggested. 

      (3) In Figures 6C and 6D, the authors should include an analysis of a wild-type isogenic line under their new CHIR/LB condition for comparison. 

      As suggested, we included the WT isogenic line in the comparison. For Fig. 6C these are shown on a separate graph because the Y-axis values are very different. Note that the CHIR/LB treatments that improve mutant cell differentiation impact the WT cells in the opposite manner.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This is an interesting manuscript that extends prior work from this group identifying that a chemovar of Cannabis induces apoptosis of T-ALL cells by preventing NOTCH1 cleavage. Here the authors isolate specific components of the chemovar responsible for this effect to CBD and CBDV. They identify the mechanism of action of these agents as occurring via the integrated stress response. Overall the work is well performed but there are two lingering questions that would be helpful to address as follows:

      • Exactly how CBD and CBDV result in the upregulation of the TRPV1/integrated stress response is unclear. What is the most proximal target of these agents that results in these changes?

      The interaction of CBD and CBDV with TRPV1 has been thoroughly investigated by previous studies in the field. A few prominent examples are:

      (1) De Petrocellis, Luciano, Alessia Ligresti, Aniello Schiano Moriello, Marco Allarà, Tiziana Bisogno, Stefania Petrosino, Colin G. Stott, and Vincenzo Di Marzo. "Effects of cannabinoids and cannabinoid‐enriched Cannabis extracts on TRP channels and endocannabinoid metabolic enzymes." British journal of pharmacology 163, no. 7 (2011): 1479-1494.

      (2) Muller, Chanté, Paula Morales, and Patricia H. Reggio. "Cannabinoid ligands targeting TRP channels." Frontiers in molecular neuroscience 11 (2019): 487.

      (3) Iannotti, Fabio Arturo, Charlotte L. Hill, Antonio Leo, Ahlam Alhusaini, Camille Soubrane, Enrico Mazzarella, Emilio Russo, Benjamin J. Whalley, Vincenzo Di Marzo, and Gary J. Stephens. "Nonpsychotropic plant cannabinoids, cannabidivarin (CBDV) and cannabidiol (CBD), activate and desensitize transient receptor potential vanilloid 1 (TRPV1) channels in vitro: potential for the treatment of neuronal hyperexcitability." ACS chemical neuroscience 5, no. 11 (2014): 1131-1141.

      (4) Costa, Barbara, Gabriella Giagnoni, Chiara Franke, Anna Elisa Trovato, and Mariapia Colleoni. "Vanilloid TRPV1 receptor mediates the antihyperalgesic effect of the nonpsychoactive cannabinoid, cannabidiol, in a rat model of acute inflammation." British journal of pharmacology 143, no. 2 (2004): 247-250.

      (5) de Almeida, Douglas L., and Lakshmi A. Devi. "Diversity of molecular targets and signaling pathways for CBD." Pharmacology research & perspectives 8, no. 6 (2020): e00682.

      (6) Anand, Uma, Ben Jones, Yuri Korchev, Stephen R. Bloom, Barbara Pacchetti, Praveen Anand, and Mikael Hans Sodergren. "CBD effects on TRPV1 signaling pathways in cultured DRG neurons." Journal of Pain Research (2020): 22692278.

      Similarly, other works have demonstrated the link between TRPV1 and the integrated stress response pathway (see below). These studies suggested increased reactive oxygen species (ROS) production, Cyclooxygenase-2 (COX-2) enzyme, as well as other stressors, lead to modulation of intracellular calcium levels by TRPV1.

      (1) Ho, Karen W., Nicholas J. Ward, and David J. Calkins. "TRPV1: a stress response protein in the central nervous system." American journal of neurodegenerative disease 1, no. 1 (2012): 1.

      (2) de la Harpe, Amy, Natasha Beukes, and Carminita L. Frost. "CBD activation of TRPV1 induces oxidative signaling and subsequent ER stress in breast cancer cell lines." Biotechnology and Applied Biochemistry 69, no. 2 (2022): 420-430.

      (3) Soliman, Eman, and Rukiyah Van Dross. "Anandamide‐induced endoplasmic reticulum stress and apoptosis are mediated by oxidative stress in nonmelanoma skin cancer: Receptor‐independent endocannabinoid signaling." Molecular Carcinogenesis 55, no. 11 (2016): 1807-1821.

      • Related to the above, all experiments to confirm the mechanism of action of CBD/CBDV rely on chemical agents, whose precise targets are not fully clear in some cases. Thus, some use of genetic means (such as by knockout of TRPV1, ATF4) to confirm the dependency of these pathways on drug response and NOTCH cleavage would be very helpful.

      Knockdown experiments and inhibition with inhibitors are two different approaches to studying the function of a specific gene or protein. Each method has its advantages and limitations. We initially attempted to knock-down CHAC1, but only managed to silence ~50% (Incomplete knockdown). Following treatment of MOLT4 cells with the whole extract, we observed only a partial downregulation in the mRNA expression of the Notch intracellular domain (NICD) (left panel), which could account for the incomplete rescue from the extract-induced death (right panel). We therefore turned to confirm the signaling pathway by inhibition of different targets with chemical agents.

      Author response image 1.

      Partial knockdown of CHAC1 hinders extract-induced cell death. (A) MOLT-4 cells were treated with either an empty vector or shRNA for Chac1, 369 and 739 represent two different areas of Chac1, for 48 hrs. Then, the gene expression of CHAC1 was assessed via qRT-PCR (N=3). (B) MOLT-4 cells were treated as in A, then added vehicle control or whole Extract (3 µg/mL) for additional 24 hrs, and the viability of the cells was assessed with XTT.

      Reviewer #2 (Public Review):

      Summary:

      The Meiri group previously showed that Notch1-activated human T-ALL cell lines are sensitive to a cannabis extract in vitro and in vivo (Ref. 32). In that article, the authors showed that Extract #12 reduced NICD expression and viability, which was partially rescued by restoring NICD expression. Here, the authors have identified three compounds of Extract #12 (CBD, 331-18A, and CBDV) that are responsible for the majority of anti-leukemic activity and NICD reduction. Using a pharmacological approach, the authors determined that Extract #12 exerted its anti-leukemic and NICD-reducing effects through the CB2 and TRPV1 receptors. To determine the mechanism, the authors performed RNA-seq and observed that Extract #12 induces ER calcium depletion and stress-associated signals -- ATF4, CHOP, and CHAC1. Since CHAC1 was previously shown to be a Notch inhibitor in neural cells, the authors assume that the cannabis compounds repress Notch S1 cleavage through CHAC1 induction. The induction of stress-associated signals, Notch repression, and anti-leukemic effects were reversed by the integrated stress response (ISR) inhibitor ISRIB. Interestingly, combining the 3 cannabinoids gave synergistic anti-leukemic effects in vitro and had growthinhibitory effects in vivo.

      Strengths:

      (1) The authors show novel mechanistic insights that cannabinoids induce ER calcium release and that the subsequent integrated stress response represses activated NOTCH1 expression and kills T-ALL cells.

      (2) This report adds to the evidence that phytocannabinoids can show a so-called "entourage effect" in which minor cannabinoids enhance the effect of the major cannabinoid CBD.

      (3) This report dissects the main cannabinoids in the previously described Extract #12 that contribute to T-ALL killing.

      (4) The manuscript is clear and generally well-written.

      (5) The data are generally high quality and with adequate statistical analyses.

      (6) The data generally support the authors' conclusions. The exception is the experiments related to Notch.

      (7) The authors' discovery of the role of the integrated stress response might explain previous observations that SERCA inhibitors block Notch S1 cleavage and activation in T-ALL (Roti Cancer Cell 2013). The previous explanation by Roti et al was that calcium depletion causes Notch misfolding, which leads to impaired trafficking and cleavage. Perhaps this explanation is not entirely sufficient.

      Weaknesses:

      (1) Given the authors' previous Cancer Communications paper on the anti-leukemic effects and mechanism of Extract #12, the significance of the current manuscript is reduced.

      Our original manuscript consisted extensive multidisciplinary research, and we were asked to divide the research work into a paper that focuses on the cannabis plant and another paper that addresses finding the specific molecules and their underlying mechanism.

      We understand that our publication of the initial observations with the whole extract dampened the overall novelty presented here, but the previous publication lacked the detailed and strong mechanistic work presented here that explains how the cannabis extract exerted its antitumoral effects.

      In addition, the finding of the need for 3 phytocannabinoids and the synergy analysis supplies essential support to the ‘entourage effect’. This is a phenomenon in which the presence of minor proportions of cannabinoids and other plant components significantly modulate the effects of the main active components of cannabis and thereby produce more potent or more selective effects than the use of one major cannabinoid alone. It was well-demonstrated for endocannabinoids but was only demonstrated in very few studies for phytocannabinoids.

      (2) It would be important to connect the authors' findings and a wealth of literature on the role of ER calcium/stress on Notch cleavage, folding, trafficking, and activation.

      We mentioned three previous papers (ref. 34-36) that guided us in our investigation. Following this reviewer’s comment, we added to the discussion a few lines connecting our findings to previous works on ER stress and Notch activation with the appropriate references.

      (3) There is an overreliance on the data on a single cell line -- MOLT4. MOLT4 is a good initial choice as it is Notch-mutated, Notch-dependent, and representative of the most common T-ALL subtype -- TAL1. However, there is no confirmatory data in other TAL1positive T-ALLs or interrogation of other T-ALL subtypes.

      As mentioned by the reviewer, this study followed a previous publication in which 7 different cell lines were assessed (MOLT‐4, CCRF‐CEM, Jurkat, Loucy, HPB-ALL, DND-41and T-ALL1). MOLT-4 cells were used to investigate the mechanism, both MOLT-4 cells and CCRF-CEM cells were utilized to investigate the effect of the cannabinoid combination or the whole extract in-vivo.

      (4) Fig. 6H. The effects of the cannabinoid combination might be statistically significant but seem biologically weak.

      Survival rates are presented in Fig. 6H for the combination of the cannabinoids and in Supplementary Fig. S6C for the whole extract. While this mouse model provides valuable insights, the biological significance and the translation of findings to human patients require cautious interpretation.

      (5) Fig. 3. Based on these data, the authors conclude that the cannabinoid combination induces CHAC1, which represses Notch S1 cleavage in T-ALL cells. The concern is that Notch signaling is highly context-dependent. CHAC1 might inhibit Notch in neural cells (Refs. 34-35), but it might not do this in a different context like T-ALL. It would be important to show evidence that CHAC1 represses S1 cleavage in the T-ALL context. More importantly, Fig. 3H clearly shows the cannabinoid combination inducing ATF4 and CHOP protein expression, but the effects on CHAC1 protein do not seem to be satisfactory as a mechanism for Notch inhibition. Perhaps something else is blocking Notch expression?

      We understand the reviewer’s concern. Previous works had shown the upregulation of CHAC1 also in the context of Notch signaling in leukemia, particularly recently also for T-ALL:

      (1) Meng, X., Matlawska-Wasowska, K., Girodon, F., Mazel, T., Willman, C.L., Atlas, S., Chen, I.M., Harvey, R.C., Hunger, S.P., Ness, S.A. and Winter, S.S., 2011. GSI-I (Z-LLNle-CHO) inhibits γ-secretase and the proteosome to trigger cell death in precursor-B acute lymphoblastic leukemia. Leukemia, 25(7), pp.11351146.

      (2) Chang, Yoon Soo, Joell J. Gills, Shigeru Kawabata, Masahiro Onozawa, Giusy Della Gatta, Adolfo A. Ferrando, Peter D. Aplan, and Phillip A. Dennis. "Inhibition of the NOTCH and mTOR pathways by nelfinavir as a novel treatment for T cell acute lymphoblastic leukemia." International Journal of Oncology 63, no. 5 (2023): 1-12.

      As for the second part of the reviewer’s comment, we tested both the mRNA transcript and protein expression of CHAC1. The increase is clearly shown at 60 min for the mRNA Fig. 3D and Fig. 4F and for the protein also in Supplementary Fig. S4G-I.

      To show direct involvement of CHAC1 we utilized the means of knockdown. Though it was not completely effective and accounted for about ~50% reduction, it clearly shows the involvement of CHAC1 in the mechanism leading to the reduction in viability of these cancer cells.

      Author response image 2.

      Partial knockdown of CHAC1 hinders extract-induced cell death. (A) MOLT-4 cells were treated with either an empty vector or shRNA for Chac1, 369 and 739 represent two different areas of Chac1, for 48 hrs. Then, the gene expression of CHAC1 was assessed via qRT-PCR (N=3). (B) MOLT-4 cells were treated as in A, then added vehicle control or whole Extract (3 µg/mL) for additional 24 hrs, and the viability of the cells was assessed with XTT.

      (6) Fig. 4B-C/S5D-E. These Western blots of NICD expression are consistent with the cannabinoid combination blocking Furin-mediated NOTCH1 cleavage, which is reversed by ISR inhibition. However, there are many mechanisms that regulate NICD expression. To support their conclusion that the effects are specifically Furin-medated, the authors should probe full-length (uncleaved) NOTCH1 in their Western blots.

      We have probed for the full-length Notch1 in our previously published paper (Cancer Communications, Supplementary Fig. S1G-I). As we have shown here the three cannabinoids together mimic the effect of the whole extract, we did not repeat the experiments with full-length Notch1.

      (7) Fig. S4A-B. While these pharmacologic data are suggestive that Extract #12 reduces NICD expression through the CB2 receptor and TRPV1 channel, the doses used are very high (50uM). To exclude off-target effects, these data should be paired with genetic data to support the authors' conclusions.

      We performed a dose-response experiment before choosing the doses used for the inhibitors of CB2 and TRPV1 (see below). The dose of 50 µM was selected as it did not affect the viability of the cells.

      Author response image 3.

      Dose-response of CB2 and TRPV1 inhibitors in MOLT-4 cells. MOLT-4 cells were treated with increasing concentrations (µM) of (A) CB2 inhibitor AM630 or (B) TRPV1 inhibitor AMG9810; and 24 hrs later the viability of the cells was assessed with XTT.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Fig. 6H, it is unclear why the authors are using CCRF-CEM cells, which are known to be resistant to Notch inhibitors, rather than popular cell lines that are Notch-dependent (e.g. CUTLL1, DND-41, HPB-ALL). Since cannabinoids seem to kill at least in part through Notch inhibition, the effects would be predicted to be greater in Notch-dependent T-ALL cell lines than Notch-independent cell lines like CCRF-CEM. To show stronger in vivo preclinical efficacy, another suggestion is to combine cannabinoids with tolerable dosing of gammasecretase inhibitors as published by the Michelle Kelliher group.

      We have shown in our previous publication that both MOLT-4 and CCRF-CEM cells are dependent on Notch for their propagation, while other cell lines of T-ALL such as Loucy and Jurkat do not. Therefore, we treat CCRF-CEM as Notch-dependent. We discuss the possibility of using the cannabinoid combination with other treatments, specifically chemotherapy, to enhance effectiveness.

      (2) To increase significance, this reviewer suggests strengthening the mechanism. However, this reviewer understands the challenge of identifying the correct mechanism. Thus, an alternative would be to increase clinical relevance. Some specific suggestions are described below.

      (a) With regard to increasing mechanistic insights, the authors should be aware of some papers that might be helpful. Roti et al (Cancer Cell 2013) showed that SERCA inhibitors like thapsigardin reduce ER calcium levels and block Notch signaling by inhibiting NOTCH1 trafficking and inhibiting Furin-mediated (S1) cleavage of Notch1. Multiple EGF repeats and all three Lin12/Notch repeats in the extracellular domains of Notch receptors require calcium for proper folding (Aster Biochemistry 1999; Gordon Nat. Struct. Mol. Biol. 2007; Hambleton Structure 2004; Rand Protein Sci 1997). Thus, Roti et al concluded that ER calcium depletion blocks NOTCH1 S1 cleavage. This effect seems to be conserved in Drosophila as Periz and Fortiin (EMBO J, 1999) showed impaired Notch cleavage in Ca2+/ATPasemutated Drosophila cells. Besser et al should consider these papers when exploring the mechanism by which the ER calcium release by the cannabinoid combination blocks activated NOTCH1 expression. Similarities and differences should be discussed.

      As mentioned above and stated also by the reviewer, many papers have shown the cleavage and/or activation of Notch following ER stress.

      (b) With regard to increasing clinical relevance, the authors should consider testing the effects of the cannabinoid combination on primary samples, PDX models, and/or genetically engineered mouse models. Pan-Notch inhibitors like gamma-secretase inhibitors (GSIs) have been disappointing in clinical trials because of excessive on-target toxicity, in particular in the intestine. The authors should consider exploring whether the cannabinoids might be superior to GSIs with regard to intestinal toxicity and why that might be (e.g. receptor expression).

      We thank the reviewer and agree that clinical relevance is of outmost importance. As obtaining primary tumor cells from patients is challenging, we assessed the whole cannabis extract in a PDX model. This extract is already being used by patients. We added this result as Supplementary fig. S7, and address it in the main text of the Results and in the Materials and Methods section.

      (3) Since the authors have performed gene expression profiling, another test to confirm that Extract #12 acts through the Notch pathway is to perform enrichment analysis for known Notch target genes in T-ALL (e.g. Wang PNAS 2013).

      We performed the analysis and this is how we pinpointed the involvement of ATF4, CHOP and CHAC1 of the integrated stress response pathway.

      Minor concern:

      Supplemental Table S4. According to the text (page 10, line 160) and table title, these data are RNA-seq. However, according to the GSE154287 annotation, these data are Affymetrix arrays There are no gene names in the GSE table. Are the IDs probesets rather than genes?

      Indeed, the gene analysis data are Affymetrix arrays and the title was corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      The manuscript by Bohra et al. describes the indirect effects of ligand-dependent gene activation on neighboring non-target genes. The authors utilized single-molecule RNA-FISH (targeting both mature and intronic regions), 4C-seq, and enhancer deletions to demonstrate that the non-enhancer-targeted gene TFF3, located in the same TAD as the target gene TFF1, alters its expression when TFF1 expression declines at the end of the estrogen signaling peak. Since the enhancer does not loop with TFF3, the authors conclude that mechanisms other than estrogen receptor or enhancer-driven induction are responsible for TFF3 expression. Moreover, ERα intensity correlations show that both high and low levels of ERα are unfavorable for TFF1 expression. The ERa level correlations are further supported by overexpression of GFP-ERa. The authors conclude that transcriptional machinery used by TFF1 for its acute activation can negatively impact the TFF3 at peak of signaling but once, the condensate dissolves, TFF3 benefits from it for its low expression.

      Strengths:

      The findings are indeed intriguing. The authors have maintained appropriate experimental controls, and their conclusions are well-supported by the data.

      Weaknesses:

      There are some major and minor concerns that related to approach, data presentation and discussion. But I think they can be fixed with more efforts.

      We thank the reviewer for their positive comments on the paper. We have addressed all their specific recommendations below.  

      The deletion of enhancer reveals the absolute reliance of TFF1 on its enhancers for its expression. Authors should elaborate more on this as this is an important finding.

      We thank the reviewer for the comment. We have now added a more detailed discussion on the requirement of enhancer for TFF1 expression in the revised manuscript (line 368-385).  

      In Fig. 1, TFF3 expression is shown to be induced upon E2 signaling through qRT-PCR, while smFISH does not display a similar pattern. The authors attribute this discrepancy to the overall low expression of TFF3. In my opinion, this argument could be further supported by relevant literature, if available. Additionally, does GRO-seq data reveal any changes in TFF3 expression following estrogen stimulation? The GRO-seq track shown in Fig.1 should be adjusted to TFF3 expression to appreciate its expression changes.

      We have now included a browser shot image of TFF3 region showing GRO-Seq signal at E2 time course (Fig. S1C). We observed an increased transcription towards the 3’ end of TFF3 gene body at 3h.  The increased transcription at 3h, corroborates with smFISH data. The relative changes of TFF3 expression measured by qRT-PCR and smFISH for intronic transcripts are somewhat different, we speculate that such biased measurements that are dependent on PCR amplifications could be more for genes that express at low levels and smFISH using intronic probes may be a more sensitive assay to detect such changes.    

      Since the mutually exclusive relationship between TFF1 and TFF3 is based on snap shots in fixed cells, can authors comment on whether the same cell that expresses TFF1 at 1h, expresses TFF3 at 3h? Perhaps, the calculations taking total number of cells that express these genes at 1 and 3h would be useful.

      Like pointed out by the reviewer, since these are fixed cells, we cannot comment on the fate of the same cell at two time points. To further address this limitation, future work could employ cells with endogenous tags for TFF1 and TFF3 and utilize live cell imaging techniques. In a fixed cell assay, as the reviewer suggests, it can be investigated whether a similar fraction shows high TFF3 expression at 3h, as the fraction that shows high TFF1 expression at 1 h. To quantify the fractions as suggested by the reviewer, we plotted the fraction of cells showing high TFF1 and TFF3 expression at 1h and 3h. We identify truly high expressing cells by taking mean and one standard deviation (for single cell level data) at E2-1hr as the threshold for TFF1 (80 and above transcript counts) and mean and one standard deviation (for single cell level data) at E2-3hr as the threshold for TFF3 (36 and above transcript counts). The fraction with high TFF1 expression at 1h  (12.06 ± 2.1) is indeed comparable to that with high TFF3 expression at 3h (12.50 ± 2.0) (Fig. 2C and Author response image 1). We should note that if the transcript counts were normally distributed, a predetermined fraction would be expected to be above these thresholds and comparable fractions can arise just from underlying statistics. But in our experiments, this is unlikely to be the case given the many outliers that affect both the mean and the standard deviation, and the lack of normality and high dispersion in single cell distributions. Of course, despite the fractions being comparable, we cannot be certain if it is the same set of cells that go from high expression of TFF1 to high expression of TFF3, but definitely that is a possibility. We thank the reviewer for pointing out this comparison.

      Author response image 1.

      The graph represents the percent of cells that show high expression for TFF1 and TFF3 at 1h and 3h post E2 signaling. The threshold was collected by pooling in absolute RNA counts from 650 analyzed cells (as in Fig. 2C). The mean and standard deviation over single cell data were calculated. Mean plus one standard deviation was used to set the threshold for identifying high expressing cells. For TFF1, as it maximally expresses at 1h the threshold used was 80. For TFF3, as it maximally expresses at 3h the threshold used was 36. Fraction of cells expressing above 80 and 36 for TFF1 and TFF3 respectively were calculated from three different repeats. Mean of means and standard deviations from the three experiments are plotted here.

      Authors conclude that TFF3 is not directly regulated by enhancer or estrogen receptor. Does ERa bind on TFF3 promoter? 

      The ERa ChIP-seq performed at 1h and 3h of signaling suggests that TFF3 promoter is not bound by ERa as shown in supplementary Fig. 1B and S1B. However, one peak upstream to TFF1 promoter is visible and that is lost at 3h. 

      Minor comments:

      Reviewer’s comment -The figures would benefit from resizing of panels. There is very little space between the panels.

      We have now resized the figures in the revised manuscript.

      The discussion section could include an extrapolation on the relationship between ERα concentration and transcriptional regulation. Given that ERα levels have been shown to play a critical role in breast cancer, exploring how varying concentrations of ERα affect gene expression, including the differential regulation of target and non-target genes, would provide valuable insights into the broader implications of this study.

      This is a very important point that was missing from the manuscript. We have included this in the discussion in the revised manuscript (line 426-430).

      Reviewer #2:

      Summary:

      In this manuscript by Bohra et al., the authors use the well-established estrogen response in MCF7 cells to interrogate the role of genome architecture, enhancers, and estrogen receptor concentration in transcriptional regulation. They propose there is competition between the genes TFF1 and TFF3 which is mediated by transcriptional condensates. This reviewer does not find these claims persuasive as presented. Moreover, the results are not placed in the context of current knowledge.

      Strengths:

      High level of ERalpha expression seems to diminish the transcriptional response. Thus, the results in Fig. 4 have potential insight into ER-mediated transcription. Yet, this observation is not pursued in great depth however, for example with mutagenesis of ERalpha. However, this phenomenon - which falls under the general description of non monotonic dose response - is treated at great depth in the literature (i.e. PMID: 22419778). For example, the result the authors describe in Fig. 4 has been reported and in fact mathematically modeled in PMID 23134774. One possible avenue for improving this paper would be to dig into this result at the single-cell level using deletion mutants of ERalpha or by perturbing co-activators.

      We thank the reviewer for pointing us to the relevant literature on our observation which will enhance the manuscript. We have discussed these findings in relations to ours in the discussion section (Line 400-413). We thank the reviewer for insight on non-monotonic behavior.

      Weaknesses:

      There are concerns with the sm-RNA FISH experiments. It is highly unusual to see so much intronic signal away from the site of transcription (Fig. 2) (PMID: 27932455, 30554876), which suggests to me the authors are carrying out incorrect thresholding or have a substantial amount of labelling background. The Cote paper cited in the manuscript is likewise inconsistent with their findings and is cited in a misleading manner: they see splicing within a very small region away from the site of transcription. 

      We thank the reviewer for this comment, and apologize if they feel we misrepresented the argument from Cote et al. This has now been rectified in the manuscript. However, we do not agree that the intronic signals away from the site of transcription are an artefact. First, the images presented here are just representative 2D projections of 3D Z-stacks; whereas the full 3D stack is used for spot counting using a widely-used algorithm that reports spot counts that are constant over wide range of thresholds (Raj et al., 2008). The veracity of automated counts was first verified initially by comparison to manual counts. Even for the 2D representations the extragenic intronic signals show up at similar thresholds to the transcription sites. 

      The signal is not non-specific arising from background labeling, explained by following reasons:

      • To further support the time-course smFISH data and its interpretation without depending on the dispersed intronic signal, we have analyzed the number of alleles firing/site of transcription at a given time in a cell under the three conditions. We counted the sites of transcription in a given cell and calculated the percentage of cells showing 1,2,3,4 or >4 sites. We see that the percent of cells showing a single site of transcription for TFF1 is very high in uninduced cells and this decreases at 1h. At 1h, the cells showing 2, 3 and 4 sites of transcription increase which again goes down at 3h (Author response image 2A). This agrees with the interpretation made from mean intronic counts away from the site of transcription. Similarly, for TFF3, the number of cells showing 2,3 and 4 sites of transcription increase slightly at 3hr compared to uninduced and 1hr (Author response image 2B).  We can also see that several cells have no alleles firing at a given time as has been quantified in the graphs on right showing total fraction of cells with zero versus non-zero alleles firing (Author response image 2A-B). A non-specific signal would be present in all cells.

      • There is literature on post-transcriptional splicing of RNA beyond our work, which suggests that intronic signal can be found at relatively large distances away from the site of transcription. Waks et al. showed that some fraction of unspliced RNA could be observed up to 6-10 microns away from the site of transcription suggesting that there can be a delay between transcription and (alternative) splicing (Waks et al., 2011). Pannuclear disperse intronic signals can arise as there can be more than one allele firing at a time in different nuclear locations. The spread of intronic transcripts in our images is also limited in cells in which only 1 allele is firing at E2-1 hour (Author response image 2C) or uninduced cells (Author response image 2D). Furthermore, Cote et al. discuss that “Of note, we see that increased transcription level correlates with intron dispersal, suggesting that the percentage of splicing occurring away from the transcription site is regulated by transcription level for at least some introns. This may explain why we observe posttranscriptional splicing of all genes we measured, as all were highly expressed.” This is in line with our interpretation that intron signal dispersal can occur in case of posttranscriptional splicing (Coté et al., 2023). Additionally, other studies have suggested that transcripts in cells do not necessarily undergo co-transcriptional splicing which leads us to conclude that intronic signal can be found farther away from the site of transcription. Coulon et al. showed that splicing can occur after transcript release from the site and suggested that no strict checkpoint exists to ensure intron removal before release which results in splicing and release being kinetically uncoupled from each other (Coulon et al., 2014). Similarly, using live-cell imaging, it was shown that splicing is not always coupled with transcription, and this could depend on the nature and structural features of transcript (such as blockage of polypyrimidine tract which results in delayed recognition) (Vargas et al., 2011). Drexler  et al. showed that as opposed to drosophila transcripts that are shorter, in mammalian cells, splicing of the terminal intron can occur post-transcriptionally (Drexler et al., 2020). Using RNA polymerase II ChIP-Seq time course data from ERα activation in the MCF-7 cells, Honkela et al. showed that large number of genes can show significant delays between the completion of transcription and mRNA production (Honkela et al., 2015). This was attributed to faster transcription of shorter genes which results in splicing  delays suggesting rapid completion of transcription on shorter genes can lead to splicing-associated delays (Honkela et al., 2015). More recently, comparisons of nascent and mature RNA levels suggested a time lapse between transcription and splicing for the genes that are early responders during signaling (Zambrano et al., 2020). The presence of significant numbers of TFF1 nascent RNA in the nucleus in our data corroborates with above observations. 

      • Uniform intensities across many transcripts suggests these are true signal arising from RNA molecules which would not be the case for non-specific, background signal (Author response image 2E).

      • Splicing occurs in the nucleus and intron containing pre-transcripts should be nuclear localized. Thus, intronic signals should remain localized to the nucleus unlike the mature mRNA which translocate to the cytoplasm after processing and thus exonic signals can be found both in the nucleus and the cytoplasm. In keeping with this, we observe no signal in the cytoplasm for the intronic probes and it remains localized within the nucleus as expected and can be seen in Author response image 2F, while exonic signals are observed in both compartments. This suggests to us that the signal is coming from true pre-transcripts. There is no reason for non-specific background labelling to remain restricted to the nucleus.

      • We observe that the mean intronic label counts for both the genes TFF1 and TFF3 increases upon E2-induction compared to uninduced condition (Fig. 2B). Similarly, the mean intronic count for both genes reduce drastically in the TFF1-enhancer deleted cells (Fig. 3C, D). This change in the number of intronic signal specifically on induction and enhancer deletion suggests that the signal is not an artefact and arises from true nascent transcripts that are sensitive to stimulus or enhancer deletion.

      • We expect colocalization of intronic signal with exonic signals in the nucleus, while there can be exonic signals that do not colocalize with intronic, representing more mature mRNA. Indeed, we observe a clear colocalization between the intronic and exonic signals in the nucleus, while exonic signals can occur independent of intronic both in the nucleus and the cytoplasm. This clearly demonstrates that the intronic signals in our experiments are specific and not simply background labelling (Author response image 2G).

      These studies and the arguments above lead us to conclude that the presence of intronic transcripts in the nucleus, away from the site of transcription is not an artefact. We hope the reviewer will agree with us. These analyses have now been included in the manuscript as Supplementary Figure 6 and have been added in the manuscript at line numbers 106-111, 201204,  215-217 and line 231-235. We thank the reviewer for raising this important point.

      Author response image 2.

      Dynamic induction and RNA localization of TFF1 and TFF3 transcription across cell populations using smRNA FISH A. Bar graph depicting the percentage of cells with 1,2,3,4, or greater than 4 sites of transcription for TFF1 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph on right shows the number of cells with zero or non-zero number of alleles firing. B. Bar graph depicting the percentage of cells with 1,2,3,4 or greater than 4 sites of transcription for TFF3 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph in the middle shows the number of cells with 2,3,4 or greater than 4 sites of transcription for TFF3.The graph on the right shows the number of cells with zero or non-zero number of alleles firing. C. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in cells induced for 1 hour with E2. The image shows that when a single allele of TFF1 is firing, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. D. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in uninduced cells. The image shows that when a single allele of TFF1 is firing and transcription is low, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. E. Line profile through several transcripts in the nucleus show uniform and similar intensities indicating that these are true signals. F. 60X Representative images from a single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1 (top) and InTFF3 and ExTFF3 (bottom). The image shows that there is no intronic signal in the cytoplasm, while exonic signals can be found both in the nucleus and the cytoplasm. The scale bar is 5 microns. G. 60X Representative images from single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1. The image shows that all intronic signals are colocalized with exonic signals, but all exonic signals are expectedly not colocalized with intronic signals, representing more mature mRNA. The scale bar is 5 microns.

      One substantial way to improve the manuscript is to take a careful look at previous single cell analysis of the estrogen response, which in some cases has been done on the exact same genes (PMID: 29476006, 35081348, 30554876, 31930333). In some of these cases, the authors reach different conclusions than those presented in the present manuscript. Likewise, there have been more than a few studies that have characterized these enhancers (the first one I know of is: PMID 18728018). Also, Oh et al. 2021 (cited in the manuscript) did show an interaction between TFF1e and TFF3, which seems to contradict the conclusion from Fig. 3. In summary, the results of this paper are not in dialogue with the field, which is a major shortcoming. 

      We thank the reviewer for pointing out these important studies. The studies from Prof. Larson group are particularly very insightful (Rodriguez et al., 2019). We have now included this in the discussion (line 106-111 and line 420-424) where we suggest the differences and similarities between our, Larson’s group and also Mancini’s group (Patange et al., 2022; Stossi et al., 2020). 

      The 4C-Seq data from the manuscript Oh et al. 2021 is exactly consistent with our observation from Fig 3 as they also observed little to no interaction between TFF1e and TFF3p in WT cells, only upon TFF1p deletion, did the TFF1e become engaged with the TFF3p. In agreement with this, we also observe little to no interaction between TFF1e and TFF3p in WT cells (Fig.3A). This is also consistent with our competition model for resources between these two genes. Oh et al. shows interaction between TFF1e and TFF3 when the TFF1 promoter is deleted showing that when the primary promoter is not available the enhancer is retargeted to the next available gene (Oh et al., 2021). It does not show that in WT or at any time point of E2 signalling does TFF1e and TFF3 interact.

      In the opinion of this reviewer, there are few - if any - experiments to interrogate the existence of LLPS for diffraction-limited spots such as those associated with transcription. This difficulty is a general problem with the field and not specific to the present manuscript. For example, transient binding will also appear as a dynamic 'spot' in the nucleus, independently of any higher-order interactions. As for Fig. 5, I don't think treating cells with 1,6 hexanediol is any longer considered a credible experiment. For example, there are profound effects on chromatin independent of changes in LLPS (PMID: 33536240).  

      We are cognizant of and appreciate the limitations pointed out by the reviewer. We and others have previously shown that ERa forms condensates on TFF1 chromatin region using ImmunoFISH assay (Saravanan et al., 2020).  The data below shows the relative mean ERα intensity on TFF1 FISH spots and random regions clearly showing an appearance of the condensate at the TFF1 site. Further, the deletion of TFF1e causes the reduction in size of this condensate. Thus, we expect that these ERα condensates are characterized by higher-order interactions and become disrupted on treatment with 1,6-hexanediol. These condensates are the size of below micron as mentioned by the reviewer, but most TF condensates are of the similar sizes. We agree with the reviewer that 1,6- hexanediol treatment is a brute-force experiment with several irreversible changes to the chromatin. Although we have tried to use it at a low concentration for a short period of time and it has been used in several papers (Chen et al., 2023; Gamliel et al., 2022). The opposite pattern of TFF1 vs. TFF3 expression upon 1,6- hexanediol treatment suggests that there is specificity. Further, to perturb condensates, mutants of ERa can be used (N-terminus IDR truncations) however, the transcriptional response of these mutants is also altered due to perturbed recruitment of coactivators that recognize Nterminus of ER, restricting the distinction between ERa functions and condensate formation.

      References:

      Chen, L., Zhang, Z., Han, Q., Maity, B. K., Rodrigues, L., Zboril, E., Adhikari, R., Ko, S.-H., Li, X., Yoshida, S. R., Xue, P., Smith, E., Xu, K., Wang, Q., Huang, T. H.-M., Chong, S., & Liu, Z. (2023). Hormone-induced enhancer assembly requires an optimal level of hormone receptor multivalent interactions. Molecular Cell, 83(19), 3438-3456.e12. https://doi.org/10.1016/j.molcel.2023.08.027

      Coté, A., O’Farrell, A., Dardani, I., Dunagin, M., Coté, C., Wan, Y., Bayatpour, S., Drexler, H. L., Alexander, K. A., Chen, F., Wassie, A. T., Patel, R., Pham, K., Boyden, E. S., Berger, S., Phillips-Cremins, J., Churchman, L. S., & Raj, A. (2023). Post-transcriptional splicing can occur in a slow-moving zone around the gene. eLife, 12. https://doi.org/10.7554/eLife.91357.2

      Coulon, A., Ferguson, M. L., de Turris, V., Palangat, M., Chow, C. C., & Larson, D. R. (2014). Kinetic competition during the transcription cycle results in stochastic RNA processing. eLife, 3, e03939. https://doi.org/10.7554/eLife.03939

      Drexler, H. L., Choquet, K., & Churchman, L. S. (2020). Splicing Kinetics and Coordination Revealed by Direct Nascent RNA Sequencing through Nanopores. Molecular Cell, 77(5), 985-998.e8. https://doi.org/10.1016/j.molcel.2019.11.017

      Gamliel, A., Meluzzi, D., Oh, S., Jiang, N., Destici, E., Rosenfeld, M. G., & Nair, S. J. (2022). Long-distance association of topological boundaries through nuclear condensates. Proceedings of the National Academy of Sciences of the United States of America, 119(32), e2206216119. https://doi.org/10.1073/pnas.2206216119

      Honkela, A., Peltonen, J., Topa, H., Charapitsa, I., Matarese, F., Grote, K., Stunnenberg, H. G., Reid, G., Lawrence, N. D., & Rattray, M. (2015). Genome-wide modeling of transcription kinetics reveals patterns of RNA production delays. Proceedings of the National Academy of Sciences of the United States of America, 112(42), 13115. https://doi.org/10.1073/pnas.1420404112

      Oh, S., Shao, J., Mitra, J., Xiong, F., D’Antonio, M., Wang, R., Garcia-Bassets, I., Ma, Q., Zhu, X., Lee, J.-H., Nair, S. J., Yang, F., Ohgi, K., Frazer, K. A., Zhang, Z. D., Li, W., & Rosenfeld, M. G. (2021). Enhancer release and retargeting activates disease-susceptibility genes. Nature, 595(7869), Article 7869. https://doi.org/10.1038/s41586-021-03577-1

      Patange, S., Ball, D. A., Wan, Y., Karpova, T. S., Girvan, M., Levens, D., & Larson, D. R. (2022). MYC amplifies gene expression through global changes in transcription factor dynamics. Cell Reports, 38(4). https://doi.org/10.1016/j.celrep.2021.110292

      Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A., & Tyagi, S. (2008). Imaging individual mRNA molecules using multiple singly labeled probes. Nature Methods, 5(10), Article 10. https://doi.org/10.1038/nmeth.1253

      Rodriguez, J., Ren, G., Day, C. R., Zhao, K., Chow, C. C., & Larson, D. R. (2019). Intrinsic Dynamics of a Human Gene Reveal the Basis of Expression Heterogeneity. Cell, 176(1–2), 213-226.e18. https://doi.org/10.1016/j.cell.2018.11.026

      Saravanan, B., Soota, D., Islam, Z., Majumdar, S., Mann, R., Meel, S., Farooq, U., Walavalkar, K., Gayen, S., Singh, A. K., Hannenhalli, S., & Notani, D. (2020). Ligand dependent gene regulation by transient ERα clustered enhancers. PLOS Genetics, 16(1), e1008516. https://doi.org/10.1371/journal.pgen.1008516

      Stossi, F., Dandekar, R. D., Mancini, M. G., Gu, G., Fuqua, S. A. W., Nardone, A., De Angelis, C., Fu, X., Schiff, R., Bedford, M. T., Xu, W., Johansson, H. E., Stephan, C. C., & Mancini, M. A. (2020). Estrogeninduced transcription at individual alleles is independent of receptor level and active conformation but can be modulated by coactivators activity. Nucleic Acids Research, 48(4), 1800. https://doi.org/10.1093/nar/gkz1172

      Vargas, D. Y., Shah, K., Batish, M., Levandoski, M., Sinha, S., Marras, S. A. E., Schedl, P., & Tyagi, S. (2011). Single-Molecule Imaging of Transcriptionally Coupled and Uncoupled Splicing. Cell, 147(5), 1054–1065. https://doi.org/10.1016/j.cell.2011.10.024

      Waks, Z., Klein, A. M., & Silver, P. A. (2011). Cell-to-cell variability of alternative RNA splicing. Molecular Systems Biology, 7(1), 506. https://doi.org/10.1038/msb.2011.32

      Zambrano, S., Loffreda, A., Carelli, E., Stefanelli, G., Colombo, F., Bertrand, E., Tacchetti, C., Agresti, A., Bianchi, M. E., Molina, N., & Mazza, D. (2020). First Responders Shape a Prompt and Sharp NF-κB-Mediated Transcriptional Response to TNF-α. iScience, 23(9), 101529. https://doi.org/10.1016/j.isci.2020.101529

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors in this paper investigate the nature of the activity in the rodent EPN during a simple freely moving cue-reward association task. Given that primate literature suggests movement coding whereas other primate and rodent studies suggest mainly reward outcome coding in the EPNs, it is important to try to tease apart the two views. Through careful analysis of behavior kinematics, position, and neural activity in the EPNs, the authors reveal an interesting and complex relationship between the EPN and mouse behavior.

      Strengths:

      (1) The authors use a novel freely moving task to study EPN activity, which displays rich movement trajectories and kinematics. Given that previous studies have mostly looked at reward coding during head-fixed behavior, this study adds a valuable dataset to the literature. (2) The neural analysis is rich and thorough. Both single neuron level and population level (i.e. PCA) analysis are employed to reveal what EPN encodes.

      Thank you very much for this appreciation.

      Weaknesses:

      (1) One major weakness in this paper is the way the authors define the EPN neurons. Without a clear method of delineating EPN vs other surrounding regions, it is not convincing enough to call these neurons EPNs solely from looking at the electrode cannula track from Figure 2B. Indeed, EPN is a very small nucleus and previous studies like Stephenson-Jones et al (2016) have used opto-tagging of Vglut2 neurons to precisely label EPN single neurons. Wallace et al (2017) have also shown the existence of SOM and PV-positive neurons in the EPN. By not using transgenic lines and cell-type specific approaches to label these EPN neurons, the authors miss the opportunity to claim that the neurons recorded in this study do indeed come from EPN. The authors should at least consider showing an analysis of neurons slightly above or below EPN and show that these neurons display different waveforms or firing patterns.

      We thank the reviewer for their comment, and we thank the opportunity to expand on the inclusion criteria of studied units after providing an explanation. 

      As part of another study, we performed experiments recording in EPN with optrodes and photoidentification in PV-Cre animals. We found optoidentified units in both: animals with correct placement (within the EPN) and on those with off-target placement (within the thalamus or medial to the EPN). Thus, despite the use of Cre animals, we relied on histology to ensure correct EPN recording. We believe that the optotagging based purely on neural makers such as PV, SOM, VGLUT, VGAT would not provide a better anatomical delineation of the EPN since adjacent structures are rich in those same markers. The thalamic reticular nucleus is just dorsal to the EPN and it has been shown to express both SOM and PV (Martinez-Garcia et al., 2020). 

      On the other hand, the lateral hypothalamus (just medial to the EPN) also expresses vGlut2 and SOM. Stephenson-Jones (2016), Extended Data Figure 1, panel g, shows vGluT2 and somatostatin labeling of neurons, with important expression of neurons dorsal, ventral and medial to the EPN. Thus, we believe that viral strategies relying on single neuronal markers still depend on careful histological analysis of recording sites.

      A combination of neural markers or more complex viral strategies might be more suitable to delineate the EPN. As an example, for anatomical tracing Stephenson-Jones et al. 2016 performed a rabies-virus based approach involving retrogradely transported virus making use of projection sites through two injections. Two step viral approaches were also performed in Wallace, M. et al. 2017. We attempted to perform a two-step viral approach, using an anterogradely transported Cre-expressing virus (AAV1.hSyn.Cre.WPRE.hGH) injected into the striatum and a second Cre dependent ChR2 into the EPN. However, our preliminary experiments showed that this double viral approach had a stark effect decreasing the performance of animals during the task (we attempted re-training 2-3 weeks after viral infections and animals failed to turn to the contralateral side of the injections). We believe that this approach might have had a toxic effect (Zingg et al., 2017). 

      To this point, a recent paper (Lazaridis et al., 2019) repeated an optogenetic experiment performed in the Stephenson-Jones et al. study, using a set of different viral approaches and concluded that increasing the activity of GPi-LHb is not aversive, as it had been previously reported. Thus, future studies attempting to increase anatomical specificity are a must, but they will require using viral approaches amenable to the behavioral paradigm.

      We attempted to find properties regarding waveforms, firing rate, and firing patterns from units above or below, however, we did not find a marker that could generate a clear demarcation. We show here a figure that includes the included units in this study as well as excluded ones to show that there is a clear overlap.

      Author response image 1.

      Finally, we completely agree with the reviewer in that there is still room for improvement. We have further expanded the Methods section to explain better our efforts to include units recorded within the EPN. Further, we have added a paragraph within the Discussion section to point out this limitation (lines 871-876).

      Methods (lines 116-131):

      “Recordings. Movable microwire bundles (16 microwires, 32 micrometers in diameter, held inside a cannula, Innovative Neurophysiology, Durham, NC)] were stereotaxtically implanted just above the entopeduncular nucleus (-0.8 AP, 1.7 ML, 3.9 DV). Post surgical care included antibiotic, analgesic and antiinflammatory pharmacological treatment. After 5 days of recovery, animals were retrained for 1-2 weeks. Unitary activity was recorded for 2-6 days at each dorsoventral electrode position and the session with the best electrophysiological (signal to noise ratio (>2), stability across time) and behavioral [performance, number of trials (>220)] quality was selected. Microwire electrodes were advanced in 50 micrometer dorsoventral steps for 500 micrometers in total. After experiment completion, animals were perfused with a 4% paraformaldehyde solution. Brains were extracted, dehydrated with a 30% sucrose solution and sectioned in a cryostat into 30micron thick slices. Slices were mounted and photographed using a light microscope. Microwire tracks of the 16-microwire bundle were analyzed (Fig. 2A-B) and only animals with tracks traversing the EPN were selected (6 out of 10). Finally, we located the final position of microwire tips and inferred the dorsoventral recording position of each of the recording sessions. Only units recorded within the EPN were included.” 

      Discussion (lines 871-876):

      “A weakness of the current study is the lack of characterization of neuronal subtypes. An area of opportunity for future research could be to perform photo-identification of neuronal subtypes within the EPN which could contribute to the overall description of the information representation. Further, detailed anatomical viral vector strategies could aid to improve anatomical localization of recordings, reduce reliance on histological examination, and solve some current controversies (Lazaridis et al., 2019).” 

      (2) The authors fail to replicate the main finding about EPN neurons which is that they encode outcome in a negative manner. Both Stephenson-Jones et al (2016) and Hong and Hikosaka (2008) show a reward response during the outcome period where firing goes down during reward and up during neutral or aversive outcome. However, Figure 2 G top panel shows that the mean population is higher during correct trials and lower during incorrect trials. This could be interesting given that the authors might try recording from another part of EPN that has not been studied before. However, without convincing evidence that the neurons recorded are from EPN in the first place (point 1), it is hard to interpret these results and reconcile them with previous studies.

      We really thank the reviewer for pointing out that we need to better explain how EPN units encode outcome. We now provide an additional panel in Figure 4, its corresponding text in the results section (lines 544-562) and a new paragraph in the discussion related to this comment.

      We believe that we do indeed recapitulate findings of both of Stephenson-Jones et al (2016) and Hong and Hikosaka (2008). Both studies focus on a specific subpopulation of GPi/EPN neurons that project to the lateral habenula (LHb). Stephenson-Jones et al (2016) posit that GPi-LHb neurons (which they opto-tag as vGluT2) exhibit a decreased firing rate during rewarding outcomes. Hong and Hikosaka (2008) antidromically identified LHb projecting neurons through within the GPi and found reward positive and reward negative neurons, which were respectively modulated either by increasing or decreasing their firing rate with a rewarding outcome (red and green dots on the x-axis of Figure 5A in their paper).

      As the reviewer pointed out the zScore may be misleading. Therefore, in our study we also decomposed population activity on reward axis through dPCA. When marginalizing for reward in Figure 3F, we find that the weights of individual units on this axis are centered around zero, with positive and negative values (Figure 3F, right panel). Thus, units can code a rewarding outcome as either an increase or a decrease of activity. We show example units of such modulation in Figure 3-1g and h.

      We had segregated our analysis of spatio-temporal and kinematic coding upon the reward coding of units in Figure 4L-M. Yet, following this comment and in an effort of further clarifying this segregation, we introduced panels with the mean zScore of units during outcome evaluation in Figure 4L.

      We amended the main text to better explain these findings (lines 544-562).

      “Previous reports suggest that EPN units that project to the lateral habenula encode reward as a decrease in firing rate. Thus, we wished to ask whether reward encoding units can code kinematic and spatio-temporal variables as well.

      To this end, we first segregated units upon their reward coding properties: reward positive (which increased activity with reward) and reward negative units (which decreased activity with reward). We performed auROC on the 250ms after head entry comparing rewarded trials and incorrect trails (p<0.001, permutation test). Mean activity of reward insensitive, positive and negative units is shown in Fig. 4L. Next, we performed a dimensionality reduction on the coefficients of the model that best explained both contexts (kinematic + spatio-temporal model on pooled data) using UMAP (McInnes et al., 2018). We observe a continuum rather than discrete clusters (Fig. 4L). Note that individual units are color coded according to their responsivity to reward. We did not find a clear clustering either.”  

      Paragraph added in the discussion (lines 749-755):

      “In this study, we found that rewarding outcomes can be represented by EPN units through either an increase or a decrease in firing rate (Fig. 3F, 3-1g-h, 4L). While Stephenson-Jones et al., 2016 found that lateral habenula (LHb)-projecting neurons within the EPN of mice primarily encoded rewarding outcomes by a decrease in firing rate, Hong and Hikosaka, 2008 observed that in primates, LHb-projecting units could encode reward through either a decrease or an increase in firing rate. Thus, our results align more closely with the latter study, which also employed an operant conditioning task.”

      (3) The authors say that: 'reward and kinematic doing are not mutually exclusive, challenging the notion of distinct pathways and movement processing'. However, it is not clear whether the data presented in this work supports this statement. First, the authors have not attempted to record from the entire EPN. Thus it is possible that the coding might be more segregated in other parts of EPN. Second, EPNs have previously been shown to display positive firing for negative outcomes and vice versa, something which the authors do not find here. It is possible that those neurons might not encode kinematic and movement variables. Thus, the authors should point out in the main text the possibility that the EPN activity recorded might be missing some parts of the whole EPN.

      We thank the reviewer for the opportunity to expand on this topic. We believe it is certainly possible that other not-recorded regions of the EPN might exhibit greater segregation of reward and kinematics. However, we considered it worthwhile pointing out that from the dataset collected in this study reward-sensitive units encode kinematics in a similar fashion to reward-insensitive ones (Fig. 4L,M). Moreover, we asked specifically whether reward-negative units (that decrease firing rate with rewarding outcomes, as previously reported) could encode kinematics and spatio-temporal variables with different strength than reward-insensitive ones and could not find significant differences (Fig. 4M).

      We did indeed find units that displayed decreased firing rate upon rewarding outcomes, as has been previously reported. We have addressed this fact more thoroughly in point (2). 

      Finally, we agree with the reviewer that the dataset collected in this study is by no means exhaustive of the entire EPN and have thus included a sentence pointing this out in the Discussion section (lines 805-806):

      “Given that we did not record from the entire EPN, it is still possible that another region of the nucleus might exhibit more segregation.”

      (4) The authors use an IR beam system to record licks and make a strong claim about the nature of lick encoding in the EPN. However, the authors should note that IR beam system is not the most accurate way of detecting licks given that any object blocking the path (paw or jaw-dropping) will be detected as lick events. Capacitance based, closed-loop detection, or video capturing is better suited to detect individual licks. Given that the authors are interested in kinematics of licking, this is important. The authors should either point this out in the main text or verify in the system if the IR beam is correctly detecting licks using a combination of those methods.

      We thank the reviewer for the opportunity of clarifying the lick event acquisition. We have experience using electrical alternatives to lickometers; however, we believe they were not best suited to this application. Closed-loop lickometers generally use a metallic grid upon which animals stand so that the loop can be closed; however, we wanted to have a transparent floor. We have found capacitance based lickometers to be useful in head-fixed conditions but have noticed that they are very dependent on animal position and proximity of other bodyparts such as limbs. Given the freely moving aspect of the task this was difficult to control. Finally, both electric alternatives for lickometers are more prone to noise and may introduce electrical artifacts that might contaminate the spiking signal. This is why we opted to use a slit in combination with an IR beam that would only fit the tongue and that forced enough protrusion such that individual licks could be monitored. Further, the slit could not fit other body-parts like the paw or jaw. We have now included a video (Supp. Video 2) showing a closeup of this behavior that better conveys how the jaw and paw do not fit inside the slit. The following text has been added in the corresponding methods section (lines 97-98):

      “The lickometer slit was just wide enough to fit the tongue and deep enough to evoke a clear tongue protrusion.”

      Reviewer #1 (Recommendations For The Authors):

      (1)The authors should verify using opto-tagging of either Vglut2, SOM, or PV neurons whether they can see the same firing pattern. If not, the authors should address this weakness in the paper.

      We thank the reviewer for this important point, we have provided a more detailed reply above.

      (2)The way dPCA or PCA is applied to the data is not stated at all in the main text. Are all units from different mice combined? Or applied separately for each mouse? How does that affect the interpretation of the data? At least a brief text should be included in the main text to guide the readers.

      We thank the reviewer for pointing out this important omission. We have included an explanation in the Methods section and in the Main text.

      Methods (lines 182-184):

      “For all population level analyses individual units recorded from all sessions and all animals were pooled to construct pseudo-simultaneous population response of combined data mostly recorded separately.”

      Main text (lines 397-399):

      “For population level analyses throughout the study, we pooled recorded units from all animals to construct a pseudo-simultaneous population.”

      Discussion (lines 729-730):

      “…(from pooled units from all animals to construct a pseudo-simultaneous population, which assumes homogeneity across subjects)”

      (3) The authors argue that they do not find 'value coding' in this study. However, the authors never manipulate reward size or probability, but only the uncertainty or difficulty of the task. This might be better termed 'difficulty', and it is difficult to say whether this correlates with value in this task. For instance, mice might be very confident about the choice, even for an intermediate frequency sweep, if the mouse had waited long enough to hear the full sweep. In that case, the difficulty would not correlate with value, given that the mouse will think the value of the port it is going to is high. Thus, authors should avoid using the term value.

      We agree with the reviewer. We have modified the text to specify that difficulty was the variable being studied and added the following sentence in the Discussion (lines 747-748):

      “It is still possible that by modifying reward contingencies such as droplet size value coding could be evidenced.”

      (4) How have the authors obtained Figure 7D bottom panel? It is unclear at all what this correlation represents. Are the authors looking at a correlation between instantaneous firing rate and lick rate during a lick bout?

      We thank the reviewer for pointing out that omission. It is indeed correlation coefficient between the instantaneous firing rate and the instantaneous lick rate for a lick bout. We have included labeling in Figure 7D and pointed this out in the main text [lines 680-681]:

      “Fig.7D, lower panel shows the correlation coefficient between the instantaneous firing rate and the instantaneous lick rate within a lick bout for all units.”

      Reviewer #2 (Public Review):

      This paper examined how the activity of neurons in the entopeduncular nucleus (EPN) of mice relates to kinematics, value, and reward. The authors recorded neural activity during an auditory-cued two-alternative choice task, allowing them to examine how neuronal firing relates to specific movements like licking or paw movements, as well as how contextual factors like task stage or proximity to a goal influence the coding of kinematic and spatiotemporal features. The data shows that the firing of individual neurons is linked to kinematic features such as lick or step cycles. However, the majority of neurons exhibited activity related to both movement types, suggesting that EPN neuronal activity does not merely reflect muscle-level representations. This contradicts what would be expected from traditional action selection or action specification models of the basal ganglia.

      The authors also show that spatiotemporal variables account for more variability compared to kinematic features alone. Using demixed Principal Component Analysis, they reveal that at the population level, the three principal components explaining the most variance were related to specific temporal or spatial features of the task, such as ramping activity as mice approached reward ports, rather than trial outcome or specific actions. Notably, this activity was present in neurons whose firing was also modulated by kinematic features, demonstrating that individual EPN neurons integrate multiple features. A weakness is that what the spatiotemporal activity reflects is not well specified. The authors suggest some may relate to action value due to greater modulation when approaching a reward port, but acknowledge action value is not well parametrized or separated from variables like reward expectation.

      We thank the reviewer for the comment. We indeed believe that further exploring these spatiotemporal signals is important and will be the subject of future studies.

      A key goal was to determine whether activity related to expected value and reward delivery arose from a distinct population of EPN neurons or was also present in neurons modulated by kinematic and spatiotemporal features. In contrast to previous studies (Hong & Hikosaka 2008 and Stephenson-Jones et al., 2016), the current data reveals that individual neurons can exhibit modulation by both reward and kinematic parameters. Two potential differences may explain this discrepancy: First, the previous studies used head-fixed recordings, where it may have been easier to isolate movement versus reward-related responses. Second, those studies observed prominent phasic responses to the delivery or omission of expected rewards - responses largely absent in the current paper. This absence suggests a possibility that neurons exhibiting such phasic "reward" responses were not sampled, which is plausible since in both primates and rodents, these neurons tend to be located in restricted topographic regions. Alternatively, in the head-fixed recordings, kinematic/spatial coding may have gone undetected due to the forced immobility.

      Thank you for raising this point. Nevertheless, there is some phasic activity associated with reward responses, which can be seen in the new panel in Figure 4L.

      Overall, this paper offers needed insight into how the basal ganglia output encodes behavior. The EPN recordings from freely moving mice clearly demonstrate that individual neurons integrate reward, kinematic, and spatiotemporal features, challenging traditional models. However, the specific relationship between spatiotemporal activity and factors like action value remains unclear.

      We really appreciate this reviewer for their valuable comments.

      Reviewer #2 (Recommendations For The Authors):

      One small suggestion is to make sure that all the panels in the figures are well annotated. I struggled in places to know what certain alignments or groupings meant because they were not labelled. An example would be what do the lines correspond to in the lower panels of Figure 2D and E. I could figure it out from other panels but it would have helped if each panel had better labelling.

      Thanks for pointing this out, we have improved labelling across the figures and corrected the specific example you have pointed out.

      The paper is very nice though. Congratulations!

      Thank you very much.

      Editor's note:

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      We thank the editor for the comment. A statistics table has been added.

      References:

      Lazaridis, I., Tzortzi, O., Weglage, M., Märtin, A., Xuan, Y., Parent, M., Johansson, Y., Fuzik, J., Fürth, D., Fenno, L. E., Ramakrishnan, C., Silberberg, G., Deisseroth, K., Carlén, M., & Meletis, K. (2019). A hypothalamus-habenula circuit controls aversion. Molecular Psychiatry, 24(9), 1351–1368. https://doi.org/10.1038/s41380-019-0369-5

      Martinez-Garcia, R. I., Voelcker, B., Zaltsman, J. B., Patrick, S. L., Stevens, T. R., Connors, B. W., & Cruikshank, S. J. (2020). Two dynamically distinct circuits drive inhibition in the sensory thalamus. Nature, 583(7818), 813–818. https://doi.org/10.1038/s41586-0202512-5

      McInnes, L., Healy, J., Saul, N., & Großberger, L. (2018). UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software, 3(29), 861. https://doi.org/10.21105/joss.00861

      Zingg, B., Chou, X. lin, Zhang, Z. gang, Mesik, L., Liang, F., Tao, H. W., & Zhang, L. I. (2017). AAV-Mediated Anterograde Transsynaptic Tagging: Mapping Corticocollicular Input-Defined Neural Pathways for Defense Behaviors. Neuron, 93(1), 33–47. https://doi.org/10.1016/j.neuron.2016.11.045

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper reports valuable results regarding the potential role and time course of the prefrontal cortex in conscious perception. Although the sample size is small, the results are clear and convincing, and strengths include the use of several complementary analysis methods. The behavioral test includes subject report so the results do not allow for distinguishing between theories of consciousness; nevertheless, results do advance our understanding of the contribution of prefrontal cortex to conscious perception. We appreciate very much for editor and reviewers encouraged review opinion. Particularly, we thank three reviewers very much for their professional and constructive comments that help us to improve the manuscript substantially.

      Public Reviews:

      Reviewer #1 (Public Review):

      This is a clear and rigorous study of intracranial EEG signals in the prefrontal cortex during a visual awareness task. The results are convincing and worthwhile, and strengths include the use of several complementary analysis methods and clear results. The only methodological weakness is the relatively small sample size of only 6 participants compared to other studies in the field. Interpretation weaknesses that can easily be addressed are claims that their task removes the confound of report (it does not), and claims of primacy in showing early prefrontal cortical involvement in visual perception using intracranial EEG (several studies already have shown this). Also the shorter reaction times for perceived vs not perceived stimuli (confident vs not confident responses) has been described many times previously and is not a new result.

      We appreciate very much for the reviewer’s encouraged opinion. We are going to address reviewer’s specific questions and comments point-by-point in following.

      ‘The only methodological weakness is the relatively small sample size of only 6 participants compared to other studies in the field.’

      We agree that the sample size is relatively small in the present study. To compensate such shortcoming, we rigorously verified each result at both individual and population levels, resembling the data analysis method in non-human primate study.

      Interpretation weaknesses that can easily be addressed are claims that their task removes the confound of report (it does not),

      Thank you very much for your comment. We agree that our task does not remove the confound of report entirely. However, we believe that our task minimizes the motor confounds by dissociating the emergence of awareness from motor in time and balanced direction of motor between aware and unaware conditions. We have modified the text according to reviewer’s comment in the revised manuscript as following: “This task removes the confound of motor-related activity”.

      ..and claims of primacy in showing early prefrontal cortical involvement in visual perception using intracranial EEG (several studies already have shown this).

      We agree that several iEEG studies, including ERP and HFA, have shown the early involvement of prefrontal cortical in visual perception. However, in these studies, the differential activity between conscious and unconscious conditions was not investigated, thus, the activity in prefrontal cortex might be correlated with unconscious processing, rather than conscious processing. In present study, we compared the neural activity in PFC between conscious and unconscious trials, and found the correlation between PFC activity and conscious perception. Although one iEEG study(Gaillard et al., 2009) reported awareness-specific PFC activation, the awareness-related activity started 300 ms after the onset of visual stimuli, which was ~100 ms later than the early awareness related activity in our study. Also, due to the limited number of electrodes in the previous study (2 patients with 19 recording sites mostly in mesiofrontal and peri-insular regions), it was restricted while exploring the awareness-related activity in PFC. In the present study, the number of recording sites (245) were much more than previous study and covered multiple areas in PFC. Our results further show earlier awareness-related activity (~ 200 ms after visual stimuli onset), including ERP, HFA and PLV, which sheds new light on understanding of the role of PFC in conscious perception.

      We have added this discussion in the MS (lines 522-536);

      Also the shorter reaction times for perceived vs not perceived stimuli (confident vs not confident responses) has been described many times previously and is not a new result. Thank you very much for your comment. We agree that the reaction time is strongly modulated by the confident level, which has been described previously (Broggin, Savazzi, & Marzi, 2012; Marzi, Mancini, Metitieri, & Savazzi, 2006). However, in previous studies, the confident levels were usually induced by presenting stimulus with different physical property, such as spatial frequency, eccentricity and contrast. It is well known that the more salient stimuli will induce the faster process of visual information and speed up the process of visuomotor transformation, eventually shorten the reaction time (Corbetta & Shulman, 2002; Posner & Petersen, 1990). Therefore, the dependence of visual processing on the salience of visual stimulus confounds with the effect of visual awareness on the reaction time, which is hard to attribute the shorter reaction time in more salient condition purely to visual awareness. In contrast, we create a condition (near perceptual threshold) in the present study, in which the saliency (contrast) of visual stimulus is very similar in both aware and unaware conditions in order to eliminate the influence of stimulus saliency in reaction time. We think that the difference in reaction time in our study is mainly due to the modulation of awareness state, which was not reported previously.

      We have added the discussion in the MS (lines 497-507).

      Reviewer #1 (Recommendations For The Authors):

      Specific comments follow:

      Abstract: "we designed a visual awareness task that can minimize report-related confounding" and in the Introduction lines 112-115: "Such a paradigm can effectively dissociate awareness-related activity from report-related activity in terms of time... and report behavior"; Discussion lines 481-483 "even after eliminating the influence of the confounding variables related to subjective reports such as motion preparation" and other similar statements in the manuscript should be removed. The task involves report using eye movements with every single stimulus. The fact that there is report for both perceived and not perceived stimuli, that the direction of report is not determined until the time of report, and that there is delay between stimulus and report, does not remove the report-related post-perceptual processing that will inevitably occur in a task where overt report is required for every single trial. For example, brain activity related to planning to report perception will only occur after perceived trials, regardless of the direction of eye movement later decided upon. This preparation to respond is different for perceived and not perceived stimuli, but is not part of the perception itself. In this way the current task is not at all unique and does not substantially differ from many other report-based tasks used previously.

      The objective of present study is to assess whether PFC is involved in the emergence of visual awareness. To do so, it is crucial to determine the subjective awareness state as correct as possible. Considering the disadvantage of non-report paradigms in determining the subjective awareness state (Tsuchiya et al. TiCS, 2015; Mashour et al, Neuron, 2020), we employed a balanced report paradigm. It has been argued (Merten & Nieder, PNAS, 2011) that, in the balanced report paradigms, subjects could not prepare any motor response during the delay period because only the appearance of a rule cue (change color of fixation point at the end of delay period) informed subjects about the appropriate motor action. In this case, the post-perceptual processing during delay period might reflect the non-motor cognitive activity. Alternatively, as being mentioned by reviewer, the post-perceptual processing might relate to planning to report perception, which is different for perceived and not perceived stimuli. Therefore, up to date, the understanding of the post-perceptual processing remains controversial. According to reviewer’s comment, we have modified the description of our task as following: “we designed a visual awareness task that can minimize report-related motor confounding”. Also, have changed “report-related” to “motorrelated” in the text of manuscript.

      Figures 3, 4 changes in posterior middle frontal gyri suggest early frontal eye field involvement in perception. This should be interpreted in the context of many previous studies showing FEF involvement in signal detection. The authors claim that "earlier visual awareness related activities in the prefrontal cortex were not found in previous iEEG studies, especially in the HG band" on lines 501-502 of the Discussion. This statement is not true and should be removed. The following statement in the Discussion on lines 563-564 should be removed for the same reasons: "our study detected 'ignition' in the human PFC for the first time." Authors should review and cite the following studies as precedent among others:

      Blanke O, Morand S, Thut G, Michel CM, Spinelli L, Landis T, Seeck M (1999) Visual activity in the human frontal eye field. Neuroreport 10 (5):925-930. doi:10.1097/00001756-19990406000006

      Foxe JJ, Simpson GV (2002) Flow of activation from V1 to frontal cortex in humans. A framework for defining "early" visual processing. Exp Brain Res 142 (1):139-150. doi:10.1007/s00221-001-0906-7

      Gaillard R, Dehaene S, Adam C, Clemenceau S, Hasboun D, Baulac M, Cohen L, Naccache L (2009) Converging intracranial markers of conscious access. Plos Biology 7 (3):e61

      Gregoriou GG, Gotts SJ, Zhou H, Desimone R (2009) High-frequency, long-range coupling between prefrontal and visual cortex during attention. Science 324:1207-1210

      Herman WX, Smith RE, Kronemer SI, Watsky RE, Chen WC, Gober LM, Touloumes GJ, Khosla M, Raja A, Horien CL, Morse EC, Botta KL, Hirsch LJ, Alkawadri R, Gerrard JL, Spencer DD, Blumenfeld H (2019) A Switch and Wave of Neuronal Activity in the Cerebral Cortex During the First Second of Conscious Perception. Cereb Cortex 29 (2):461-474.

      Khalaf A, Kronemer SI, Christison-Lagay K, Kwon H, Li J, Wu K, Blumenfeld H (2022) Early neural activity changes associated with stimulus detection during visual conscious perception. Cereb Cortex. doi:10.1093/cercor/bhac140

      Kwon H, Kronemer SI, Christison-Lagay KL, Khalaf A, Li J, Ding JZ, Freedman NC, Blumenfeld H (2021) Early cortical signals in visual stimulus detection. Neuroimage 244:118608.

      We agree that several iEEG studies, including ERP and HFA, have shown the early involvement of prefrontal cortical in visual perception. However, in these studies, the differential activity between conscious and unconscious conditions was not investigated, thus, the activity in prefrontal cortex might be correlated with unconscious processing, rather than conscious processing. In present study, we compared the neural activity in PFC between conscious and unconscious trials, and found the correlation between PFC activity and conscious perception. Although one iEEG study reported awareness-specific PFC activation, the awareness-related activity started 300 ms after the onset of visual stimuli, which was ~100 ms later than the early awareness related activity in our study. Also, due to the limited number of electrodes in the previous study (2 patients with 19 recording sites mostly in mesiofrontal and peri-insular regions), it was restricted while exploring the awareness-related activity in PFC. In the present study, the number of recording sites (245) were much more than previous study and covered multiple areas in PFC. Our results further show earlier awareness-related activity (~ 200 ms after visual stimuli onset), including ERP, HFA and PLV, which sheds new light on understanding of the role of PFC in conscious perception.

      We have added this discussion in the MS (lines 522-533);

      Minor weakness that should be mentioned in the Discussion: The intervals for the FP (fixation period) and Delay period were both fixed at 600 ms instead of randomly jittered, so that subjects likely had anticipatory activity predictably occurring with each grating and cue stimulus.

      Thank you very much for your comment. We agree that subjects might have anticipatory activity during experiment. Actually, the goal for us to design the task in this way is to try to balance the effect of attention and anticipation between aware and unaware conditions. We have added this discussion in the MS (lines 467-469);

      The faster reaction times for perceived/confident responses vs not perceived/unconfident responses has been reported many times previously in the literature and should be acknowledged rather than being claimed as a novel finding. Authors should modify p. 163 lines 160-162, first sentence of the Discussion lines 445-446 "reaction time.. shorter" claiming this was a novel finding; same for lines 464-467. Please see the following among others:

      Broggin E, Savazzi S, Marzi CA (2012) Similar effects of visual perception and imagery on simple reaction time. Q J Exp Psychol (Hove) 65 (1):151-164. doi:10.1080/17470218.2011.594896

      Chelazzi L, Marzi CA, Panozzo G, Pasqualini N, Tassinari G, Tomazzoli L (1988) Hemiretinal differences in speed of light detection in esotropic amblyopes. Vision Res 28 (1):95-104 Marzi CA, Mancini F, Metitieri T, Savazzi S (2006) Retinal eccentricity effects on reaction time to imagined stimuli. Neuropsychologia 44 (8):1489-1495. doi:10.1016/j.neuropsychologia.2005.11.012

      Posner MI (1994) Attention: the mechanisms of consciousness. Proceedings of the National Academy of Sciences of the United States of America 91 (16):7398-7403

      Sternberg S (1969) Memory-scanning: mental processes revealed by reaction-time experiments. Am Sci 57 (4):421-457

      Thanks. We have cited some of these papers in the revised manuscript due to the restricted number of citations.

      Methods lines 658-659: "results under LU and HA conditions were classified as the control group and were only used to verify and check the results during calculation." However the authors show these results in the figures and they are interesting. HA stimuli show earlier responses than NA stimuli. This is a valuable result which should be discussed and interpreted in light of the other findings.

      We thank very much for reviewer’s comment. We have made discussion accordingly in the revised MS (lines 535-536).

      General comment on figures: Many of the figure elements are tiny and the text labels and details can't be seen at all, especially single trial color plots, and the brain insets showing recording sites.

      We have modified the figures accordingly.

      Other minor comments: Typo: Figure 2 legend, line 169 "The contrast level resulted in an awareness percentage greater than 25%..." is missing a word and should say instead something like "The contrast level that resulted in an awareness percentage greater than 25%..."

      Thanks. We have corrected the typo accordingly.

      Figure 2 Table description in text line 190 says "proportions of recording sites" but the Table only shows number of recording sites and number of subjects, not "proportions." This should be corrected in the text.

      Thanks. We have corrected the error.

      Figure 3, and other figures, should always label the left and right hemispheres to avoid ambiguity.

      Thanks. We have made correction accordingly. In caption of Figure 2D (line 189), we modified the sentence as ‘In all brain images, right side of the image represents the right side of the brain’.

      Methods line 666. The saccadic latency calculations paragraph should have a separate heading before it, to separate it from the Behavioral data analysis section.

      Thanks. It has been corrected in line 725.

      Reviewer #2 (Public Review):

      The authors attempt to address a long-standing controversy in the study of the neural correlates of visual awareness, namely whether neurons in prefrontal cortex are necessarily involved in conscious perception. Several leading theories of consciousness propose a necessary role for (at least some sub-regions of) PFC in basic perceptual awareness (e.g., global neuronal workspace theory, higher order theories), while several other leading theories posit that much of the previously reported PFC contributions to perceptual awareness may have been confounded by task-based cognition that co-varied between the aware and unaware reports (e.g., recurrent processing theory, integrated information theory). By employing intracranial EEG in human patients and a threshold detection task on low-contrast visual stimuli, the authors assessed the timing and location of neural populations in PFC that are differentially activated by stimuli that are consciously perceived vs. not perceived. Overall, the reported results support the view that certain regions of PFC do contribute to visual awareness, but at time-points earlier than traditionally predicted by GNWT and HOTs.

      Reply: We appreciate very much for the reviewer’s encouraged opinion.

      Major strengths of this paper include the straightforward visual threshold detection task including the careful calibration of the stimuli and the separate set of healthy control subjects used for validation of the behavioral and eye tracking results, the high quality of the neural data in six epilepsy patients, the clear patterns of differential high gamma activity and temporal generalization of decoding for seen versus unseen stimuli, and the authors' interpretation of these results within the larger research literature on this topic. This study appears to have been carefully conducted, the data were analyzed appropriately, and the overall conclusions seem warranted given the main patterns of results.

      Reply: We appreciate very much for the reviewer’s encouraged opinion.

      Weaknesses include the saccadic reaction time results and the potential flaws in the design of the reporting task. This is not a "no report" paradigm, rather, it's a paradigm aimed at balancing the post-perceptual cognitive and motor requirements between the seen and unseen trials. On each trial, subjects/patients either perceived the stimulus or not, and had to briefly maintain this "yes/no" judgment until a fixation cross changed color, and the color change indicated how to respond (saccade to the left or right). Differences in saccadic RTs (measured from the time of the fixation color change to moving the eyes to the left or right response square) were evident between the seen and unseen trials (faster for seen). If the authors' design achieved what they claim on page 3, "the report behaviors were matched between the two awareness states ", then shouldn't we expect no differences in saccadic RTs between the aware and unaware conditions? The fact that there were such differences may indicate differences in post-perceptual cognition during the time between the stimulus and the response cue. Alternatively, the RT difference could reflect task-strategies used by subjects/patients to remember the response mapping rules between the perception and the color cue (e.g., if the YES+GREEN=RIGHT and YES+RED=LEFT rules were held in memory, while the NO mappings were inferred secondarily rather than being actively held in memory). This saccadic RT result should be better explained in the context of the goals of this particular reporting-task.

      The objective of present study is to assess whether PFC is involved in the emergence of visual awareness. To do so, it is crucial to determine the subjective awareness state as correct as possible. Considering the disadvantage of non-report paradigms in determining the subjective awareness state (Tsuchiya et al, TiCS, 2015; Mashour et al, Neuron, 2020), we employed a balanced report paradigm. It has been argued (Merten & Nieder, PNAS, 2011) that, in the balanced report paradigms, subjects could not prepare any motor response during the delay period because only after the appearance of a rule cue (change color of fixation point at the end of delay period) subjects were informed about the appropriate motor action. In this case, the post-perceptual processing during delay period might reflect the non-motor cognitive activity, such as working memory (Mashour et al. Neuron, 2020). Alternatively, as being mentioned by reviewer, the postperceptual processing might relate to planning to report perception, which is different for perceived and not perceived stimuli (Aru et al. Neurosci Biobehav Rev, 2012 ). Therefore, up to date, the understanding of the post-perceptual processing remains controversial. Considering reviewer’s comment together with other opinions, we have modified the description of our task as following: “we designed a visual awareness task that can minimize report-related motor confounding”. Also, we have changed “report-related” to “motor-related” in the rest of manuscript.

      Regarding the question whether the saccadic RT in our balanced response paradigm should be expected to be similar between aware and unaware condition, we think that the RT should be similar in case if the delay period is long enough for the decision of “no” to be completed. In fact, in a previous study (Merten & Nieder, PNAS, 2011), the neuronal encoding of “no” decision didn’t appear until 2s after the stimulus cue onset. However, in our task, the delay period lasted only 600 ms that was long enough to form the “yes” decision, but was not enough to form the “no” decision. It might be the reason that our data show shorter RT in aware condition than in unaware condition.

      We totally agree reviewer’s comment about the alternative interpretation for RT difference between aware and unaware condition in our study, i.e., reflecting task-strategies used by subjects/patients to remember the response mapping rules between the perception and the color cue (e.g., if the YES+GREEN=RIGHT and YES+RED=LEFT rules were held in memory, while the NO mappings were inferred secondarily rather than being actively held in memory). We have made additional discussion about these questions in the revised manuscript (lines 492496).

      Nevertheless, the current results do help advance our understanding of the contribution of PFC to visual awareness. These results, when situated within the larger context of the rapidly developing literature on this topic (using "no report" paradigms), e.g., the recent studies by Vishne et al. (2023) Cell Reports and the Cogitate consortium (2023) bioRxiv, provide converging evidence that some sub-regions of PFC contribute to visual awareness, but at latencies earlier than originally predicted by proponents of, especially, global neuronal workspace theory.

      We appreciate very much for the reviewer’s encouraged opinion.

      Reviewer #2 (Recommendations For The Authors):

      Abstract: "the spatiotemporal overlap between the awareness-related activity and the interregional connectivity in PFC suggested that conscious access and phenomenal awareness may be closely coupled." I strongly suggest revising this sentence. The current results cannot be used to make such a broad claim about p-consciousness vs. a-consciousness. This study used a balanced trial-by-trial report paradigm, which can only measure conscious access.

      We thank reviewer for this comment. We have withdrawn this sentence from the revised manuscript.

      Task design: A very similar task was used previously by Schröder et al. (2021) J Neurosci. See specifically, their Figure 1, and Figure 4B-C. Using almost the exact same "matching task", the authors of this previous study show that they get a P3b for both the perceived and not-perceived conditions, confirming that post-perceptual cognition/report confounds were not eliminated, but instead were present in (and balanced between) both the perceived/not-perceived trials due to the delayed matching aspect of the design. This previous paper should be cited and the P3b result should be considered when assessing whether cognition/report confounds were addressed in the current study.

      Thank you very much for your reminding about the study of Schröder et al. We are sorry for not citing this closely related study in our previous manuscript. Schröder et al. found while P3b showed significant difference between perceived and not-perceived trials in direct report task, the P3b was presented in both perceived/not-perceived trials and not significantly different in the matched task. Based on these findings, Schröder et al. argued that P3b represented the task specific post-perceptual cognition/report rather than the emergence of awareness per se. Considering the similarity of tasks between Schröder et al. and ours, we agree that our task is not able to totally eliminate the confound of post-perceptual cognition/report related activity with awareness related activity. Nevertheless, our task is able to minimize the confound of motorrelated activity with the emergence of awareness by separating them in time and balancing the direction of responsive movements. Therefore, we modified the term of “report-related” to “motor-related” in the text of revised manuscript.

      On page 2, lines 71-75, the authors' review of the Frassle et al. (2014) experiment should be revised for accuracy. In this study, all PFC activity did not disappear as the authors claim. Also, the main contrast in the Frassle et al. study was rivalry vs. replay. However, in both of these conditions, visual awareness was changing with the main difference being whether there was sensory conflict between the two eyes or not. Such a contrast would presumably subtract out the common activity patterns related to visual awareness changes, while isolating rivalry (and the resulting neural competition) vs. non-rivalry (and the lack of such competition) which is not broadly relevant for the goal of measuring neural correlates of visual awareness which are present in both sides of the contrast (rivalry and replay).

      Thank you very much for your suggestion. We agree that and revised in the MS (lines 71-76).

      ‘For instance, a functional magnetic resonance imaging (fMRI) study employing human binocular rivalry paradigms found that when subjects need to manually report the changing of their awareness between conflict visual stimuli, the frontal, parietal, and occipital lobes all exhibited awareness-related activity. However, when report was not required, awareness-related activation was largely diminished in the frontal lobe but remained in the occipital and parietal lobes’

      On page 2, lines 76-78, the authors write, "no-report paradigm may overestimate unconscious processing because it cannot directly measure the awareness state". This should be reworded for clarity, as report paradigms also do not "directly measure the awareness state". All measures of awareness are indirect, either via subjects verbal or manual reports, or via behaviors or other physiological measures like OKN, pupillometry, etc. It's also not clear as written why no-report paradigms might overestimate unconscious processing.

      Thank you very much for your suggestion. We agreed and modified the description. In lines 76-80:

      ‘Nevertheless, the no-report paradigm may overestimate the neural correlates of awareness by including unconscious processing, because it infers the awareness state through other relevant physiological indicators, such as optokinetic nystagmus and pupil size(Tsuchiya, Wilke, Frassle, & Lamme, 2015). In the absence of subjective reports, it remains controversial regarding whether the presented stimuli are truly seen or not.’

      However, the no-report paradigm may overestimate the neural correlates of awareness, because it infers the awareness state through other relevant physiological indicators, such as optokinetic nystagmus and pupil size(Tsuchiya et al., 2015) , in the absence of subjective reports and it remains controversial that whether the stimuli presented in such paradigm are truly seen as opposed to being merely potentially visible but unattended.

      On page 5, line 155, there is a typo. This should be Figure 2C, not 2B.

      Thanks. We have modified the description.

      On page 5, lines 160-162, the authors state, "The results showed that the saccadic reaction time in the aware trials was systematically shorter than that in the unaware trials. Such results demonstrate that visual awareness significantly affects the speed of information processing in the brain." I don't understand this. If subjects can never make a saccade until the fixation cross changes color, both for Y and N decisions, why would a difference in saccadic reaction times indicate anything about visual awareness affecting the speed of information processing in the brain? Doesn't this just show that the Red/Green x Left/Right response contingencies were easier to remember and execute for the Yes-I-did-see-it decisions compared to the No-I-didn't-see-it decisions?

      We agree and have made additional discussion about these questions in the revised manuscript (lines 492-496).

      ‘An alternative interpretation for RT difference between aware and unaware condition in our study is that the difference in task-strategies used by subjects/patients to remember the response mapping rules between the perception and the color cue (e.g., if the YES+GREEN=RIGHT and YES+RED=LEFT rules were held in memory, while the NO mappings were inferred secondarily rather than being actively held in memory).’

      In Figure 3B (and several other figures) due to the chosen view and particular brain visualization used, many readers will not know whether the front of brain is up and back of brain down or vise versa (there are no obvious landmarks like the cerebellum, temporal sulcus, etc.). I suggest specifying this in the caption or better yet on the figure itself.

      Thanks. We have added these descriptions in the caption of Figure 2D.

      Line 189 ‘In all brain images, right and up sides of each image represent the right and up sides of the brain’.

      In Figure 3B, the color scale may confuse some readers. When I first inspected this figure, I immediately thought the red meant positive voltage or activation, while the blue meant negative voltage or deactivation. Only later, I realized that any color here is meaningful. Not sure if an adjustment of the color scale might help, or perhaps not normalizing (and not taking absolute values of the voltage diffs, but maintaining the +/- diffs)?

      Thanks for reviewer’s comment. We are sorry for not clearly describing the reason why we normalized the activity in absolute value and chose the color scale from 0 to 20. The major reason is that it is not clearly understood so far regarding the biological characteristics of LFP polarity (Einevoll et al, Nat Rev Neurosci, 2013). To simplify such complex issue, we consider the change in magnitude of LFP during delay period in our task represents awareness related activity, regardless its actual value being positive or negative. Therefore, we first calculated the absolute value of activity difference between aware and unaware trials in individual recording site, then used Shepard's method (see Method for detailed information) to calculate the activity in each vertex and projected on the surface of brain template as shown in Fig. 3B.

      We have added the description in the MS (lines 794-800).

      We have tried to adjust the color scale from -20 to 20 according to reviewer’s suggestion. However, the topographic heatmap showed less distinguishable between brain regions with different strength of awareness related activity. Thus, we would like to keep the way as we used to analyze and present these results.

      Figure 3B: Why choose seemingly arbitrary time points in this figure? What's the significance of 247 and 314 and 381ms (why not show 200, 250, 300, etc.)? Also, are these single time-points or averages within a broader time window around this time-point, e.g., 225-275ms for the 250ms plot?

      Thank reviewer for this helpful comment. We are sorry for not clearly describing why we chose the 8 time points to demonstrate the spatiotemporal characteristics of awareness related activity in Fig. 3B. To identify the awareness related activity, we analyzed the activity difference between aware and unaware trials during delay period (180-650 ms after visual stimulus onset). The whole dynamic process has been presented in SI with a video (video S1). Here, we just sampled the activity at 8 time points (180 ms, 247 ms, 314 ms, etc.) that equally divided the 430 ms delay period.

      We have added the description in the MS (lines 213-215).

      Figure 3D: It's not clear how this figure panel is related to the data shown in Fig3A. In Fig3A, the positive amplitude diffs all end at around 400ms, but in Fig3D, these diffs extend out to 600+ms. I suggest adding clarity about the conversion being used here.

      Thanks for reviewer’s comment. We are sorry for not clearly describing the way to analyze the population activity (Fig. 3D) in the previous version of manuscript. Since it is not clearly understood so far regarding the biological characteristics of LFP polarity, to simplify such complex issue, we consider the change in magnitude of LFP during delay period in our task is awareness related activity, regardless its actual value being positive or negative. Therefore, while analyzing the awareness related population activity, we first calculate the absolute value of activity difference between aware and unaware trials in individual recording site, then pool the data of 43 recording sites together and calculate the mean and standard error of mean (SEM)(Fig. 3D). As you can see in Fig. 3A, the activity difference between aware (red) and unaware (blue) trials lasts until/after the end of delay period. Thus, the awareness related population activity in Fig 3D extends out to 600 ms.

      We have added the description in the MS (lines 769-777).

      Figure 6D could be improved by making the time labels much bigger, perhaps putting them on the time axis on the bottom rather than in tiny text above each brain.

      Thanks for reviewer’s comment. We have modified it accordingly.

      Page 18, line 480: "our results show that the prefrontal cortex still displays visual awareness-related activities even after eliminating the influence of the confounding variables related to subjective reports such as motion preparation" This is too strong of a statement. It's not at all clear whether confounding variables related to subjective reports (especially the cognition needed to hold in mind the Y/N decision about seeing the stimulus prior to the response cue) were eliminated with the design used here. In other places of the manuscript, the authors use "minimized" which is more accurate.

      Thanks for reviewer’s comment. We have modified it accordingly.

      Page 19, section starting on line 508: The authors should consider citing the study by Vishne et al. (2023), which was just accepted for publication recently, but has been posted on bioRxiv for almost a year now: https://www.biorxiv.org/content/10.1101/2022.08.02.502469v1 . And on page 20, line 563, the authors claim that to the best of their knowledge, they were the first to detect "ignition" in PFC in human subjects. Consider revising this statement, now that you know about the Vishne et al. paper.

      We agree.

      Thanks for your reminding about these papers. We have cited this study and made discussion in the revised manuscript (line 522-533). We agree that several iEEG studies have shown the early involvement of PFC in visual perception (Vishne et al. 2023; Khalaf et al. 2023; Kwon et al. 2021). However, in these studies, authors did not compare the neural activity between conscious and unconscious conditions, leaving the possibility that the ERP and HFA were correlated with the unconscious information processing rather than awareness-specific processing. In the present study, we compared the neural activity in PFC between conscious and unconscious trials, and found that the activity of PFC specifically correlated with conscious perception. As we mentioned in the previous version of manuscript, there is one iEEG study (Gaillard et al. 2009) that reported awareness-specific activity in PFC. However, the awareness related activity started more than 300 ms after the onset of visual stimuli, which was about 100 ms longer than the early awareness related activity in our study. Nevertheless, according to reviewer’s comment, we modified our argument as following in lines 621-623:

      ‘However, as discussed above, in contrast with previous studies, our study detected earlier awareness-specific ‘ignition’ in the human PFC, while minimizing the motor-related confounding.’

      Experimental task section of Methods: Were any strategies for learning the response cue matching task suggested to patients/subjects, and/or did any patients/subjects report which strategy they ended up using? For example, if I were a subject in this experiment, I would remember and mentally rehearse the rules: "YES+GREEN = RIGHT" and "YES+RED = LEFT". For trials in which I didn't see anything, I wouldn't need to hold 2 more rules in mind, as they can be inferred from the inverse of the YES rules (and it's much harder to hold 4 things in mind than 2). This extra inference needed to get to the NO+GREEN = LEFT and NO+RED = RIGHT rules would likely cause me to respond slightly slower to the NO trials compared to the YES trials, leading to saccadic RT effects in the same direction the authors found. More information about the task training and strategies used by patients/subjects would be helpful.

      We agree and discussed this in lines 492-496.

      Reviewer #3 (Public Review):

      The authors report a study in which they use intracranial recordings to dissociate subjectively aware and subjectively unaware stimuli, focusing mainly on prefrontal cortex. Although this paper reports some interesting findings (the videos are very nice and informative!) the interpretation of the data is unfortunately problematic for several reasons. I will detail my main comments below. If the authors address these comments well, I believe the paper may provide an interesting contribution to further specifying the neural mechanisms important for conscious access (in line with Gaillard et al., Plos Biology 2009).

      Reply: We appreciate very much for the reviewer’s encouraged opinion.

      The main problem with the interpretation of the data is that the authors have NOT used a so called "no-report paradigm". The idea of no report paradigms is that subjects passively view a certain stimulus without the instruction to "do something with it", e.g., detect the stimulus, immediately or later in time. Because of the confusion of this term, specifically being related to the "act of reporting", some have argued we should use the term no-cognition paradigm instead (Block, TiCS, 2019, see also Pitts et al., Phil Trans B 2018). The crucial aspect is that, in these types of paradigms, the critical stimulus should be task-irrelevant and thus not be associated with any task (immediately or later). Because in this experiment subjects were instructed to detect the gratings when cued 600 ms later in time, the stimuli are task relevant, they have to be reported about later and therefore trigger all kinds of (known and potentially unknown) cognitive processes at the moment the stimuli are detected in real-time (so stimulus-locked). You could argue that the setup of this delayed response task excludes some very specific report related processes (e.g., the preparation of an eye-movement), which is good, however this is usually not considered the main issue. For example when comparing masked versus unmasked stimuli (Gaillard et al., 2009 Plos Biology), these conditions usually also both contain responses but these response related processes are "averaged out" in the specific contrasts (unmasked > masked). In this paper, RT differences between conditions (that are present in this dataset) are taken care of by using this delayed response in this paper, which is a nice feature for that and is not the case for the above example set-up.

      Given the task instructions, and this being merely a delayed-response task, it is to be expected that prefrontal cortex shows stronger activity for subjectively aware versus subjectively unaware stimuli. Unfortunately, given the nature of this task, the novelty of the findings is severely reduced. The authors cannot claim that prefrontal cortex is associated with "visual awareness", or what people have called phenomenal consciousness (this is the goal of using no-cognition paradigms). The only conclusion that can be drawn is that prefrontal cortex activity is associated with accessing sensory input: and hence conscious access. This less novel observation has been shown many times before and there is also little disagreement about this issue between different theories of consciousness (e.g., global workspace theory and local recurrency theories both agree on this).

      We totally agree that the no-report/no-cognition paradigms contain less cognition within the post-perceptual processing than the report paradigms. We designed the balanced response task in order to minimize the motor related component from post-perceptual processing, even though this task does not eliminate the entire cognition from post-perceptual processing. Regarding reviewer’s comment that our task is not able to assess the involvement of PFC in the emergence of awareness, we have different opinion. As we mentioned in the manuscript, the findings of early awareness related activity (~200 ms) in PFC, which resemble the VAN activity in EEG studies, indicate the association of PFC with the emergence of visual awareness (phenomenal consciousness).

      The best solution at this point seems to rewrite the paper entirely in light of this. My advice would be to state in the introduction that the authors investigate conscious access using iEEG and then not refer too much to no-cognition paradigm or maybe highlight some different strategies about using task-irrelevant stimuli (see Canales-Johnson et al., Plos Biology 2023; Hesse et al., eLife 2020; Hatamimajoumerd et al Curr Bio 2022; Alilovic et al., Plos Biology 2023; Pitts et al., Frontiers 2014; Dwarakanth et al., Neuron 2023 and more). Obviously, the authors should then also not claim that their results solve debates about theories regarding visual awareness (in the "no-cognition" sense, or phenomenal consciousness), for example in relation to the debate about the "front or the back of the brain", because the data do not inform that discussion. Basically, the authors can just discuss their results in detail (related to timing, frequency, synchronization etc) and relate the different signatures that they have observed to conscious access.

      The objective of present study is to assess whether PFC is involved in the emergence of visual awareness (i.e., phenomenal consciousness). Interestingly, we found the early awareness related activity (~200 ms after visual stimulus onset), including ERP, high gamma activity and phase synchronization, in PFC, which indicate the association of PFC with the emergence of visual awareness. Therefore, we would like to keep the basic context of manuscript and make revision according to reviewers’ comments.

      On the other hand, we totally agree reviewer’s argument that the report paradigm is more suitable to study the access consciousness. Indeed, we have found that the awareness related activity in PFC could be separated into two subgroups, i.e., early activity with shorter latency (~200 ms after stimulus onset) and late activity with longer latency (> 350 ms after stimulus onset). In addition, the early activity was declined to the baseline level within ~200 ms during delay period, whereas the late activity lasted throughout the delay period and reached to the next stage of task (change color of the fixation point). Moreover, the early activity occurs primarily within the contralateral PFC of the visual stimulus, whereas the late activity occurs within both contralateral and ipsilateral PFC. While the early awareness related activity resembles the VAN activity in EEG studies (associating with p-consciousness), the late awareness related activity resembles the P3b activity (associating with a-consciousness). We are going to report these results in a separated paper soon.

      I think the authors have to discuss the Gaillard et al PLOS Biology 2009 paper in much more detail. Gaillard et al also report a study related to conscious access contrasting unmasked and masked stimuli using iEEG. In this paper they also report ERP, time frequency and phase synchronization results (and even Granger causality). Because of the similarities in approach, I think it would be important to directly compare the results presented in that paper with results presented here and highlight the commonalities and discrepancies in the Discussion.

      Thanks for reviewer’s comment. We have made additional analysis and detailed discussion accordingly. In addition, we also extended discussion with other relevant studies in the revised manuscript.

      In lines 528-549,

      ‘Although one iEEG study reported awareness-specific PFC activation, the awareness-related activity started 300 ms after the onset of visual stimuli, which was ~100 ms later than the early activity in our study. Also, due to the limited number of electrodes in PFC (2 patients with 19 recording sites mostly in mesiofrontal and peri-insular regions), their experiments were restricted while exploring the awareness-related activity in PFC. In the present study, the number of recording sites (245) were much more than previous study and covered more areas in PFC. Our results further show earlier awareness-related activity (~ 200 ms after visual stimuli onset), including ERP, HFA and PLV. These awareness-related activity in PFC occurred even earlier (~150 ms after stimulus onset) for the salient stimulus trials (Fig. 3A\D and Fig. 4A\D, HA condition).

      However, the proportions are much smaller than that reported by Gaillard et al, which peaked at ~60%. We think that one possibility for the difference may be due to the more sampled PFC subregions in present study and the uneven distribution of awareness-related activity in PFC. Meanwhile, we noticed that the peri-insula regions and middle frontal gyrus (MFG), which were similar with the regions reported by Gaillard et al, seemed to show more fraction of awarenessrelated sites than other subregions during the delay period (0-650 ms after stimulus onset). To test such possibility and make comparison with the study of Gaillard et al. we calculated the proportion of awareness-related site in peri-insula and MFG regions. We found although the proportion of awareness-related site was larger in peri-insula and MFG than in other subregions, it was much lower than the report of Gaillard et al. One alternative possibility for the difference between these two studies might be due to the more complex task in Gaillard et al. Nevertheless, we think these new results would contribute to our understanding of the neural mechanism underlying conscious perception, especially for the role of PFC.’ In lines 601-603:

      ‘The only human iEEG study reported that the phase synchronization of the beta band in the aware condition also occurred relatively late (> 300 ms) and mainly confined to posterior zones but not PFC.’

      As for the Granger Causality analysis between PFC and occipital lobe, while the aim of this study focused mainly on PFC and there were few recoding sites in occipital lobe, we would like to do this analysis in later studies after we collect more data.

      In the Gaillard paper they report a figure plotting the percentage of significant frontal electrodes across time (figure 4A) in which it can be seen that significant electrodes emerge after approximately 250 ms in PFC as well. It would be great if the authors could make a similar figure to compare results. In the current paper there are much more frontal electrode contacts than in the Gaillard paper, so that is interesting in itself.

      Thanks reviewer for this constructive comment. We made similar analysis as Gaillard et al. and plotted the results in the figure bellow. As you can see, the awareness related sites started to emerge about 200 ms after visual stimulus onset according to both ERP and HG activity. The proportion of awareness related sites reached peak at ~14% (8% for HG) in 300-400ms. However, the proportions are much smaller than that reported by Gaillard et al, which peaked at ~60%. We think that one possibility for the difference may be due to the more sampled PFC subregions in present study and the uneven distribution of awareness-related activity in PFC. Meanwhile, we noticed that the peri-insula regions and middle frontal gyrus (MFG), which were similar with the regions reported by Gaillard et al, seemed to show more fraction of awareness-related sites than other subregions during the delay period (0-650 ms after stimulus onset). To test such possibility and make comparison with the study of Gaillard et al. we calculated the proportion of awareness-related site in peri-insula and MFG regions. We found although the proportion of awareness-related site was larger in peri-insula and MFG than in other subregions, it was much lower than the report of Gaillard et al. One alternative possibility for the difference between these two studies might be due to the more complex task in Gaillard et al.

      We have added this figure and discussion to the revised manuscript as a new result (Figure 4E & S2 and lines 537-549).

      Author response image 1.

      Percentage of awareness-related sites in ERP and HG analysis. n, number of recording sites in PFC.

      Author response image 2.

      Percentage of awareness-related sites in ERP and HG analysis at parsopercularis and middle frontal gyrus (MFG). n, number of recording sites.

      In my opinion, some of the most interesting results are not highlighted: the findings that subjectively unaware stimuli show increased activations in the prefrontal cortex as compared to stimulus absent trials (e.g., Figure 4D). Previous work has shown PFC activations to masked stimuli (e.g., van Gaal et al., J Neuroscience 2008, 2010; Lau and Passigngham J Neurosci 2007) as well as PFC activations to subjectively unaware stimuli (e.g., King, Pescetelli, and Dehaene, Neuron 2016) and this is a very nice illustration of that with methods having more detailed spatial precision. Although potentially interesting, I wonder about the objective detection performance of the stimuli in this task. So please report objective detection performance for the patients and the healthy subjects, using signal detection theoretic d'. This gives the reader an idea of how good subjects were in detecting the presence/absence of the gratings. Likely, this reveals far above chance detection performance and in that case I would interpret these findings as "PFC activation to stimuli indicated as subjectively unaware" and not unconscious stimuli. See Stein et al., Plos Biology 2021 for a direct comparison of subjectively and objectively unaware stimuli.

      We gratefully appreciate for reviewer’s helpful and valuable comments. We do notice that the activity of PFC in subjectively unawareness condition (stimulus contrast near perceptual threshold) is significantly higher than stimulus absent condition. Such results, by using sEEG recordings with much higher spatial resolution than brain imaging and scalp EEG, support findings of previous studies (citations). Considering the question of neural correlation of unawareness processing is a hot and interesting topic, after carefully considering, we would like to report these results in a separate paper, rather than add these results in the current manuscript in order to avoid the distraction.

      According to reviewer’s comment about the objective detection performance of the stimuli in our task, we analyzed the signal detection theoretic d’. The values of d’ in patients and healthy subjects are similar (1.81±0.27 in patients and 2.12±0.37 in healthy subjects). Such results indicate that the objective detection performance of subjects in our task is well above the chance level. Since our task merely measures the subjective awareness, we agree reviewer’s comment about the interpretation of our results as “PFC activation to stimuli indicated the subjective unawareness rather than objective unawareness”. We will emphasize this point in our next paper.

      We have added the d prime in the MS (lines149-150).

      In Figure 7 of the paper the authors want to make the case that the contrast does not differ between subjectively aware stimuli and subjectively unaware stimuli. However so far they've done the majority of their analyses across subjects, and for this analysis the authors only performed within-subject tests, which is not a fair comparison imo. Because several P values are very close to significance I anticipate that a test across subjects will clearly show that the contrast level of the subjectively aware stimuli is higher than of the subjectively unaware stimuli, at the group level. A solution to this would be to sub-select trials from one condition (NA) to match the contrast of the other condition (NU), and thereby create two conditions that are matched in contrast levels of the stimuli included. Then do all the analyses on the matched conditions.

      Thank reviewer for the helpful comment. Regarding reviewer’s comment “However so far they've done the majority of their analyses across subjects, and for this analysis the authors only performed within-subject tests, which is not a fair comparison imo”, if we understand correctly, reviewer considered that it was fair if the analysis of neural activity in PFC was done across subjects but the stimulus contrast analysis between NA and NU was done individually. Actually, it is not the case. In neural activity analysis, the significant awareness-related sites were identified firstly in each individual subject (Fig. 3A and Fig 4A, and Methods), same as the analysis of stimulus contrast (see Methods). Only in the neural population activity analysis, the activity of awareness-related sites was pooled together and made further analysis.

      To further evidence the awareness related activity in PFC is not highly correlated with stimulus contrast, we compared the activity difference between two different stimulus contrast conditions, i.e., stimulus contrast difference between high-contrast aware (HA) and NA conditions (large difference, ~14%), and between NA and NU conditions (slight difference, ~0.2%). The working hypothesis is that, if PFC activity is closely correlated with the contrast of stimulus contrast, we expect to see the activity difference between HA and NA conditions is much larger than that between NA and NU conditions. To test this hypothesis, we analyzed data of two patients in which the previous analysis showed significant or near significant difference of stimulus contrast between NA and NU conditions (Author response image 1, below, patient #2 and 1). The results (Author response image 1) show that the averaged activity difference (0-650 ms after visual stimulus onset) between HA and NA was similar as the averaged activity difference between NA and NU trials, even though the stimulus contrast difference was much larger between HA and NA conditions than between NA and NU conditions. Such results indicate that the awareness-related activity in PFC cannot be solely explained by the contrast difference between NA and NU conditions. Based on these results, we think that it is not necessary to perform the analysis as reviewer’s comment “A solution to this would be to sub-select trials from one condition (NA) to match the contrast of the other condition (NU), and thereby create two conditions that are matched in contrast levels of the stimuli included. Then do all the analyses on the matched conditions”. Another reason that impedes us to do this analysis is due to the limited trial numbers in our dataset.

      Author response image 3.

      Relationship between stimulus contract and PFC activity. X axis represents the stimulus contrast difference between two paired conditions, i.e., aware versus unaware in near perceptual threshold conditions (NA – NU, red dots); aware in high contrast condition versus aware in near perceptual threshold condition (HA – NA, blue dots). Y axis represents the activity difference between paired stimulus conditions. The results show that activity difference is similar between two paired conditions regardless the remarkable contrast difference between two paired conditions. Such results indicate that the greater activity in NA trials than in NU trials (Fig. xx-xx) could not be interpreted by the slight difference in stimulus contrast between NA and NU trials.

      Related, Figure 7B is confusing and the results are puzzling. Why is there such a strong below chance decoding on the diagonal? (also even before stimulus onset) Please clarify the goal and approach of this analysis and also discuss/explain better what they mean.

      We have withdrawn Figure7B for the confusing decoding results on the diagonal.

      I was somewhat surprised by several statements in the paper and it felt that the authors may not be aware of several intricacies in the field of consciousness. For example, a statement like the following "Consciousness, as a high-level cognitive function of the brain, should have some similar effects as other cognitive functions on behavior (for example, saccadic reaction time). With this question in mind, we carefully searched the literature about the relationship between consciousness and behavior; surprisingly, we failed to find any relevant literature." This is rather problematic for at least two reasons. First, not everyone would agree that consciousness is a highlevel cognitive function and second there are many papers arguing for a certain relationship between consciousness and behavior (Dehaene and Naccache, 2001 Cognition; van Gaal et al., 2012, Frontiers in Neuroscience; Block 1995, BBS; Lamme, Frontiers in Psychology, 2020; Seth, 2008 and many more). Further, the explanation for the reaction time differences in this specific case is likely related to the fact that subjects' confidence in that decision is much higher in the aware trials than in the unaware trials, hence the speeded response for the first. This is a phenomenon that is often observed if one explores the "confidence literature". Although the authors have not measured confidence I would not make too much out of this RT difference.

      We agree that and modified accordingly in lines 492-507.

      ‘An alternative interpretation for RT difference between aware and unaware condition in our study, i.e., reflecting task-strategies used by subjects/patients to remember the response mapping rules between the perception and the color cue (e.g., if the YES+GREEN=RIGHT and YES+RED=LEFT rules were held in memory, while the NO mappings were inferred secondarily rather than being actively held in memory).

      Another possibility is that the reaction time is strongly modulated by the confident level, which has been described in previous studies(Broggin et al., 2012; Marzi et al., 2006). However, in previous studies, the confident levels were usually induced by presenting stimulus with different physical property, such as spatial frequency, eccentricity and contrast. However, the dependence of visual process on the salience of visual stimulus confounds with the effect of visual awareness on the reaction time of responsive movements, which is hard to attribute the shorter reaction time in more salient condition purely to visual awareness. In contrast, we create a condition (near aware threshold) in the present study, in which the saliency (contrast) of visual stimulus is very similar in both aware and unaware conditions in order to eliminate the influence of stimulus saliency in reaction time. We think that the difference in reaction time in our study is mainly due to the modulation of awareness state, which was not reported previously.’

      I would be interested in a lateralized analysis, in which the authors compare the PFC responses and connectivity profiles using PLV as a factor of stimulus location (thus comparing electrodes contralateral to the presented stimulus and electrodes ipsilateral to the presented stimulus). If possible this may give interesting insights in the mechanism of global ignition (global broadcasting), supposing that for contralateral electrodes information does not have to cross from one hemisphere to another, whereas for ipsilateral electrodes that is the case (which may take time). Gaillard et al refer to this issue as well in their paper, and this issue is sometimes discussed regarding to Global workspace theory. This would add novelty to the findings of the paper in my opinion.

      We gratefully appreciate reviewer’s helpful and available suggestions. We have made the analysis accordingly. We find that the awareness-related ERP activation in PFC occurs earlier only in the contralateral PFC with latency about 200 ms and then occurs in both contralateral and ipsilateral PFC about 100 ms later. In addition, the magnitude of awareness-related activity is stronger in the contralateral PFC than in ipsilateral PFC during the early phase (200-400 ms), then the activity becomes similar between contralateral and ipsilateral PFC. Moreover, the awareness related HG activity only appears in the contralateral PFC. Such results show the spatiotemporal characteristics of visual awareness related activity between two hemispheres. We are going to report these results in a separate paper soon.

      Reviewer #3 (Recommendations For The Authors):

      Some of the font sizes in the figures are too small.

      We have modified accordingly.

      To me, the abbreviations are confusing, (NA/NU etc). I would try to come up with easier ones or just not use abbreviations.

      We have modified accordingly and try to avoid to use the abbreviations.

      The data/scripts availability statement states "available upon reasonable request". I would suggest that the authors make the data openly available when possible, and I believe eLife requires that as well.

      Thanks for reviewer’s suggestions. Due to several ongoing studies based on this dataset, we would like to open our data after complete these studies if there is no restriction from national policy.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations For The Authors):

      Comment 1: The authors need to do more to cite the prior work of others. CCL2 allelic expression imbalance tied to the rs13900 alleles was first reported by Johnson et al. (Pharmacogenet Genomics. 2008 Sep; 18(9): 781-791) and should be cited in the Introduction on line 128 next to the Pham 2012 reference. Also, in the Results section, line 142, please provide references for the statement "We and others have previously reported a perfect linkage disequilibrium between rs1024611 in the CCL2 cis-regulatory region and rs13900 in its 3′ UTR" since the linkage disequilibrium for these 2 SNPs is not reported in the ENSEMBL server for the 1000 genomes dataset. #

      We thank the reviewer for pointing out the omission regarding the citation of prior work. We acknowledge that Johnson et al. (2008) reported the association between rs13900 and CCL2 allelic expression imbalance based on Snapshot methodology while examining _cis-_acting variants of 42 candidate genes. To acknowledge these prior studies, we have cited the previous works of Johnson et al. (Johnson et al., 2008) along with Pham et al. (Pham et al., 2012) that linked rs13900 to CCL2 allelic expression imbalance. The text in the introduction section (Lines 128-130) has been updated to reflect the above-mentioned changes.

      “We and others have demonstrated AEI in CCL2 using rs13900 as a marker with the T allele showing a higher expression level relative to C allele (Johnson et al., 2008; Pham et al., 2012).”

      We have cited some previous studies that suggested strong linkage disequilibrium between rs1024611 and rs13900 within CCL2 gene, with D’=1 and R<sup>2</sup>=0.96 (Hubal et al., 2010; Intemann et al., 2011; Kasztelewicz et al., 2017; Pham et al., 2012) on Line 144. To address the concern regarding unreported linkage disequilibrium between rs1024611 and rs13900, we reviewed the pairwise linkage disequilibrium data by population in the ENSEMBL server for 1000 Genome dataset and confirm that the linkage disequilibrium (LD) between rs1024611 and rs13900 has been observed, with D’=1 and R<sup>2</sup>=0.92 to 1.0 in specific populations. We have included a table (Author response table 1) depicting pairwise LD between rs13900 and rs1024611 as reported in the ENSEMBL server for the 1000 genome dataset, a URL reference to the ENSEMBL server data.

      Author response table 1.

      Pairwise linkage disequilibrium data between rs13900 and rs1024611 by population reported in the ENSEMBL server for the 1000 genome dataset

      F. Variant, Focus Variant; R<sup>2</sup>, correlation between the pair loci; D’, difference between the observed and expected frequency of a given haplotype.

      URL: https://www.ensembl.org/Homo_sapiens/Variation/HighLD?db=core;r=17:34252269-34253269;v=rs1024611;vdb=variation;vf=959559590;second_variant_name=rs13900

      Comment 2: Certain details of the experimental protocols need to be further elaborated or clarified to contextualize the significance of the findings. For example, in the results line 184 the authors state "Using nascent RNA allows accurate determination of mRNA decay by eliminating the effects of preexisting mRNA." How does measuring nascent RNA enable the accurate determination of mRNA decay? Doesn't it measure allele-specific mRNA synthesis? Please elaborate, as this is a key result of the study. Can the authors provide a reference supporting this statement?

      It is worthwhile to mention that mRNA decay can be precisely measured by eliminating the effect of any preexisting mRNA. Metabolic labeling with 4-thiouridine allows exclusive capture of newly synthesized RNA which will allow quantification of RNA decay eliminating any interference from preexisting RNA. We agree that nascent RNA measurement primarily reflects synthesis rate rather than degradation. However, in conjugation with actinomycin-D based inhibition studies it can be exploited for accurate mRNA decay determination of the newly synthesized RNA (Russo et al., 2017). Therefore, our aim was to use the nascent RNA to study decay kinetics. The imbalance in the CCL2 allele expression does occur at the transcriptional level as seen in non-actinomycin-D treatment group (Figure 2C) although the impact of post-transcriptional mechanisms that alter transcripts stability cannot be ruled out. Therefore, we employed a novel approach that could assess both the synthesis and the degradation by combining actinomycin-D inhibition and nascent RNA capture in the same experimental setup. In the presence of actinomycin-D, we could detect much greater allelic difference in the expression levels of the rs13900T and C allele four-hour post-treatment, suggesting a role for post-transcriptional mechanisms in CCL2 AEI.

      “We have expanded the method section in the revised draft to include experimental details on capture of nascent RNA and subsequent downstream analysis” (Lines 553-563).

      Newly synthesized RNA was isolated using the Click-It Nascent RNA Capture Kit (Invitrogen, Cat No: C10365) following the manufacturer’s protocol. Peripheral blood mononuclear cells (PBMCs) or monocyte-derived macrophages (MDMs) obtained from heterozygous individuals were stimulated with lipopolysaccharide (LPS) for 3 hours in presence of 0.2 mM 5-ethynyl uridine (EU) (Jao and Salic, 2008; Paulsen et al., 2013). After the pulse, the culture medium was replaced with fresh growth medium devoid of EU. To assess RNA stability, actinomycin-D (5 µg/mL) was added, and samples were collected at 0, 1, 2, and 4 h post-treatment. The EU RNA was subjected to a click reaction that adds a biotin handle which was then captured by streptavidin beads. The captured RNA was used for cDNA synthesis (Superscript Vilo kit, Cat No: 11754250), PCR amplification, and allelic quantification.”

      Comment 3: Also, they next state that the assay was carried out using cells treated with actinomycin D (line 186). Doesn't actinomycin D block transcription? The original study by Jia et al 2008 in PNAS reported that low concentration of ActD (100 nM) blocked RNA pol I and higher concentration (2 uM) blocked RNA pol II. This or the study on which the InVitrogen kit is based should be cited. The concentration of actinomycin D used to treat the cells should be given. They report that the T allele transcript was more abundant than the C allele transcript in nascent RNA. Why doesn't that argue for a transcriptional mechanism rather than an RNA-stability mechanism? This result should be discussed in the Discussion.

      In our study, we used a concentration of 5 µg/mL (3.98 µM), which as noted by the reviewer can effectively inhibit RNA polymerase II (Pl II) activity. We have updated our manuscript to include details and cited the original work of (Jao and Salic, 2008; Paulsen et al., 2013), which thoroughly investigate the effect of various concentrations of ActD on RNA polymerase I and II (Line no 557). A discussion of the RNA stability mechanism is provided in the Result section (Lines 196-198).

      Comment 4: In their bioinformatics analysis of the allele-specific CCL2 mRNAs, they reported that the analysis obtained a score of 1e (line 214). What does that mean? Is it significant?

      We acknowledge that the notation “a score of 1e” was unclear and thank the reviewer for pointing it out. We have clarified its significance in the revised manuscript. The following text has been included in the result section (Line no 223)

      “The score of 1e was obtained using RBP-Var, a bioinformatics tool that scores variants involved in posttranscriptional interaction and regulation (Mao et al., 2016). Here, the annotation system rates the functional confidence of variants from category 1 to 6. While Category 1 is the most significant category and includes variants that are known to be expression quantitative trait loci (eQTLs), likely affecting RBP binding site, RNA secondary structure and expression, category 6 is assigned to minimal possibility to affect RBP binding. Additionally, subcategories provide further annotation ranging from the most informational variants (a) to the least informational variant (e). Reported 1e denotes that the variant has a motif for RBP binding. Although the employed scoring system is hierarchical from 1a to 1e, with decreasing confidence in the variant’s function. However, all the variants in category 1 are considered potentially functional to some degree.”

      Comment 5: In Figure 3A, why is the rare SNP rs181021073 shown? This SNP does not comeup anywhere else in the paper. For clarity, it should be removed from Figure 3A.

      We thank the reviewer for pointing out the error in Figure 3A and apologize for the oversight. We agree that the SNP rs1810210732 is not mentioned anywhere in the manuscript and its inclusion in Figure 3A may have caused confusion. We have removed this SNP from the revised figure.

      Comment 6: For the RNA EMSA results presented in Fig. 4C with recombinant ELAVL1 (HuR), there is clearly a loss of unbound T allele probe with increasing concentrations of the recombinant protein (without a concomitant increase in shifted complex). This suggests that the T allele probe is degraded or loses its fluorescent tag in the presence of recombinant HuR, whereas the C allele probe does not. The quantitation of the shifted complex presented in Fig. 4D as a percentage of bound and unbound probe is therefore artificially elevated for the T allele compared to the C allele. In fact, there seems to be little difference between the shifted complexes with the T and C allele probes. The authors should explain this difference in free probe levels.

      We appreciate the constructive critique of the reviewer regarding the RNA EMSA results in Fig. 4C. To address this, we repeated the experiments to analyze the differential binding of rs13900T/C allele bearing probes with increasing concentration of the recombinant HuR. No degradation/ loss of fluorescence tag for T allele was noted in presence of recombinant HuR in three independent experiments (Author response image 1). This indicates that both the probes with C or T allele show comparable stability and are not affected by increasing concentration of recombinant HuR. The apparent reduction in the unbound T allele probe in Figure 4C may be due to saturation at higher HuR concentration rather than degradation.

      Author response image 1.

      Differential binding and stability of oligoribonucleotide probes containing rs13900C or T alleles with recombinant HuR. (A) REMSA with labeled oligoribonucleotides containing either rs13900C or rs13900T and recombinant HuR at indicated concentrations. (B&C) Representative quantitative densitometric analysis of HuR binding to the oligoribonucleotides bearing rs13900 T or C. The signal in the bound fractions were normalized with the free probe. The figure represents data from three independent experiments (mean ± SEM).

      Comment 7: In the Methods section, concentrations and source of reagents should be given. For example, what was the bacterial origin of LPS and concentration? What concentration of actinomycin D? What was the source? Was it provided with the nascent RNA kit? In describing the riboprobes used for REMSA, please underline the allele in the sequences (lines 549 and 550).

      Thank you for your detailed feedback and suggestions regarding the Materials and Methods Section. We regret the oversight in providing detailed information on reagent concentrations and sources in the method section. We have now rectified this omission and have provided the necessary details and a summary of material/reagents used is presented as a supplementary table (Supplementary Table 4) to enable others to replicate our experiments accurately. Regarding the description of riboprobes for RNA Electrophoretic Mobility Shift Assay, we underlined and bold the allele in the sequences as suggested (Lines 603-604).

      Comment 8: For polysome profiling on line 603, please provide a protocol for the differentiation of primary macrophages from monocytes (please cite an original protocol, not a prior paper that does not give a detailed protocol).

      We agree with the reviewer’s comment and have included the following text for primary macrophage differentiation from monocytes in the method section cited the original protocol (Line 668).

      “Human monocytes were isolated from fresh blood as described earlier (Gavrilin et al., 2009) with slight modification. Briefly, peripheral blood mononuclear cells were isolated by density gradient centrifugation using Histopaque, followed by immunomagnetic negative selection using EasySep Human Monocyte isolation kit. A high purity level for CD14+ cells was consistently achieved (≥90%) through this procedure, as confirmed by flowcytometry. The purified monocytes were immediately used for macrophage differentiation by treating them with 50 ng/mL M-CSF (PeproTech) for 72 h and flow cytometric measurement of surface markers CD64+,

      CD206+, CD44 was used to confirm the differentiation”. This data is now shown in the new Supplementary Figure S6.

      Comment 9: In the legend of Figure 2, please replace "5 ug of actinomycin D" with the actual concentration used.

      We appreciate your attention to detail and thank you for pointing out the error in the legend of Figure 2. We regret the oversight and have made the suggested change (Line 739).

      Comment 10: In the Discussion, the authors cite the study of CCL2 mRNA stabilization by HuR in mice by Sasaki et al (lines 407-9). Is regulation of CCL2 mRNA by HuR in the mouse relevant to human studies?

      How conserved is the 3'UTR of mouse and human CCL2? Is the rs13900 variant located in a conserved region? How many putative HuR sites are found in the 3'UTR of human and mouse CCL2 3'UTR? Does HuR dimerize (see Pabis et al 2019, NAR)? This information could be added to the Discussion.

      Thank you for your valuable comment. We appreciate your suggestion to include information on the dimerization of HuR in our discussion. While reporting the overall structure and domain arrangement of HuR, Pabis et al. (2019) deciphered dimerization involving Trp261 in RRM3 as key requirement for functional activity of HuR in vitro. This finding provides additional context for understanding HuR’s role in regulating CCL2 expression. We have added the following few lines in the discussion (Lines 421-428) acknowledging HuR’s ability to dimerize and cite the relevant references.

      “HuR consists of three RNA recognition motifs (RRMs) that are highly conserved and canonical in nature (Ripin et al., 2019). In absence of RNA the three RRMs are flexibly linked but upon RNA binding they transition to a more compact arrangement. Mutational analysis revealed that HuR function is inseparably linked to RRM3 dimerization and RNA binding. Dimerization enables recognition of tandem AREs by dimeric HuR (Pabis et al., 2019) and explains how this protein family can regulate numerous targets found in pre-mRNAs, mature mRNAs, miRNAs and long noncoding RNAs.”

      We aligned the CCL2 3’UTR from five different mammalian species and found that the region flanking rs13900/ HuR binding site is relatively conserved (Author response image 2). Based on PAR-CLIP datasets there are four HuR binding regions in human CCL2 3’ UTR (Lebedeva et al., 2011). However, the region overlapping rs13900 seems to be predominantly involved in the CCL2 regulation (Fan et al., 2011). This information has been included in the discussion.

      Author response image 2.

      Cross-species alignment of the CCL2 3’UTR region flanking the rs13900 using homologous regions from 5 different mammals. (Hu, Human; CH, Chimps; MO, Mouse; RA, Rat; DO, Dog, rs13900 is shown within the brackets Y, pyrimidine)

      Reviewer #2 (Recommendations For The Authors):

      Comment 1: The supplemental figures need appropriate figure legends.

      We regret the oversight and thank the reviewer for bringing it to our attention. We have now included the figure legend for the supplemental figures in the revised manuscript.

      Comment 2: The data on LPS-induced CCL2 expression in PBMCs should be represented as a scatter plot with statistical significance to enhance clarity and interpretability.

      We thank the reviewer for this constructive suggestion. In the revised Figure 2A the induction of CCL2 expression by LPS in PBMCs obtained from 6 volunteers is represented as a scatter plot. We have also included individual data points in the updated figure and statistical significance to improve clarity and interpretability.

      Comment 3: The stability of CCL2 mRNA in control cells needs comparison with treated cells for context. The stability of a housekeeping gene (such as GAPDH or ACTB) should always be included as a control in actinomycin D experiments. Clarify the differential stability of rs13900C vs. rs13900T alleles.

      We used 18S to normalize data for the mRNA stability studies, as it is abundant and has been recommended for such studies, as it is relatively unaltered when compared to other housekeeping genes following Act D treatment in well-controlled studies (Barta et al., 2023). We also compared Ct values between the Act D-treated samples and the Act D-untreated samples in this study and found them to be comparable (Author response image 3).

      Author response image 3.

      Ct values of 18s rRNA in ACT-D and control samples in Fig 2.

      Comment 4: In the main text and the methods, the authors state that nascent RNA was obtained in the presence of actinomycin D and EU. However, actinomycin D blocks the transcription of nascent RNAs, therefore the findings in Figure 2C do not reflect nascent RNA

      Please see our response to Reviewer 1 Comment 2. We would like to emphasize that to assess the differential role of the rs13900 in nascent RNA decay we integrated nascent RNA labeling and transcriptional inhibition. Briefly, PBMC from a heterozygous individual were either unstimulated or stimulated with LPS and pulsed with 5-ethynyl uridine (0.2 mM) for 3 h and the media was replaced with EU free growth medium. RNA was obtained at 0,1, 2 and 4 h following actinomycin-D treatment (5 µg/mL) to assess the stability of nascent RNA.

      Comment 5: Figure 4A is not clearly described or labeled. What are lanes 2 and 6?

      Figure 4 has now been updated to clearly describe all the lanes. Lanes 2 and 6 represent the mobility shift seen following the incubation by whole cell extracts and oligonucleotide bearing rs13900C and rs13900T probes respectively.

      Comment 6: Figure 4C and Figure 4D: the charts in Figure 4D do not seem to reflect the changes in Figure 4C. How was the mean variant calculated? How do the authors explain the different quantities in unbound/free RNA in rs13900C compared to rs13900T?

      We appreciate the constructive critique of the reviewer regarding the RNA EMSA results in Fig. 4C. To address this, we repeated the experiments to analyze the differential binding of rs13900T/C probes with increasing concentration of the recombinant HuR. No degradation/ loss of fluorescence tag in presence of HuR was noted in case of T allele (Author response image 1). This indicates that both the C and T allele probes exhibit comparable stability and are not affected by increasing the concentration of recombinant HuR. The apparent reduction in the unbound T allele probe in Figure 4C may be due to saturation due to higher HuR concentration rather than degradation. Also please note under limiting HuR concentration (50µM) there is more binding of purified HuR by the T bearing oligoribonucleotide (compare lanes 2 & 6 in Author response image 1).

      Comment 7: Figure 5A does not look like an IP. The authors should show the heavy and light chains and clarify why there is co-precipitation of beta-actin with IgG and HuR. Also, they should include input samples. Figure 5B: given that in a traditional RIP the mRNA is not cross-linked and fragmented, any region of CCL2 mRNA would be amplified, not just the 3'UTR. In other words, Figure 5B can be valuable to show the enrichment of CCL2 mRNA in general, but not the enrichment of a specific region.

      We understand the reviewer’s concern on Figure 5A and 5B. Due to sample limitations we are unable to confirm these results using heavy and light chains antibodies. However, it is important to note that co-precipitation of β-actin with IgG and HuR can be due to its non-specific binding with protein G. In a recent study non-specific precipitation by protein G or A was reported for proteins such as p53, p65 and β-actin (Zeng et al., 2022). We are including a figure provided by MBL Life Sciences as the quality check document for their RIP Assay Kit (RN 1001) that was used in our study. It is evident from Author response image 4 that even pre-clearing the lysate may not remove the ubiquitously expressed proteins such as β-actin or GAPDH and they will persist as contaminants in pull-down samples. Hence the presence of β-actin in the IgG and HuR IP fractions may be due to non-specific interactions with the agarose beads.

      Author response image 4.

      MBL RIP-Assay Kit’s Quality Check. Quality check of immunoprecipitated endogenous PTBP1 expressed in Jurkat cells. Lane 1: Jurkat (WB positive cells), Lane 2: Jurkat + normal Rabbit IgG, Lane 3: Jurkat+ anti-PTBP1.

      We agree with the reviewer’s comments that traditional RIP without cross-linking and fragmentation allows amplification of any region of CCL2 mRNA. However, the upregulation of CCL2 gene expression in α-HuR immunoprecipitated samples indirectly reflects the enrichment of CCL2 mRNA associated with HuR. Moreover, 3’-UTR targeting primers were used for amplification to examine HuR binding at this region. We believe this approach ensures that the above enrichment specifically reflects HuR association with the 3’-UTR rather than other parts of the transcript.

      Comment 8: Construct Validation in Luciferase Assays (Figure 6): The authors need to confirm equal transfection amounts of constructs and show changes in luciferase mRNA levels. It would be better to use a dual luciferase construct for internal normalization.

      We would like to thank the reviewer for his concern and comments related to the luciferase reporter assay. As mentioned in the Methods equal transfection amount (0.5 µg) were used in our study (Line 658). We chose to normalize the reporter activity using total protein concentration instead of using a dual-reporter system to avoid crosstalk with co-transfected control plasmids. This is now included in the Materials and Method section (Lines 662-664). The optimized design of the LightSwitch Assay system used in our study allows a single assay design when a highly efficient transfection system is used (as recommended by the manufacturer). We verified the presence of the correct insert in the CCL2 Light Switch 3’UTR reporter constructs (Author response image 5). We also sequenced the vector backbone of both constructs to rule out any inadvertently added mutations.

      Author response image 5.

      Schematic of the Lightswitch 3’UTR vector. (A) Vector information. The vector contains a multiple cloning site (MCS) upstream of the Renilla Luciferase gene (RenSP). Human 3’UTR CCL2 is cloned into MCS downstream of the reporter gene and it becomes a part of a hybrid transcript that contains the luciferase coding sequence used to the UTR sequence of CCL2. Constructs containing rs13900C or rs13900T allele were generated using site-specific mutagenesis on CCL2 LightSwitch 3’UTR reporter. The constructs were validated by Sanger sequencing. (B&C) Sequence chromatograph of the constructs containing CCL2-3’UTR insert showing rs13900C and rs13900T respectively. The result confirms the fidelity of the constructs used in the reporter assay.

      Comment 9: Polysome Data Presentation: The authors should present the distribution of luciferase mRNA (rs13900T and rs13900C) in all fractions separately and include data on the translation of a control like ACTB or GAPDH.

      Since our assessment of CCL2 allele-specific enrichment in the polysome fractions from MDMs of heterozygous donors did not yield a consistent pattern for differential loading (Supplementary Table3), we used a 3’UTR reporter-based assays that estimated the impact of rs13900 T and C alleles on overall translational output (translatability). The translatability was calculated as luciferase activity normalized by luciferase mRNA levels after adjusting for protein and 18S rRNA using a previously reported method (Zhang et al., 2017). As the measurement of relative allele enrichment in polysome fractions was not included in our invitro reporter assays, it is not possible to present the distribution of luciferase mRNA in various fractions separately. Author response image 6 shows the proportion of CCL2 mRNA in different fractions corresponding to cytosolic, monosome and polysome fractions obtained from MDM lysates from heterozygous donors along with 18S rRNA quantification.

      Author response image 6.

      Determination of rs13900C/T allelic enrichment in polysome fractions and its effect on polysome loading. Polysome profile obtained by sucrose gradient centrifugation of macrophages before and after stimulation with LPS (1 µg/mL) for 3 h. (A&B) The CCL2 mRNA shifts from monosome-associated fractions to heavier polysomes following LPS stimulation, indicating increased translation efficiency. (C&D) In contrast, the distribution of 18S shows no significant shift due to LPS treatment. (mean ± SEM, n=4). The percentage of mRNA loading on polysome was calculated using ΔCT method (mean ± SEM, n=4). (E&F) CCL2 AEI measurement in polysomes of macrophages from heterozygous donors (n=2). Genomic and cDNA were subjected to Sanger sequencing and the peak height of both the alleles were used to determine the relative abundance of each allele.

      Comment 10: Please explain in detail how primary monocytes were transfected with siRNAs for more than 72 hours. Typically, primary monocytes are very hard to transfect, have a very limited lifespan in culture (around 48 hours), and show a high level of cell death upon transfection. If monocytes were differentiated from macrophages, explain in detail how it was done and provide supporting citations from the literature.

      We agree with the challenges associated with transfecting primary monocytes, including their limited lifespan in culture and susceptibility to cell death following transfection and apologize for not elaborating the method section on lentiviral transduction of primary macrophages. To overcome these limitations, we utilized monocytes undergoing differentiation into macrophages rather than fully differentiated macrophages for our experiments. Cells were transfected by slightly modifying the method described by Plaisance-Bonstaff et.al 2019 (Plaisance-Bonstaff et al., 2019). Briefly, monocytes were purified from PBMCs obtained from homozygous donors for rs13900 C or rs13900T by negative selection. Upon purification cells were resuspended in 24 well plates at a seeding density of 0.5 x10<sup>6</sup> cells per well and were further cultured in the medium supplemented with 50 ng/mL M-CSF (Fig S7 and Fig. S6). After 24 h, ready to use GFP-tagged pCMV6-HuR or CMV-null lentiviral particles (Amsbio, Cambridge, M.A) were transduced into 0.5 x10<sup>6</sup> cells in presence of polybrene (60 µg/mL) at a MOI of 1. The cells were processed for HuR and CCL2 expression 72 h after transduction after stimulation with LPS for 3 h. This data is now shown in new Supplementary Figure S7.

      Comment 11: The authors should prove the binding of HuR to the 3'UTR of CCL2 not only in vitro but also in cells. For this aim, a CLIP including RNA fragmentation followed by RT-PCR or sequencing would be more informative than a RIP. It would be helpful also to demonstrate the different binding to the 3'UTR variants (rs13900C vs. rs13900T).

      We thank the reviewer for his valuable suggestion on validating binding of HuR to the 3’UTR in cells. It is important to highlight that several independent datasets including CLIP have already demonstrated that HuR binds to the 3’UTR of CCL2 including the region spanning the rs13900 locus. We have summarized the relevant studies in a tabular form (Supplementary Table-2). We are unable to confirm these results in new experiments due to sample limitation. The already existing data and experimental evidence provided in this manuscript strongly suggest that HuR binds within the 3’UTR. Also, a previously published study (Fan et al, 2011) showed that only the first 125 bp of the CCL2 3’UTR that flanks rs13900 showed strong binding to HuR but not the CCL2 coding region or other regions of 3’UTR. This further suggests that the HuR binding to the CCL2 is localized to the 3’UTR that flanks rs13900. Please note that the primers used for amplification of the RIP material were 3’-UTR specific.

      Comment 12: To quantify nascent RNA, Figure 2C should be replaced by new experiments. To label nascent RNA, authors can perform a run on/run-off experiments only with EU, without actinomycin D. As aforementioned, ActD blocks the transcription of new RNA, therefore is not useful for studying nascent RNA.

      We thank the reviewer for the suggestion and would like to emphasize that while measuring the rs13900C/T allelic ratio in nascent RNA, the experimental setup included evaluating the AEI both in presence and absence of the transcriptional inhibitor actinomycin D. The data presented in Figure 2C shows that the AEI in presence of actinomycin D is amplified in comparison to non-actinomycin D treatment. This provides definitive evidence to our hypothesis that rs13900T confers greater stability to the CCL2 message. We apologize for the oversight of not mentioning non-ACT D treatment in the methods. Necessary changes have been made to the revised manuscript (Lines 553-63).

      Comment 13: The authors should also investigate the role of TIA1 as a potential RBP and explore the possibility that TIA1 may interact more with the C allele to suppress translation.

      Based on the existing studies, we highlighted the importance of RNA-binding proteins such as TIA1 and U2AF56 that may interact with CCL2 transcript (Lines 408-09). However, exploring TIA1 binding and its functional consequences are beyond the scope of the current study. We thank the reviewer for this comment and this aspect will be pursued in future studies.

      Comment 14: It would be informative if the authors included study limitations and potential clinical implications of these findings, particularly regarding therapeutic approaches targeting CCL2.

      We would like to inform the reviewer that the submitted manuscript included the limitations of our study. They were discussed at appropriate places and were not included as a separate section. For instance, Line 398 emphasizes the need for in-depth studies for association of rs13900 and canonical CCL2 transcript. The need for additional studies regarding SNP-induced structural changes in RNA and its implication for RBP accessibility was highlighted at Lines 417-419. The inconclusive results of differential loading of polysomes and the need to conduct further research on the impact of rs13900 on CCL2 translatability in primary cells (Lines 457-459). We noted at Lines 484-485 about our further studies exploring the differential binding of HuR to the other regions of CCL2 3’UTR.

      Multiple studies have indicated that functional interference of HuR as a novel therapeutic strategy, particularly in the context of cancer, inflammation, neurodegeneration, and autoimmune disorders. These approaches include inhibitors such as MS-444, KH-3, and CMLD-2 that disrupt the interaction between HuR and ARE elements or mRNAs of target genes involved in disease pathology (Chaudhary et al., 2023; Fattahi et al., 2022; Lang et al., 2017; Liu et al., 2020; Wang et al., 2019; Wei et al., 2024), offering a potential new avenue for disease treatment. Findings from our studies provide unique insights on regulation of CCL2 expression by both rs13900 and HuR. We strongly believe that the SNP rs13900 and HuR represent a new druggable target for M/M-mediated disorders such as inflammatory diseases, cancer, and cardiovascular diseases. The potential clinical implications have been discussed in the revised manuscript (Lines 487-494)

      References

      Barta, N., Ordog, N., Pantazi, V., Berzsenyi, I., Borsos, B.N., Majoros, H., Pahi, Z.G., Ujfaludi, Z., Pankotai, T., 2023. Identifying Suitable Reference Gene Candidates for Quantification of DNA Damage-Induced Cellular Responses in Human U2OS Cell Culture System. Biomolecules 13.

      Chaudhary, S., Appadurai, M.I., Maurya, S.K., Nallasamy, P., Marimuthu, S., Shah, A., Atri, P., Ramakanth, C.V., Lele, S.M., Seshacharyulu, P., Ponnusamy, M.P., Nasser, M.W., Ganti, A.K., Batra, S.K., Lakshmanan, I., 2023. MUC16 promotes triple-negative breast cancer lung metastasis by modulating RNA-binding protein ELAVL1/HUR. Breast Cancer Res 25, 25.

      Fan, J., Ishmael, F.T., Fang, X., Myers, A., Cheadle, C., Huang, S.K., Atasoy, U., Gorospe, M., Stellato, C., 2011. Chemokine transcripts as targets of the RNA-binding protein HuR in human airway epithelium. J Immunol 186, 2482-2494.

      Fattahi, F., Ellis, J.S., Sylvester, M., Bahleda, K., Hietanen, S., Correa, L., Lugogo, N.L., Atasoy, U., 2022. HuR-Targeted Inhibition Impairs Th2 Proinflammatory Responses in Asthmatic CD4(+) T Cells. J Immunol 208, 38-48.

      Hubal, M.J., Devaney, J.M., Hoffman, E.P., Zambraski, E.J., Gordish-Dressman, H., Kearns, A.K., Larkin, J.S., Adham, K., Patel, R.R., Clarkson, P.M., 2010. CCL2 and CCR2 polymorphisms are associated with markers of exercise-induced skeletal muscle damage. J Appl Physiol (1985) 108, 1651-1658.

      Intemann, C.D., Thye, T., Forster, B., Owusu-Dabo, E., Gyapong, J., Horstmann, R.D., Meyer, C.G., 2011. MCP1 haplotypes associated with protection from pulmonary tuberculosis. BMC Genet 12, 34.

      Jao, C.Y., Salic, A., 2008. Exploring RNA transcription and turnover in vivo by using click chemistry. Proc Natl Acad Sci U S A 105, 15779-15784.

      Johnson, A.D., Zhang, Y., Papp, A.C., Pinsonneault, J.K., Lim, J.E., Saffen, D., Dai, Z., Wang, D., Sadee, W., 2008. Polymorphisms affecting gene transcription and mRNA processing in pharmacogenetic candidate genes: detection through allelic expression imbalance in human target tissues. Pharmacogenet Genomics 18, 781791.

      Kasztelewicz, B., Czech-Kowalska, J., Lipka, B., Milewska-Bobula, B., Borszewska-Kornacka, M.K., Romanska, J., Dzierzanowska-Fangrat, K., 2017. Cytokine gene polymorphism associations with congenital cytomegalovirus infection and sensorineural hearing loss. Eur J Clin Microbiol Infect Dis 36, 1811-1818. Lang, M., Berry, D., Passecker, K., Mesteri, I., Bhuju, S., Ebner, F., Sedlyarov, V., Evstatiev, R., Dammann, K., Loy, A., Kuzyk, O., Kovarik, P., Khare, V., Beibel, M., Roma, G., Meisner-Kober, N., Gasche, C., 2017. HuR Small-Molecule Inhibitor Elicits Differential Effects in Adenomatosis Polyposis and Colorectal Carcinogenesis. Cancer Res 77, 2424-2438.

      Lebedeva, S., Jens, M., Theil, K., Schwanhausser, B., Selbach, M., Landthaler, M., Rajewsky, N., 2011. Transcriptome-wide analysis of regulatory interactions of the RNA-binding protein HuR. Mol Cell 43, 340-352.

      Liu, S., Huang, Z., Tang, A., Wu, X., Aube, J., Xu, L., Xing, C., Huang, Y., 2020. Inhibition of RNA-binding protein HuR reduces glomerulosclerosis in experimental nephritis. Clin Sci (Lond) 134, 1433-1448.

      Mao, F., Xiao, L., Li, X., Liang, J., Teng, H., Cai, W., Sun, Z.S., 2016. RBP-Var: a database of functional variants involved in regulation mediated by RNA-binding proteins. Nucleic Acids Res 44, D154-163.

      Pabis, M., Popowicz, G.M., Stehle, R., Fernandez-Ramos, D., Asami, S., Warner, L., Garcia-Maurino, S.M., Schlundt, A., Martinez-Chantar, M.L., Diaz-Moreno, I., Sattler, M., 2019. HuR biological function involves RRM3-mediated dimerization and RNA binding by all three RRMs. Nucleic Acids Res 47, 1011-1029.

      Paulsen, M.T., Veloso, A., Prasad, J., Bedi, K., Ljungman, E.A., Tsan, Y.C., Chang, C.W., Tarrier, B., Washburn, J.G., Lyons, R., Robinson, D.R., Kumar-Sinha, C., Wilson, T.E., Ljungman, M., 2013. Coordinated regulation of synthesis and stability of RNA during the acute TNF-induced proinflammatory response. Proc Natl Acad Sci U S A 110, 2240-2245.

      Pham, M.H., Bonello, G.B., Castiblanco, J., Le, T., Sigala, J., He, W., Mummidi, S., 2012. The rs1024611 regulatory region polymorphism is associated with CCL2 allelic expression imbalance. PLoS One 7, e49498.

      Plaisance-Bonstaff, K., Faia, C., Wyczechowska, D., Jeansonne, D., Vittori, C., Peruzzi, F., 2019. Isolation, Transfection, and Culture of Primary Human Monocytes. J Vis Exp.

      Ripin, N., Boudet, J., Duszczyk, M.M., Hinniger, A., Faller, M., Krepl, M., Gadi, A., Schneider, R.J., Sponer, J., Meisner-Kober, N.C., Allain, F.H., 2019. Molecular basis for AU-rich element recognition and dimerization by the HuR C-terminal RRM. Proc Natl Acad Sci U S A 116, 2935-2944.

      Russo, J., Heck, A.M., Wilusz, J., Wilusz, C.J., 2017. Metabolic labeling and recovery of nascent RNA to accurately quantify mRNA stability. Methods 120, 39-48.

      Wang, J., Hjelmeland, A.B., Nabors, L.B., King, P.H., 2019. Anti-cancer effects of the HuR inhibitor, MS-444, in malignant glioma cells. Cancer Biol Ther 20, 979-988.

      Wei, L., Kim, S.H., Armaly, A.M., Aube, J., Xu, L., Wu, X., 2024. RNA-binding protein HuR inhibition induces multiple programmed cell death in breast and prostate cancer. Cell Commun Signal 22, 580.

      Zeng, X., Zeng, W.H., Zhou, J., Liu, X.M., Huang, G., Zhu, H., Xiao, S., Zeng, Y., Cao, D., 2022. Removal of nonspecific binding proteins is required in co-immunoprecipitation with nuclear proteins. Biotechniques 73, 289-296.

      Zhang, X., Chen, X., Liu, Q., Zhang, S., Hu, W., 2017. Translation repression via modulation of the cytoplasmic poly(A)-binding protein in the inflammatory response. Elife 6.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The primary weakness of the paper concerns its conclusion of having generated "homogenous mature microglia", partly based on the RNAseq analysis. However, the comparison of gene profiles was carried out only between "hiPSC-derived mature microglia" and the proliferating myeloid progenitors. While the transcriptome profiles revealed a trend of enrichment of microglia-like gene expression in "hiPSC-derived mature microglia" compared to proliferating myeloid progenitors, this is not sufficient to claim they are "mature microglia". It is important that one carries out a comparative analysis of the RNAseq data with those of primary human microglia, which may be done by leveraging the public database. To convincingly claim these cells are mature microglia, questions need to be addressed including how similar the molecular signatures of these cells are compared with the fully differentiated primary microglia cell or if they remain progenitor-like or take on mosaic properties, and how they distinguish from macrophages.

      We greatly appreciate the insightful comments and suggestions from the reviewers, which were instrumental in enhancing our data analysis and organization. In response to the feedback, we have updated the terminology from “mature microglia” to simply “microglia” while clarifying in our text that these are fully differentiated microglia under single-type cell culture conditions.

      Guided by the reviewer's advice, we incorporated RNA-seq data from human brain microglia studies conducted by Dr. Poon and Dr. Blurton-Jones' Lab (Abud et al., Neuron, 2017) and Dr. Huitinga's Lab (van der Poel et al., Nat Commun, 2019). We then conducted a comparative analysis of the gene expression profiles between our fully differentiated hiPSC-derived microglia and those from fetal/adult brain microglia (see Fig.2. Suppl. B, C and D; Suppl. table 1 and table 2). The correlation analysis revealed that our hiPSC-derived microglia closely resemble fetal and adult brain microglia, distinguishing them significantly from monocytes and inflammatory monocytes.

      (2) While the authors attempted to demonstrate the functional property of "hiPSC-derived mature microglia" in culture, they used LPS challenge, which is an inappropriate assay. This is because human microglia respond poorly to LPS alone but need to be activated by a combination of LPS with other factors, such as IFNγ. Their data that "hiPSC-derived mature microglia" showed robust responses to LPS indeed implicates that these cells do not behave like mature human microglia.

      We appreciate the feedback received. In response, we cultured hiPSC-derived microglia cells and subjected them to treatments with IFNγ, LPS, and a combination of both IFNγ+LPS, as illustrated in Figure 3 suppl. Our findings revealed that the IFNγ+LPS combination notably enhanced the expression of IL1a, IL1b, TNFa, CCL8, and CXCL10, whereas IL6 and CCL2 levels remained unchanged. Treatment with IFNγ alone significantly elevated the expression of TNFa, CCL8, CXCL10, and CCL2. These outcomes align with the findings reported by Rustenhoven et al. (Sci Rep, 2016), suggesting that the functionality of our hiPSC-derived microglia cells closely mirrors that of primary human adult microglia cells.

      (3) The resolution of Figs. 4 - 6 is so low that even some of the text and labels are hardly readable. Based on the morphology shown in Fig. 4 and the statement in line 147, these hiPSC-derived "cells altered their morphology to a rounded shape within an hour of incubation and rapidly internalized the fluorescent-labeled particles". This is a peculiar response. Usually, microglia do not respond to fluorescent-labeled zymosan by turning into a rounded shaped within an hour when they internalize them. Such a behavior usually implicates weak phagocytotic capacity.

      Thank you for your insightful comments. During submission, the main text's PDF version was converted online, resulting in low-quality output. We have since updated this with a high-resolution version. The observed alterations in cell morphology following zymosan phagocytosis may be attributed to the high zymosan concentration used (2mg/ml). We conducted an assessment to understand the impact of zymosan concentration on the morphology of hiPSC-derived microglial cells, as shown in Figure 4 suppl B. Our findings indicate that microglia cells adopt an amoeboid, rounded shape at zymosan concentrations exceeding 20ug/ml. To clarify this point, we have amended the text to read: "The cells altered their morphology and rapidly internalized the fluorescent-labeled particles."

      (4) Data presented in Fig. 5 are not very convincing to support that transplanted cells were immunopositive for "human CD11b (Fig.5C), as well as microglia signature markers P2ry12 and TMEM119 (Fig.5D)" (line 167). The resolution and magnification of Fig. 5D is too low to tell the colocalization of tdT and human microglial marker immunolabeling. In the flat-mount images (C, I), hCD11b immunolabeling is not visible in the GCL or barely visible in the IPL. This should be discussed.

      We are grateful for the reviewer's comments. As previously mentioned, the low quality of the images was due to the online conversion of the PDF version. We have now submitted both high-quality PDF and Word versions for the reviewer's assessment. In these high-quality versions, the colocalization of tdT with human P2ry12 and TMEM119 is distinctly visible. Additionally, we have updated the hTMEM119 staining images in Figure 5D. The results from hCD11b staining align with those observed in mouse CD11b staining, notably showing more effective staining in the outer plexiform layer (OPL) microglia cells. The reason for this—whether it pertains to a staining issue, a variance in CD11b expression among microglia cells in the OPL and ganglion layer (GL), or differences in the samples due to varying conditions—is not yet clear and warrants further investigation.

      (5) Microglia respond to injury by becoming active and lose their expression of the resting state microglial marker, such as P2ry12, which is used in Fig. 6 for detection of migrated microglia. To confirm that these cells indeed respond to injury like native microglia, one should check for activated microglial markers and induction of pro-inflammatory cytokines in the sodium iodate-injury model.

      The reviewer's insights are spot-on. We utilized preserved retinas to extract mRNA, which was then reverse-transcribed to cDNA for conducting qRT-PCR using human-specific primers, as detailed in the updated Table 5. The findings revealed that following retinal pigment epithelium (RPE) injury for 3 days, the transplanted hiPSC-derived microglial cells exhibited an increase in the production of inflammatory cytokines and upregulated genes related to phagocytosis, migration, and adhesion. Conversely, there was a decrease in the expression of microglia-specific signature genes and neurotrophic factors, as demonstrated in Figure 7 suppl.

      Reviewer #1 (Recommendations For The Authors):

      Line 52: "Microglia cell repopulation research suggests that: 1) if no injury or infection occurs, retinal microglia cells can sustain their homeostasis indefinitely" - this statement is too strong or delivers a confusing message; it needs clarification or to be backed up by evidence. Recent single cell RNA sequencing analyses suggest that even under a normal condition, residential microglia do not present as a single homeostatic cell cluster, rather a subpopulation of activated inflammatory microglia are constantly detectable in the normal retina. This is likely because normal retinal neurons can be stressed due to various reasons, such as the temporal accumulation of misfolded proteins, exposed to strong light, or ageing, etc.

      We appreciate the comments. We changed the sentence to read, "Microglia cell repopulation research suggests that: 1) retinal resident microglia cells can sustain their population with the local dividing and migration if any perturbations do not exceed the threshold of the recovery speed by local neighbor microglia cells."

      Line 83: "we applied an appropriate protocol for culturing human iPSC-derived microglia cells" - it would be more appropriate if the word "appropriate" can be replaced by either "unique" or a phrase like "we adopted a (previously published) protocol...".

      Thanks! We changed it to “We modified a previously published protocol to culture human iPSC-derived microglia cells.".

      Fig. 1F,G: A method of flow cytometry will provide more comprehensive cell quantification for percentages of positively labeled cells than cell counts under high magnification confocal images.

      Thanks for the comments! We agreed with the reviewer. Given the experimental resources available, the quantifications of confocal images did provide a reasonable assessment. We will perform flow cytometry analysis in future experiments.

      Reviewer #2 (Public review):

      Weaknesses:

      Gene expression analysis of mature microglia cells should be better interpreted and it would be beneficial to compare the iPSC-derived microglia gene set to a human microglial cell line (for example, HMC3) instead of myeloid progenitor cells.<br /> The way that the manuscript has been written, unfortunately, is not optimal. I recommend that the entire manuscript be edited and proofread in English. The text contains spelling and grammar mistakes, and the manuscript is inconsistent in several parts. The manuscript should also be revised for a scientific paper format.

      We appreciate the reviewer's comments and have taken them into consideration along with similar inquiries from Reviewer 1. Following the suggestions, we conducted a comparison of gene expression profiles between our hiPSC-derived microglia and those from fetal/adult brain microglia, as depicted in the updated Fig.2. Suppl. B, C and D; as well as in the Suppl. table 1 and table 2. The correlation analysis demonstrated that the hiPSC-derived microglia cells closely resemble fetal and adult brain microglia, significantly differing from monocytes and inflammatory monocytes. Additionally, we have revised the manuscript to adhere more closely to the conventional scientific format.

      Reviewer #2 (Recommendations For The Authors):

      Specific suggestions for improvement:

      - Regarding the characterization of human iPSC-derived microglia, P2RY12 is a general hematopoietic cell marker. One cannot judge the maturity of microglia only by P2RY12 expression (for example, line 261). The expression of more specific markers such as TMEM119 and PROS1 should be studied and discussed.

      We are thankful for the reviewer's valuable feedback. In response:

      We have removed the term "mature" and clarified that the hiPSC-derived microglia we studied are fully differentiated within single-type cell culture conditions.

      We performed a comparative analysis of the gene expression profiles between our hiPSC-derived microglia and microglia from human brains, as illustrated in the updated Fig.2. Suppl. B, C and D. The results affirm that hiPSC-derived microglia closely resemble human fetal and adult microglia.

      We noted that the expression of TMEM119 in hiPSC-derived microglia under in vitro single-type cell culture conditions is notably low, as shown in the below A. This suggests that the stimulatory factors in our single-type cell culture might not sufficiently induce TMEM119 expression in microglia. The necessity for a retinal environment or interaction with neuronal and/or other glial cells for TMEM119 expression mirrors the behavior of infiltrating peripheral monocytes in pathological conditions, which initially lack TMEM119 but later differentiate into microglial-like macrophages that express TMEM119, as reported by Ma et al. in Sci Rep (2017).

      Additionally, our findings suggest that PROS1 is not uniquely characteristic of microglia but is expressed across a variety of cell types. Within our specific culture conditions, we noted a higher expression of PROS1 in microglial progenitor cells, as shown in Author response image 1B and C.

      Author response image 1.

      - In Figure 2, Part E, the names of the genes or pathways in the figure are not clear, and are these genes the set that are the most differentially expressed between iPSCs-derived microglia and MPC? The analysis needs more explanation.

      We regret any confusion caused by our previous explanation. To clarify, we compiled a list of microglia-enriched genes from the research conducted by Barres BA Lab (Bennett et al., Proc Natl Acad Sci U S A, 2016) and from our own RNA sequencing data of mouse retinal microglia, identifying a total of 130 genes predominantly expressed in microglia (Suppl. Table 3). We then applied this gene list to analyze our hiPSC-derived microglia RNA sequencing data, resulting in the identification of 71 microglia-specific genes. These 71 genes were subjected to Ingenuity Pathway Analysis (IPA) to visualize the signaling pathways involved. The details of these microglia genes can be found in the updated suppl. table 3.

      - Lines 124 to 128 mention that high expression of Stat3, IL1b, and IL6 and their central role in pathway analysis emphasize the efficiency of the maturation protocol. Regarding the fact that Stat3, IL1b, and IL6 are contributors to proinflammatory pathways, it is not convincing that the high expression of these genes in iPSC-derived microglia demonstrates the efficiency of the maturation protocol, given that microglia are not stimulated.

      Thanks for the comments! We added the sentences about the comparison results between hiPSC-derived microglia and human brain microglia. We have also replaced the “mature” with “functional.” The sentence reads, “Thus, our method of obtaining differentiated microglia is a reliable method to generate a large number of homogenous functional microglia cells.”

      - Statistical analysis is missing for some graphs, for example, figures 1-3 and 5.

      We appreciate the comments. We have added the statistical results in the revised version.

      - The legend for Figure 3 needs to be rewritten. The graphs or applied assays should be explained in the legend, not the interpretation of the data.

      The legend was rewritten.

      - There is no Figure 3 in the supplement figures file.

      We added Figure 3. Suppl.

      - hTMEM119 staining in Figure 5, Part D, is mostly background. Please provide another image.

      The images were unclear after on-line converting due to the low number of pixels. We replaced them with new hTMEM119 staining images in Figure 5D.

      - In line 176, figure 5I has been forgotten to be mentioned.

      Thank you very much! We added 5I.

      - Lines 241 to 244 state that more than 50% of the AMD-associated genes are highly expressed in retinal microglia according to Fig. discussion suppl A & B. It is not clear that the gene set that was used for analysis is from a healthy retinal microglia or AMD-related ones. Please explain precisely.

      Thank you for your feedback. The gene list we referenced originates from a Genome-Wide Association Study (GWAS) that compared patients with Age-related Macular Degeneration (AMD) to healthy cohorts. We did not directly utilize this list in our experiments but referred to it to underscore the importance of microglia cells in the context of AMD.

      Some of the English proofreading and manuscript format comments:

      Line 805: Iba1 is written in lowercase. Is it human IBA1? It is not consistent with the way it is written in the text (in line 117, for example).

      Thank you for pointing out the error. We reformed all Iba1 as “Iba1”. The Iba1 we used here are all from Wako (#019–19741), which labels both mouse and human microglial cells.

      Line 814: microglia-enriched gene expression instead of microglia-enrich gene expression

      Thank you! We changed it.

      Line 345: Starting a sentence with lower case letter.

      Thank you! We changed it.

      Line 342: Myeloid lineage instead of myeloid cell linage.

      Thank you! We changed it.

      Line 815: What does FPKM stand for? The abbreviations should be explained.

      The FPKM is the abbreviation of Fragments Per Kilobase of transcript per Million mapped reads. We added it in the text.

      Line 309: The manuscript has occasionally referred to PLX-5622 without a minus. Please follow a uniform format.

      We changed all “PLX5622” to “PLX-5622”.

      Lines 327-331: should be rewritten.

      The mentioned paragraph was rewritten.

      Lines 335-340: should be rewritten.

      The mentioned sentence was rewritten.

      Line 135: qRT-PCR instead of QPCR," as it is also mentioned in the methods and material. The correction also applies to all the QPCRs in the text.

      We changed “QPCR” with “qRT-PCR”

      Figure 3: Graph B should be right side of graph A

      Images description: It is better to have the images description in the left side of the image, for example, figure 5 part B, GL, IPL and OPL

      Thanks for the suggestion. We changed the image organization as per the reviewer’s advice.

      Lines 258 to 260 in the discussion have also been repeated with the same words in the introduction.

      The mentioned paragraph was rewritten.

      Lines 327-331 should be rewritten.

      The mentioned paragraph was rewritten.

      Lines 335-340 should be rewritten.

      The mentioned paragraph was rewritten.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Comments

      Reviewer 1

      (1) Despite the well-established role of Netrin-1 and UNC5C axon guidance during embryonic commissural axons, it remains unclear which cell type(s) express Netrin-1 or UNC5C in the dopaminergic axons and their targets. For instance, the data in Figure 1F-G and Figure 2 are quite confusing. Does Netrin-1 or UNC5C express in all cell types or only dopamine-positive neurons in these two mouse models? It will also be important to provide quantitative assessments of UNC5C expression in dopaminergic axons at different ages.

      Netrin-1 is a secreted protein and in this manuscript we did not examine what cell types express Netrin-1. This question is not the focus of the study and we consider it irrelevant to the main issue we are addressing, which is where in the forebrain regions we examined Netrin-1+ cells are present. As per the reviewer’s request we include below images showing Netrin-1 protein and Netrin-1 mRNA expression in the forebrain. In Figure 1 below, we show a high magnification immunofluorescent image of a coronal forebrain section showing Netrin-1 protein expression.

      Author response image 1.

      This confocal microscope image shows immunofluorescent staining for Netrin-1 (green) localized around cell nuclei (stained by DAPI in blue). This image was taken from a coronal section of the lateral septum of an adult male mouse. Scale bar = 20µm

      In Figures 2 and 3 below we show low and high magnification images from an RNAscope experiment confirming that cells in the forebrain regions examined express Netrin-1 mRNA.

      Author response image 2.

      This confocal microscope image of a coronal brain section of the medial prefrontal cortex of an adult male mouse shows Netrin-1 mRNA expression (green) and cell nuclei (DAPI, blue). Brain regions are as follows: Cg1: Anterior cingulate cortex 1, DP: dorsopeduncular cortex, fmi: forceps minor of the corpus callosum, IL: Infralimbic Cortex, PrL: Prelimbic Cortex

      Author response image 3.

      A higher resolution image from the same sample as in Figure 2 shows Netrin-1 mRNA (green) and cell nuclei (DAPI; blue). DP = dorsopeduncular cortex

      Regarding UNC5c, this receptor homologue is expressed by dopamine neurons in the rodent ventral tegmental area (Daubaras et al., 2014; Manitt et al., 2010; Phillips et al., 2022). This does not preclude UNC5c expression in other cell types. UNC5c receptors are ubiquitously expressed in the brain throughout development, performing many different developmental functions (Kim and Ackerman, 2011; Murcia-Belmonte et al., 2019; Srivatsa et al., 2014). In this study we are interested in UNC5c expression by dopamine neurons, and particularly by their axons projecting to the nucleus accumbens. We therefore used immunofluorescent staining in the nucleus accumbens, showing UNC5 expression in TH+ axons. This work adds to the study by Manitt et al., 2010, which examined UNC5 expression in the VTA. Manitt et al. used Western blotting to demonstrate that UNC5 expression in VTA dopamine neurons increases during adolescence, as can be seen in the following figure:

      References:

      Daubaras M, Bo GD, Flores C. 2014. Target-dependent expression of the netrin-1 receptor, UNC5C, in projection neurons of the ventral tegmental area. Neuroscience 260:36–46. doi:10.1016/j.neuroscience.2013.12.007

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254-10.20110.2011

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Phillips RA, Tuscher JJ, Black SL, Andraka E, Fitzgerald ND, Ianov L, Day JJ. 2022. An atlas of transcriptionally defined cell populations in the rat ventral tegmental area. Cell Reports 39:110616. doi:10.1016/j.celrep.2022.110616

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      (2) Figure 1 used shRNA to knockdown Netrin-1 in the Septum and these mice were subjected to behavioral testing. These results, again, are not supported by any valid data that the knockdown approach actually worked in dopaminergic axons. It is also unclear whether knocking down Netrin-1 in the septum will re-route dopaminergic axons or lead to cell death in the dopaminergic neurons in the substantia nigra pars compacta?

      First we want to clarify and emphasize, that our knockdown approach was not designed to knock down Netrin-1 in dopamine neurons or their axons. Our goal was to knock down Netrin-1 expression in cells expressing this guidance cue gene in the dorsal peduncular cortex.

      We have previously established the efficacy of the shRNA Netrin-1 knockdown virus used in this experiment for reducing the expression of Netrin-1 (Cuesta et al., 2020). The shRNA reduces Netrin-1 levels in vitro and in vivo.

      We agree that our experiments do not address the fate of the dopamine axons that are misrouted away from the medial prefrontal cortex. This research is ongoing, and we have now added a note regarding this to our manuscript.

      Our current hypothesis, based on experiments being conducted as part of another line of research in the lab, is that these axons are rerouted to a different brain region which they then ectopically innervate. In these experiments we are finding that male mice exposed to tetrahydrocannabinol in adolescence show reduced dopamine innervation in the medial prefrontal cortex in adulthood but increased dopamine input in the orbitofrontal cortex. In addition, these mice show increased action impulsivity in the Go/No-Go task in adulthood (Capolicchio et al., Society for Neuroscience 2023 Abstracts)

      References:

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (3) Another issue with Figure1J. It is unclear whether the viruses were injected into a WT mouse model or into a Cre-mouse model driven by a promoter specifically expresses in dorsal peduncular cortex? The authors should provide evidence that Netrin-1 mRNA and proteins are indeed significantly reduced. The authors should address the anatomic results of the area of virus diffusion to confirm the virus specifically infected the cells in dorsal peduncular cortex.

      All the virus knockdown experiments were conducted in wild type mice, we added this information to Figure 1k.

      The efficacy of the shRNA in knocking down Netrin-1 was demonstrated by Cuesta et al. (2020) both in vitro and in vivo, as we show in our response to the reviewer’s previous comment above.

      We also now provide anatomical images demonstrating the localization of the injection and area of virus diffusion in the mouse forebrain. In Author response image 4 below the area of virus diffusion is visible as green fluorescent signal.

      Author response image 4.

      Fluorescent microscopy image of a mouse forebrain demonstrating the localization of the injection of a virus to knock down Netrin-1. The location of the virus is in green, while cell nuclei are in blue (DAPI). Abbreviations: DP: dorsopeduncular cortex IL: infralimbic cortex

      References:

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (4) The authors need to provide information regarding the efficiency and duration of knocking down. For instance, in Figure 1K, the mice were tested after 53 days post injection, can the virus activity in the brain last for such a long time?

      In our study we are interested in the role of Netrin-1 expression in the guidance of dopamine axons from the nucleus accumbens to the medial prefrontal cortex. The critical window for these axons leaving the nucleus accumbens and growing to the cortex is early adolescence (Reynolds et al., 2018b). This is why we injected the virus at the onset of adolescence, at postnatal day 21. As dopamine axons grow from the nucleus accumbens to the prefrontal cortex, they pass through the dorsal peduncular cortex. We disrupted Netrin-1 expression at this point along their route to determine whether it is the Netrin-1 present along their route that guides these axons to the prefrontal cortex. We hypothesized that the shRNA Netrin-1 virus would disrupt the growth of the dopamine axons, reducing the number of axons that reach the prefrontal cortex and therefore the number of axons that innervate this region in adulthood.

      We conducted our behavioural tests during adulthood, after the critical window during which dopamine axon growth occurs, so as to observe the enduring behavioral consequences of this misrouting. This experimental approach is designed for the shRNa Netrin-1 virus to be expressed in cells in the dorsopeduncular cortex when the dopamine axons are growing, during adolescence.

      References:

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018b. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      (5) In Figure 1N-Q, silencing Netrin-1 results in less DA axons targeting to infralimbic cortex, but why the Netrin-1 knocking down mice revealed the improved behavior?

      This is indeed an intriguing finding, and we have now added a mention of it to our manuscript. We have demonstrated that misrouting dopamine axons away from the medial prefrontal cortex during adolescence alters behaviour, but why this improves their action impulsivity ability is something currently unknown to us. One potential answer is that the dopamine axons are misrouted to a different brain region that is also involved in controlling impulsive behaviour, perhaps the dorsal striatum (Kim and Im, 2019) or the orbital prefrontal cortex (Jonker et al., 2015).

      We would also like to note that we are finding that other manipulations that appear to reroute dopamine axons to unintended targets can lead to reduced action impulsivity as measured using the Go No Go task. As we mentioned above, current experiments in the lab, which are part of a different line of research, are showing that male mice exposed to tetrahydrocannabinol in adolescence show reduced dopamine innervation in the medial prefrontal cortex in adulthood, but increased dopamine input in the orbitofrontal cortex. In addition, these mice show increased action impulsivity in the Go/No-Go task in adulthood (Capolicchio et al., Society for Neuroscience 2023 Abstracts)

      References

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Jonker FA, Jonker C, Scheltens P, Scherder EJA. 2015. The role of the orbitofrontal cortex in cognition and behavior. Rev Neurosci 26:1–11. doi:10.1515/revneuro2014-0043 Kim B, Im H. 2019. The role of the dorsal striatum in choice impulsivity. Ann N York Acad Sci 1451:92–111. doi:10.1111/nyas.13961

      (6) What is the effect of knocking down UNC5C on dopamine axons guidance to the cortex?

      We have found that mice that are heterozygous for a nonsense Unc5c mutation, and as a result have reduced levels of UNC5c protein, show reduced amphetamine-induced locomotion and stereotypy (Auger et al., 2013). In the same manuscript we show that this effect only emerges during adolescence, in concert with the growth of dopamine axons to the prefrontal cortex. This is indirect but strong evidence that UNC5c receptors are necessary for correct adolescent dopamine axon development.

      References

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      (7) In Figures 2-4, the authors only showed the amount of DA axons and UNC5C in NAcc. However, it remains unclear whether these experiments also impact the projections of dopaminergic axons to other brain regions, critical for the behavioral phenotypes. What about other brain regions such as prefrontal cortex? Do the projection of DA axons and UNC5c level in cortex have similar pattern to those in NAcc?

      UNC5c receptors are expressed throughout development and are involved in many developmental processes (Kim and Ackerman, 2011; Murcia-Belmonte et al., 2019; Srivatsa et al., 2014). We cannot say whether the pattern we observe here is unique to the nucleus accumbens, but it is certainly not universal throughout the brain.

      The brain region we focus on in our manuscript, in addition to the nucleus accumbens, is the medial prefrontal cortex. Close and thorough examination of the prefrontal cortices of adult mice revealed practically no UNC5c expression by dopamine axons. However, we did observe very rare cases of dopamine axons expressing UNC5c. It is not clear whether these rare cases are present before or during adolescence.

      Below is a representative set of images of this observation, which is now also included as Supplementary Figure 4:

      Author response image 5.

      Expression of UNC5c protein in the medial prefrontal cortex of an adult male mouse. Low (A) and high (B) magnification images demonstrate that there is little UNC5c expression in dopamine axons in the medial prefrontal cortex. Here we identify dopamine axons by immunofluorescent staining for tyrosine hydroxylase (TH, see our response to comment #9 regarding the specificity of the TH antibody for dopamine axons in the prefrontal cortex). This figure is also included as Supplementary Figure 4 in the manuscript. Abbreviations: fmi: forceps minor of the corpus callosum, mPFC: medial prefrontal cortex.

      References:

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254- 10.20110.2011

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      (8) Can overexpression of UNC5c or Netrin-1 in male winter hamsters mimic the observations in summer hamsters? Or overexpression of UNC5c in female summer hamsters to mimic the winter hamster? This would be helpful to confirm the causal role of UNC5C in guiding DA axons during adolescence.

      This is an excellent question. We are very interested in both increasing and decreasing UNC5c expression in hamster dopamine axons to see if we can directly manipulate summer hamsters into winter hamsters and vice versa. We are currently exploring virus-based approaches to design these experiments and are excited for results in this area.

      (9) The entire study relied on using tyrosine hydroxylase (TH) as a marker for dopaminergic axons. However, the expression of TH (either by IHC or IF) can be influenced by other environmental factors, that could alter the expression of TH at the cellular level.

      This is an excellent point that we now carefully address in our methods by adding the following:

      In this study we pay great attention to the morphology and localization of the fibres from which we quantify varicosities to avoid counting any fibres stained with TH antibodies that are not dopamine fibres. The fibres that we examine and that are labelled by the TH antibody show features indistinguishable from the classic features of cortical dopamine axons in rodents (Berger et al., 1974; 1983; Van Eden et al., 1987; Manitt et al., 2011), namely they are thin fibres with irregularly-spaced varicosities, are densely packed in the nucleus accumbens, sparsely present only in the deep layers of the prefrontal cortex, and are not regularly oriented in relation to the pial surface. This is in contrast to rodent norepinephrine fibres, which are smooth or beaded in appearance, relatively thick with regularly spaced varicosities, increase in density towards the shallow cortical layers, and are in large part oriented either parallel or perpendicular to the pial surface (Berger et al., 1974; Levitt and Moore, 1979; Berger et al., 1983; Miner et al., 2003). Furthermore, previous studies in rodents have noted that only norepinephrine cell bodies are detectable using immunofluorescence for TH, not norepinephrine processes (Pickel et al., 1975; Verney et al., 1982; Miner et al., 2003), and we did not observe any norepinephrine-like fibres.

      Furthermore, we are not aware of any other processes in the forebrain that are known to be immunopositive for TH under any environmental conditions.

      To reduce confusion, we have replaced the abbreviation for dopamine – DA – with TH in the relevant panels in Figures 1, 2, 3, and 4 to clarify exactly what is represented in these images. As can be seen in these images, fluorescent green labelling is present only in axons, which is to be expected of dopamine labelling in these forebrain regions.

      References:

      Berger B, Tassin JP, Blanc G, Moyne MA, Thierry AM (1974) Histochemical confirmation for dopaminergic innervation of the rat cerebral cortex after destruction of the noradrenergic ascending pathways. Brain Res 81:332–337.

      Berger B, Verney C, Gay M, Vigny A (1983) Immunocytochemical Characterization of the Dopaminergic and Noradrenergic Innervation of the Rat Neocortex During Early Ontogeny. In: Proceedings of the 9th Meeting of the International Neurobiology Society, pp 263–267 Progress in Brain Research. Elsevier.

      Levitt P, Moore RY (1979) Development of the noradrenergic innervation of neocortex. Brain Res 162:243–259.

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C (2011) The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394.

      Miner LH, Schroeter S, Blakely RD, Sesack SR (2003) Ultrastructural localization of the norepinephrine transporter in superficial and deep layers of the rat prelimbic prefrontal cortex and its spatial relationship to probable dopamine terminals. J Comp Neurol 466:478–494.

      Pickel VM, Joh TH, Field PM, Becker CG, Reis DJ (1975) Cellular localization of tyrosine hydroxylase by immunohistochemistry. J Histochem Cytochem 23:1–12.

      Van Eden CG, Hoorneman EM, Buijs RM, Matthijssen MA, Geffard M, Uylings HBM (1987) Immunocytochemical localization of dopamine in the prefrontal cortex of the rat at the light and electron microscopical level. Neurosci 22:849–862.

      Verney C, Berger B, Adrien J, Vigny A, Gay M (1982) Development of the dopaminergic innervation of the rat cerebral cortex. A light microscopic immunocytochemical study using anti-tyrosine hydroxylase antibodies. Dev Brain Res 5:41–52.

      (10) Are Netrin-1/UNC5C the only signal guiding dopamine axon during adolescence? Are there other neuronal circuits involved in this process?

      Our intention for this study was to examine the role of Netrin-1 and its receptor UNC5C specifically, but we do not suggest that they are the only molecules to play a role. The process of guiding growing dopamine axons during adolescence is likely complex and we expect other guidance mechanisms to also be involved. From our previous work we know that the Netrin-1 receptor DCC is critical in this process (Hoops and Flores, 2017; Reynolds et al., 2023). Several other molecules have been identified in Netrin-1/DCC signaling processes that control corpus callosum development and there is every possibility that the same or similar molecules may be important in guiding dopamine axons (Schlienger et al., 2023).

      References:

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Schlienger S, Yam PT, Balekoglu N, Ducuing H, Michaud J-F, Makihara S, Kramer DK, Chen B, Fasano A, Berardelli A, Hamdan FF, Rouleau GA, Srour M, Charron F. 2023. Genetics of mirror movements identifies a multifunctional complex required for Netrin-1 guidance and lateralization of motor control. Sci Adv 9:eadd5501. doi:10.1126/sciadv.add5501

      (11) Finally, despite the authors' claim that the dopaminergic axon project is sensitive to the duration of daylight in the hamster, they never provided definitive evidence to support this hypothesis.

      By “definitive evidence” we think that the reviewer is requesting a single statistical model including measures from both the summer and winter groups. Such a model would provide a probability estimate of whether dopamine axon growth is sensitive to daylight duration. Therefore, we ran these models, one for male hamsters and one for female hamsters.

      In both sexes we find a significant effect of daylength on dopamine innervation, interacting with age. Male age by daylength interaction: F = 6.383, p = 0.00242. Female age by daylength interaction: F = 21.872, p = 1.97 x 10-9. The full statistical analysis is available as a supplement to this letter (Response_Letter_Stats_Details.docx).

      Reviewer 3

      (1) Fig 1 A and B don't appear to be the same section level.

      The reviewer is correct that Fig 1B is anterior to Fig 1A. We have changed Figure 1A to match the section level of Figure 1B.

      (2) Fig 1C. It is not clear that these axons are crossing from the shell of the NAC.

      We have added a dashed line to Figure 1C to highlight the boundary of the nucleus accumbens, which hopefully emphasizes that there are fibres crossing the boundary. We also include here an enlarged image of this panel:

      Author response image 6.

      An enlarged image of Figure1c in the manuscript. The nucleus accumbens (left of the dotted line) is densely packed with TH+ axons (in green). Some of these TH+ axons can be observed extending from the nucleus accumbens medially towards a region containing dorsally oriented TH+ fibres (white arrows).

      (3) Fig 1. Measuring width of the bundle is an odd way to measure DA axon numbers. First the width could be changing during adult for various reasons including change in brain size. Second, I wouldn't consider these axons in a traditional bundle. Third, could DA axon counts be provided, rather than these proxy measures.

      With regards to potential changes in brain size, we agree that this could have potentially explained the increased width of the dopamine axon pathway. That is why it was important for us to use stereology to measure the density of dopamine axons within the pathway. If the width increased but no new axons grew along the pathway, we would have seen a decrease in axon density from adolescence to adulthood. Instead, our results show that the density of axons remained constant.

      We agree with the reviewer that the dopamine axons do not form a traditional “bundle”. Therefore, throughout the manuscript we now avoid using the term bundle.

      Although we cannot count every single axon, an accurate estimate of this number can be obtained using stereology, an unbiassed method for efficiently quantifying large, irregularly distributed objects. We used stereology to count TH+ axons in an unbiased subset of the total area occupied by these axons. Unbiased stereology is the gold-standard technique for estimating populations of anatomical objects, such as axons, that are so numerous that it would be impractical or impossible to measure every single one. Here and elsewhere we generally provide results as densities and areas of occupancy (Reynolds et al., 2022). To avoid confusion, we now clarify that we are counting the width of the area that dopamine axons occupy (rather than the dopamine axon “bundle”).

      References:

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      (4) TH in the cortex could also be of noradrenergic origin. This needs to be ruled out to score DA axons

      This is the same comment as Reviewer 1 #9. Please see our response below, which we have also added to our methods:

      In this study we pay great attention to the morphology and localization of the fibres from which we quantify varicosities to avoid counting any fibres stained with TH antibodies that are not dopamine fibres. The fibres that we examine and that are labelled by the TH antibody show features indistinguishable from the classic features of cortical dopamine axons in rodents (Berger et al., 1974; 1983; Van Eden et al., 1987; Manitt et al., 2011), namely they are thin fibres with irregularly-spaced varicosities, are densely packed in the nucleus accumbens, sparsely present only in the deep layers of the prefrontal cortex, and are not regularly oriented in relation to the pial surface. This is in contrast to rodent norepinephrine fibres, which are smooth or beaded in appearance, relatively thick with regularly spaced varicosities, increase in density towards the shallow cortical layers, and are in large part oriented either parallel or perpendicular to the pial surface (Berger et al., 1974; Levitt and Moore, 1979; Berger et al., 1983; Miner et al., 2003). Furthermore, previous studies in rodents have noted that only norepinephrine cell bodies are detectable using immunofluorescence for TH, not norepinephrine processes (Pickel et al., 1975; Verney et al., 1982; Miner et al., 2003), and we did not observe any norepinephrine-like fibres.

      References:

      Berger B, Tassin JP, Blanc G, Moyne MA, Thierry AM (1974) Histochemical confirmation for dopaminergic innervation of the rat cerebral cortex after destruction of the noradrenergic ascending pathways. Brain Res 81:332–337.

      Berger B, Verney C, Gay M, Vigny A (1983) Immunocytochemical Characterization of the Dopaminergic and Noradrenergic Innervation of the Rat Neocortex During Early Ontogeny. In: Proceedings of the 9th Meeting of the International Neurobiology Society, pp 263–267 Progress in Brain Research. Elsevier.

      Levitt P, Moore RY (1979) Development of the noradrenergic innervation of neocortex. Brain Res 162:243–259.

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C (2011) The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394.

      Miner LH, Schroeter S, Blakely RD, Sesack SR (2003) Ultrastructural localization of the norepinephrine transporter in superficial and deep layers of the rat prelimbic prefrontal cortex and its spatial relationship to probable dopamine terminals. J Comp Neurol 466:478–494.

      Pickel VM, Joh TH, Field PM, Becker CG, Reis DJ (1975) Cellular localization of tyrosine hydroxylase by immunohistochemistry. J Histochem Cytochem 23:1–12.

      Van Eden CG, Hoorneman EM, Buijs RM, Matthijssen MA, Geffard M, Uylings HBM (1987) Immunocytochemical localization of dopamine in the prefrontal cortex of the rat at the light and electron microscopical level. Neurosci 22:849–862.

      Verney C, Berger B, Adrien J, Vigny A, Gay M (1982) Development of the dopaminergic innervation of the rat cerebral cortex. A light microscopic immunocytochemical study using anti-tyrosine hydroxylase antibodies. Dev Brain Res 5:41–52.

      (5) Netrin staining should be provided with NeuN + DAPI; its not clear these are all cell bodies. An in situ of Netrin would help as well.

      A similar comment was raised by Reviewer 1 in point #1. Please see below the immunofluorescent and RNA scope images showing expression of Netrin-1 protein and mRNA in the forebrain.

      Author response image 7.

      This confocal microscope image shows immunofluorescent staining for Netrin-1 (green) localized around cell nuclei (stained by DAPI in blue). This image was taken from a coronal section of the lateral septum of an adult male mouse. Scale bar = 20µm

      Author response image 8.

      This confocal microscope image of a coronal brain section of the medial prefrontal cortex of an adult male mouse shows Netrin-1 mRNA expression (green) and cell nuclei (DAPI, blue). RNAscope was used to generate this image. Brain regions are as follows: Cg1: Anterior cingulate cortex 1, DP: dorsopeduncular cortex, IL: Infralimbic Cortex, PrL: Prelimbic Cortex, fmi: forceps minor of the corpus callosum

      Author response image 9.

      A higher resolution image from the same sample as in Figure 2 shows Netrin-1 mRNA (green) and cell nuclei (DAPI; blue). DP = dorsopeduncular cortex

      (6) The Netrin knockdown needs validation. How strong was the knockdown etc?

      This comment was also raised by Reviewer 1 #1.

      We have previously established the efficacy of the shRNA Netrin-1 knockdown virus used in this experiment for reducing the expression of Netrin-1 (Cuesta et al., 2020). The shRNA reduces Netrin-1 levels in vitro and in vivo.

      References:

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (7) If the conclusion that knocking down Netrin in cortex decreases DA innervation of the IL, how can that be reconciled with Netrin-Unc repulsion.

      This is an intriguing question and one that we are in the planning stages of addressing with new experiments.

      Although we do not have a mechanistic answered for how a repulsive receptor helps guide these axons, we would like to note that previous indirect evidence from a study by our group also suggests that reducing UNC5c signaling in dopamine axons in adolescence increases dopamine innervation to the prefrontal cortex (Auger et al, 2013).

      References

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      (8) The behavioral phenotype in Fig 1 is interesting, but its not clear if its related to DA axons/signaling. IN general, no evidence in this paper is provided for the role of DA in the adolescent behaviors described.

      We agree with the reviewer that the behaviours we describe in adult mice are complex and are likely to involve several neurotransmitter systems. However, there is ample evidence for the role of dopamine signaling in cognitive control behaviours (Bari and Robbins, 2013; Eagle et al., 2008; Ott et al., 2023) and our published work has shown that alterations in the growth of dopamine axons to the prefrontal cortex leads to changes in impulse control as measured via the Go/No-Go task in adulthood (Reynolds et al., 2023, 2018a; Vassilev et al., 2021).

      The other adolescent behaviour we examined was risk-like taking behaviour in male and female hamsters (Figures 4 and 5), as a means of characterizing maturation in this behavior over time. We decided not to use the Go/No-Go task because as far as we know, this has never been employed in Siberian Hamsters and it will be difficult to implement. Instead, we chose the light/dark box paradigm, which requires no training and is ideal for charting behavioural changes over short time periods. Indeed, risk-like taking behavior in rodents and in humans changes from adolescence to adulthood paralleling changes in prefrontal cortex development, including the gradual input of dopamine axons to this region.

      References:

      Bari A, Robbins TW. 2013. Inhibition and impulsivity: Behavioral and neural basis of response control. Progress in neurobiology 108:44–79. doi:10.1016/j.pneurobio.2013.06.005

      Eagle DM, Bari A, Robbins TW. 2008. The neuropsychopharmacology of action inhibition: cross-species translation of the stop-signal and go/no-go tasks. Psychopharmacology 199:439–456. doi:10.1007/s00213-008-1127-6

      Ott T, Stein AM, Nieder A. 2023. Dopamine receptor activation regulates reward expectancy signals during cognitive control in primate prefrontal neurons. Nat Commun 14:7537. doi:10.1038/s41467-023-43271-6

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      Vassilev P, Pantoja-Urban AH, Giroux M, Nouel D, Hernandez G, Orsini T, Flores C. 2021. Unique effects of social defeat stress in adolescent male mice on the Netrin-1/DCC pathway, prefrontal cortex dopamine and cognition (Social stress in adolescent vs. adult male mice). Eneuro ENEURO.0045-21.2021. doi:10.1523/eneuro.0045-21.2021

      (9) Fig2 - boxes should be drawn on the NAc diagram to indicate sampled regions. Some quantification of Unc5c would be useful. Also, some validation of the Unc5c antibody would be nice.

      The images presented were taken medial to the anterior commissure and we have edited Figure 2 to show this. However, we did not notice any intra-accumbens variation, including between the core and the shell. Therefore, the images are representative of what was observed throughout the entire nucleus accumbens.

      To quantify UNC5c in the accumbens we conducted a Western blot experiment in male mice at different ages. A one-way ANOVA analyzing band intensity (relative to the 15-day-old average band intensity) as the response variable and age as the predictor variable showed a significant effect of age (F=5.615, p=0.01). Posthoc analysis revealed that 15-day-old mice have less UNC5c in the nucleus accumbens compared to 21- and 35-day-old mice.

      Author response image 10.

      The graph depicts the results of a Western blot experiment of UNC5c protein levels in the nucleus accumbens of male mice at postnatal days 15, 21 or 35 and reveals a significant increase in protein levels at the onset adolescence.

      Our methods for this Western blot were as follows: Samples were prepared as previously (Torres-Berrío et al., 2017). Briefly, mice were sacrificed by live decapitation and brains were flash frozen in heptane on dry ice for 10 seconds. Frozen brains were mounted in a cryomicrotome and two 500um sections were collected for the nucleus accumbens, corresponding to plates 14 and 18 of the Paxinos mouse brain atlas. Two tissue core samples were collected per section, one for each side of the brain, using a 15-gauge tissue corer (Fine surgical tools Cat no. NC9128328) and ejected in a microtube on dry ice. The tissue samples were homogenized in 100ul of standard radioimmunoprecipitation assay buffer using a handheld electric tissue homogenizer. The samples were clarified by centrifugation at 4C at a speed of 15000g for 30 minutes. Protein concentration was quantified using a bicinchoninic acid assay kit (Pierce BCA protein assay kit, Cat no.PI23225) and denatured with standard Laemmli buffer for 5 minutes at 70C. 10ug of protein per sample was loaded and run by SDS-PAGE gel electrophoresis in a Mini-PROTEAN system (Bio-Rad) on an 8% acrylamide gel by stacking for 30 minutes at 60V and resolving for 1.5 hours at 130V. The proteins were transferred to a nitrocellulose membrane for 1 hour at 100V in standard transfer buffer on ice. The membranes were blocked using 5% bovine serum albumin dissolved in tris-buffered saline with Tween 20 and probed with primary (UNC5c, Abcam Cat. no ab302924) and HRP-conjugated secondary antibodies for 1 hour. a-tubulin was probed and used as loading control. The probed membranes were resolved using SuperSignal West Pico PLUS chemiluminescent substrate (ThermoFisher Cat no.34579) in a ChemiDoc MP Imaging system (Bio-Rad). Band intensity was quantified using the ChemiDoc software and all ages were normalized to the P15 age group average.

      Validation of the UNC5c antibody was performed in the lab of Dr. Liu, from whom it was kindly provided. Briefly, in the validation study the authors showed that the anti-UNC5C antibody can detect endogenous UNC5C expression and the level of UNC5C is dramatically reduced after UNC5C knockdown. The antibody can also detect the tagged-UNC5C protein in several cell lines, which was confirmed by a tag antibody (Purohit et al., 2012; Shao et al., 2017).

      References:

      Purohit AA, Li W, Qu C, Dwyer T, Shao Q, Guan K-L, Liu G. 2012. Down Syndrome Cell Adhesion Molecule (DSCAM) Associates with Uncoordinated-5C (UNC5C) in Netrin-1mediated Growth Cone Collapse. The Journal of biological chemistry 287:27126–27138. doi:10.1074/jbc.m112.340174

      Shao Q, Yang T, Huang H, Alarmanazi F, Liu G. 2017. Uncoupling of UNC5C with Polymerized TUBB3 in Microtubules Mediates Netrin-1 Repulsion. J Neurosci 37:5620–5633. doi:10.1523/jneurosci.2617-16.2017

      (10) "In adolescence, dopamine neurons begin to express the repulsive Netrin-1 receptor UNC5C, and reduction in UNC5C expression appears to cause growth of mesolimbic dopamine axons to the prefrontal cortex".....This is confusing. Figure 2 shows a developmental increase in UNc5c not a decrease. So when is the "reduction in Unc5c expression" occurring?

      We apologize for the mistake in this sentence. We have corrected the relevant passage in our manuscript as follows:

      In adolescence, dopamine neurons begin to express the repulsive Netrin-1 receptor UNC5C, particularly when mesolimbic and mesocortical dopamine projections segregate in the nucleus accumbens (Manitt et al., 2010; Reynolds et al., 2018a). In contrast, dopamine axons in the prefrontal cortex do not express UNC5c except in very rare cases (Supplementary Figure 4). In adult male mice with Unc5c haploinsufficiency, there appears to be ectopic growth of mesolimbic dopamine axons to the prefrontal cortex (Auger et al., 2013). This miswiring is associated with alterations in prefrontal cortex-dependent behaviours (Auger et al., 2013).

      References:

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      (11) In Fig 3, a statistical comparison should be made between summer male and winter male, to justify the conclusions that the winter males have delayed DA innervation.

      This analysis was also suggested by Reviewer 1, #11. Here is our response:

      We analyzed the summer and winter data together in ANOVAs separately for males and females. In both sexes we find a significant effect of daylength on dopamine innervation, interacting with age. Male age by daylength interaction: F = 6.383, p = 0.00242. Female age by daylength interaction: F = 21.872, p = 1.97 x 10-9. The full statistical analysis is available as a supplement to this letter (Response_Letter_Stats_Details.docx).

      (12) Should axon length also be measured here (Fig 3)? It is not clear why the authors have switched to varicosity density. Also, a box should be drawn in the NAC cartoon to indicate the region that was sampled.

      It is untenable to quantify axon length in the prefrontal cortex as we cannot distinguish independent axons. Rather, they are “tangled”; they twist and turn in a multitude of directions as they make contact with various dendrites. Furthermore, they branch extensively. It would therefore be impossible to accurately quantify the number of axons. Using unbiased stereology to quantify varicosities is a valid, well-characterized and straightforward alternative (Reynolds et al., 2022).

      References:

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      (13) In Fig 3, Unc5c should be quantified to bolster the interesting finding that Unc5c expression dynamics are different between summer and winter hamsters. Unc5c mRNA experiments would also be important to see if similar changes are observed at the transcript level.

      We agree that it would be very interesting to see how UNC5c mRNA and protein levels change over time in summer and winter hamsters, both in males, as the reviewer suggests here, and in females. We are working on conducting these experiments in hamsters as part of a broader expansion of our research in this area. These experiments will require a lengthy amount of time and at this point we feel that they are beyond the scope of this manuscript.

      (14) Fig 4. The peak in exploratory behavior in winter females is counterintuitive and needs to be better discussed. IN general, the light dark behavior seems quite variable.

      This is indeed a very interesting finding, which we have expanded upon in our manuscript as follows:

      When raised under a winter-mimicking daylength, hamsters of either sex show a protracted peak in risk taking. In males, it is delayed beyond 80 days old, but the delay is substantially less in females. This is a counterintuitive finding considering that dopamine development in winter females appears to be accelerated. Our interpretation of this finding is that the timing of the risk-taking peak in females may reflect a balance between different adolescent developmental processes. The fact that dopamine axon growth is accelerated does not imply that all adolescent maturational processes are accelerated. Some may be delayed, for example those that induce axon pruning in the cortex. The timing of the risk-taking peak in winter female hamsters may therefore reflect the amalgamation of developmental processes that are advanced with those that are delayed – producing a behavioural effect that is timed somewhere in the middle. Disentangling the effects of different developmental processes on behaviour will require further experiments in hamsters, including the direct manipulation of dopamine activity in the nucleus accumbens and prefrontal cortex.

      Full Reference List

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      Bari A, Robbins TW. 2013. Inhibition and impulsivity: Behavioral and neural basis of response control. Progress in neurobiology 108:44–79. doi:10.1016/j.pneurobio.2013.06.005

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      Daubaras M, Bo GD, Flores C. 2014. Target-dependent expression of the netrin-1 receptor, UNC5C, in projection neurons of the ventral tegmental area. Neuroscience 260:36–46. doi:10.1016/j.neuroscience.2013.12.007

      Eagle DM, Bari A, Robbins TW. 2008. The neuropsychopharmacology of action inhibition: crossspecies translation of the stop-signal and go/no-go tasks. Psychopharmacology 199:439– 456. doi:10.1007/s00213-008-1127-6

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Jonker FA, Jonker C, Scheltens P, Scherder EJA. 2015. The role of the orbitofrontal cortex in cognition and behavior. Rev Neurosci 26:1–11. doi:10.1515/revneuro-2014-0043

      Kim B, Im H. 2019. The role of the dorsal striatum in choice impulsivity. Ann N York Acad Sci 1451:92–111. doi:10.1111/nyas.13961

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254-10.2011

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Ott T, Stein AM, Nieder A. 2023. Dopamine receptor activation regulates reward expectancy signals during cognitive control in primate prefrontal neurons. Nat Commun 14:7537. doi:10.1038/s41467-023-43271-6

      Phillips RA, Tuscher JJ, Black SL, Andraka E, Fitzgerald ND, Ianov L, Day JJ. 2022. An atlas of transcriptionally defined cell populations in the rat ventral tegmental area. Cell Reports 39:110616. doi:10.1016/j.celrep.2022.110616

      Purohit AA, Li W, Qu C, Dwyer T, Shao Q, Guan K-L, Liu G. 2012. Down Syndrome Cell Adhesion Molecule (DSCAM) Associates with Uncoordinated-5C (UNC5C) in Netrin-1-mediated Growth Cone Collapse. The Journal of biological chemistry 287:27126–27138. doi:10.1074/jbc.m112.340174

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018b. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schlienger S, Yam PT, Balekoglu N, Ducuing H, Michaud J-F, Makihara S, Kramer DK, Chen B, Fasano A, Berardelli A, Hamdan FF, Rouleau GA, Srour M, Charron F. 2023. Genetics of mirror movements identifies a multifunctional complex required for Netrin-1 guidance and lateralization of motor control. Sci Adv 9:eadd5501. doi:10.1126/sciadv.add5501

      Shao Q, Yang T, Huang H, Alarmanazi F, Liu G. 2017. Uncoupling of UNC5C with Polymerized TUBB3 in Microtubules Mediates Netrin-1 Repulsion. J Neurosci 37:5620–5633. doi:10.1523/jneurosci.2617-16.2017

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      Torres-Berrío A, Lopez JP, Bagot RC, Nouel D, Dal-Bo G, Cuesta S, Zhu L, Manitt C, Eng C, Cooper HM, Storch K-F, Turecki G, Nestler EJ, Flores C. 2017. DCC Confers Susceptibility to Depression-like Behaviors in Humans and Mice and Is Regulated by miR-218. Biological psychiatry 81:306–315. doi:10.1016/j.biopsych.2016.08.017

      Vassilev P, Pantoja-Urban AH, Giroux M, Nouel D, Hernandez G, Orsini T, Flores C. 2021. Unique effects of social defeat stress in adolescent male mice on the Netrin-1/DCC pathway, prefrontal cortex dopamine and cognition (Social stress in adolescent vs. adult male mice). Eneuro ENEURO.0045-21.2021. doi:10.1523/eneuro.0045-21.2021

      Private Comments

      Reviewer #1

      (12) The language should be improved. Some expression is confusing (line178-179). Also some spelling errors (eg. Figure 1M).

      We have removed the word “Already” to make the sentence in lines 178-179 clearer, however we cannot find a spelling error in Figure 1M or its caption. We have further edited the manuscript for clarity and flow.

      Reviewer #2

      (1) The authors claim to have revealed how the 'timing of adolescence is programmed in the brain'. While their findings certainly shed light on molecular, circuit and behavioral processes that are unique to adolescence, their claim may be an overstatement. I suggest they refine this statement to discuss more specifically the processes they observed in the brain and animal behavior, rather than adolescence itself.

      We agree with the reviewer and have revised the manuscript to specify that we are referring to the timing of specific developmental processes that occur in the adolescent brain, not adolescence overall.

      (2) Along the same lines, the authors should also include a more substantiative discussion of how they selected their ages for investigation (for both mice and hamsters), For mice, their definition of adolescence (P21) is earlier than some (e.g. Spear L.P., Neurosci. and Beh. Reviews, 2000).

      There are certainly differences of opinion between researchers as to the precise definition of adolescence and the period it encompasses. Spear, 2000, provides one excellent discussion of the challenges related to identifying adolescence across species. This work gives specific ages only for rats, not mice (as we use here), and characterizes post-natal days 28-42 as being the conservative age range of “peak” adolescence (page 419, paragraph 1). Immediately thereafter the review states that the full adolescent period is longer than this, and it could encompass post-natal days 20-55 (page 419, paragraph 2).

      We have added the following statement to our methods:

      There is no universally accepted way to define the precise onset of adolescence. Therefore, there is no clear-cut boundary to define adolescent onset in rodents (Spear, 2000). Puberty can be more sharply defined, and puberty and adolescence overlap in time, but the terms are not interchangeable. Puberty is the onset of sexual maturation, while adolescence is a more diffuse period marked by the gradual transition from a juvenile state to independence. We, and others, suggest that adolescence in rodents spans from weaning (postnatal day 21) until adulthood, which we take to start on postnatal day 60 (Reynolds and Flores, 2021). We refer to “early adolescence” as the first two weeks postweaning (postnatal days 21-34). These ranges encompass discrete DA developmental periods (Kalsbeek et al., 1988; Manitt et al., 2011; Reynolds et al., 2018a), vulnerability to drug effects on DA circuitry (Hammerslag and Gulley, 2014; Reynolds et al., 2018a), and distinct behavioral characteristics (Adriani and Laviola, 2004; Makinodan et al., 2012; Schneider, 2013; Wheeler et al., 2013).

      References:

      Adriani W, Laviola G. 2004. Windows of vulnerability to psychopathology and therapeutic strategy in the adolescent rodent model. Behav Pharmacol 15:341–352. doi:10.1097/00008877-200409000-00005

      Hammerslag LR, Gulley JM. 2014. Age and sex differences in reward behavior in adolescent and adult rats. Dev Psychobiol 56:611–621. doi:10.1002/dev.21127

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Kalsbeek A, Voorn P, Buijs RM, Pool CW, Uylings HBM. 1988. Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology 269:58–72. doi:10.1002/cne.902690105

      Makinodan M, Rosen KM, Ito S, Corfas G. 2012. A critical period for social experiencedependent oligodendrocyte maturation and myelination. Science 337:1357–1360. doi:10.1126/science.1220845

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      Reynolds LM, Flores C. 2021. Mesocorticolimbic Dopamine Pathways Across Adolescence: Diversity in Development. Front Neural Circuit 15:735625. doi:10.3389/fncir.2021.735625

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette MP, Arvanitogiannis A, Flores C. 2018. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schneider M. 2013. Adolescence as a vulnerable period to alter rodent behavior. Cell and tissue research 354:99–106. Doi:10.1007/s00441-013-1581-2

      Spear LP. 2000. Neurobehavioral Changes in Adolescence. Current directions in psychological science 9:111–114. doi:10.1111/1467-8721.00072

      Wheeler AL, Lerch JP, Chakravarty MM, Friedel M, Sled JG, Fletcher PJ, Josselyn SA, Frankland PW. 2013. Adolescent Cocaine Exposure Causes Enduring Macroscale Changes in Mouse Brain Structure. J Neurosci 33:1797–1803. doi:10.1523/jneurosci.3830-12.2013

      (3) Figure 1 - the conclusions hinge on the Netrin-1 staining, as shown in panel G, but the cells are difficult to see. It would be helpful to provide clearer, more zoomed images so readers can better assess the staining. Since Netrin-1 expression reduces dramatically after P4 and they had to use antigen retrieval to see signal, it would be helpful to show some images from additional brain regions and ages to see if expression levels follow predicted patterns. For instance, based on the allen brain atlas, it seems that around P21, there should be high levels of Netrin-1 in the cerebellum, but low levels in the cortex. These would be nice controls to demonstrate the specificity and sensitivity of the antibody in older tissue.

      We do not study the cerebellum and have never stained this region; doing so now would require generating additional tissue and we’re not sure it would add enough to the information provided to be worthwhile. Note that we have stained the forebrain for Netrin-1 previously, providing broad staining of many brain regions (Manitt et al., 2011)

      References:

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      (4) Figure 3 - Because mice tend to avoid brightly-lit spaces, the light/dark box is more commonly used as a measure of anxiety-like behavior than purely exploratory behavior (including in the paper they cited). It is important to address this possibility in their discussion of their findings. To bolster their conclusions about the coincidence of circuit and behavioral changes in adolescent hamsters, it would be useful to add an additional measure of exploratory behaviors (e.g. hole board).

      Regarding the light/dark box test, this is an excellent point. We prefer the term “risk taking” to “anxiety-like” and now use the former term in our manuscript. Furthermore, our interest in the behaviour is purely to chart the development of adolescent behaviour across our treatment groups, not to study a particular emotional state. Regardless of the specific emotion or emotions governing the light/dark box behaviour, it is an ideal test for charting adolescent shifts in behaviour as it is well-characterized in this respect, as we discuss in our manuscript.

      (5) Supplementary Figure 4,5 The authors defined puberty onset using uterine and testes weights in hamsters. While the weights appear to be different for summer and winter hamsters, there were no statistical comparison. Please add statistical analyses to bolster claims about puberty start times. Also, as many studies use vaginal opening to define puberty onset, it would be helpful to discuss how these measurements typically align and cite relevant literature that described use of uterine weights. Also, Supplementary Figures 4 and 5 were mis-cited as Supp. Fig. 2 in the text (e.g. line 317 and others).

      These are great suggestions. We have added statistical analyses to Supplementary Figures 5 and 6 and provided Vaginal Opening data as Supplementary Figure 7. The statistical analyses confirm that all three characters are delayed in winter hamsters compared to summer hamsters.

      We have also added the following references to the manuscript:

      Darrow JM, Davis FC, Elliott JA, Stetson MH, Turek FW, Menaker M. 1980. Influence of Photoperiod on Reproductive Development in the Golden Hamster. Biol Reprod 22:443–450. doi:10.1095/biolreprod22.3.443

      Ebling FJP. 1994. Photoperiodic Differences during Development in the Dwarf Hamsters Phodopus sungorus and Phodopus campbelli. Gen Comp Endocrinol 95:475–482. doi:10.1006/gcen.1994.1147

      Timonin ME, Place NJ, Wanderi E, Wynne-Edwards KE. 2006. Phodopus campbelli detect reduced photoperiod during development but, unlike Phodopus sungorus, retain functional reproductive physiology. Reproduction 132:661–670. doi:10.1530/rep.1.00019

      (6) The font in many figure panels is small and hard to read (e.g. 1A,D,E,H,I,L...). Please increase the size for legibility.

      We have increased the font size of our figure text throughout the manuscript.

      Reviewer #3

      (15) Fig 1 C,D. Clarify the units of the y axis

      We have now fixed this.

      Full Reference List

      Adriani W, Laviola G. 2004. Windows of vulnerability to psychopathology and therapeutic strategy in the adolescent rodent model. Behav Pharmacol 15:341–352. doi:10.1097/00008877-200409000-00005

      Hammerslag LR, Gulley JM. 2014. Age and sex differences in reward behavior in adolescent and adult rats. Dev Psychobiol 56:611–621. doi:10.1002/dev.21127

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Kalsbeek A, Voorn P, Buijs RM, Pool CW, Uylings HBM. 1988. Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology 269:58–72. doi:10.1002/cne.902690105

      Makinodan M, Rosen KM, Ito S, Corfas G. 2012. A critical period for social experiencedependent oligodendrocyte maturation and myelination. Science 337:1357–1360. doi:10.1126/science.1220845

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      Reynolds LM, Flores C. 2021. Mesocorticolimbic Dopamine Pathways Across Adolescence: Diversity in Development. Front Neural Circuit 15:735625. doi:10.3389/fncir.2021.735625 Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schneider M. 2013. Adolescence as a vulnerable period to alter rodent behavior. Cell and tissue research 354:99–106. doi:10.1007/s00441-013-1581-2

      Spear LP. 2000. Neurobehavioral Changes in Adolescence. Current directions in psychological science 9:111–114. doi:10.1111/1467-8721.00072

      Wheeler AL, Lerch JP, Chakravarty MM, Friedel M, Sled JG, Fletcher PJ, Josselyn SA, Frankland PW. 2013. Adolescent Cocaine Exposure Causes Enduring Macroscale Changes in Mouse Brain Structure. J Neurosci 33:1797–1803. doi:10.1523/jneurosci.3830-12.2013

    1. Author Response

      The following is the authors’ response to the original reviews.

      To the reviewers.

      We appreciate a detailed and deep review of our manuscript. Below are our comments and responses. Many requested data are present in the Supplementary figures of the manuscript. There seem to be two main concerns: one regarding the evidence of TLT2 expression in HFSCs; and second, regarding CEP/TLR2. As detailed below, we utilized 3 different methods to document TLR2 expression: TLR2-reporter mouse, staining for TLR2 and qPCR of isolated cells for TLR2. The source (the data are in Supplementary Fig. 5A, B and in references below) and nature of CEP (it is not a protein, but metabolic product of Polyunsaturated acid DHA oxidation by MPO amongst other ROS sources) are also explained below.

      1) “The expression analysis of TLR2 is questionable. Many of the conclusions about the level of target genes are based on quantifying fluorescence intensity in microscopy images (e.g., TLR2 level in young or aged mice, BMP7 levels in mice with/without TLR2 KO). This could be strengthened by using qPCR to measure gene expression levels in FACS-sorted HFSCs, which would provide more accurate quantification. Additionally, the authors should test if the TLR2 antibody used is valid.”

      In most instances we have used TLR2 reporter mouse, which presents an advantage over immunostaining. Fig.2 (A-H) shows expression of TLR2 reporter, not the staining with TLR2 abs. For selected experiments we utilized immunostaining with anti- TLR2 (Santa Cruz Biotechnology, sc-21759) antibody, which has been validated in our previous publication (see Michael G. McCoy and all. Endothelial TLR2 promotes proangiogenic immune cell recruitment and tumor angiogenesis. // Sci Signal. 2021 Jan 19; 14(666): eabc5371/doi: 10.1126/ scisignal.abc5371). In Fig.S2E of that manuscript we validated these abs using a knockout of TLR2. In the current paper, we further validate anti-TLR2 abs by showing its co-localization with the TLR2-GFP reporter (Fig. S1A).

      We then confirmed reporter and immunostaining data by qPCR showing Tlr2 expression in FACS-purified mouse HFSCs in anagen, telogen, and catagen (Fig.2J), in mouse epidermal cells and FACS-purified HFSCs (Fig.2K), and FACS-purified HFSCs isolated from Control and TLR2HFSC-KO mice (Fig.4E).

      As for the mechanistic link between TLR2 and BMP signaling was identified using RNAseq on FACS-purified HFSCs (supplementary Fig.4), then verified using qPCR (Fig.4E shows Bmp7,Bmp2, Bmpr1a ) and only then immunohistochemistry staining for BMP7 and phosphoSMAD1/5/9 was used (Fig.4A-D, F-H). Note that the large body of requested evidence is presented in Supplementary data. Other mechanistic links shown using qPCR include Nfkb2, Il1b, Il6, and Bmp7 in FACS-purified mouse HFSCs treated with BSA control or CEP (Fig.6Q,6R).

      “As the reviewers note, it is not clear whether the TLR2+ signal is located at the basal side of bulge stem cells, basement membrane underlying bulge stem cells, or dermal sheath cells encapsulating bulge structure. Co-staining with basement membrane markers such as collagen and laminin or HFSC basal side membrane markers such as Itga6, Itgb1, and Itgb4 will clarify this. In addition, showing the expression pattern of TLR2 in full skin including epidermis and dermis would be helpful. As TLR2 is highly expressed in immune cells or blood endothelial cells, if the antibody staining is valid, strong positive signals should present in the cells. Moreover, testing the TLR2 antibody in Tlr2 knock-out mouse tissues would be an appropriate control experiment.”

      Once again, in most instances we have used not the staining for TLR2 but TLP2 reporter mouse (Fig.2 legend). Anti-TLR2 abs have been verified in TLR2 KO as described above. Fig.2K shows comparison of Tlr2 mRNA expression in mouse epidermal cells to FACS-purified HFSCs by qPCR.

      TLR2 signal is detected in several cell types within the hair follicle as well as in dermal cells surrounding the hair follicles, such as lymphocytes, resident tissue macrophages, fibroblast, and fibroblast precursors, etc. (https://www.proteinatlas.org/ENSG00000137462-TLR2/single+cell+type). In Author response image 1 below, white arrows point to the TLR2-positive cells around the hair follicle. In our paper, we focus on HFSC TLR2 and use the respective inducible tissue specific TLR2 KO. The contribution of TLR2 on other cell types can be assessed by the comparison of the phenotypes of global TLR2 KO, TLR2 KO-WT bone marrow chimeras and HFSC-specific TLR2 KO. The results are presented in both, main and supplementary figures (Fig.5D-I and SFig.5I-K shows global TLR2 KO, Fig.6H-I, SFig.5G-h shows bone marrow chimeras and Figs.3,4, 5 (J-M), Fig.5 (J-N) shows the main focus, HFSC-TLR2 KO. Overall, the phenotype (delay of hair regeneration after wounding) seems to be the strongest in TLR2 KO, whereas bone marrow chimeras and HFSCs phenotypes are comparable. Thus, TLR2 on bone marrow derived cells complements the main role for TLR2 on HFSCs.

      Author response image 1.

      Staining for TRLR2 (white), DAPI (blue) and Keratin 17 (purple) is shown

      “The increase in expression of TLR2 during the hair follicle stem cell activation should be documented by FACS and/or qPCR. This is important because as noted by one of the reviewers.”

      While original observation was done using both, a TLR2 reporter mouse and immunostaining, the data were confirmed by qPCR showing Tlr2 mRNA expression in FACS-purified mouse HFSCs in anagen, telogen, and catagen (Fig.2J).

      “In Fig 1D, the authors mentioned that they re-analyzed published RNA-seq data (Greco et al., 2009) to show the increase of Tlr2 and Tlr6 expression in late telogen compared to early telogen. However, there is no RNA-seq data in that paper, but only microarray data of bulge vs HG comparison and dermal papillae cells (DP) in early, mid, late Telo. If the authors used DP data to show the increase of Tlr2 transcripts in late Telo, the analysis is completely wrong and has to be corrected. The problem is compounded by the fact that in other published HFSC RNA-seq datasets (Yang et al., Cell, 2017, Adam et al., Nature Cell Biology, 2020), the expression levels of Tlr2 and Tlr6 are very low (below 5 TPM). In Fig 1G, the authors also re-analyzed Morinaga et al., 2021 data to show the reduction of Tlr2 expression in HFSCs in high-fat diet mice. However, in the raw data of Morinaga et al., 2021 (GSE169173), Tlr2 expression FPKM values are below 1 in both normal diet and high-fat diet samples, which are too low to perform comparative analysis and are not statistically meaningful. Like Tlr2, the expressions of Tlr1 and Tlr6, which form heterodimer with TLR2, are almost 0. Thus, the authors should revisit the dataset and revise their analysis and conclusion.”

      To document the existence of Tlr2 and Tlr6 expression in HFSCs, the authors should perform RNR-seq-based gene expression analysis by themselves. Otherwise, the authors' TLR2 expression analyses in Fig 1 are not convincing. These are serious issues that the authors will want to rectify so that eLIFE readers will not discount their findings and importance.”

      It is correct, we analyzed a published array, not RNAseq data (Greco et al., 2009) using GEO2R tool which allowed us to compare the mRNA expression levels between early, middle, and late telogen in bulge CD34 positive cells. We changed the “RNA-seq” (the term was used incorrectly) to “RNA microarray” in the main text.

      In our manuscript, TLR2 expression is documented not only in Fig.1, but also in Fig.2 and S.Fig.1. We utilized 3 different methods to document TLR2 expression: TLR2-reporter mouse, staining for TLR2 and qPCR of isolated cells for TLR2. Fig.2K shows comparison of Tlr2 mRNA expression in mouse epidermal cells to FACS-purified HFSCs by qPCR to document increased TLR2 expression on HFSCs. Likewise, Fig.2J shows qPCR for TLR2 on HFSC during various phases of hair growth.

      “In Fig 2, to support the expression of Tlr2 in HFSCs, the authors utilized TLR2-GFP mice and showed the strong GFP expression in HFSCs, hair bulb, and ORS. However, as the expression data in Fig 1 are questionable, the GFP reporter data should be carefully analyzed with proper control experiments. For example, although TLRs are highly expressed in immune cells and endothelial cells, which are abundantly present in skin, Fig 2 data did show the GFP expression in these cells. Instead, the GFP signals looked very specific to epithelial compartments, which is odd. Again, to convince readers, the authors should provide more comprehensive analyses of expression patterns of TLR2-GFP mice in skin. Also, if the TLR2-GFP signals faithfully reflect the actual expression of Tlr2 mRNA, the GFP signals should increase in late telogen compared to early telogen. The authors should check whether TLR2-GFP expression follows this pattern.”

      The specificity of TLR reporter was characterized in Price et al. , 2018. A Map of Toll-like Receptor Expression in the Intestinal Epithelium Reveals Distinct Spatial, Cell Type-Specific, and Temporal Patterns. Immunity, 49. Thus, TLR2 reporter mouse is well characterized (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6152941/) and represents one of the best available tools to show TLR2 expression.

      Expression of TLR2 on endothelial cells and validation of anti-TLR2 abs was performed in McCoy et al, Science Signaling as mentioned above. Also as discussed above we show a strong correlation between TLR2-GFP reporter expression and TLR2 expression using coimmunostaining with GFP and TLR2 antibodies with appropriate isotype-match non-immune antibodies as negative controls.

      There is no doubt that TLR2 is expressed on immune, endothelial and epithelial cells. According to the Human Protein Atlas, TLR2 expression is identified in skin fibroblasts, keratinocytes, melanocytes, etc., so our findings are well supported by the literature (https://www.proteinatlas.org/ENSG00000137462-TLR2/single+cell+type). Indeed, we detected TLR2 in cells surrounding the hair follicle (see the pictures above). TLR2 signal was detected in nearly all niches of hair follicles including the CD34-positive cells.

      In Fig.S1 we demonstrated an increased level of TLR2 in the late (competent) telogen compared to the early (refractory) telogen using immunostaining for TLR2-GFP. The results mirrored published RNA-array data in Fig.1D. Again, reporter and immunostaining results have been validated by qPCR for TLR2.

      The levels of TLR2 might be heavily influences by the environment, i.e. pathogens availability. In this regard, note that mice for this study were kept in normal, not pathogen-free conditions.

      “Overall, the existence of Tlr2 expression in HFSCs is still questionable. Without resolving these, genetic deletion of Tlr2 in HFSCs cannot be rationalized.”

      In our manuscript, TLR2 expression is documented not only in Fig.1, but also in Fig.2 and S.Fig.1. We utilized 3 different methods to document TLR2 expression: TLR2-reporter mouse, staining for TLR2 and qPCR of isolated cells for TLR2. Besides these data, we show the functional responses to canonical TLR2 ligand, PAM3CSK4, and previously characterized endogenous ligand, CEP, using proliferation, western blotting and many other approaches. In numerous immunostainings we show co-localization of TLR2 and CD34 (Fig.2) using IMARIS surface rendering and colocalization tools. Our conclusions are further supported by published results as discussed above.

      2) “The central conclusion of this study is that the activation of TLR2 can suppress BMP signaling; however, the molecular link between TLR2 and BMP signaling is still missing. Given the importance of this finding, it would be intriguing to further investigate how TLR2 activation suppresses BMP signaling. A better characterization of the molecular-level interaction between TLR2 and BMP signaling can further enhance the impact of this study.

      -The published dataset should be re-analyzed, as some images and their quantification do not appear to be matched. Representative images should be used.”“In Fig 4, the authors propose that the activation of TLR2 pathway inhibits the BMP signaling pathway, which makes HFSCs quiescent. In TLR2-HFSC-KO, the authors showed that BMP7 is increased and pSMAD1/5/9 is sustained. The increase in BMP7 expression and SMAD activation should be demonstrated by additional assays. Are SMAD target genes activated in the cKO mice?”

      This mechanistic link between TLR2 and BMP was originally identified by RNAseq, confirmed by qPCR and then by immunostaining for both, BMP7 and BMP pathway activation based on phosphoSMAD1/5/9 levels. The connection to BMP pathway was also shown by western blotting (S.Fig.4B,C). The rescue experiments have been performed using Noggin injections. According to our data, numerous SMAD target genes are upregulated in TLR2-HFSC-KO, such as Kank2, Ptk2b, Scarf2, Camk1, Dpysl2, as well as BMP2 and BMP7, and these changes were confirmed by qPCR analysis in Fig.4E. Additional evidence is shown in Fig.6, which demonstrates that endogenous TLR2 ligand, CEP-carboxyethylpyrrole, acts by a similar, BMP-dependent pathway. Also, Supplemental Fig.4 adds more details to this link. SFig.4B,C shows that TLR2 activation by canonical ligand PAM3CSK4 inhibits pSMAD levels induced by BMP (western blot is shown). At the same time, as anticipated PAM3CSK4 upregulated NFkB, however, little of no effect of BMP stimulation on NFkB is observed. To summarize: TLR2 affects both, BMP7 production and BMP induced downstream signaling judged by PhosphoSMADs. The later connection appears to go in one direction: TLR2 signaling affects BMP-induced pSMADs, however, BMP signaling does not seem to substantially change TLR2-dependent NFkB. We plan to delve into the intersection of these important pathways in future.

      “Functionally, downregulation of BMP signaling by injecting Noggin, a BMP antagonist, in TLR2HFSC-KO mice induces HFSC proliferation. These functional data are solid. However, it is still curious how TLR2 signaling interact with BMP pathway molecularly. Is it transcriptional regulation or translational regulation? Perhaps, RNA-seq analysis of TLR2HFSC-KO could give some hints to answer this question. Furthermore, checking out other signaling pathways such as WNT/LEF1 and pCREB, which are important for hair cycle activation and NFkB, a downstream effector of TLR signaling would be helpful to interrogate mechanistic insights.”

      As discussed above, TLR2 affects both, BMP7 production and BMP-induced downstream signaling judged by PhosphoSMADs. The later connection appears to go in one direction: TLR2 signaling affects BMP-induced pSMADs, however, BMP signaling does not seem to substantially change TLR2-dependent NFkB.

      Indeed, in addition to BMP signaling, the Wnt signaling and β-catenin stabilization within HFSCs, known to trigger their activation (Deschene et al., 2014). However, this axis remained unchanged upon TLR2HFSC-KO (as shown in Supplementary Fig. 4J). There were several published reports on the crosstalk between TLR and BMP signaling such as (doi: 10.1089/scd.2013.0345. Epub 2013 Nov 7) showing that activation of TLR4 inhibits BMP-induced pSMAD1/5/8 and this connection requires NFkB. We probed NfkB activation, please, see the responses above.

      However, we were not able to detect substantial effect of NFkB inhibition on BMP signaling in hair follicles (not shown).

      3) “The function of CEP, a proposed endogenous ligand of TLR2, is still not clear. The authors imply that the decreased CEP level in aged mice could lead to deficient TLR2 signaling, which could further cause aging-associated hair regeneration defects. But this has not been demonstrated. What are the BMPs and pSmad1/5 levels in aged skin? Another important experiment to confirm the importance of this link during aging would be to inject CEP into the aged skin and examine whether this could restore hair regeneration in aged mice. Does CEP activate hair cycling during the endogenous pathway? What might be the source of CEP? Does CEP treatment activate BMP7 signaling? The authors should clarify these issues. The authors suggested that CEP is an endogenous ligand of TLR2, and administration of CEP induces hair cycle entry in a TLR2dependent manner. How potent is CEP in terms of HFSC activation? In Fig 6Q, CEP increases the expression of Nfkb2, Il1b, and Il6, but the fold changes are marginal. Also, if CEP is a critical ligand, the loss of CEP by a genetic deletion or a pharmacological inhibition should result in the delay of hair cycle entry. Furthermore, the source of CEP expression is curious. Is it expressed by HFSCs or dermal fibroblast or immune cells? Finally, comparing the effect of CEP to the effect of other bacterial origin Tlr2 ligands such as heat killed bacteria, purified microbial cell-wall components, and synthetic agonists (Pam3CSK4) would be helpful. It is curious if HFSC directly senses the bacterial materials and triggers hair follicle regeneration or are indirectly directed by immune cells and endothelial cells, which could be primary sensor.”

      CEP is not a protein, it is an oxidative stress-generated metabolite of polyunsaturated fatty acid, DHA (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5360178/), thus, it is impossible to generate a knockout of this molecule. As demonstrated in previous publications (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2990914/, https://pubmed.ncbi.nlm.nih.gov/34871763/) CEP serves as a critical endogenous ligand supporting TLR2 signaling in the absence of pathogens. While other TLR2 endogenous ligands, such as HMGBs or HSPs exist (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4373479/), CEP binds to TLR2 directly, and its generation is aided by MPO (myeloperoxidase) amongst other peroxidases and sources of reactive oxygen/nitrogen species. MPO (produced by immune cells amongst others) serves as an innate immunity response against pathogens, but it also generates CEP adducts (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6034644/) adducts in both protein and lipid form. The knockout of MPO diminishes CEP generation in skin (PMC6034644), thereby demonstrating the causative relationship between CEP and MPO.

      Author response image 2.

      Additional immunostaining of mouse skin for Keratin 17 (purple), CEP (green) and MPO (red). Similar staining is in S.Fig.5A and quantification is in S.Fig.5B.

      Also, the above-mentioned manuscripts show that CEP effects are milder but overall comparable with canonical TLR2 agonists, PAM3SCK4. As we mention in the present manuscript, normal young mice’s tissues are devoid of CEP (which is generated in response to inflammation) with an exception of hair follicles. This is likely attributed to the secretion of MPO by hair follicles (PMID: 36402231) especially in conditions of inflammation (PMID: 32893875). Supplementary Fig.5A,B show that MPO is present at the high level in sebaceous gland (as a part of anti-microbial mechanism). Again, MPO is a secreted enzyme and it is likely to be a source of continuous DHA oxidation into CEP in hair follicles. We also document that both, TLR2 and CEP levels in hair follicles (but not in other tissues-an important point for CEP) are reduced in aging. Likewise, SFig.5A,B shows that MPO secretion in hair follicle is reduced by more than 60% in aging mice. Thus, it is likely that reduced MPO levels in aging hair follicle produce less CEP. Together with reduced TLR2 levels, the lack of CEP might contribute to hair loss in aging.

      We show that similar to TLR2, CEP in hair follicles operates via a BMP-7 dependent pathway (see Fig.6). We also provide results using canonical bacterial ligand for TLR2, PAM3CSK4 whose effect on HFSCs proliferation is similar to CEP in a TLR2-dependent manner. TLR2 blocking approaches were used (Supp. Fig.4B, C, D, E, Supp. Fig.5D-5F). It remains to be seen whether CEP is required for the normal hair cycling and whether its administration might improve hair loss in aging subjects.

      “The impacts of CEP/TLR2 on proliferation of keratinocytes is still weak. How much of this effect is a result of NFkB activation, and how much is simply due to inhibiting BMP signaling?

      Impact of TLR2 on proliferation was demonstrated using a variety of mouse models, from global TLR2 KO to bone marrow chimeras to HFSCs-specific TLR2 KO, again using multiple approaches. The same applies to the effects of CEP as well as to canonical TLR2 ligand, PAM3CSK4, which were demonstrated both in vivo and in culture to be TLR2-dependent (Fig.6MO) and Supplementary Fig.4E-D). As for NFkB connection, see our responses above. It seems that the connection between TLR2 and BMP pathway occurs independently of NFkB activation.

      4) The links between TLR2 pathway and aging and obesity are only correlative. Although the authors suggest that the reduction of TLR2 expression in aging and obesity may diminish hair growth (Fig 1), there is no direct functional evidence that supports this possibility. If the authors wish to make this claim, they should test the roles of TLR2 and CEP in aging and obesity conditions.”

      We show that both, TLR2 and CEP are reduced in aging, and that this pathway contributes to hair cycling and regeneration upon wounding, we do not wish to claim more.

      5) More minor points:

      “Fig.4: The Noggin treatment in TLR2 KO mice is an important experiment. However, it is unclear why Noggin only enhances proliferation (Ki67 level) in HG but not in the bulge. This discrepancy should be addressed.”

      As we showed in Fig. 3B-3F, TLR2 HFSC-KO mice have prolonged first telogen. Noggin treatment at the first postnatal telogen promotes telogen to anagen transition in TLR2HFSC-KO characterized by the activation of HG cells prior to the bulge cells. According to the literature, the bulge cells remained silent during the late telogen, however, HGs became Ki67- positive and the proliferation of HG cells contributed to the telogen-to-anagen transition.

      (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2668200/

      https://www.sciencedirect.com/science/article/pii/S0022202X15404518?via%3Dihub

      https://journals.biologists.com/jcs/article/114/19/3419/34892/Hair-follicle-predetermination).

      “Fig.5: Does TLR2 cKO slow down wound healing, in addition to affecting pigmentation and the number of hair follicles?”

      In our previous publication, we demonstrated that deletion of TLR2 in HFSC does not affect wound healing process. Instead, endothelial TLR2 promotes wound vascularization and healing.

      (see Xiong and all. Timely Wound Healing Is Dependent on Endothelial but Not on Hair Follicle Stem Cell Toll-Like Receptor 2 Signaling.// Journal of Investigative Dermatology, Volume 142, Issue 11, November 2022, Pages 3082-3092.e1).

      “There is no panel B in Fig.4. There is no image in Fig 4D. Please correct this properly.”

      We corrected Fig.4

      “Discussion: The constant production of CEP in homeostatic skin and in the absence of inflammation should be further discussed. Additionally, the possible causes of reducing CEP levels during aging should also be further discussed.”

      We explained the sources of CEP generation, such as MPO as a one of the key enzyme, above.<br /> The data on MPO levels in hair follicles of young and old mice are presented in Supplementary Fig.5A,B. Since we previously shown that MPO produces CEP from DHA (PMC6034644), the reduction in MPO in aging is likely to contribute to reduced CEP levels.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      ⍺-synuclein (syn) is a critical protein involved in many aspects of human health and disease. Previous studies have demonstrated that post-translational modifications (PTMs) play an important role in regulating the structural dynamics of syn. However, how post-translational modifications regulate syn function remains unclear. In this manuscript, Wang et al. reported an exciting discovery that N-acetylation of syn enhances the clustering of synaptic vesicles (SVs) through its interaction with lysophosphatidylcholine (LPC). Using an array of biochemical reconstitution, single vesicle imaging, and structural approaches, the authors uncovered that N-acetylation caused distinct oligomerization of syn in the presence of LPC, which is directly related to the level of SV clustering. This work provides novel insights into the regulation of synaptic transmission by syn and might also shed light on new ways to control neurological disorders caused by syn mutations.

      We thank the reviewer for appreciating the importance of our work and his/her positive comments.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors employed DLS to quantify the percentage of SV clustering in Fig. 1c and d. As DLS usually measures particle size distribution, I am not sure how the data was plotted in Fig. 1c and d. It would be great to show a representative raw dataset here.

      We thank the reviewer for the comment. To address this, we have put four representative DLS datasets of different α-Syn variants mediating SV clustering for clarification (Author response image 1). Rather than presenting the particle distribution based on the light scattering intensity, DLS can also convert the intensity to present the data as particle size distribution based on the particle number counts. In our analysis, particle diameters around 50 nm are considered to represent single SV species, whereas diameters larger than 120 nm indicate SV clusters. Specifically, as shown in Author response image 1, adding Ac-α-syn to a homogeneous SV sample altered the distribution from one single SV particle species (Author response image 1d) to three distinct species (Author response image 1a); this resulted in 68.5% of the particles being single SVs and 31.5% being SV clusters.

      Author response image 1.

      Representative raw dataset of α-Syn-mediated synaptic vesicle (SV) clustering monitored by dynamic light scattering (DLS). The grey-colored rows represent small particles (< 5 nm) that contributed zero to the particle number count.

      (2) Syn-lipid interactions are known to be altered by mutations involved in neurodegenerative diseases. I am wondering how those mutations will affect SV clustering mediated by the interaction of LPC with N-acetylated syn.

      We thank the reviewer for the insightful comment. Our data indicate that N-acetylation enhances the binding of the N-terminal region of α-syn to LPC, thereby facilitating SV clustering. This enhancement benefits from the fact that N-acetylation effectively neutralizes the positive charge of α-syn’s N-terminal region, promoting its insertion into LPC-rich membranes through hydrophobic interactions. Therefore, we envision that any mutation that weakens membrane binding capability of the N-terminal unmodified α-Syn may decrease SV clustering mediated by the interaction between the Ac-α-syn and LPC.

      In a separated work (doi: 10.1093/nsr/nwae182, Fig. S8), we compared the binding affinity of LPC with wild-type N-terminal un-modified α-syn and six Parkinson’s disease (PD) familial mutants (A30P, E46K, H50Q, G51D, A53E, and A53T). Among these, only the A30P mutation showed a significant decrease in binding with LPC. Furthermore, using the same single vesicle assay setup, in another paper (doi: 10.1073/pnas.2310174120, Fig. 4C), we demonstrated that the A30P-mutated α-Syn lost its ability to facilitate SV clusters. Therefore, among the six PD mutations, the A30P mutation may significantly impact the SV clustering mediated by Ac-α-syn LPC interaction.

      (3) The crosslinking data in Fig. 4 was obtained using LPC or PS liposomes. I am wondering if these results truly mimic physiological conditions. Could the authors use SVs for these experiments?

      We thank the reviewer for the suggestion. To elucidate the mechanistic differences between N-terminal unmodified α-syn and N-acetylated α-syn, we utilized pure LPC and PS liposomes for clarity. If using natural source SVs, which contain many synaptic proteins, could complicate or obscure the interaction patterns of Ac-α-syn due to potential crosstalk with other SV proteins. Additionally, the complex lipid environment of SV membranes would not help us decipher the specific molecular mechanism by which Ac-α-Syn facilitates SV clustering through LPC.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors provide evidence that posttranslational modification of synuclein by N-acetylation increases clustering of synaptic vesicles in vitro. When using liposomes the authors found that while clustering is enhanced by the presence of either lysophosphatidylcholine (LPC) or phosphatidylcholine in the membrane, N-acetylation enhanced clustering only in the presence of LPC. Enhancement of binding was also observed when LPC micelles were used, which was corroborated by increased intra/intermolecular cross-linking of N-acetylated synuclein in the presence of LPC.

      Strengths:

      It is known for many years that synuclein binds to synaptic vesicles but the physiological role of this interaction is still debated. The strength of this manuscript is clearly in the structural characterization of the interaction of synuclein and lipids (involving NMR-spectroscopy) showing that the N-terminal 100 residues of synuclein are involved in LPC-interaction, and the demonstration that N-acetylation enhances the interaction between synuclein and LPC.

      We thank the reviewer for their positive assessment of our work.

      Weaknesses:

      Lysophosphatides form detergent-like micelles that destabilize membranes, with their steady-state concentrations in native membranes being low, questioning the significance of the findings. Oddly, no difference in binding between the N-acetylated and unmodified form was observed when the acidic phospholipid phosphatidylserine was included. It remains unclear to which extent binding to LPC is physiologically relevant, particularly in the light of recent reports from other laboratories showing that synuclein may interact with liquid-liquid phases of synapsin I that were reported to cause vesicle clustering.

      We appreciate the reviewers’ insightful comments. Indeed, in another paper (doi: 10.1093/nr/nwae182), employing conventional α-Syn pull-down assay and LC-MS lipidomics method, we found that α-Syn has a preference for binding to lysophospholipids across in vivo and in vitro systems. Additionally, by comparing the lipid compositions of mouse brains, SVs and SV lipid-raft membranes, we found LPC levels to be twice as high in SVs compared to brain homogenates, and twice as high in lipid-raft membranes compared to non-lipid-raft membranes. Altogether, these findings emphasize the physiological relevance of understanding the mechanism by which Ac-α-syn mediated SV clustering through LPC.

      Liquid-liquid phase separation has been implicated in the assembly and maintenance of SV clusters, and we believe that the SV cluster liquid phase is interconnected by highly abundant proteins with multivalent low-affinity interactions. Besides the previously discovered protein-protein interactions between α-Syn and synapsin (doi: 10.1016/j.jmb.2021.166961) or VAMP2 (doi: 10.1038/s41556-024-01456-1) that contribute to SV condensates, protein-lipid interactions between α-Syn and acidic phospholipids or LPC may also play a role. Furthermore, post-translational modifications, such as N-acetylation of α-Syn, may also contribute to SV condensates.

      Reviewer #2 (Recommendations For The Authors):

      In Fig. 2, the authors indicate that for the binding assay both vesicle populations, the immobilized "acceptor" and the superfused "donor" population were labeled with different fluorescent dyes whereas in the text it is stated that the immobilized acceptor liposomes were unlabeled. Please clarify. Moreover, a control is missing showing that binding indeed depends on the immobilised liposome fraction and does not occur in their absence. This control is important because due to the long incubation times non-specific adsorption may occur which may be enhanced by adding destabilizing LPC or charged PS to the membrane.

      We thank the reviewer for pointing out this inconsistency. To avoid signal leakage from a high concentration of DiD vesicles upon green laser irradiation, we immobilized unlabeled vesicles. We have revised the Figure 2a as well as the figure caption.

      Regarding the control mentioned by the reviewer, we agree with the reviewer that non-specific binding could occur with the long incubation. In fact, a layer of highly dense liposomes (100 μM) immobilized on the imaging surface is also for reducing non-specific interactions. In the absence of this layer of immobilized liposomes, we did see a high level of non-specific binding that significantly impacted our experiments. Therefore, we need to perform clustering experiments in the presence of immobilized liposomes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript co-authored by Pál Barzó et al is very clear and very well written, demonstrating the electrophysiological and morphological properties of human cortical layer 2/3 pyramidal cells across a wide age range, from age 1 month to 85 years using whole-cell patch clamp. To my knowledge, this is the first study that looks at the cross-age differences in biophysical and morphological properties of human cortical pyramidal cells. The community will also appreciate the significant effort involved in recording data from 485 cells, given the challenges associated with collecting data from human tissue. Understanding the electrophysiological properties of individual cells, which are essential for brain function, is crucial for comprehending human cortical circuits. I think this research enhances our knowledge of how biophysical properties change over time in the human cortex. I also think that by building models of human single cells at different ages using these data, we can develop more accurate representations of brain function. This, in turn, provides valuable insights into human cortical circuits and function and helps in predicting changes in biophysical properties in both health and disease.

      Strengths:

      The strength of this work lies in demonstrating how the electrophysiological and morphological features of human cortical layer 2/3 pyramidal cells change with age, offering crucial insights into brain function throughout life.

      Weaknesses:

      One potential weakness of the paper is that the methodology could be clearer, especially in how different cells were used for various electrophysiological measurements and the conditions under which the recordings were made. Clarifying these points would improve the study's rigor and make the results easier to interpret.

      Reviewer #2 (Public review):

      Summary:

      In this study, Barzo and colleagues aim to establish an appraisal for the development of basal electrophysiology of human layer 2/3 pyramidal cells across life and compare their morphological features at the same ages.

      Strengths:

      The authors have generated recordings from an impressive array of patient samples, allowing them to directly compare the same electrophysiological features as a function of age and other biological features. These data are extremely robust and well organised.

      Weaknesses:

      The use of spine density and shape characteristics is performed from an extremely limited sample (2 individuals). How reflective these data are of the population is not possible to interpret. Furthermore, these data assume that spines fall into discrete types - which is an increasingly controversial assumption.

      Many data are shown according to somewhat arbitrary age ranges. It would have been more informative to plot by absolute age, and then perform more rigourous statistics to test age-dependent effects.

      Overall, the authors achieve their aims by assessing the physiological and morphological properties of human L2/3 pyramidal neurons across life. Their findings have extremely important ramifications for our understanding of human life and implications for how different neuronal properties may influence neurological conditions.

      Reviewer #3 (Public review):

      Summary:

      To understand the specificity of age-dependent changes in the human neocortex, this paper investigated the electrophysiological and morphological characteristics of pyramidal cells in a wide age range from infants to the elderly.

      The results show that some electrophysiological characteristics change with age, particularly in early childhood. In contrast, the larger morphological structures, such as the spatial extent and branching frequency of dendrites, remained largely stable from infancy to old age. On the other hand, the shape of dendritic spines is considered immature in infancy, i.e., the proportion of mushroom-shaped spines increases with age.

      Strengths:

      Whole-cell recordings and intracellular staining of pyramidal cells in defined areas of the human neocortex allowed the authors to compare quantitative parameters of electrophysiological and morphological properties between finely divided age groups.

      They succeeded in finding symmetrical changes specific to both infants and the elderly, and asymmetrical changes specific to either infants or the elderly. The similarity of pyramidal cell characteristics between areas is unexpected.

      Weaknesses:

      Human L2/3 pyramidal cells are thought to be heterogeneous, as L2/3 has expanded to a high degree during the evolution from rodents to humans. However, the diversity (subtyping) is not revealed in this paper.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors):

      The manuscript co-authored by Pál Barzó et al is very clear and very well written, demonstrating the electrophysiological and morphological properties of the human cortical layer 2/3 pyramidal cells across a wide age range, from age 1 month to 85 years using whole-cell patch clamp. To my knowledge, this is the first study that looks at the cross-age differences in morphological and electrophysiological properties of human cortical pyramidal cells. The community will also appreciate the significant effort involved in recording data from 485 cells, given the challenges associated with collecting data from human tissue. understanding the electrophysiological properties of individual cells, which are essential for brain function, is crucial for comprehending human cortical circuits. I think this research enhances our knowledge of how biophysical properties change over time in the human cortex. I also think that by building models of human single cells at different ages using these data, we can develop more accurate representations of brain function. This, in turn, provides valuable insights into human cortical circuits and function and helps in predicting changes in biophysical properties in both health and disease.

      We are grateful for the positive evaluation of our work. We also thank the reviewers for their comments and believe that our manuscript has improved significantly with their help. In addition to the reviewer’s suggestions for improvement, further cell reconstructions were performed to make the anatomical data more robust (n = 1,2,3,3,4,3,2 additional reconstruction in age groups infant, early childhood, late childhood, adolescence, young adulthood, middle adulthood and late adulthood, respectively; Σn = 18). Four additional cells were added to the spine analysis and the statistics associated with each additional dataset were updated.

      I have some comments, particularly regarding the methodology and data presentation, to improve the clarity of the paper

      (1) I assume the tissue is from the resected area adjacent to the tumor. Could you please clarify this in the Methods section?

      Thank you for this comment, it has been clarified in the Methods section with the following sentence: “We used human cortical tissue adjacent to the pathological lesion  that had to be surgically removed from patients (n = 63 female  n = 45 male) as part of the treatment for tumors, hydrocephalus, apoplexy, cysts, and arteriovenous malformation.”

      (2) Regarding the presentation of data in the Methods section, could you please clarify whether the authors used different cells for measuring the various electrophysiological properties? The number of recorded cells for calculating subthreshold properties (e.g., late adulthood: n = 113) differs from the number the cells used for calculating suprathreshold properties (e.g., late adulthood: n = 83). If this is the case, it may make it difficult to compare the electrophysiological properties. Could you please clarify this?

      The different element numbers are indeed due to the fact that different quality criteria were defined for the analysis of fast and slow signals. For the analysis of fast signals (e.g. AP half-width, AP upstroke velocity, AP amplitude), higher quality requirements were established therefore cells with high series resistance (> 30 MΩ) were excluded. We have updated and clarified the recording conditions in the text, figures, and methodology section accordingly.

      (3) Additionally, they mentioned that their recordings were done at zero holding current and at more than -50 pA. Could you clarify whether the data from these two sets of experiments were combined? If so, please provide an explanation in the methods section.

      Basically, we wanted to determine the parameters of the potential changes of the membrane at rest. However, for technical reasons related to the biological amplifier, in some of the experiments a certain continuous holding current may be present during the measurement (3.5% of all experiments). The holding currents were in the range of -50 pA to +60 pA. Within this range, previously checked on mouse neurons we have not found linear correlation between the electrophysiological properties and the holding current. This is reported in the Methods section.

      (4) This section needs revision. It is unclear why different series resistances (Rs) or different cells were used to compute various electrophysiological properties." To calculate passive membrane properties (resting membrane potential, input resistance, time constant, and sag) either cells with series resistance (Rs): 22.85 {plus minus} 9.04 MΩ (ranging between -4.55 MΩ and 56.76 MΩ) and 0 pA holding current (n = 154), or cells with holding current > -50 pA (-7.46 {plus minus} 28.56 pA, min: -49.89 pA, max: 59.68pA) and Rs < 30 MΩ (18.96 {plus minus} 6.48 MΩ) (n = 23) were used. For the analysis of high frequency action potential features (AP half-width, AP up-stroke velocity, AP amplitude and rheobase) cells with Rs < 30 MΩ (n = 331 cells with Rs 19.2 {plus minus} 6.6 MΩ) and holding current > -50pA (n = 308 with 0 pA holding current and Rs: 19.22 {plus minus} 6.59 MΩ, n = 23 withholding current: -7.46 {plus minus} 28.56 pA and Rs: 18.96 {plus minus} 6.48 MΩ) were used."

      To make the chapter clearer, we simplified the cell groups used to analyse the different electrophysical properties and revised the Method section as follows: “For the analysis of the electrophysiological recordings n = 457 recordings with a series resistance (Rs) of 24.93 ± 11.18 MΩ (max: 63.77 MΩ) were used. For the analysis of fast parameters related to the action potential (AP half-width, AP upstroke velocity, AP amplitude and rheobase), higher quality requirements were set and cells with Rs > 30 MΩ were excluded. This reduced the data set to n = 331 cells with Rs 19.42 ± 6.2 MΩ.”

      (5) The authors recorded the sag ratio using a -100 pA injected current. Is there a technical reason why they did not inject more than -100 PA?

      There is no particular technical reason, we use similar to others this current amplitude for voltage response recordings over the years to record electrophysiological traces.

      (6) In the abstract, the authors mentioned that data were recorded from ages 1 month to 85 years. However, in the results, they stated that data were recorded from ages 0 to 85 years. Could you please clarify this discrepancy?

      We corrected this discrepancy.

      (7) Additionally, the results mention that data were collected from 485 human cortical layer 2/3 (L2/3) pyramidal cells, but subthreshold membrane features such as resting membrane potential, input resistance, time constant (tau), and sag ratio were calculated in 475 cortical pyramidal cells from 99 patients. Could you please clarify these discrepancies? In the discussion "We recorded from n = 457 human cortical excitatory pyramidal cells from the supragranular layer from birth to 85 years"

      Thank you for pointing this out, we have corrected the error. Although our full data set contained 485 pyramidal cells, 28 recordings were excluded from the electrophysiological analysis and were used for morphological evaluation only, therefore 457 recordings were used for passive parameter measurements.

      (8) Regarding the distance from the pia to the border layer L1/L2, did the authors notice any differences across ages?

      To investigate whether the thickness of cortical layer 1 changes throughout life, we measured the L1 thickness and found no significant differences between age groups (P = 0.09, Kruskal-Wallis test) (Author response image 1).

      Author response image 1.

      Thickness of cortical layer 1 at different life stages. (A) Boxplot shows the thickness of layer 1. (B) Scatter plot shows the distribution of L1 thickness measured on the reconstructed cells. Age is shown in years on a logarithmic scale, dots are color-coded according to the corresponding age groups.

      (9) I am not sure why they referred to the data as layer 2/3 when most of the data, based on Figure 1E, were recorded from a distance of 0-200 µm from the L1/L2 border. Could it be that there is no significant depth-dependent variation in electrophysiological properties, as reported by Berg (2021), Kalmbach (2018), and Chameh (2021)?

      Although the vast majority of our data comes from a distance of less than 200 μm from the L1/L2 border, we cannot neglect the fact that our dataset also contains a small number of cells deeper than this, which are layer 3 cells. Apart from some differences shown in Supplementary Figures 7-9, we found no general difference between cells located at a distance of less than 200 μm and more than 200 μm from the L1 border.

      (10) In Figure 1, there is variability in resting membrane potential (RMP), tau, and input resistance (IR) within the infant age group. However, this trend is not observed in the sag ratio. Could you please discuss this finding?

      The large variance in the data is due to dramatic changes in these three parameters during the first year of life. Supplementary Figure 3 shows the comparisons of parameter distributions of patients between 0-6 months and 6-12 months. The sag amplitude in these cells is generally low therefore no such large changes could have occurred in them.

      (11) Did the authors use a K-Nearest Neighbors (KNN) test to assess the accuracy of the infant cluster in Figure 3F?

      Based on eight electrophysiological features of the cells (resting Vm, input resistance, tau, sag ratio, rheobase, AP half-width, AP up-stroke, and AP amplitude), the infant pyramidal cells on a UMAP form a distinct group (Author response image 2A) represented by cluster 4 on Author response image 2B. When calculating the sum of the Euclidean distances of cells within the cluster from the centroid, the isolated infant group (cluster 4) shows the smallest distance value from the centroid (cluster 1: 40.2, cluster 2: 36.21, cluster 3: 39.96, cluster 4: 5.72, cluster 5: 39.2, cluster 6: 55.74, cluster 7: 54.27), demonstrating that infant cells create a discrete cluster distinct from other age groups (Author response image 2B).

      Author response image 2.

      (A) Uniform Manifold Approximation and Projection (UMAP) of 8 selected electrophysiological properties (resting Vm, input resistance, tau, sag ratio, rheobase, AP half-width, AP up-stroke, and AP amplitude) with data points for 331 cortical L2/3 pyramidal cells, colored with the corresponding age groups. (B) UMAP colored by k-means clustering with 7 clusters, red crosses represent the centroids of the clusters.

      (12) Missing citation: 'Previous research has shown that the biophysical properties of human pyramidal cells show depth-related correlations throughout L2/3 (Berg et al., 2021).' Please include citations for Kalmbach (2018) and Chameh (2021).

      We thank for the additional references, these studies are now cited.

      (13) Have they noticed any morphological properties differences among the different cortical lobes (Parietal, Temporal, Frontal, and Occipital). It would be beneficial to present this data, especially since they have a sufficient sample size from each cortical lobe.

      The majority of our data set on the morphological properties of pyramidal cells comes from the parietal (n = 17 cells) and temporal lobe (n = 15). We found no significant differences in the morphological properties of cells from these two brain regions and no differences between age groups in the same cortical lobes.

      (14) Have the authors found differences in spine characteristics among different cortical areas, as reported previously by 10.1023/a:1024134312173).

      We found morphological differences in dendritic spines in the different brain regions, yet, our data are limited to draw definitive conclusions.

      Reviewer #2 (Recommendations for the authors):

      Major

      (1) I believe that these data presented in all main text figures would be more intuitive to be plotted on a log(age) scale, such as shown in supplementary Figure 13. The bounds of the ages used for different groups, as summarised in Figure 1 feel somewhat arbitrary.

      Recent neuroscientific studies on postnatal ageing mainly use the age-group comparison format (Kang 2011, Bethlehem 2022), which has been defined based on milestones in the cognitive, motor, social-emotional, and language/communications domains of observable behaviour (Zubler et al. 2022, for detailed definitions see Kang 2011). Since many parameters do not vary linearly but take a U-shape (or inverted U-shape), statistical quantification of these is not straightforward, so we would retain the age-group format for the main graphs. However, at the reviewer's suggestion, electrophysiological and morphological parameters are presented on a log(age) scale as supplementary figures (Supplementary Figures 2,4 and 6), also further statistical analysis was also carried out without grouping the data (see response 5).

      (2) The authors present a lot of data values in the text, which is also shown in the figures. This makes reading of the manuscript somewhat difficult in places. For brevity, it may be best to present this data as supplementary tables.

      Thank you for this suggestion. We have inserted these data as tables.

      (3) I am unclear why the authors excluded cells that fired doublets or triplets in Figure 4? Were these included in the passive and AP-specific analysis - but excluded from F-I plots? Please clarify the rationale and the relative abundance of these physiological types based on age - one might predict that more initial-burst firing types are associated with older neurons?

      Thank you for drawing attention to this anomaly. We have updated the figures and text by adding the cells with initial burst firing. These cells are also included in the analysis of passive and action potential properties. In our overall dataset, 6.78% of cells show burst firing; infant: 0%, early childhood: 3.57% (1 cell), late childhood: 0%, adolescence: 11.11% (6 cells), young adulthood: 10.11% (9), middle adulthood: 10.71% (6 cells), late adulthood: 7.96 (9 cells) of all cells including the age groups.

      (4) The statistical analyses performed in Figure 6 are not justified. From the authors' description of these data, they derive spine density measurements from 1 infant and 1 aged adult, then perform pseudoreplicated analysis in these individuals. These data would require greater replication from infant and aged groups - with the possible inclusion of a younger adult group also. It would be ideal to have n=3/age group to allow robust statistical analysis.

      Thank you for this point. Accordingly, we have expanded our data set to include n = 3 infant pyramidal cells (83 days old, from one patient) and n = 3 pyramidal cells from three late adulthood patients (64.3 ± 2.08 years old).

      (5) Given the high number of individuals and replicates throughout this manuscript, a more circumspect approach to statistics would be appreciated, e.g. a generalised linear mixed effects model - with age as a fixed effect and sex, patient, etc as random effects. This may reveal the greatest statistical power of these important and rich data.

      Of the generative models we used the Generalized Additive Mixed Model (GAMM) to describe the relationship between age and the various passive and active electrophysiological features. We defined age with cubic spline smoothing term as the fixed effect and gender, brain area, surgical procedure, and hemisphere as random effects. With GAMM we found that the age-dependent correlation of the examined parameters (resting membrane potential, input resistance, tau, sag ratio, rheobase current, AP half-width, AP up-stroke velocity, AP amplitude, first AP latency, adaptation) was significant, except for F-I slope, described by the model incorporating the four random effects.  We also observed correlation with gender, brain area, hemisphere, and surgical procedure in various intrinsic properties. The Author response table 1 below shows the statistical values of GAMM and the statistical tests used in the manuscript to compare.

      Author response table 1.

      Statistical significance of patient attributes *In the pairwise comparison, the age of cells in the two groups was significantly different: female (subthreshold: 37.36 ± 26.25 years old, suprathreshold: 38.3 ± 25.6 y.o.) - male (subthreshold: 24.86 ± 23.7 y.o., suprathreshold: 25.7 ± 23.93 y.o.), subthreshold: P = 1.96*10-6, suprathreshold: P = 3.25*10-5 Mann-Whitney test. **In the pairwise comparison, the age of cells in the two groups was significantly different: surgical procedure: tumor removal (subthreshold: 33.72 ± 24.33 y.o., suprathreshold: 36.43 ± 27.07 y.o.) - VP shunt (subthreshold: 27.38 ± 29.69 y.o., suprathreshold: 27.07 ± 29.37 y.o.) subthreshold: P = 3.68*10-3, suprathreshold: P = 1.64-10-3, Mann-Whitney test)

      (6) Regarding the morphological diversity of dendritic spines. There is some debate in the field as to whether the distinction of specific dendritic spine types - as conveyed in this manuscript - are true subtypes or reflect a continuum of diverse morphology (see Tønneson et al., 2014 Nature Neuroscience). It is appreciated that the approach taken by the authors is the dogma within the field - however, dogma should continue to be challenged. Given that the authors have used DAB labelling combined with light microscopy, the possibility of accurately measuring spine morphology required for determining this continuum is extremely limited (e.g. Li et al., (2023) ACS Chemical Neuroscience). I would suggest that alongside the inclusion of further replicates for their spine analysis, the authors tone down their discussion of spine subtypes given the absence of any synaptic data presented in this current study to support the maturation (or otherwise) of dendritic spine synapses.

      Many thanks to the reviewer for this comment. We agree with the drawbacks of our method for testing spine categorization. To increase the reliability of our results, we increased the number of pyramidal cells in the infant and late adult groups. We also revised the figure and as suggested by Reviewer#3 added photos of spines to each category in addition to schematic drawings to give an impression of the phenotype. In the discussion, we only address the differences between two readily separable mushroom and filopodial forms and highlight results that only confirm findings already known in the literature. Although the concerns are valid, we apply the sentence from the above Li et al. (2023) reference “...the most sophisticated equipment may not always be necessary for answering some research questions”. We believe that it is worth sharing our data and the somewhat subjective grouping, which we hope to report in more detail in the future.

      Minor

      (1) The order of the supplemental materials is out of order with their introduction in the text. These should be revised to reflect the order mentioned in the text.

      Thank you for your comment, we have corrected the order of the supplementary figures.

      (2) In Supplementary Figure 13, it would be informative to include some form of linear regression to confirm whether an age-dependent effect on neuronal morphology exists.

      We have added linear regression to the figure.

      (3) Figure 3D = should this be AP - not Ap?

      Thank you for drawing attention to this, we have corrected the incorrect typing on the figure.

      (4) For UMAP analysis in Figure 3, please provide a table of the features that were used for the 32 & 8-parameter UMAPs respectively.

      We have added a table to the Materials and methods section of all the electrophysiological features included in the UMAP.

      (5) For morphology, please include pia and L1/2 border for reconstructions shown for clarity.

      We indicated both the pia mater and the L1/2 border on the figure showing all the reconstructions (Supplementary Figure 10).

      Reviewer #3 (Recommendations for the authors):

      Major:

      (1) Data were obtained from different cortical areas of human patients of different ages. The electrophysiological characteristics were largely independent of other attributes such as disease, gender, and cortical areas (Supplementary Figure 2). To support the conclusion that age is one of the key attributes responsible for change, a similar morphological analysis would be necessary for gender.

      We updated the text and the supplementary section with Supplementary Figures 18-21. to determine if age-related differences in biophysical characteristics are affected by the patient's gender.

      (2) 'mushroom-shaped, thin, filopodial, branched, and stubby spines'

      Show photographs of individual typical spine types to make the classification easier to understand.

      To make the classification more understandable, we have updated the corresponding figure (Figure 6) with representative photos of the dendritic spine types.

      (3) Some electrophysiological parameters of the infant group showed higher deviations compared to other age groups. A UMAP (Supplementary Figure 2) shows that some infant neurons form a small cluster, while other infant neurons are scattered with neurons of other ages. Are there any differences between infant neurons in the small cluster and other infant neurons with respect to attributes other than age?

      For most of the electrophysiological parameters, the infant age group showed age-dependent variability, as illustrated in Supplementary Figures 3, 2,4 and 6 . The small group of infant cells is not clustered by gender, brain region, or medical condition, as shown in Supplementary Figure 5.

      (4) A recent paper (Benavides-Piccione et al. 2024, doi:10.1093/cercor/bhae180) reported that some morphological parameters of human layer 3 neurons differ between occipital and temporal regions. Area-dependent morphological differences have been also reported in non-human primates. Discussion of potential contradictions may therefore be requested.

      Most of the cells we reconstructed originated from the parietal and temporal regions (parietal: n = 20, temporal: n = 23, frontal: n = 15, occipital: n = 5). We found no differences in morphological features between these two regions, and we also found no significant differences when we compared the cells from the same brain regions by age group.

      (5) L2/3 cells of rodents are morphologically differentiated according to cortical depth. If individual L2/3 cells of humans are less differentiated than those of rodents, this point should be discussed.

      Depth-related morphological heterogeneity has already been reported previously (Berg 2021), however, our dataset on the morphological characteristics of pyramidal cells is from the upper L2/3 region, with their soma located at a distance of 117.85 ± 65.3 μm (between: 11.05 and 243.3 μm) from the L1/L2 border. Therefore, we cannot conclude from our data whether humans are less differentiated than rodents.

      Minor:

      (1) Cell body morphology may affect electrophysiological properties. However, morphological quantification of cell bodies has not been reported. It may be added.

      In our DAB-labeled samples, we could not perfectly measure the total volume of the cell body in the reconstructions, therefore our measurements regarding the soma morphology are not shown in the manuscript. When comparing the cell body area of the middle sections of the soma of the reconstructed cells between the age groups, we found no significant differences (P = 0.082, Kruskal–Wallis test).

      (2) 'The adaptation of the AP frequency response'

      Describe how this parameter was obtained.

      The adaptation of the AP frequency response or adaptation was calculated as the average adaptation of the interspike interval between consecutive APs.

      (3) 'we excluded cells showing initial duplet or triplet action potential bursts'

      Why were the burst cells excluded from the analysis?

      We have modified the figures and text to include cells with initial burst firing.

      (4) Electrophysiological characteristics to be analyzed:

      Spike thresholds and afterhyperpolarizations

      We found age-related differences in the amplitude of the afterhyperpolarization (P = 2.56*10<sup>-30</sup>, Kruskal-Wallis test) and in the threshold of the action potential (P = 5.24*10<sup>-12</sup>, Kruskal-Wallis test) (Author response image 3).

      Author response image 3.

      Age-dependence of afterhyperpolarization and AP threshold. (A-B) Boxplots show the differences in afterhyperpolarization (AHP) amplitude (A) and AP threshold (B) between age groups. Asterisks indicate statistical significance (* P < 0.05, ** P < 0.01, *** P < 0.001, Kruskal-Wallis test with post-hoc Dunn test). (C-D) Scatter plots show AHP amplitude (C) and AP threshold (D) across the lifespan. Age is shown on a logarithmic scale, dots are colored according to the corresponding age group.

      (5) 'We identified and labeled each spine on n = 2 fully 3D-reconstructed cells'

      To which cortical area do these cells belong?

      At what depths are they distributed?

      Is it possible to report the number of spines, in addition to the density per unit length?

      We increased the number of cells in which we analyzed dendritic spine density. The data shown in Figure 6. are from pyramidal cells from an infant patient (n = 3 from a single patient) and late adulthood patients (n = 3 from 3 patients) (Supplementary Figure 13). The infant cells are from the same patient, the sample is from the right parietal lobe, and the patient is 83 days old. The older cells are from three different patients (#1: 65 years old, right temporal lobe; #2: 66 years old, right parietal lobe; #3: 62 years old, right frontal lobe). Infant cells are located 144.43 ± 45.26 µm (#1: 109.3, #2: 128.49, #3: 195.5 µm), late adult cells 161.22 ± 66.22 µm (#1: 183.5, #2: 213.42, #3: 86.73 µm) from the L1/2 border. We provide the number of spines in an additional supplementary table (Supplementary table 2.).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Thank you for your time and consideration on our submission. We also thank the reviewers for their consideration and helpful comments.  We have revised the introduction, results, and discussion sections of the revised manuscript in accordance with the reviewers’ suggestions, which have enhanced the clarity of our work. Specifically, we have clarified that the aim of the study is to report newly discovered sperm behaviours inside the uterus via high resolution deep tissue live imaging, and to stimulate further studies and discussion in the field of postcopulatory sexual selection in mice based on our observations. To the best of our knowledge, many of the specific sperm behaviours described in our manuscript are being reported for the first time, proven through direct observation inside the living reproductive tract.

      We have also restructured our manuscript and moved our hypothetical interpretations based on our experimental observations to the discussion section. We hope that these revisions have clarified our claims and that our revised manuscript effectively communicates the importance of our findings and its values in prompting new questions and insight that encourage further studies. We believe that our work clearly demonstrates the importance of sperm/reproductive tract interaction, which cannot be adequately studied in artificial environments, and may become an important guideline for designing future experiments and studies.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors want to determine the role of the sperm hook of the house mouse sperm in movement through the uterus. The authors are trying to distinguish between two hypotheses put forward by others on the role of the sperm hook: (1) the sperm cooperation hypothesis (the sperm hook helps to form sperm trains) vs (2) the migration hypothesis (that the sperm hook is needed for sperm movement through the uterus). They use transgenic lines with fluorescent labels to sperm proteins, and they cross these males to C57BL/6 females in pathogen-free conditions. They use 2-photon microscopy on ex vivo uteri within 3 hours of mating and the appearance of a copulation plug. There are a total of 10 post-mating uteri that were imaged with 3 different males. They provide 10 supplementary movies that form the basis for some of the quantitative analysis in the main body figures. Their data suggest that the role of the sperm hook is to facilitate movement along the uterine wall. 

      We thank the reviewer for summarizing our work and the critical review of our paper. As summarized, the sperm hook has been primarily associated with the sperm cooperation (sperm hook) hypothesis and the migration hypothesis. However, we would like to emphasize that the aim of our work is not to cross check between the two hypotheses. Our aim was not to disprove either hypothesis, but rather to develop an experimental platform that enables detailed observation of sperm migration dynamics within the live reproductive tract. 

      Through live imaging, we observed both the formation of sperm trains as well as interaction between the sperm and female reproductive tract epithelium. However, in our observations, we could not find advantage in terms of faster movement for the rarely observed sperm trains. While these events were infrequent in our experiments, we are not asserting that the sperm train hypothesis is invalid but rather reporting our observations as is. 

      The main findings of our work lie in the newly observed dynamic behaviours of mouse sperm interacting with the female reproductive tract epithelium. Specifically, tapping and associated guided movement along the uterus wall, anchoring and related resistance to internal fluid flow and migration through the utero-tubal junction, and self-organized behaviour while clinging onto the colliculus tubarius. We have extensively revised the manuscript structure to clarify our findings.

      Strengths: 

      Ex vivo live imaging of fluorescently labeled sperm with 2-photon microscopy is a powerful tool for studying the behavior of sperm. 

      Weaknesses: 

      The paper is descriptive and the data are correlations. 

      The data are not properly described in the figure legends. 

      When statistical analyses are performed, the authors do not comment on the trend that sperm from the three males behave differently from each other. This weakens confidence in the results. For example, in Figure 1 the sperm from male 3613 (blue squares) look different from male 838 (red circles), but all of these data are considered together. The authors should comment on why sperm across males are considered together when the individual data points appear to be different across males. 

      Thank you for your comments and suggestions. We have revisited all figure legends and made the necessary amendments (shown in the red-lined manuscript). Please note that, for a better flow of the paper, the previous Figure 1 has been changed to Figure 2 in the revised manuscript.

      Regarding the analysis using different males, we would like to explain the statistics used. We used generalized linear mixed models to test the effect of the Angle and Distance to the wall on the migration kinetic parameters. The advantage of the generalized linear mixed models is that they consider individual variations in the data as an error term, thereby controlling such individual variations. 

      There are two main factors contributing to individual variations. One is, as you pointed out, the difference in sperm from different males. However, we used genetically similar mice, so genetical variations must be minimal. Nonetheless, there must be individual differences that caused variations including age, stress level as well as body conditions. As these factors cannot be controlled, we used the mixed model approach where individual variations are grouped within the individual. This approach enabled us to test the effect of each explanatory variable (Angle and Distance) within an individual. 

      The second factor that could cause variations is the female oestrous status. To avoid artifacts that could influence sperm behaviour, we did not use any invasive methods, such as hormone injections, to control or induce female oestrus. We controlled for this possible effect by including the mating date as a random effect. Since each female was used only once, the mating date reflects the variation caused by each female.

      To provide further verification that the variation between individual males do not affect our results, we conducted analysis per individual male and mating dates (per each female). As clearly shown, sperm data points from individual males or female also show consistent clear correlations with the distance from the uterus wall. As pointed out, while the mean sperm speed could be different between individuals, they are not the topic we are interested in here. Our interest here is the effect of the distance between sperm and the uterine wall. Additionally, the variation between males is not always larger than those effect of the day (female), which in total suggest that integrating male variation is not essential. We have added this information to Supplementary Figure (Fig. S3) of the revised supplementary materials.

      Moving forward, we can also consider the same analysis for the effects of the distance from wall on sperm SWR and LIN (linearity of forward progression) where no statistical significance was found. As see in the following figures, no statistically significant effect of the distance to wall on SWR and LIN are seen in that the regression lines drawn for each male and mating dates.

      In summary, the statistical approach we used here has successfully reflected variations in sperm kinetics from different males as well as the variance from different females. We hope that our explanations and additional analysis answer your concerns. 

      Movies S8-S10 are single data points and no statistical analyses are performed. Therefore, it is unclear how penetrant the sperm movements are. 

      With respect to Movie S8, Figure 4A and B (Figure 5A and B in the current revised manuscript) depict the trajectories of accumulated spermatozoa (sperm trains) in the female uterus, as shown in Movie S8. We have added this information to the revised figure legend (L 293) for clarity. We could not observe sperm trains that moved faster than single sperms during over 100 hours of observation and collection of over 10TB of images. The three sperm trains presented in Fig. 5B were the sperm trains that moved in the head-forward direction. Most other identifiable trains, or clusters, did not move or could not move forward as their heads were entangled randomly. Although we of course agree that a statistical test for Movie S8 (also Fig. 5B) would be great, due to the small number of sperm trains we found, we could not perform meaningful statistical tests. Instead, we provided all data in the box plots in Fig. 5C so that readers can evaluate and understand our points. We believe that this is a more neutral way of presenting our data rather than providing statistical significance.

      Regarding Movies S9 and S10, we are not entirely sure whether we understood your comments clearly. It would be very helpful if you could point out more specifically to the manuscript with line numbers as we would like to address your concerns and suggestions, and we believe that your input will improve our manuscript. We did not describe the penetration of sperm in these movies. Movies S9 and S10 are newly found sperm behaviours inside the UTJ and Isthmus. We observed that sperm beating is influenced by the width of luminal space as well as internal flow as see in Movies S9 and S10. As our animal model only expresses red fluorescence in the midpiece, accurate beating frequency measurement cannot be performed. However, we can clearly observe that beating is not continuous and almost results in a halt with respect to reproductive tract variations. We revised our description about the findings about beating speed changes in the revised manuscript (LL 305-335).  

      Movies S1B - did the authors also track the movement of sperm located in the middle of the uterus (not close to the wall)? Without this measurement, they can't be certain that sperm close to the uterus wall travels faster. 

      We revised the new Movie S1B to include videos that were used for the sperm migration kinetics analysis in Figure 2 (previously Figure 1). As you can see in the movies, the graph, and statistical analysis, there is a clear trend showing spermatozoa migration is slower as a function of distance from the uterus wall. Regarding your comment with respect to the middle of the uterus (not close to the wall), we have added another movie (Movie S1C) that was acquired at different depths from the wall (going towards the centre of the uterus). As clearly seen in Movie S1c, when imaging deeper into the uterus, there are an increasing number of inactive or slow-moving spermatozoa. Since the diameter of the uterus is easily over 2mm, we currently do not have optical access to exactly the centre of the uterus, but for all depths that are observable, spermatozoa near the wall were clearly faster.

      Movie S5A - is of lower magnitude (200 um scale bar) while the others have 50 and 20 uM scale bars. Individual sperm movement can be observed in the 20 uM (Movie 5SC). If the authors went to prove that there is no upsucking movement of sperm by the uterine contractions, they need to provide a high magnification image. 

      The main focus of video S5A, is the intramural UTJ where spermatozoa are located in rows within narrow luminal space (see Author response image 1). When there is up-suck like sperm passive carriage, there must be sperm movement from the uterus to intramural UTJ as in Author response image 1 left. However, there is no such sperm movement could be seen in our observations, as shown in Movie 5A. Importantly, as you can see in Movie 5A, indicated by an arrow from 5 sec to 6 sec, some spermatozoa are moving downward (see also Author response image 1 right). This is the opposite direction of movement with respect to possible up-suck like sperm carriage. 

      Genetical evidence also support up-suck like passive sperm carriage is not the case for sperm migration from the uterus to UTJ. If environmental up-suck like passive transfer plays an important role, it is unlikely that genetically modified spermatozoa cannot pass the entrance of the intramural UTJ (Nakanishi et al., 2004, Biol. Reprod.; Li et al., 2013, J. Mol. Cell Biol.; Larasati et al., 2020, Biol. Reprod.; Qu et al., 2021, Protein Cell). 

      Author response image 1.

      The left image represents what is expected when up-suck like passive sperm carriage occurs. The right image represents what is actually experimentally observed in the intramural UTJ (see Movie S5A). The direction of the arrowheads indicates the direction of sperm movement.

      Movie S8 - if the authors want to make the case that clustered sperm do not move faster than unclustered sperm, then they need to show Movie S8 at higher magnification. They also need to quantify these data. 

      We understand your concern. As shown in Figure 5B, we included all sperm kinetics data of each sperm train and unlinked spermatozoon around the trains as individual dots. The only analysis we did not conduct was a statistical test with the data as it could be erroneous due to the large sample size difference (3 trains vs 181 unlinked spermatozoa). As the medians of the four sperm kinetic parameters are similar except SWR, we concluded that they are not necessarily faster than unlinked single spermatozoa. Since there is no known advantage to spermatozoa (including sperm trains) with intermediate moving speeds for sperm competition – for example in IVF, success fertilization rate is high when faster and active spermatozoa with normal shape are selected (Vaughan & Sakkas, 2019, Biol. Reprod.) – it is questionable whether there can be an advantage to the formation of sperm trains whose speed is not faster than unlinked spermatozoa in our data.

      However, we do not agree with your comment regarding the need for higher magnification. Measurement of the sperm migration speeds (kinetic parameters) does not require measurement of exact tail movements in this study. Only sperm heads were tracked to measure their trajectory and such tracking was better done at low mag. For example, measuring the speed of a car does not need higher magnifications to visualize the rotation of the wheels. Additionally, including the effect of observation magnification on the sperm kinetic parameters for all 4 GLMM models for Figure 2 (Table S3) does not change the result, which shows that magnification is not a factor that influences our analysis. 

      Movie S9C - what is the evidence that these sperm are dead or damaged? 

      Thank you for your valid comment. We tracked sperm movements for at least 10 minutes and such entangled spermatozoa in the UTJ never became re-active. As you can see in the new Movie S9b, entangled spermatozoa were also acrosome re-acted (green acrosome head is gone) while active spermatozoa are responding to peristaltic movement by exhibiting movements within the same video. However, as you pointed out, we did not measure their viability with appropriate dyes. Although we also considered about extracting these spermatozoa and performing viability tests, we could not come up with a way to specifically extract the exact spermatozoa that were imaged. Considering your comments, we changed the term damaged or dead to inactive in the revised manuscript (LL 313-316, Legend Figure 6D. LL 380-384).

      Movie S10 - both slow- and fast-moving sperm are seen throughout the course of the movie, which does not support the authors' conclusion that sperm tails beat faster over time. 

      There must have been a misunderstanding. We did not indicate that sperm beating got faster over time anywhere in the main manuscript, including the figure legend and related movie captions. As correctly pointed out, the sperm beating speed changes over time (not getting faster over time) and shows a correlation with internal fluid flow and width of luminal space (LL 320-332). Please let us know if you meant something else. 

      Reviewer #2 (Public Review): 

      Summary: 

      The specific objective of this study was to determine the role of the large apical hook on the head of mouse sperm (Mus musculus) in sperm migration through the female reproductive tract. The authors used a custom-built two-photon microscope system to obtain digital videos of sperm moving within the female reproductive tract. They used sperm from genetically modified male mice that produce fluorescence in the sperm head and flagellar midpiece to enable visualization of sperm moving within the tract. Based on various observations, the authors concluded that the hook serves to facilitate sperm migration by hooking sperm onto the lining of the female reproductive tract, rather than by hooking sperm together to form a sperm train that would move them more quickly through the tract. The images and videos are excellent and inspirational to researchers in the field of mammalian sperm migration, but interpretations of the behaviors are highly speculative and not supported by controlled experimentation. 

      Thank you for your critical review and valuable comments on our manuscript. As pointed out, some of our findings and suggestions were largely observation based. However, to the best of our knowledge, many of our observations are novel, particularly in the context of live imaging inside the female uterus and reproductive tract. We believe these observations open doors to many questions and follow up studies that can be envisioned based on our findings, which is what drives science forward. 

      That being said, we entirely agree that many follow up experiments need to be designed and performed, especially to validate the exact molecular mechanisms of the observed dynamics. We acknowledge that it is unfortunate we currently lack the proper molecular experimental toolsets to perform further tests. We have removed much of the hypothetical discussions from the results section and moved them to the discussion section. We hope that our revision more clearly defines the observed experimental data and our interpretations.

      Strengths: 

      The microscope system developed by the authors could be of interest to others investigating sperm migration. 

      The new behaviors shown in the images and videos could be of interest to others in the field, in terms of stimulating the development of new hypotheses to investigate. 

      Weaknesses: 

      The authors stated several hypotheses about the functions of the sperm behaviors they saw, but the hypotheses were not clearly stated or tested experimentally. 

      The hypothesis statements were weakened by the use of hedge words, such as "may". 

      We appreciate your helpful comments and have revised our hypotheses and suggestions accordingly. We have removed instances of “may” or revised it to be more direct. We have also moved most of our interpretations and hypotheses from the results to the discussion section. 

      It is important to note that experimental approaches to test what we suggested from our findings in the current ex-vivo observation platform are not trivial and require extensive investigation of several unknown factors of the female reproductive tract. For instance, obtaining detailed information on the chemical characteristics and fluid dynamics in the female reproductive tract is essential to build a microfluidic channel that accurately resembles the uterus and oviduct, replicating what we found in an extracted living entire organ. This poses a significant challenge and requires collaborative expertise from many labs, which we hope to build in the near future. 

      Furthermore, our biggest concern is that, even if we were to construct the appropriate microfluidic channel to test sperm migration, it is very likely that the sperm behaviours that we observed under natural conditions may not be replicated in artificial environments. This raises questions about whether in-silico or in-vitro findings can truly resemble what we reported here using the ex-vivo observation inside a living organ.

      To share our experience related to this difficulty, at the initial stage of our study, we attempted sperm injection combined with fluorescent beads to visualize the fluid flow, as well as dyeing the female reproductive tract and spermatozoa after mating. However, none of these resulted in meaningful results. Another potential approach to perform similar research regarding our claims is using genetical engineering to indirectly confirm the influence of the sperm hook morphology on sperm behaviour. However, such an approach lacks a mechanical demonstration about how the sperm hook interacts with the female reproductive tract. 

      It is unfortunate that the sperm behaviours that we found and reported here are considered as highly speculative. The main findings of our work lie in the newly observed dynamic behaviours of mouse sperm interacting with the female reproductive tract epithelium. Specifically, these behaviours include tapping and associated guided movement along the uterus wall, anchoring and related resistance to internal fluid flow and migration through the utero-tubal junction, and self-organized behaviour while clinging onto the colliculus tubarius. 

      We have extensively revised the manuscript structure to clarify our findings and integrated our points in the introduction. Although we understand our following hypotheses may be considered speculative and the causative relationship between the sperm hook and its role in sperm migration requires further experimental approaches, we believe that the image-based observation of dynamic behaviours of spermatozoa are solid. We believe our findings will facilitate further studies and discussion in the field of studies on postcopulatory sexual selection in rodents.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The manuscript is written for an expert in a fairly small field. I recommend that the authors rewrite the manuscript to make it more accessible to people outside of the field. These suggestions include 

      (1) Provide a diagram of the female reproductive tract in Figure 1. 

      a. Indicate where sperm enter the tract and the location of the oocyte they are trying to reach. 

      b. Label all areas of the uterus that are mentioned in this study and be consistent about the label. 

      (2) All movies should have a diagram of the location of the uterus that is being imaged. 

      Thank you for the great suggestion. We have added a diagram of the female reproductive tract in the revised Figure 1A. In response to your comments 1a and b, we have indicated such information by including eggs in the ampulla and arrows that indicate sperm migration direction. We have also labelled the name of the specific areas that were studied in the manuscript.

      We are unsure how to integrate the diagram in all movies without reframing the videos, which could cause serious corruption of the files. More importantly, we think that adding the same diagram to all movies may complicate the visuals and disrupt indications and subject in the movie. Instead, we have referred to the common diagram (Figure 1A) in each movie caption, specifying where the video was taken. Thank you for the suggestion. With this information, we hope readers can now more easily understand where we made the observations. 

      (3) The major questions in the field need to be better described in the introduction. 

      Thank you for your valuable suggestions and specific comments which have greatly helped improve our manuscript. We have revised our introduction and discussion sections by adding more literature reviews and integrating studies across a wider range of the postcopulatory sexual selection, as per your suggestion (LL 34-57, LL 385-398).

      (4) The major question that the authors are trying to address should be described in the introduction. 

      Thank you for the helpful suggestion. We have clarified in the introduction that our aim was to contribute to the field of postcopulatory sexual selection in rodents by advancing methodological progress and to stimulate discussion and future research on the function of the sperm hook in murine rodents (LL 76-94) based on our observations.

      (5) A discussion of the sperm hook should be provided. How many species have this structure (or similar structure)? 

      We have integrated your point into the revised discussion section. Essentially, most murine rodent species have sperm hooks (while their exact shapes differ). However, as there are over 500 species and not all of them have been tested, we do not know exactly how many of them have this structure. Therefore, we included paper references that examined species variations in sperm hook characteristics and their possible correlation with sperm competition (LL 385417) in the discussion. Additionally, we also included papers by Breed (2004) and by Roldan et al (1992) that investigated murine rodents with a sperm hook in the introduction section as well (LL 58-61).  

      (6) The figure legends must describe everything in the figure or movie. 

      Thank you for the helpful suggestion. We previously thought that our figure legends may be too long. We have included further information in the figure legends and movie captions. We have also revised the movies by adding some clips following our revision (Movie S1).

      Reviewer #2 (Recommendations For The Authors): 

      Here are some specific concerns I had about the clarity of approach to experiments and interpretations of results. 

      In the Introduction, the authors stated that the study was intended to determine the function of the hooks on the mouse sperm heads. However, in the Results section, the authors did not explain the rationale for the first set of experiments with respect to the overall objective of the study. In this experiment, the authors measured the velocities of sperm swimming in the uterus and found that the sperm moved faster when closer to the uterine wall (VCL, VSL). They concluded that migration along the uterine wall "may" be an efficient strategy for reaching the entrance to the uterotubal junction (UTJ) and did not explain how this related to the function of the hooks. 

      Thank you for your critical comment and guidance. We have changed the order of Figure 1 and Figure 2 and revised the result section to integrate your points. At the initial stage of the study, we expected to find evidence of the function of sperm trains in aiding sperm migration in the female uterus (which has not been observed in the live uterus; previous works were done invitro with extracted sperm from epididymis or uterus after mating). However, what we found was something unexpected: dynamic sperm hook related movements facilitating sperm migration inside the female uterus by playing a mechanical role in sperm interaction with the uterine wall. These results that were presented in the previous Figure 2 has been reorganized as the new Figure 1.

      Based on this observation, our research later moved to clarify whether such sperm-epithelium interaction indeed helps sperm migration. This led us to measure sperm kinetics in relation to their distance and angle to the uterine wall. We have revised our introduction and result parts by integrating these points. We hope that our revision will answer your questions. We have also reduced the use of ‘may’ or ‘can’ in the results section. In the revised manuscript, we have moved such hypotheses to the discussion section and focused on what we observed in the results section.

      The authors proposed that the sperm hook "may" play a crucial role in determining the direction of migration. When sperm encountered a uterine wall, significantly more changed migration direction toward the pro-hook direction than toward the anti-hook direction. In Figure 2B, sperm behavior is not visually understandable nor clearly explained. 

      Thank you for the helpful comments. We have removed “may” and “might” to make our claim clearer and more concise. We have also revised the previous Figure 2B by combining it with the previous Figure 2C (they have been combined into Figure 1C now). We have also revised Figure 1B by increasing the line thickness of the sperm trajectory of the pro-wall-hook direction and added the anti-wall-hook trajectory. We hope that these revisions make the figure easier to understand.

      In Figure 2E, are the authors showing that the tip of the hook is caught between two epithelial cells? Please clarify the meaning of this figure. 

      Please clarify the difference between "tapping" and "anchoring". 

      Thank you for the detailed comments. As you pointed out, we currently have no evidence whether sperm can be caught in epithelia inter-cellular gaps. We have revised this source of confusion by removing the gap in the revised figure (Figure 1E). We have also included the definition of anchoring (LL 142-143) and tapping (LL 128-130). Anchoring facilitates the attachment of sperm to the uterine epithelia. Such anchoring also involves the catching of the sperm head in the inter-mucosal fold or gap, particularly at the entrance of the intramural UTJ at the end of the uterus. Tapping is the interaction between the head hook and epithelia in which the sperm hook is tapping (or patting) on the surface. Sperm tapping can be a byproduct that results from flagella beating when spermatozoa migrate toward the pro-wall-hook direction along the uterine wall (epithelia) or can play some role in sperm migration. As we currently cannot draw a conclusion, we did not integrate the possible function of the tapping in the manuscript.

      The authors proposed that opposite sliding of neighboring mucosal folds lining the UTJ would cause small openings to form, through which only perhaps one sperm at a time could enter and pass through the UTJ into the uterus. This hypothesis was not actually tested. 

      Imaging inside deep tissue is challenging due to light scattering as it penetrates through biological tissue. While this is also true for the uterus, the intramural UTJ is especially difficult to image because the UTJ consists of several thick muscle and cell layers (see Movie S5A). Another challenge is that the peristaltic movement of the UTJ results in constant movement, making continuous tracking of single sperms while passing through the entirety of the UTJ impossible in our current experiments. We have moved this hypothesis to the discussion section and restated that this is a pure hypothetical model (LL 399-406). We hope that our model encourages the community in designing or establishing an improved ex-vivo observation system that may be able to test this hypothetical model in the near future.

      Next, the authors hypothesized that sperm that encounter the small openings in the UTJ may then be guided onward and the hooks could prevent backward slipping. This was also not tested. 

      As you’ve noted, the function of the sperm hook that aids in sliding and preventing backward slipping could not be tested directly in our ex-vivo observation platform that relies on natural movement of the living organ. However, we believe that these limitations also highlight the importance of continued research and the development of more advanced methodologies in this field.

      We would also like to note that we provide direct observations of spermatozoa resisting internal flow due to reproductive tract contractions in Movie S3A, B as well as Movie S5B. We referred to these movies and pointed out the role of anchoring (sperm attachment) in preventing sperm from being squeezing out (LL 140-149, LL 224-241). Unfortunately, we cannot conceive of how this behaviour can be tested additionally in any uterus-resembling microfluidic device or ex-vivo systems. In line with your suggestion, we have rewritten the related result section and moved our related discussions in the result part to the discussion section (LL 224-241, LL 399-417). 

      The authors observed that large numbers of uterine sperm are attached to the entrance of the UTJ. Some sperm clustered and synchronized their flagellar beating. The authors speculated that this behavior served to push sperm in clusters onward through the UTJ. 

      We would like to note that we did not speculate that sperm clustering and their synchronization could serve to push spermatozoa in a cluster to move onward through the UTJ. We only pointed out our observation in recorded videos, that generative flow from the clustered spermatozoa pushed away other spermatozoa as seen in Movie S7 (LL 261-264). Although such sperm cooperation is possible (blocking passage of later sperm), we cannot draw that conclusion from our observation. The possibility you pointed out (pushing sperm onward through the UTJ) was suggested by Qu et al in 2021 [Cooperation-based sperm clusters mediate sperm oviduct entry and fertilization, Protein & Cell] based on their observations on cleared dead reproductive tracts.

      The authors found only a few sperm trains in the uterus, UTJ, and oviduct, so they could not measure sufficient numbers of samples to test whether sperm trains swim faster than single sperm. Without sufficient data, they concluded that the "sperm trains did not move faster than unlinked single spermatozoa." 

      We would like to take this opportunity to clarify our claims. We do not claim that our current experiments can give the final verdict on whether the sperm train hypothesis for faster swimming is correct or not. The phrase “sperm trains did not move faster” was not intended to mean that the sperm train hypothesis is invalid.  We did not draw a conclusion but dryly described the experimental data that we observed (LL 279-286).  We would once again like to emphasize that the main claim of our manuscript is not to rule out the sperm train hypothesis, but to present the various dynamic interactions of the sperm head with the female reproductive tract. To make the statement more balanced, we revised the sentence as “observed sperm trains did not move faster or slower than unlinked single spermatozoa” (LL 281-282).

      The authors hypothesized that the dense sperm clusters at the entrance into the UTJ could prevent the rival's sperm from entering the UTJ (due to plugging entrance and/or creating an outward flow to sweep back the rival's sperm), but they did not test it. 

      We agree that we were not able to test such possible function of the sperm cluster at UTJ entrance. Following your concerns, we revised the result part (LL 256-264) by removing most of our discussions related to the observed phenomena. We also integrated some interpretation rather to the discussion section (LL 421-437) and suggested that future works using appropriate microfluidic channel designs or sequential double mating experiments may be performed for additional tests (LL 443-447). However, we would like to point out that Movie S7C clearly shows surrounding sperms that are swept away from the sperm clusters. Since the sperm density is high, this is almost equivalent to a particle image velocimetry experiment, and we can clearly see the effect of the outward flow generated by the sperm clusters.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Weakness#1: The authors claim to have identified drivers that label single DANs in Figure 1, but their confocal images in Figure S1 suggest that many of those drivers label additional neurons in the larval brain. It is also not clear why only some of the 57 drivers are displayed in Figure S1.

      As described in the Results section, we screened 57 GAL4 driver lines based on previous reports. These included drivers that had been shown to label a single dopaminergic neuron (DAN) or a small subset of DANs in the larval or adult brain hemisphere, suggesting potential for specific DAN labeling in larvae.

      In Figure 1, TH-GAL4 was used to cover all neurons in the DL1 cluster, while R58E02 and R30G08 were well known drivers for pPAM. Fly strains in Figure 1h, k, l, and m were reported as single DAN strains in larvae[1], while strains in Figure 1e, f, g were reported identifying only several DANs in adult brains[2,3]. We examined these strains and only some of them labeled single DANs in 3rd instar larval brain hemisphere (Figure 1f, g, h, l and m). Among them, only strains in Figure 1f and h labeled single DAN in the brain hemisphere, without labeling other non-DANs. Other strains labeled non-DANs in addition to single DANs (Figure 1g, l and m). Taking ventral nerve cord (VNC) into consideration, strain in Figure 1h also labeled neurons in VNC (Figure S1e), while strain in Figure 1f did not (Figure S1c).

      In summary, the driver shown in Figure 1f (R76F02AD;R55C10DBD, labeling DAN-c1) is the only line we identified that labels a single DAN in the 3rd instar larval brain hemisphere without additional labeling. The other lines shown in Figure 1 (g, h, l, m) label a single DAN but also include some non-DANs. Figure 1 focuses on strains that label a single or a pair of DANs.

      Labeling patterns for all 57 driver lines are summarized in Table 1. Figure S1 includes representative examples; full confocal images for all screened strains are available upon request, as stated in the figure legend.

      Weakness #2: Critically, R76F02-AD; R55C10-DBD labels more than one neuron per hemisphere in Figure S1c, and the authors cite Xie et al. (2018) to note that this driver labels two DANs in adult brains. Therefore, the authors cannot argue that the experiments throughout their paper using this driver exclusively target DAN-c1.

      Figure S1c shows a single dopaminergic (DA) neuron in each brain hemisphere. While additional GFP-positive signals were occasionally observed, they did not originate from the cell bodies of DA neurons, as these were not labeled by the tyrosine hydroxylase (TH) antibody. These additional GFP signals primarily appeared to be neurites, including axonal terminals, although we cannot rule out the possibility that some represent false-positive signals or weakly stained non-neuronal cell bodies. This interpretation is based on the analysis of 22 third-instar larval brains.

      To clarify this point in the manuscript, we added the following sentence to the Results section: “Based on the analysis of 22 brain samples, we observed this driver strain labels one neuron per hemisphere in the third-instar larval brain (Figure 2a–d, Figure S1c, Table S3).” Additionally, Table S3 was included to summarize the DAN-c1 labeling pattern across all 22 samples. An enlarged inset highlighting GFP-positive signals was also added to Figure S1c.

      Weakness #3: Missing from the screen of 57 drivers is the driver MB320C, which typically labels only PPL1-γ1pedc in the adult and should label DAN-c1 in the larva. If MB320C labels DAN-c1 exclusively in the larva, then the authors should repeat their key experiments with MB320C to provide more evidence for DAN-c1 involvement specifically.

      We thank the reviewer for this insightful suggestion. The MB320C driver primarily labels the PPL1-γ1pedc neuron in the adult brain, along with one or two additional weakly labeled cells. It would indeed be interesting to examine the expression pattern of this driver in third-instar larval brains. If it is found to label only DAN-c1 at this stage, we could consider using it to knock down D2R and assess whether this recapitulates our current findings.

      While we agree that this is a promising direction for future studies, we believe it is not essential for the current manuscript, given the specificity of the DAN-c1 driver (please see our response to Reviewer #3 for details). Nonetheless, we appreciate the reviewer’s suggestion, and we recognize that MB320C could be a valuable tool for future experiments.

      Weakness #4: The authors claim that the SS02160 driver used by Eschbach et al. (2020) labels other neurons in addition to DAN-c1. Could the authors use confocal imaging to show how many other neurons SS02160 labels? Given that both Eschbach et al. and Weber et al. (2023) found no evidence that DAN-c1 plays a role in larval aversive learning, it would be informative to see how SS02160 expression compares with the driver the authors use to label DAN-c1.

      We did not have our own images showing DANs in brains of SS02160 driver cross line. However, Extended Data Figure 1 in the paper of Eschbach et al. shows strongly labeled four neurons on each brain hemisphere[4], indicating that this driver is not a strain only labeling one neuron, DAN-c1.

      Weakness #5: The claim that DAN-c1 is both necessary and sufficient in larval aversive learning should be reworded. Such a claim would logically exclude any other neuron or even the training stimuli from being involved in aversive learning (see Yoshihara and Yoshihara (2018) for a detailed discussion of the logic), which is presumably not what the authors intended because they describe the possible roles of other DANs during aversive learning in the discussion.

      We agree with the reviewer that the terms “necessary” and “sufficient” may be too exclusive and could unintentionally exclude contributions from other neurons. As noted in the Discussion section, we acknowledge that additional dopaminergic neurons may also play roles in larval aversive learning. To reflect this, we have revised our wording to use “important” and “mediates” instead of the more definitive terms “necessary” and “sufficient,” making our conclusions more accurate and appropriately measured.

      Weakness #6: Moreover, if DAN-c1 artificial activation conveyed an aversive teaching signal irrespective of the gustatory stimulus, then it should not impair aversive learning after quinine training (Figure 2k). While the authors interpret Figure 2k (and Figure 5) to indicate that artificial activation causes excessive DAN-c1 dopamine release, an alternative explanation is that artificial activation compromises aversive learning by overriding DAN-c1 activity that could be evoked by quinine.

      This is an excellent point, and we agree that we cannot rule out the possibility that artificial activation interferes with aversive learning by overriding the natural activity of DAN-c1 that would normally be evoked by quinine. The observed results with TRPA1 could potentially be attributed to dopamine depletion, inactivation due to prolonged depolarization, or neural adaptation. However, we believe that our hypothesis - that over-excitation of DAN-c1 impairs learning - is more consistent with our experimental findings and with previously published data. Our rationale is as follows: (1) Associative learning in larvae occurs only when the conditioned stimulus (CS, e.g., an odor such as pentyl acetate) and unconditioned stimulus (US, e.g., quinine) are paired. In wild-type larvae, the CS depolarizes a subset of Kenyon cells in the mushroom body (MB), while the US induces dopamine (DA) release from DAN-c1 into the lower peduncle (LP) compartment (Figure 7a). When both stimuli coincide, calcium influx from CS activation and Gαs signaling via D1-type dopamine receptors activate the MB-specific adenylyl cyclase, rutabaga, which functions as a coincidence detector (Figure 7d). (2) Rutabaga converts ATP to cAMP, activating the PKA signaling pathway and modifying synaptic strength between Kenyon cells and mushroom body output neurons (MBONs) (Figure 7d). These changes in synaptic strength underlie learned behavioral responses to future presentations of the same odor. (3) Our results show that D2R is expressed in DAN-c1, and that D2R knockdown impairs aversive learning. Since D2Rs typically inhibit neuronal excitability and reduce cAMP levels[5], we hypothesize that D2R acts as an autoreceptor in DAN-c1 to restrict DA release. When D2R is knocked down, this inhibition is lifted, leading to increased DA release in response to the US (quinine). The resulting excess DA, in combination with CS-induced calcium influx, would elevate cAMP levels in Kenyon cells excessively - disrupting normal learning processes (Figure 7b). This is supported by studies showing that dunce mutants, which have elevated cAMP levels, also exhibit aversive learning deficits[6]. (4) The TRPA1 activation results are consistent with our over-excitation model. When DAN-c1 was artificially activated at 34°C in the distilled water group, this mimicked the natural activation by quinine, producing an aversive learning response toward the odor (Figure 2k or new Figure 2i, DW group). Similarly, in the sucrose group, artificial activation mimicked quinine, producing a learning response that reflected both appetitive and aversive conditioning (Figure 2k, SUC group). (5) Over-excitation impairs learning in the quinine group. When DAN-c1 was activated during quinine exposure, both artificial and natural activation combined to produce excessive DA release. This over-excitation likely disrupted the cAMP balance in Kenyon cells, impairing learning and resulting in failure of aversive memory formation (Figure 2k, QUI group). This phenotype closely mirrors the effect of D2R knockdown in DAN-c1. (6) Optogenetic activation of DAN-c1 during aversive training similarly produced elevated DA levels due to both natural and artificial stimulation. This again would result in MBN over-excitation and a corresponding learning deficit. When optogenetic activation occurred during non-training phases (resting or testing), no additional DA was released during training, and aversive learning remained intact (Figure 5b). (7) Notably, when optogenetic activation was applied during training, we observed no aversive learning in the distilled water group and no reduction in the sucrose group (Figure 5c, 5d). We interpret this as evidence that the optogenetic stimulation was strong enough to cause elevated DA release in both groups, impairing learning in a manner similar to D2R knockdown or TRPA1 overactivation. (8) We extended this over-excitation framework to directly activate Kenyon cells (MBNs). Since MBNs are involved in both appetitive and aversive learning, their over-excitation disrupted both types of learning (Figure 6), further supporting our hypothesis. In summary, we propose that DAN-c1 activity is tightly regulated by D2R autoreceptors to ensure appropriate levels of dopamine release during aversive learning. Disruption of this regulation - either through D2R knockdown or artificial overactivation of DAN-c1 - results in excessive DA release, over-excitation of Kenyon cells, and impaired learning. This over-excitation model is consistent with both our experimental results and prior literature.

      Weakness #7: The authors should not necessarily expect that D2R enhancer driver strains would reflect D2R endogenous expression, since it is known that TH-GAL4 does not label p(PAM) dopaminergic neurons.

      Just like the example of TH-GAL4, it is possible that the D2R driver strains may partially reflect the expression pattern of endogenous D2R in larval brains. When we crossed the D2R driver strains with the GFP-tagged D2R strain, however, we observed co-localization in DM1 and DL2b dopaminergic neurons, as well as in mushroom body neurons (Figure S3c to h). In addition, D2R knockdown with D2R-miR directly supported that the GFP-tagged D2R strain reflected the expression pattern of endogenous D2R (Figure 4b to d, signals were reduced in DM1). In summary, we think the D2R driver strains supported the expression pattern we observed from the GFP-tagged D2R strain, especially in DM1 DANs.

      Weakness #8: Their observations of GFP-tagged D2R expression could be strengthened with an anti-D2R antibody such as that used by Lam et al., (1999) or Love et al., (2023).

      Love et al. (2023) used the antibody originally described by Draper et al.[6]. We attempted to use the same antibody in our experiments; however, we were unable to detect clear signals following staining. This may be due to a lack of specificity for neurons in the Drosophila larval brain or incompatibility with our staining protocol. Unfortunately, we were unable to locate a copy of the Lam (1999) paper for further reference.

      Weakness #9: Finally, the authors could consider the possibility other DANs may also mediate aversive learning via D2R. Knockdown of D2R in DAN-g1 appears to cause a defect in aversive quinine learning compared with its genetic control (Figure S4e). It is unclear why the same genetic control has unexpectedly poor aversive quinine learning after training with propionic acid (Figure S5a). The authors could comment on why RNAi knockdown of D2R in DAN-g1 does not similarly impair aversive quinine learning (Figure S5b).

      We re-analyzed the data related to DAN-g1. Interestingly, knockdown of D2R in DAN-g1 larvae trained with quinine (QUI) showed a significant difference in response index (R.I.) compared to the distilled water (DW) control group. However, it also differed significantly from the DAN-g1 genetic control group trained with QUI (two-way ANOVA with Tukey’s multiple comparisons, p = 0.0002), while it was not significantly different from the UAS-D2R-miR genetic control group (p = 0.2724). Furthermore, knockdown of D2R in DAN-g1 did not lead to aversive learning deficits when larvae were trained with a different odorant, propionic acid (ProA; Figure S5a). Similarly, using an RNAi line to knock down D2R in DAN-g1 did not result in learning impairment when larvae were trained with pentyl acetate (PA; Figure S5b). These inconsistencies may stem from differences in stimulus intensity across odorants, as well as the variable efficiency of the knockdown strategies (microRNA vs. RNAi). Based on these results, we propose that D2Rs in DAN-g1 may modulate larval aversive learning in a quantitative manner but do not play as critical a role as those in DAN-c1, where knockdown produces a clear qualitative effect. We have added this paragraph to the Discussion section of the manuscript.

      Reviewer #2 (Public review):

      Weakness#1: Is not completely clear how the system DAN-c1, MB neurons and Behavioral performance work. We can be quite sure that DAN-c1;Shits1 were reducing dopamine release and impairing aversive memory (Figure 2h). Similarly, DAN-c1;ChR2 were increasing dopamine release and also impaired aversive memory (Figure 5b). However, is not clear what is happening with DAN-c1;TrpA1 (Figure 2K). In this case the thermos-induction appears to impair the behavioral performance of all three conditions (QUI, DW and SUC) and the behavior is quite distinct from the increase and decrease of dopamine tone (Figure 2h and 5b).

      The study successfully examined the role of D2R in DAN-c1 and MB neurons in olfactory conditioning. The conclusions are well supported by the data, with the exception of the claim that dopamine release from DAN-c1 is sufficient for aversive learning in the absence of unconditional stimulus (Figure 2K). Alternatively, the authors need to provide a better explanation of this point.

      Please refer to our response to Weakness #6 of Reviewer #1 above.

      Reviewer #3 (Public review):

      Weakness #1: It is a strength of the paper that it analyses the function of dopamine neurons (DANs) at the level of single, identified neurons, and uses tools to address specific dopamine receptors (DopRs), exploiting the unique experimental possibilities available in larval Drosophila as a model system. Indeed, the result of their screening for transgenic drivers covering single or small groups of DANs and their histological characterization provides the community with a very valuable resource. In particular the transgenic driver to cover the DANc1 neuron might turn out useful. However, I wonder in which fraction of the preparations an expression pattern as in Figure 1f/ S1c is observed, and how many preparations the authors have analyzed. Also, given the function of DANs throughout the body, in addition to the expression pattern in the mushroom body region (Figure 1f) and in the central nervous system (Figure S1c) maybe attempts can be made to assess expression from this driver throughout the larval body (same for Dop2R distribution).

      We thank the reviewer for the positive comments and thoughtful suggestions.

      Regarding the R76F02AD; R55C10DBD strain, we examined 22 third instar larval brains expressing GFP, Syt-GFP, or Den-mCherry. All brains clearly labeled DAN-c1. In approximately half of the samples, only DAN-c1 was labeled. In the remaining samples, 1 to 5 additional weakly labeled soma were observed, typically without associated neurites. Only 1 or 2 strongly labeled non-DAN-c1 cells were occasionally detected. These additional labeled neurons were rarely dopaminergic. In the ventral nerve cord (VNC), 8 out of 12 samples showed no labeled cells. The remaining 4 samples had 2–4 strongly labeled cells. These results support our conclusion that the R76F02AD; R55C10DBD combination predominantly and specifically labels DAN-c1 in the third instar larval brain. As for the reviewer’s question about the expression pattern of R76F02AD; R55C10DBD and D2R in the larval body, we agree that this is a very interesting avenue for further investigation. However, our current study is focused on the central nervous system and larval learning behaviors. We hope to explore this question more fully in future work.

      We added the following sentence to the Results section: “Based on analysis of 22 brain samples, we believe this driver strain consistently labels one neuron per hemisphere in the third-instar larval brain (Figure 2a - d, Figure S1c, Table S3).” In addition, we included Table S3 to summarize the DAN-c1 labeling patterns observed across these samples.

      Weakness #2: A first major weakness is that the main conclusion of the paper, which pertains to associative memory (last sentence of the abstract, and throughout the manuscript), is not justified by their evidence. Why so? Consider the paradigm in Figure 2g, and the data in Figure 2h (22 degrees, the control condition), where the assay and the experimental rationale used throughout the manuscript are introduced. Different groups of larvae are exposed, for 30min, to an odour paired with either i) quinine solution (red bar), ii) distilled water (yellow bar), or iii) sucrose solution (blue bar); in all cases this is followed by a choice test for the odour on one side and a distilled-water blank on the other side of a testing Petri dish. The authors observe that odour preference is low after odour-quinine pairing, intermediate after odour-water pairing and high after odour-sucrose pairing. The differences in odour preference relative to the odour-water case are interpreted as reflecting odour-quinine aversive associations and odour-sucrose appetitive associations, respectively. However, these differences could just as well reflect non-associative effects of the 30-min quinine or sucrose exposure per se (for a classical discussion of such types of issues see Rescorla 1988, Annu Rev Neurosci, or regarding Drosophila Tully 1988, Behav Genetics, or with some reference to the original paper by Honjo & Furukubo-Tokunaga 2005, J Neurosci that the authors reference, also Gerber & Stocker 2007, Chem Sens).

      As it stands, therefore, the current 3-group type of comparison does not allow conclusions about associative learning.

      We adopted the single-odor larval learning paradigm from Honjo et al., who first developed and validated this method for studying larval olfactory associative learning7,8. To address the reviewer’s concern regarding potential non-associative effects from 30-minute exposure to quinine or sucrose, we refer to multiple lines of evidence provided in Honjo’s studies: (1) Honjo et al. demonstrated that only larvae receiving paired presentations of odor and unconditioned stimulus (quinine or sucrose) exhibited learned responses. Exposure to either stimulus alone, or temporally dissociated presentations, failed to induce any learning response. (2) When tested with a second, non-trained odorant, larvae only responded to the odorant previously paired with the unconditioned stimulus. This rules out generalized olfactory suppression and confirms odor-specific associative learning. (3) Well-characterized learning mutants (e.g., rutabaga, dunce) that show deficits in adult reciprocal odor learning also failed to exhibit learned responses in this single-odor paradigm, further supporting its validity. (4) In our study, we used two distinct odorants (pentyl acetate and propionic acid) and two independent D2R knockdown approaches (UAS-miR and UAS-RNAi). We consistently observed that D2R knockdown in DAN-c1 impaired aversive learning. Importantly, naïve olfactory, gustatory, and locomotor assays ruled out general sensory or motor defects. Comparisons with control groups (odor paired with distilled water) also ruled out non-associative effects such as habituation. Taken together, these results strongly support that the single-odor paradigm is a robust and reliable assay for assessing larval olfactory associative learning in Drosophila. We have added a section in the Discussion to clarify and defend the use of this paradigm in our study.

      Weakness #3: A second major weakness is apparent when considering the sketch in Figure 2g and the equation defining the response index (R.I.) (line 480). The point is that the larvae that are located in the middle zone are not included in the denominator. This can inflate scores and is not appropriate. That is, suppose from a group of 30 animals (line 471) only 1 chooses the odor side and 29, bedazzled after 30-min quinine or sucrose exposure or otherwise confused by a given opto- or thermogenetic treatment, stay in the middle zone... a P.I. of 1.0 would result.

      We gave 5 min during the testing stage to allow the larvae to wander on the testing plate. Under most conditions, more than half of larvae (>50%) will explore around, and the rest may stay in the middle zone (will not be calculated). We used 25-50 larvae in each learning assay, so finally around 10-30 larvae will locate in two semicircular areas. Indeed, based on our raw data, a R.I. of 1 seldom appears. Most of the R.I.s fall into a region from -0.2 to 0.8. We should admit that the calculation equation of R. I. is not linear, so it would be sharper (change steeply) when it approaches -1 and 1. However, as most of the values fall into the region from -0.2 to 0.8, we think ‘border effects’ can be neglected if we have enough numbers of larvae in the calculation (10-30).

      Weakness #4: Unless experimentally demonstrated, claims that the thermogenetic effector shibire/ts reduces dopamine release from DANs are questionable. This is because firstly, there might be shibire/ts-insensitive ways of dopamine release, and secondly because shibire/ts may affect co-transmitter release from DANs.

      Shibire<sup>ts1</sup> gene encodes a thermosensitive mutant of dynamin, expressing this mutant version in target neurons will block neurotransmitter release at the ambient temperature higher than 30C, as it represses vesicle recycling[7]. It is a widely used tool to examine whether the target neuron is involved in a specific physiological function. We cannot rule out that there might be Shibire<sup>ts1</sup> insensitive ways of dopamine release exist. However, blocking dopamine release from DAN-c1 with Shibire<sup>ts1</sup> has already led to learning responses changing (Figure 2h). This result indicated that the dopamine release from DAN-c1 during training is important for larval aversive learning, which has already supported our hypothesis.

      For the second question about the potential co-transmitter release, we think it is a great question. Recently Yamazaki et al. reported co-neurotransmitters in dopaminergic system modulate adult olfactory memories in Drosophila[9], and we cannot rule out the roles of co-released neurotransmitters/neuropeptides in larval learning. Ideally, if we could observe the real time changes of dopamine release from DAN-c1 in wild type and TH knockdown larvae would answer this question. However, live imaging of dopamine release from one dopaminergic neuron is not practical for us at this time. On the other hand, the roles of dopamine receptors in olfactory associative learning support that dopamine is important for Drosophila learning. D1 receptor, dDA1, has been proven to be involved in both adult and larval appetitive and aversive learning[10,11]. In our work, D2R in the mushroom body showed important roles in both larval appetitive and aversive learning (Figure 6a). All this evidence reveals the importance of dopamine in Drosophila olfactory associative learning. In addition, there is too much unknow information about the co-release neurotransmitter/neuropeptides, as well as their potential complex ‘interaction/crosstalk’ relations. We believe that investigation of co-released neurotransmitter/neuropeptides is beyond the scope of this study at this time.

      Weakness #5: It is not clear whether the genetic controls when using the Gal4/ UAS system are the homozygous, parental strains (XY-Gal4/ XY-Gal4 and UAS-effector/ UAS-effector), or as is standard in the field the heterozygous driver (XY-Gal4/ wildtype) and effector controls (UAS-effector/ wildtype) (in some cases effector controls appear to be missing, e.g. Figure 4d, Figure S4e, Figure S5c).

      Almost all controls we used were homozygous parental strains. They did not show abnormal behaviors in either learnings or naïve sensory or locomotion assays. The only exception is the control for DAN-c1, the larvae from homozygous R76F02AD; R55C10DBD strain showed much reduced locomotion speed (Figure S6). To prevent this reduced locomotion speed affecting the learning ability, we used heterozygous R76F02AD; R55C10DBD/wildtype as control, which showed normal learning, naïve sensory and locomotion abilities (Figure 4e to i).

      For Figure 4d, it is a column graph to quantify the efficiency of D2R knockdown with miR. Because we need to induce and quantify the knockdown effect in specific DANs (DM1), only TH-GAL4 can be used as the control group, rather than UAS-D2R-miR. For the missing control groups in Figure S4e and S5c, we have shown them in other Figures (Figure 4e).

      We described this in the Materials and Methods part, “All control strains used in learning assays were homozygous (except DAN-c1×WT), while all experimental groups (D2R knockdown and thermogenetics) used were heterozygous by crossing the corresponding control strains”.

      We also re-organized the Figure S4e and S5c along with the control groups to make it easier to understand.

      Weakness #6: As recently suggested by Yamada et al 2024, bioRxiv, high cAMP can lead to synaptic depression (sic). That would call into question the interpretation of low-Dop2R leading to high-cAMP, leading to high-dopamine release, and thus the authors interpretation of the matching effects of low-Dop2R and driving DANs.

      We appreciate the reviewer’s suggestion. We read through this literature, which also addresses the question we mentioned in the Discussion section, about the discrepancy between the cAMP elevation in the mushroom body neurons and the reduced MBN-MBON synaptic plasticity after olfactory associative learning in Drosophila. The author gave an explanation to the existing D1R-cAMP elevation-MBN-MBON LTD axis, which is really helpful to our understanding about the learning mechanism. However, unfortunately, we do not think this offers a possible explanation for our D2R-related mechanisms. We added this literature into our citation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Throughout the behavioral experiments, a defect in aversive learning is defined as a relative increase in the response index (RI) after olfactory training with quinine (red) and a defect in appetitive learning as a relative decrease in RI after training with sucrose (blue). Training with distilled water (yellow) is intended to be a control for comparisons within genotypes/treatment groups but causes interpretation issues if it is also affected by experimental manipulations.

      The authors typically make comparisons between quinine, water, and sucrose within each group, but this often forces readers to infer the key comparisons of interest. For example, the key comparison in Figure 2h is the statistically significant difference between the red groups, which differ only in the temperature used during training. Many other figure panels in the paper would also benefit from more direct statistical comparisons, particularly Figure 2k.

      While I recognize the value of the water control, I strongly recommend that the authors make statistical comparisons directly between genotypes/treatment groups where possible and to interpret results with more caution when the water RI score differs substantially between groups. Also, since the authors are conducting two-way ANOVAs before Dunnett's multiple comparisons tests, they ideally should report the p-value for the main effect of each factor, plus the interaction p-value between the two factors before making multiple comparisons.

      We appreciate the reviewer’s suggestion. In response, we re-analyzed all learning assay data in Figures 2 and 4 using two-way ANOVA followed by Tukey’s multiple comparisons test. Unlike our previous analysis, which only compared each experimental group to its corresponding DW control, we now compared all groups against one another. First, we found that most R.I. values from different temperature conditions (Figure 2) or genotypes (Figure 4) trained with DW were not significantly different, with the exception of the data in Figure 2i (formerly Figure 2k; discussed further below). The R.I. from DAN-c1 × D2R-miR larvae trained with QUI was significantly different from both genotype control groups (DAN-c1 × WT and UAS-D2R-miR), while no significant difference was observed between the two controls trained with QUI. Thus, this more comprehensive statistical approach supports the conclusions we previously reported. Second, as the reviewer noted, the new analysis allows for a more direct interpretation of our findings. For example, in the thermogenetic experiments using the Shibire<sup>ts1</sup> strain, the R.I. of DAN-c1 × UAS-Shibire<sup>ts1</sup> larvae trained with QUI at 34°C was not significantly different from the DW group at 34°C, but was significantly different from the QUI group at 22°C. Both findings support our conclusion that blocking dopamine release from DAN-c1 impairs larval aversive learning (Figure 2f).

      In the dTRPA1 activation experiments, the R.I. of DAN-c1 × UAS-dTRPA1 larvae trained with DW at 34°C was significantly lower than that of the DW group at 22°C and the QUI group at 34°C, but not significantly different from the QUI group at 22°C (Figure 2i). These results indicate that activating DAN-c1 during training is sufficient to drive aversive learning even in the absence of QUI. Interestingly, when DAN-c1 × UAS-dTRPA1 larvae were trained with QUI at 34°C, their R.I. was significantly higher than that of the DW group at 34°C and significantly different from the QUI group at 22°C, but not significantly different from the DW group at 22°C (Figure 2i). We interpret this as evidence that simultaneous activation of DAN-c1 by both QUI and dTRPA1 leads to over-excitation, which in turn impairs aversive learning.

      We have revised the figures (Figures 2, 4, 5, and 6) and updated the corresponding Results sections to reflect this new statistical analysis. Additionally, we now report the p-values for interaction, row factor, and column factor - either in Table S4 (for Figure 2) or in the figure captions for Figures 4, 5, 6, S4, S5, and S7.

      (2) The authors' motivation to find tools that label DANs other than DAN-c1 was unclear until much later in the paper when I saw the screening experiments in Figures S4 and S5. The authors could provide a clearer justification for why they focus on DAN-c1 in Figure 2 rather than another DAN for which they found a specific driver in Figure 1. The motivation for looking at individual pPAM neurons was also unclear.

      We sincerely appreciate the reviewer’s thoughtful suggestion. Our study was initially motivated by the goal of characterizing the expression pattern of D2R in the larval brain. From there, we aimed to identify DAN drivers that label specific pairs of dopaminergic neurons, enabling us to assess the functional role of D2R in distinct DAN subtypes through targeted knockdown experiments. This approach ultimately led us to focus on DAN-c1, as it was the only neuronal population for which D2R knockdown resulted in a learning deficit. We then returned to examine the functional significance of DAN-c1 in aversive learning. While we recognize that a more comprehensive narrative might be desirable, the current structure of our manuscript reflects the most logical progression of our work based on our research priorities and experimental outcomes. We did explore alternative manuscript structures - such as beginning with the D2R expression pattern - but found that the current format best conveys our findings and rtionale.

      Regarding our motivation to study individual PAM neurons: we aimed to identify whether D2R plays a role in a specific pair of pPAM neurons involved in larval appetitive learning. However, we were unable to find a driver that exclusively labels DAN-j1, which we believe to be the key neuron in this context (see Figure 1). As a result, our investigation into appetitive learning did not progress beyond the observation of D2R expression in pPAM neurons (Figure 3d), and we did not proceed with learning assays in this context. While we acknowledge the limitations of our study, we believe that our focus on DAN-c1 is well-justified based on both our findings and the tools currently available. We respectfully note that a major restructuring of the manuscript would not necessarily clarify the rationale for focusing on DAN-c1, and therefore we have maintained the current organization.

      (3) The authors should also double-check and update the expression patterns of the drivers in Table 1 using references such as the FlyLight online resource. For example, MB438B labels PPL1-α'2α2, PPL1-α3, PPL1-γ1pedc according to FlyLight, not just PPL1-γ1pedc as initially reported by Aso and Hattori et al. (2014).

      We appreciate the reviewer’s suggestion. We have double-checked and updated the driver expression patterns in Table 1, using FlyLight data as a reference.

      (4) Interpreting overlaid green-and-red fluorescence confocal images would be difficult for any colorblind readers; I suggest that the authors consider using a more friendly color set.

      We thank the reviewer for the suggestion. In our study, we need three distinct colors to represent different channels. We also tested an alternative color scheme using and cyan , magenta, and yellow (CMY) instead of the standard red, green, and blue (RGB). As a comparison (see below), we used a R76F02AD;R55C10DBD (DAN-c1) GFP-labeled brain as an example. In our evaluation, the RGB combination provided clearer visualization and appeared more natural, while the CMY scheme looked somewhat artificial. Therefore, we decided to retain the original RGB color scheme and did not modify the colors in the figures.

      Author response image 1.

      (5) For Figure 4d, counting each DAN as an individual N would violate the assumption of independence made by the unpaired t test, since multiple DANs are found in each brain and therefore are not independent. Instead, it would be better to count each individual N as the average intensity of the four DANs measured in each brain.

      We revised the analysis of microRNA efficiency by averaging the fluorescence intensity of DANs within each brain, treating each brain as a single sample. Based on this approach, we re-plotted Figure 4d.

      (6) Finally, the authors ought to make it clearer throughout the paper that they have implicated a pair of DAN-c1 neurons in aversive learning, not just a single DAN as currently stated in the title.

      We thank the reviewer for the suggestion about the phrase we are using under this scenario. We have changed all “single neuron” to “a pair of neurons”.

      Reviewer #2 (Recommendations for the authors):

      (1) The results section presents: "Activation of DAN-c1 with dTRPA1 at 34°C during training induced repulsion to PA in the distilled water group (Figure 2k). These data suggested that DAN-c1 excitation and presumably increased dopamine release is sufficient for larval aversive learning in the absence of gustatory pairing."<br /> An alternative interpretation is that 30 min of TrpA activation depletes synaptic vesicle pool, or inactivates neurons because of prolonged depolarization, or DAN shows firing rate adaptation (e.g. see Pulver et al. 2009; doi:10.1152/jn.00071.2009). In such a case DA release would be reduced and not increased. Therefore, the interpretation that DAN-c1 activation is both necessary and sufficient in larval aversive learning is difficult to be sustained.

      In this regard it is important to know how the sensory motor abilities are during a thermos-induction at 34°C during 30 min.

      We thank the reviewer for the thoughtful suggestion. Regarding the concern about potential dopamine depletion or neuronal inactivation, we believe a comparison with the Shibire<sup>ts1</sup> experiments helps clarify the interpretation. Activation of Shibire<sup>ts1</sup> during training with distilled water did not result in aversive learning (Figure 2f), which is a distinct phenotype from that observed with dTRPA1 activation (Figure 2i). This suggests that the phenotypes seen with dTRPA1 activation are not due to reduced dopamine release. Additionally, as the reviewer suggested, we have revised our conclusion to state that “DAN-c1 is important for larval aversive learning,” rather than claiming it is both necessary and sufficient.

      (2) The GRASP system can label the contact of a cell in close proximity like synaptic contacts, but also other situations like no synaptic contact. It would be useful to use a more specific synaptic labelling tool, like the trans-synaptic tracing system (Talay et al., 2017 https://doi.org/10.1016/j.neuron.2017.10.011), which provides a better label of synaptic contact.

      We really appreciate the reviewer’s suggestion. First, we acknowledge that there are four general methods to reveal synaptic connections between neurons: immunohistochemistry (IHC), neuron labeling, viral tracing, GRASP, and electron microscopy (EM). Among these, IHC is not sufficiently convincing, viral tracing is challenging and rarely used in Drosophila, and EM, while the most accurate, is prohibitively expensive for our current goals. For these reasons, we chose the GRASP system to demonstrate the synaptic connections from dopaminergic neurons to the mushroom body. Second, we utilized an activity-dependent version of the GRASP system, linking split-GFP1-10 with synaptic proteins (e.g., synaptobrevin)[12] rather than with cell surface proteins like CD4 or CD8. This version significantly reduces false positive signals compared to the previous version, which was tagged with cell surface proteins. While we admit that this method does not provide as solid evidence of synaptic connections as EM, it is the most efficient method available to us for showing the synaptic connections from dopaminergic neurons to the mushroom body. Finally, we thank the reviewer for suggesting the literature on trans-synaptic tracing methods. Unfortunately, this method is not suitable for our goal, as it labels the entire postsynaptic neuron. In our study, we use GRASP to identify the specific dopaminergic neurons based on the synaptic locations and compartments within the mushroom body lobe. We require a labeling system at the subcellular level because, as noted, DAN-c1 forms synapses specifically in the lower peduncle (LP) of the mushroom body lobe, which is part of the axonal bundles from mushroom body neurons. Using the trans-synaptic tracing method would label the entire mushroom body, making it impossible to distinguish DAN-c1 from other DL1 dopaminergic neurons.

      (3) Previously, Honjo et al (2009) used a petri dish of 8.5 cm and a filter paper for reinforcement of 5.5 cm. In this study the petri dish was 10 cm and the size of the filter paper was not informed. That is important information because it will determine the probability of conditioning.

      A piece of filter paper (0.25cm<sup>2</sup> square) was used to hold odorants in this study. We have added this information to the Materials and Methods.

      (4) Statistic analysis of Behavioral performance of Fig 2H-I was made by ANOVA followed by Dunnett multiple comparisons test. Which was the control group? In each graph 2 independent Dunnett tests were performed against the DW control group?

      We have re-analyzed the data using a two-way ANOVA followed by Tukey’s multiple comparison test, as suggested by Reviewer #1. In Figure 2f-j (previously Figure 2h-l), the DW groups serve as the control groups. In our new analysis, we compared data across all groups using Tukey’s multiple comparison test, with particular focus on comparisons to the corresponding DW control groups.

      (5) The sample size in staining experiments of figures 1-4 were not informed.

      We have added Table S2 in the supplementary materials to provide the N numbers for brain samples used in the figures.

      (6) Color code in Fig 5 is missing, I assumed that is the same as in figure 4e

      We added color code in the figure legend of Figure 5.

      (7) Line 506 "0.1% QH solutions" should be 0.1% QUI solutions

      Changed.

      (8) There is no information on the availability of data

      We added Data Availability Statement: Data will be made available on request.

      Reviewer #3 (Recommendations for the authors):

      (1) Axes of behavioural experiments should better show the full span of possible values (-1;1) to allow a fair assessment.

      We have adjusted the axes in all learning assay graphs to a range from -1 to 1 for consistency and clarity.

      (2) Ns should better be given within the figures.

      We have added Table S2 in the supplementary materials to provide the N numbers for brain samples used in the figures. Additionally, Tables S4 to S6 include the N numbers for the learning assays. While we initially considered including the N numbers within the figure captions, we found it challenging to present this information clearly and efficiently. Therefore, we decided to summarize the N numbers in the tables instead.

      (3) Dot- or box-plots would be better for visualizing the data than means and SEMs.

      We agree with the reviewer’s suggestion. In the behavioral assay graphs, both dot plots and mean ± SEM have been included for better visualization of the data.

      (4) The paper reads as if Dop2R would reduce neuronal activity, rather than "just" cAMP levels. Such a misunderstanding should be avoided.

      We appreciate the reviewer’s comment. Under most conditions, dopamine binding to D2Rs activates the Gαi/o pathway, which inhibits adenylyl cyclase (AC) and reduces cAMP levels. This reduction in cAMP ultimately leads to decreased neuronal activity. In other words, D2R activation typically has an inhibitory effect on neurons. Additionally, D2R can exert inhibitory effects through other signaling pathways, such as the inhibition of voltage-gated associative learning, we continue to emphasize the importance of the D2R-mediated AC-cAMP-PKA signaling pathway. However, we do not rule out the potential involvement of additional signaling pathways, such as inhibition of voltage-gated calcium channels via Gβγ subunits[5]. As noted in the Introduction, dopamine receptors are also involved in other signaling cascades, including PKC, MAPK, and CaMKII pathways. In the context of our study, based on current understanding of molecular signaling in Drosophila olfactory, we still think D2R mediated AC-cAMP-PKA signaling pathway would be the most important one. However, we cannot rule out the involvement of other signaling pathways.

      (5) It would be better if citations were more clearly separated into ones that refer to adult flies versus work on larvae.

      We separated the citations related to adult flies from those working on larvae.

      (6) Line 81-83. DopECR is not found in mammals, is it?

      You are correct. DopECR is not found in mammals. This non-canonical receptor shares structural homology with vertebrate β-adrenergic-like receptors. It can be activated rapidly by dopamine as well as insect ecdysteroids[13,14].

      (7) Line 99: Better "a" learning center (some forms of learning work without mushroom bodies).

      We have revised the text from "the learning center" to "a learning center," as suggested by the reviewer.

      (8) Supplemental figures should be numbered according to the sequence in which they are mentioned in the text.

      We have rearranged the sequence of supplemental figures to match the order in which they are referenced in the text.

      (9) It is striking that dTRPA1-driving DANc1 is punishing in the water condition but that this effect does not summate with quinine punishment (but rather seems to impair it). Maybe you can back this up by ChR- or Chrimson-driving DANc1? Or by silencing DANc1 by GtACR1?

      We appreciate the reviewer’s suggestion. Indeed, we observed similar but not identical results when we used ChR2 to activate DAN-c1 during the training stage (Figure 5b and c). We found that activating DAN-c1 with quinine (QUI) impaired aversive learning (Figure 5b), consistent with our findings using dTRPA1 activation of DAN-c1 when trained in QUI at 34°C (Figure 2i). We propose that the over-excitation of DAN-c1, whether induced by QUI or artificial manipulation (optogenetics and thermogenetics), impairs aversive learning, which aligns with our findings for D2R knockdown (Figure 4e). However, there are some differences between dTRPA1 and ChR2 activation. While dTRPA1 activation induced aversive learning when trained with distilled water (DW) at 34°C (Figure 2i), ChR2 did not induce aversive learning under the same conditions (Figure 5c). We believe this difference is due to the varying activation levels between the two manipulations. Our optogenetic stimulus may have been stronger than the thermogenetic one, potentially leading to over-excitation in the DW group, preventing aversive learning. In the QUI group, the more severe over-excitation impaired aversive learning, producing a phenotype similar to that observed with other over-excitation methods (e.g., thermogenetics or D2R knockdown), where the phenotype reached a maximum level. We have also addressed these points in the Discussion section.

      (10) Unless I got the experimental procedure wrong, isn't it surprising that Figure S7b does not uncover a punishing effect of driving TH-Gals neurons?

      This optogenetic experiment with ChR2 expression in TH-GAL4 neurons was a pioneering attempt to activate DAN-c1 using ChR2. As explained in response to question (9), the failure to observe a punishing effect in the DW group when TH-GAL4 neurons were activated during training may be due to our optogenetic stimulus being too strong. This likely resulted in over-excitation of DAN-c1 (among the neurons labeled by TH-GAL4), impairing aversive learning and preventing the appearance of typical aversive behaviors.

      (11) It seems that Figure1f´ is repeated, in a mirrored manner, in Figure 2e.

      We have removed Figure 2e, as it was deemed redundant and not necessary for this section.

      Reference

      (1) Saumweber, T. et al. Functional architecture of reward learning in mushroom body extrinsic neurons of larval Drosophila. Nat Commun 9, 1104 (2018). https://doi.org/10.1038/s41467-018-03130-1

      (2) Aso, Y. & Rubin, G. M. Dopaminergic neurons write and update memories with cell-type-specific rules. Elife 5 (2016). https://doi.org/10.7554/eLife.16135

      (3) Xie, T. et al. A Genetic Toolkit for Dissecting Dopamine Circuit Function in Drosophila. Cell Rep 23, 652-665 (2018). https://doi.org/10.1016/j.celrep.2018.03.068

      (4) Eschbach, C. et al. Recurrent architecture for adaptive regulation of learning in the insect brain. Nat Neurosci 23, 544-555 (2020). https://doi.org/10.1038/s41593-020-0607-9

      (5) Neve, K. A., Seamans, J. K. & Trantham-Davidson, H. Dopamine receptor signaling. J Recept Signal Transduct Res 24, 165-205 (2004). https://doi.org/10.1081/rrs-200029981

      (6) Draper, I., Kurshan, P. T., McBride, E., Jackson, F. R. & Kopin, A. S. Locomotor activity is regulated by D2-like receptors in Drosophila: an anatomic and functional analysis. Dev Neurobiol 67, 378-393 (2007). https://doi.org/10.1002/dneu.20355

      (7) Honjo, K. & Furukubo-Tokunaga, K. Induction of cAMP response element-binding protein-dependent medium-term memory by appetitive gustatory reinforcement in Drosophila larvae. J Neurosci 25, 7905-7913 (2005). https://doi.org/10.1523/JNEUROSCI.2135-05.2005

      (8) Honjo, K. & Furukubo-Tokunaga, K. Distinctive neuronal networks and biochemical pathways for appetitive and aversive memory in Drosophila larvae. J Neurosci 29, 852-862 (2009). https://doi.org/10.1523/JNEUROSCI.1315-08.2009

      (9) Yamazaki, D., Maeyama, Y. & Tabata, T. Combinatory Actions of Co-transmitters in Dopaminergic Systems Modulate Drosophila Olfactory Memories. J Neurosci 43, 8294-8305 (2023). https://doi.org/10.1523/jneurosci.2152-22.2023

      (10) Selcho, M., Pauls, D., Han, K. A., Stocker, R. F. & Thum, A. S. The role of dopamine in Drosophila larval classical olfactory conditioning. PLoS One 4, e5897 (2009). https://doi.org/10.1371/journal.pone.0005897

      (11) Kim, Y. C., Lee, H. G. & Han, K. A. D1 dopamine receptor dDA1 is required in the mushroom body neurons for aversive and appetitive learning in Drosophila. J Neurosci 27, 7640-7647 (2007). https://doi.org/10.1523/JNEUROSCI.1167-07.2007

      (12) Macpherson, L. J. et al. Dynamic labelling of neural connections in multiple colours by trans-synaptic fluorescence complementation. Nat Commun 6, 10024 (2015). https://doi.org/10.1038/ncomms10024

      (13) Abrieux, A., Duportets, L., Debernard, S., Gadenne, C. & Anton, S. The GPCR membrane receptor, DopEcR, mediates the actions of both dopamine and ecdysone to control sex pheromone perception in an insect. Front Behav Neurosci 8, 312 (2014). https://doi.org/10.3389/fnbeh.2014.00312

      (14) Lark, A., Kitamoto, T. & Martin, J. R. Modulation of neuronal activity in the Drosophila mushroom body by DopEcR, a unique dual receptor for ecdysone and dopamine. Biochim Biophys Acta Mol Cell Res 1864, 1578-1588 (2017). https://doi.org/10.1016/j.bbamcr.2017.05.015

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to first thank the Editor as well as the two reviewers for their enthusiasm and careful evaluation of our manuscript. We also appreciate their thoughtful and constructive comments and suggestions. They did, however, have concerns regarding experimental design, data analysis, and over-interpretation of our findings. We endeavored to address these concerns through refinement of our framing, inclusion of additional new analyses, and rewriting some parts of our discussion section. We hope our response can better explain the rationale of our experimental design and data interpretation. In addition, we also acknowledge the limitations of our present study, so that it will benefit future investigations into this topic. Our detail responses are provided below.

      Reviewer #1 (Public Review)

      This study examines whether the human brain uses a hexagonal grid-like representation to navigate in a non-spatial space constructed by competence and trustworthiness. To test this, the authors asked human participants to learn the levels of competence and trustworthiness for six faces by associating them with specific lengths of bar graphs that indicate their levels in each trait. After learning, participants were asked to extrapolate the location from the partially observed morphing bar graphs. Using fMRI, the authors identified brain areas where activity is modulated by the angles of morphing trajectories in six-fold symmetry. The strength of this paper lies in the question it attempts to address. Specifically, the question of whether and how the human brain uses grid-like representations not only for spatial navigation but also for navigating abstract concepts, such as social space, and guiding everyday decision-making. This question is of emerging importance.

      Thanks very much again for the evaluation and comments. Please find our revision plans to each comment below.

      The weak points of this paper are that its findings are not sufficiently supporting their arguments, and there are several reasons for this:

      (1) Does the grid-like activity reflect 'navigation over the social space' or 'navigation in sensory feature space'? The grid-like representation in this study could simply reflect the transition between stimuli (the length of bar graphs). Participants in this study associated each face with a specific length of two bars, and the 'navigation' was only guided by the morphing of a bar graph image. Moreover, any social cognition was not required to perform the task where they estimate the gridlike activity. To make social decision-making that was conducted separately, we do not know if participants needed to navigate between faces in a social space. Instead, they can recall bar graphs associated with faces and compute the decision values by comparing the length of bars. Notably, in the trust game in this study, competence and trustworthiness are not equally important to make a decision (Equation 1). The expected value is more sensitive to one over the other. This also suggests that the space might not reflect social values but perceptual differences.

      The Reviewer raises an interesting point. We apologize for not being clear enough to address this possibility in our original manuscript and we will improve the clarity in our revision. To address this issue, we would like to break it into two sub-questions and answer them separately: 1) Are participants merely memorizing the values associated with each avatar or do they place the avatars on a two-dimensional map in their internal representation. 2) If so, are the two dimensions of this internal representation social dimensions relating to competence and trust or sensory dimensions relating to bar height (i.e., social space or sensory space).

      For the first question, we hope our analysis of the distance effect on the reaction time in the comparison task can address this issue. Specifically, it came from the idea that distance is a measure of similarity between two avatars in the 2D social space. The closer two avatars are, the more similar they are, hence distinguishing them will be harder and result in longer reaction time. If participants are merely memorizing the avatars as six isolated instances without integrating them into a low-dimensional map, then avatars should be equidistant (as if they were lying on the vertices of a 5-simplex), and would not show a distance effect. Therefore, we interpreted the stronger distance effect as a behavioural index of having a better internal map-like representation. This approach is adopted from the work by Park et al. (2020), where they used the distance effect to demonstrate human brains map abstract relationships among entities from piecemeal learning.

      For the second question of ‘social space’ vs. ‘sensory space’, our study adopted the paradigm developed by, in which they used a similar way to construct a conceptual space and found that such space can be represented with grid-like code in the entorhinal and prefrontal cortex. We stayed close to the original design by Constantinescu et al. (2016) and hoped that our work could provide, to some extent, a close replication of their result but using non-spatial social concepts instead. Indeed, this led to the limitation of our study that participants are passively traversing the artificial space rather than actively navigating in the space to make decisions/inferences. And we did not find sufficient evidence as reported in previous grid-like coding fMRI studies. This may have to do with low signal quality in the medial temporal region, we are not entirely sure. Nevertheless, we don’t think our findings contradict or disprove previous findings in any way. Here we would also like to point to the work by Park et al. (2021). Their task involves making novel inferences in a 2D social hierarchy space and found that grid-like code in the entorhinal cortex and medial prefrontal cortex support such novel inferences. Hence, we argue that results from these studies and partial evidence from our study collectively support the idea that the entorhinal is important for representing abstract knowledge (spatial and non-spatial).

      (2) Does the brain have a common representation of faces in a social space? In this study, participants don't need to have a map-like representation of six faces according to their levels of social traits. Instead, they can remember the values of each trait. The evidence of neural representations of the faces in a 2-dimensional social space is lacking. The authors argued that the relationship between the reaction times and the distances between faces provides evidence of the formation of internal representations. However, this can be found without the internal representation of the relationships between faces. If the authors seek internal representations of the faces in the brain, it would be important to show that this representation is not simply driven by perceptual differences between bar graphs that participants may recall in association with each face.

      Considering these caveats, it is hard for me to agree if the authors provide evidence to support their claims.

      With regard to the common representation of faces, this is a potential limitation of our paradigm because our current task design didn’t include a stage of face presentation to properly test this question. With regard to the asymmetry between the two dimensions in determining expected value. We think that the prerequisite for identifying six-fold grid-like coding is to have an abstract space formed by orthogonal dimensions, i.e., competence and trustworthiness in our task are not correlated. In addition, the scanner task does not require computation of expected value. However, we do think that it is worth investigating whether the extent to which each dimension contributes to decision-making and inference will distort the grid-like representation of the map. Our prediction is that the entorhinal cortex will maintain a representation of the map invariant to this aspect so that it can support inferences in different contexts where different weights may be assigned to different dimensions. But this will be an interesting hypothesis for future studies to test. We hope that our revision plans with above considerations could address the Reviewer’s comments.

      Reviewer #2 (Public Review)

      Summary:

      In this work, Liang et al. investigate whether an abstract social space is neurally represented by a grid-like code. They trained participants to 'navigate' around a two-dimensional space of social agents characterized by the traits of warmth and competence, then measured neural activity as participants imagined navigating through this space. The primary neural analysis consisted of three procedures: 1) identifying brain regions exhibiting the hexagonal modulation characteristic of a grid-like code, 2) estimating the orientation of each region's grid, and 3) testing whether the strength of the univariate neural signal increases when a participant is navigating in a direction aligned with the grid, compared to a direction that is misaligned with the grid.

      From these analyses, the authors find the clearest evidence of a grid-like code in the prefrontal cortex and weaker evidence in the entorhinal cortex.

      Strengths:

      The work demonstrates the existence of a grid-like neural code for a socially-relevant task, providing evidence that such coding schemes may be relevant for a variety of two-dimensional task spaces.

      Thank you very much again for your careful evaluation and thoughtful comments. Please find our response to the comments below.

      Weaknesses:

      In various parts of this manuscript, the authors appear to use a variety of terms to refer to the (ostensibly) same neural regions: prefrontal cortex, frontal pole, ventromedial prefrontal cortex (vmPFC), and orbitofrontal cortex (OFC). It would be useful for the authors to use more consistent terminology to avoid confusing readers.

      Thanks for pointing out the use of terms, we will try to improve that in the revision of our manuscript.

      Claims about a grid code in the entorhinal cortex are not well-supported by the analyses presented. The whole-brain analysis does not suggest that the entorhinal cortex exhibits hexagonal modulation; the strength of the entorhinal BOLD signal does not track the putative alignment of the grid code there; multivariate analyses do not reveal any evidence of a grid-like representational geometry.

      On a conceptual level, it is not entirely clear how this work advances our understanding of gridlike encoding of two-dimensional abstract spaces, or of social cognition. The study design borrows heavily from Constantinescu et al. 2016, which is itself not an inherent weakness, but the Constantinescu et al. study already suggests that grid codes are likely to underlie two-dimensional spaces, no matter how abstract or arbitrary. If there were a hypothesis that there is something unique about how grid codes operate in the social domain, that would help motivate the search for social grid codes specifically, but no such theory is provided. The authors do note that warmth and competence likely have ecological importance as social traits, but other past studies have used slightly different social dimensions without any apparent loss of generality (e.g., Park et al. 2021). There are some (seemingly) exploratory analyses examining how individual difference measures like social anxiety and avoidance might affect the brain and behavior in this study, but a strong theoretical basis for examining these particular measures is lacking.

      We acknowledge that we used very similar dimensions to the work by Park et al. (2021). While Park and colleagues (2021) took a more innovative and rigorous approach, we tried to stay close to the original design by Constantinescu et al. (2016) with the hope that our work could provide, to some extent, a close replication of their result. Our data was collected before the 2021 paper came out and as the comment points out, we did not find as complete and convincing evidence as in these previous grid-like coding fMRI papers. This may be due to low signal quality in the medial temporal region, we are not entirely sure. But we don’t think our current findings can contradict or disprove previous findings in any way.

      I found it difficult to understand the analyses examining whether behavior (i.e., reaction times) and individual difference measures (i.e., social anxiety and avoidance) can be predicted by the hexagonal modulation strength in some region X, conditional on region X having a similar estimated grid alignment with some other region Y. It is possible that I have misunderstood the authors' logic and/or methodology, but I do not feel comfortable commenting on the correctness or implications of this approach given the information provided in the current version of this manuscript.

      We apologize for not being clear enough in the manuscript and we will improve the clarity in our revision. This exploratory analysis aims to examine if there is any correlation between the strength of grid-like representation of social value map and behavioral indicators of map-like representation; and test if there are any correlation between the strength of grid-like representation of this social value map and participants’ social trait. For the behavioral indicator, we used the distance effect in the reaction time of the comparison task outside the scanner. The closer a pair of avatars are, the more similar they are, hence distinguishing them will be harder and results in longer reaction time when making comparison judgement. If participants are merely memorizing the avatars as six isolated instances without integrating them into a map, all avatars should be equidistant and there wouldn’t be a distance effect. We interpreted stronger grid-like activity as a neural index of better representation of the 2D social space, and we interpreted stronger distance effect as a behavioral index of having better internal map-like representation.

      It was puzzling to see passing references to multivariate analyses using representational similarity analysis (RSA) in the main text, given that RSA is only used in analyses presented in the supplementary material.

      We speculate if RSA in entorhinal ROI would be more sensitive than the wholebrain univariate analysis to identify grid-like code because a previous paper on grid-like code in olfactory space (Bao et al., 2019) didn’t identify grid-like representation with univariate analysis but identified it with RSA analysis. However, we failed to find evidence of grid-like code in the entorhinal ROI aligned to its own putative grid orientation with the RSA approach. We reported this result in the main text to show that we carried out a relatively thorough investigation to test the hypothesis using various approaches and decided to add references to the RSA approach in the main text as well.

      Reviewer #3 (Public Review)

      Liang and colleagues set out to test whether the human brain uses distance and grid-like codes in social knowledge using a design where participants had to navigate in a two-dimensional social space based on competence and warmth during an fMRI scan. They showed that participants were able to navigate the social space and found distance-based codes as well as grid-like codes in various brain regions, and the grid-like code correlated with behavior (reaction times).

      On the whole, the experiment is designed appropriately for testing for distant-based and grid-like codes and is relatively well-powered for this type of study, with a large amount of behavioral training per participant. They revealed that a number of brain regions correlated positively or negatively with distance in the social space, and found grid-like codes in the frontal polar cortex and posterior medial entorhinal cortex, the latter in line with prior findings on grid-like activity in the entorhinal cortex. The current paper seems quite similar conceptually and in design to previous work, most notably by Park et al., 2021, Nature Neuroscience.

      Thanks very much again for your careful evaluation and comments. Please find our response to the comments below.

      Below, I raise a few issues and questions on the evidence presented here for a grid-like code as the basis of navigating abstract social space or social knowledge.

      (1) The authors claim that this study provides evidence that humans use a spatial / grid code for abstract knowledge like social knowledge.

      This data does specifically not add anything new to this argument. As with almost all studies that test for a grid code in a similar "conceptual" space (not only the current study), the problem is that when the space is not a uniform, square/circular space, and 2-dimensional then there is no reason the code will be perfectly grid-like, i.e., show six-fold symmetry. In real-world scenarios of social space (as well as navigation, semantic concepts), it must be higher dimensional - or at least more than two-dimensional. It is unclear if this generalizes to larger spaces where not all part of the space is relevant. Modelling work from Tim Behrens' lab (e.g., Whittington et al., 2020) and Bradley Love's lab (e.g., Mok & Love, 2019) have shown/argued this to be the case. In experimental work, like in mazes from the Mosers' labs (e.g., Derdikman et al., 2009), or trapezoid environments from the O'Keefe lab (Krupic et al., 2015), there are distortions in mEC cells, and would not pass as grid cells in terms of the six-fold symmetry criterion.

      The authors briefly discuss the limitations of this at the very end but do not really say how this speaks to the goal of their study and the claim that social space or knowledge is organized as a grid code and if it is in fact used in the brain in their study and beyond. This issue deserves to be discussed in more depth, possibly referring to prior work that addressed this, and raising the issue for future work to address the problem - or if the authors think it is a problem at all.

      Thanks very much for the references to the papers that we haven’t considered enough in our discussion. We will endeavour to discuss the topic in more depth in our revision. In summary, we raise this discussion point because various research groups have found gridlike representations in 2D artificial conceptual space. We think that the next step for a stronger claim would be to find the representation of more spontaneous non-spatial maps.

      Data and analysis

      (2) Concerning the negative correlation of distance with activation in the fusiform gyrus and visual cortex: this is a slightly puzzling but potentially interesting finding. However, could this be related to reaction times? The larger the distance, the longer the reaction times, so the original finding might reflect larger activations with smaller distances.

      Thanks very much for the suggestion. However, we didn’t find a correlation between response time in the choice stage in the scanner task and the negative distance activation in the fusiform gyrus (Figures below). Meanwhile, the morph period in each trial remains the same, the negative correlation of distance with activation in the fusiform gyrus could also be interpreted as a positive correlation of morphing speed with activation in the fusiform gyrus. Indeed, stronger negative activation indicates larger activation for smaller distances, but we are uncertain what it indicates concerning the functional role of Fusiform in our current task.

      Author response image 1.

      (3) Concerning the correlation of grid-like activity with behavior: is the correlation with reaction time just about how long people took (rather than a task-related neural signal)? The authors have only reported correlations with reaction time. The issue here is that the duration of reaction times also relates to the starting positions of each trial and where participants will navigate to. Considering the speed-accuracy tradeoff, could performance accuracy be negatively correlated with these grid consistency metrics? Or it could be positively correlated, which would suggest the grid signal reflects a good representation of the task.

      We apologize for not being clear enough in the manuscript and we will improve the clarity in our revision. The reaction time used to calculate the distance effect is from a task outside the scanner. The closer a pair of avatars are, the more similar they are, hence distinguishing them will be harder and results in longer reaction time when making comparison judgement. If participants are merely memorizing the avatars as six isolated instances without integrating them into a map, all avatars should be equidistant and there wouldn’t be a distance effect. We interpreted stronger grid-like activity as a neural index of better representation of the 2D social space, and we interpreted stronger distance effect as a behavioural index of having better internal map-like representation. This was the motivation behind this analysis.

      References

      Bao, X., Gjorgieva, E., Shanahan, L. K., Howard, J. D., Kahnt, T., & Gottfried, J. A. (2019). Grid-like Neural Representations Support Olfactory Navigation of a Two-Dimensional Odor Space. Neuron, 102(5), 1066-1075 e1065. https://doi.org/10.1016/j.neuron.2019.03.034

      Constantinescu, A. O., O'Reilly, J. X., & Behrens, T. E. J. (2016). Organizing conceptual knowledge in humans with a gridlike code. Science,352(6292), 1464-1468. https://doi.org/10.1126/science.aaf0941

      Park, S. A., Miller, D. S., & Boorman, E. D. (2021). Inferences on a multidimensional social hierarchy use a grid-like code. Nat Neurosci, 24(9), 1292-1301. https://doi.org/10.1038/s41593-02100916-3

      Park, S. A., Miller, D. S., Nili, H., Ranganath, C., & Boorman, E. D. (2020). Map Making: Constructing, Combining, and Inferring on Abstract Cognitive Maps. Neuron, 107(6), 1226-1238 e1228. https://doi.org/10.1016/j.neuron.2020.06.030

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors present the cryo-EM structure of of PSI-fucoxanthin chlorophyll a/c-binding proteins (FCPs) supercomplex from the diatom Thalassiosira pseudonana CCMP1335 at a global resolution of 2.3 Å. This exceptional resolution allows the authors to construct a near-atomic model of the entire supercomplex and elucidate the molecular details of FCPs arrangement. The high-resolution structure reveals subunits not previously identified in earlier reconstructions and models, as well as sequence analysis of PSI-FCPIs from other diatoms and red algae. Additionally, the authors use their model in conjunction with a phylogenetic analysis to compare and contrast the structural features of the T. pseudonana supercomplex with those of Chaetoceros gracilis, uncovering key structural features that contribute to the efficiency of light energy conversion in diatoms.

      The study employs the advanced technique of single particle cryo-electron microscopy to visualize the complex architecture of the PSI supercomplex at near-atomic resolution and analyze the specific roles of FCPs in enhancing photosynthetic performance in diatoms.

      Overall, the approach and data are both compelling and of high quality. The paper is well written and will be of wide interest for comprehending the molecular mechanisms of photosynthesis in diatoms. This work provides valuable insights for applications in bioenergy, environmental conservation, plant physiology, and membrane protein structural biology.

      We thank you very much for your highly positive evaluation and comments on our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript elucidated the cryo-electron microscopic structure of a PSI supercomplex incorporating fucoxanthin chlorophyll a/c-binding proteins (FCPs), designated as PSI-FCPI, isolated from the diatom Thalassiosira pseudonana CCMP1335. Combining structural, sequence, and phylogenetic analyses, the authors provided solid evidence to reveal the evolutionary conservation of protein motifs crucial for the selective binding of individual FCPI subunits and provided valuable information about the molecular mechanisms governing the assembly and selective binding of FCPIs in diatoms.

      Strengths:

      The manuscript is well-written and presented clearly as well as consistently. The supplemental figures are also of high quality.

      Weaknesses:

      Only minor comments (provided in recommendations for authors) to help improve the manuscript.

      We thank you very much for your highly positive evaluation and comments on our manuscript.

      Reviewer #3 (Public Review):

      Summary:

      Understanding the structure and function of the photosynthetic machinery is crucial for grasping its mode of action. Photosystem I (PSI) plays a vital role in light-driven electron transfer, which is essential for generating cellular reducing power. A primary strategy to mitigate light and environmental stresses involves incorporating peripheral light-harvesting proteins. Among various lineages, the number of LHCIs and their protein and pigment compositions differ significantly in PSI-LHCI structures. However, it is still unclear how LHCIs recognize their specific binding sites in the PSI core. This study aims to address this question by obtaining a high-resolution structure of the PSI supercomplex, including fucoxanthin chlorophyll a/c-binding proteins (FCPs), referred to as PSI-FCPI, isolated from the diatom Thalassiosira pseudonana. Through structural and sequence analyses, distinct protein-protein interactions are identified at the interfaces between FCPI and PSI subunits, as well as among FCPI subunits themselves.

      Strengths:

      The primary strength of this work lies in its superb isolation and structural determination, followed by clear discussion and conclusions. However, the interactions among the protein complexes and their relevance in formulating general rules are not definitively established. While efficiency is a crucial aspect, preventing damage is equally important, and currently, we cannot infer this from the provided structures.

      Weaknesses:

      The interactions among the protein complexes and their relevance in formulating general rules are not definitively established. While efficiency is a crucial aspect, preventing damage is equally important, and currently, we cannot infer this from the provided structures.

      We thank you very much for your highly positive evaluation and comments on our manuscript. This study is aimed to decipher the interactions among different protein subunits within the PSI-FCPI supercomplex, from which we wish to draw their relevance in formulating general rules. While we agree that damage is equally important, it is unclear to us what kind of damage you are mentioning, and we consider that this may need to be treated in another publication, as we cannot elucidate everything in one paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Line 69: "Diatoms are one of the most important phytoplankton in aquatic environments and contribute to the primary production in the ocean remarkably." Check the sentence, something is missing.

      We modified the sentence as follow:

      "Diatoms are among the most essential phytoplankton in aquatic environments, playing a crucial role in the global carbon cycle, supporting marine food webs, and contributing significantly to nutrient cycling, thus ensuring the health and sustainability of marine ecosystems"

      (2) Supplementary Figure 1B: The SDS-PAGE gel shows multiple bands. Do the authors know the identity of these proteins, or have they considered analyzing the bands using mass spectrometry? The band at ~17 kDa is particularly intense. Could you comment on this? Have you tried running a Native-PAGE gel?

      We did not identify protein bands by MS analysis. The protein bands in the PSI-FCPI supercomplex of this diatom have been identified by Ikeda et al. 2013. The protein bands of our sample were similar to those of Ikeda et al. 2013. To explain this, we modified the sentences and cited Ikeda et al. 2013 in the revised manuscript (lines 89-91).

      "The PSI-FCPI supercomplexes were purified from the diatom T. pseudonana CCMP1335 and analyzed by biochemical and spectroscopic techniques (Fig. S1). Notably, the protein bands of PSI-FCPI closely resembled those reported in a previous study (31)."

      The ~17 kDa protein band appears to be FCPIs, which was identified in Ikeda et al. 2013. We did not perform BN-PAGE of this sample; however, we performed trehalose density gradient centrifugation (Fig. S1A).

      (3) Can the authors comment on the position of the FCPI subunits in the PSI supercomplex in diatoms compared to the arrangement of LHCIs in complex with PSI in cyanobacteria, green algae, and angiosperms? This information would be useful to incorporate into the text.

      We previously compared the PSI-FCPI structures of the diatom C. gracilis to the PSI-LHCI structures of land plant, green alga, and red alga (Nagao et al., 2020). Also, Xu et al. 2020 compared the C. gracilis PSI-FCPI structure to the PSI-LHCI structures of land plant, green alga, and red alga. The binding sites between FCPIs and LHCIs are conserved to some extent. However, our recent study revealed that no orthologous relationship exists among LHCs bound to PSI between primitive red algae and diatoms (Kato et al., 2024). Consequently, we found that the information obtained from structural comparisons alone is extremely limited. To avoid misinterpretation, this study focused on comparing the structures and amino acid sequences of FCPIs between T. pseudonana and C. gracilis.

      (4) Line 104: Despite achieving high resolution, the authors modeled only six lipid densities (the PDB model contains actually 9 lipids, you should correct it in the text). Do you believe this is due to the detergent used for purification? Can you comment on the position, identity, and potential role of the lipids within your model?

      There are 6 lipids associated with the PSI core and 3 with FCP, giving rise to a total of 9 lipids. We have described it in our original text (lines 102-104 in the modified manuscript). Additionally, our structure reveals unidentified densities which likely represent lipids; they are modeled as 88 unknown lipids (UNLs). Thus, there are more lipids in the supercomplex. However, we also observed 4 β-DDM molecules (LMT) in the structure, which are used as detergents. Thus, it is possible that some lipids have dissociated and replaced by detergents. Many of the observed lipids are located between subunits, likely contributing to the stabilization of the complex.

      (5) Line 111: The global resolution is very high. Why does the unknown protein have such low resolution that it was impossible to model it properly and perform de novo identification from the density map? Is it due to a lower abundance of particles with this subunit bound? Have you tried improving this with 3D classification/ focus refinement /density modification?

      The Unknown subunit (UNK) is located peripherally, and its density is significantly lower compared to the neighboring subunits, which may suggest a low abundance. We applied density modification using Topaz for 3D map denoising, but the effect was minimal. As the low abundance of UNK may be the cause, 3D classification and focus refinement also had limited impact.

      (6) Figure 2A: It would be useful to show the density map for the subunit together with the model, especially to demonstrate visualization of the long loop.

      We added the model and map of Psa29 to Figure S4C in the revised manuscript.

      (7) Given the proximity of Psa29 to PsaC, is the protein involved in electron shuttling? If so, could you comment on this? In line 131, you state that Psa29 was not found in other organisms. Can the authors speculate on the potential role of this protein in diatoms?

      We have no idea about the function of Psa29 at present. However, Psa29 does not contain any cofactors, indicating no contribution of it to electron transfer reactions. To understand the function of Psa29, a deletion mutant of this gene is required for examining its functional and physiological roles in diatom photosynthesis. To explain this, we added the following sentences to the revised manuscript (lines 129-133):

      "However, the functional and physiological roles of Psa29 remain unclear at present. It is evident that Psa29 does not have any pigments, quinones, or metal complexes, suggesting no contribution of Psa29 to electron transfer reactions within PSI. Further mutagenesis studies will be necessary to investigate the role of Psa29 in diatom photosynthesis."

      (8) Line 163: "Among the FCPI subunits, only FCPI-1 has BCRs in addition to Fxs and Ddxs (Figure S6A). FCPI-1 is a RedCAP, which belongs to the LHC protein superfamily but is distinct from the LHC protein family (6, 7)." It would be useful if the authors could add the carotenoid model embedded in the cryoEM density map to the figure to show the features that led to modeling BCR instead of other carotenoids. Additionally, it would be helpful to include in the text why RedCAPs differ from LHCIs and their proposed role.

      We added the model and map of two BCRs in FCPI-1 (RedCAP) to Figure S4F in the revised manuscript.

      Phylogenetic analysis showed that RedCAPs are distinct from the LHC protein family. This has been explained in lines 163-164. Also, the functional and physiological roles of RedCAP remain unclear. To explain this, we added the sentence "; however, the functional and physiological roles of RedCAP remain unclear" to the revised manuscript (lines 164-165).

      (9) Line 185: "However, it is unknown (i) whether CgRedCAP is indeed bound to the C. gracilis PSI-FCPI supercomplex and (ii) if a loop structure corresponding to the Q96-T116 loop of TpRedCAP exists in CgRedCAP." Have the authors attempted to model the protein using AlphaFold? If so, are there significant differences? Could you speculate on the absence of RedCAP in C. gracilis? Do you believe it is due to using a different detergent or related to environmental factors?

      We did not model CgRedCAP using AlphaFold. Our recent study “Kato et al. 2024” proposed that CgRedCAP binds to the LHCI-1 site in the PSI-FCPI structure based on sequence comparison. There are two types of PSI-FCPI supercomplexes, one having 16 FCPIs and the other having 24 FCPs, from C. gracilis. The different antenna sizes may depend on the growth conditions of C. gracilis (Nagao et al. 2020). These explanations were already described in the manuscript (lines 243-246).

      (10) Line 193: Figure 8 is mentioned before Figures 4-7.

      We are sorry for the mistake of Figure number. Figure 8 is Supplementary Figure 8, so that we modified Fig. S8B in the revised manuscript.

      (11) Line 223: FCPI-4 interacts only with FCPI-5, primarily through the interaction of Y196/4 with the FCPI-5 backbone. Is this interaction facilitated by other factors such as lipids, carotenoids, or other ligands? Also, FCPI-4 occupies a peculiar position compared to other LHCIs proteins (it is peripheral to FCPI-4 and FCPI-5). Do you believe this could be due to a transient interaction with the complex? Could the presence of this protein be related to the growth conditions experienced by the plant? Are there any literature reports on environmental conditions influencing FCPI arrangements? Including this information in the text would be interesting.

      Y196/4 interacts with only backbones by hydrogen-bond interactions; therefore, other cofactors do not contribute to the interactions.

      We do not believe that the interaction of FCPI-4 is transient; rather, this binding appears to be stable within the complex. Given that the PSI-FCPI supercomplexes were isolated by anion exchange chromatography, FCPI-4 and FCPI-5 are tightly associated within this complex. However, it is important to note that the expression of diatom FCPI proteins can indeed vary depending on growth conditions, as highlighted in our previous study (Nagao et al., 2020). While the peculiar position of FCPI-4 may not be directly related to transient interactions, environmental conditions could still influence the overall arrangement and expression levels of FCPIs. This information has already been described in the manuscript (lines 243-246).

      (12) Given the high resolution of your map, the overall model quality does not seem to match the map quality. Specifically, the clash score (10) and sidechain outliers (3%) are elevated. Could you comment on this? Do you believe it is related to the high number of ligands?

      Our structure contains a total of 295 ligands, including cofactors, detergents, and unknown lipids. We believe the high clash score and number of sidechain outliers are due to the large number of ligands present.

      (13) Supplementary Figure 2: You should show the 3D classes that were discarded.

      According to your comment, we added the 3D classes that were discarded and the sentence "Red boxes highlight selected particles from each 3D classification." to Figure S2 and its legend in the revised manuscript.

      (14) Which masks were used for refinement? How were they generated, and which parameters were chosen? This information should be added to the Materials and Methods section. You should show the masks used during classification, for example.

      We used a 240 Å spherical mask for refinement and classification, without applying any reference mask as input. To explain this, we added the corresponding sentence to Methods in the revised manuscript (lines 347-348) as follow:

      "A 240-Å spherical mask was used during the 3D classification and refinement processes."

      (15) Were any extra proteins detected in the early stages of the cryoEM analysis (i.e., 2D classification) that were discarded? Could you visualize the superior oligomeric states of the supercomplex?

      In the single-particle analysis, no larger particles than the analyzed complex were detected. The results of 2D classification using a sufficiently large spherical mask with a diameter of 320 Å are shown below.

      Author response image 1.

      (16) Have you tried using cryoSPARC for data analysis? If so, could you comment on that?

      We did not use cryoSPARC for data analysis.

      Reviewer #2 (Recommendations For The Authors):

      I have some minor comments below to help improve the manuscript. The line numbers below refer to those in the Word version of the manuscript.

      (1) Figure 1 legend, line 559, "membrane normal"? Panel A and B, structures with the same colors, do they refer to the closely related or interacted parts? For example, the red color for FCP1-1 in A and PsaA in B. If not, the authors may want to clarify it.

      The term 'membrane normal' refers to the direction perpendicular to the surface of a membrane. It is a concept frequently used in physics and biology to describe the orientation relative to the membrane's plane.

      We do not refer to either the closely related or interacted parts used in Figure 1. According to your comments, the colors of subunits were revised in the revised manuscript.

      (2) Line 109-117. "Psa28 is a novel subunit found in the C. gracilis PSI-FCPI structure, and its name follows the nomenclature as suggested previously (31).... After psaZ, the newly identified genes should be named psa27, psa28, etc., and the corresponding proteins are called Psa27, Psa28, etc... Psa28 was also named PsaR in the PSI-FCPI structure of C. gracilis (16)". It is confusing. Was Psa28 named twice, PsaR and Psa28? It would be helpful to add a simple explanation here.

      According to your comment, we modified the sentence as follow (lines 117-118):

      " However, Xu et al. named the subunit as PsaR in the PSI-FCPI structure of C. gracilis "

      (3) Line 134, "One of the Car molecules in PsaJ was identified as ZXT103 in the T. pseudonana PSI-FCPI structure but it is BCR112 in the C. gracilis PSI-FCPI structure (15)". Figure S4D mentioned BCR863 but did not mention BCR112. Figure S4C, D, it may need better explanations of the colors and labels, and indicate which parts are from T. pseudonana or C. gracilis.

      BCR112 was misnumbered; the correct number is BCR103. In response to your comments, we revised Figure S4C and D by labeling the characteristic pigments in the revised manuscript.

      (4) Figure S7, although mentioned in the legend, it would be helpful to label interaction pairs on the figure directly with corresponding colours.

      According to your comments, we modified the Figure and legends in the revised manuscript.

      (5) Figure 3E, it is better to avoid red/green colours in one figure as some readers may be colour-blind. It would also be helpful to label each FCPI with the same colour as its structure on the figure directly.

      According to your comments, we modified Figure 3E in the revised manuscript.

      (6) Line 185, "structures similar to the Q96-T116 loop in TpRedCAP found in the present study (Figure 8B).". The authors refer to Figure S8B? I have the same comment for line 186, Figure 8C.

      We are sorry for the mistake of Figure number. Figure 8 is Supplementary Figure 8, so we modified it as Fig. S8B in the revised manuscript.

      (7) Line 270, "TpLhcq10 cannot bind at the FCPI-2 site". Why not use FCPI-3 for TpLhcq10?

      This means that the gene product of TpLhcq10 binds at the FCPI-3 site but not at the other sites such as FCPI-2. To avoid misreading, we modified the sentence as follows:

      "TpLhcq10 binds specifically at the FCPI-3 site but not at the other sites such as FCPI-2" (lines 278-279)

      Reviewer #3 (Recommendations For The Authors):

      I have no technical or conceptual suggestions at the current stage.

      Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The Bagnat and Rawls groups' previous published work (Park et al., 2019) described the kinetics and genetic basis of protein absorption in a specialized cell population of young vertebrates termed lysosome-rich enterocytes (LREs). In this study they seek to understand how the presence and composition of the microbiota impacts the protein absorption function of these cells and reciprocally, how diet and intestinal protein absorption function impact the microbiome.

      Strengths of the study include the functional assays for protein absorption performed in live larval zebrafish, which provides detailed kinetics on protein uptake and degradation with anatomic precision, and the gnotobiotic manipulations. The authors clearly show that the presence of the microbiota or of certain individual bacterial members slows the uptake and degradation of multiple different tester fluorescent proteins.

      To understand the mechanistic basis for these differences, the authors also provide detailed single-cell transcriptomic analyses of cells isolated based on both an intestinal epithelial cell identity (based on a transgenic marker) and their protein uptake activity. The data generated from these analyses, presented in Figures 3-5, are valuable for expanding knowledge about zebrafish intestinal epithelial cell identities, but of more limited interest to a broader readership. Some of the descriptive analysis in this section is circular because the authors define subsets of LREs (termed anterior and posterior) based on their fabp2 expression levels, but then go on to note transcriptional differences between these cells (for example in fabp2) that are a consequence of this initial subsetting.

      Inspired by their single-cell profiling and by previous characterization of the genes required for protein uptake and degradation in the LREs, the authors use quantitative hybridization chain reaction RNA-fluorescent in situ hybridization to examine transcript levels of several of these genes along the length of the LRE intestinal region of germ-free versus mono-associated larvae. They provide good evidence for reduced transcript levels of these genes that correlate with the reduced protein uptake in the mono-associated larval groups.

      The final part of the study (shown in Figure 7) characterized the microbiomes of 30-day-old zebrafish reared from 6-30 days on defined diets of low and high protein and with or without homozygous loss of the cubn gene required for protein uptake. The analysis of these microbiomes notes some significant differences between fish genotypes by diet treatments, but the discussion of these data does not provide strong support for the hypothesis that "LRE activity has reciprocal effects on the gut microbiome". The most striking feature of the MDS plot of Bray Curtis distance between zebrafish samples shown in Figure 7B is the separation by diet independent of host genotype, which is not discussed in the associated text. Additionally, the high protein diet microbiomes have a greater spread than those of the low protein treatment groups, with the high protein diet cubn mutant samples being the most dispersed. This pattern is consistent with the intestinal microbiota under a high protein diet regimen and in the absence of protein absorption machinery being most perturbed in stochastic ways than in hosts competent for protein uptake, consistent with greater beta dispersal associated with more dysbiotic microbiomes (described as the Anna Karenina principle here: https://pubmed.ncbi.nlm.nih.gov/28836573/). It would be useful for the authors to provide statistics on the beta dispersal of each treatment group.

      Overall, this study provides strong evidence that specific members of the microbiota differentially impact gene expression and cellular activities of enterocyte protein uptake and degradation, findings that have a significant impact on the field of gastrointestinal physiology. The work refines our understanding of intestinal cell types that contribute to protein uptake and their respective transcriptomes. The work also provides some evidence that microbiomes are modulated by enterocyte protein uptake capacity in a diet-dependent manner. These latter findings provide valuable datasets for future related studies.

      We thank the Reviewer for their thorough and kind assessment. We appreciate the suggestion for edits and for pointing out areas that needed further clarification.

      One point in need of further explanation is the use fabp6 (referred to as fabp2 by the reviewer) to define anterior LREs and their gene expression pattern, which includes high levels of fabp6, something that was deemed a “circular argument” by the reviewer.  The rationale for using fabp6 as a reference is that we were able to define its spatial pattern in relation to other LRE markers and the neighboring ileocyte population using transgenic markers (Lickwar et al., 2017; Wen et al., 2021). Thus, far from being a circular argument, using fabp6 allowed us to identify other markers that are differentially expressed between anterior and posterior LREs, which share a core program that we highlight in our study. In the revised manuscript, we clarified this point (lines 166 – 169).

      We followed the Reviewer’s suggestion to test if LRE activity and dietary protein affected beta dispersal. Our analyses revealed that beta dispersion was not significantly different between our experimental conditions. We added details about this analysis (lines 384 – 386) and a new supplemental figure panel (Figure S7C).

      Reviewer #2 (Public review):

      Summary:

      The authors set out to determine how the microbiome and host genotype impact host protein-based nutrition.

      Strengths:

      The quantification of protein uptake dynamics is a major strength of this work and the sensitivity of this assay shows that the microbiome and even mono-associated bacterial strains dampen protein uptake in the host by causing down-regulation of genes involved in this process rather than a change in cell type.

      The use of fluorescent proteins in combination with transcript clustering in the single cell seq analysis deepens our understanding of the cells that participate in protein uptake along the intestine. In addition to the lysozome-rich enterocytes (LRE), subsets of enteroendocrine cells, acinar, and goblet cells also take up protein. Intriguingly, these non-LRE cells did not show lysosomal-based protein degradation; but importantly analysis of the transcripts upregulated in these cells include dab2 and cubn, genes shown previously as being essential to protein uptake.

      The derivation of zebrafish mono-associated with single strains of microbes paired with HCR to localize and quantify the expression of host protein absorption genes shows that different bacterial strains suppress these genes to variable extents.

      The analysis of microbiome composition, when host protein absorption is compromised in cubn-/- larvae or by reducing protein in the food, demonstrates that changes to host uptake can alter the abundance of specific microbial taxa like Aeramonas.

      Weaknesses:

      The finding that neurons are positive for protein uptake in the single-cell data set is not adequately discussed. It is curious because the cldn:GFP line used for sorting does not mark neurons and if the neurons are taking up mCherry via trans-synaptic uptake from EECs, those neurons should be mCherry+/GFP-; yet methods indicate GFP+ and GFP+/mCherry+ cells were the ones collected and analyzed.

      We thank the Reviewer for the kind and positive assessment of our work, for suggestions to improve the accessibility and clarity of the manuscript, and for pointing out an issue related to a neuronal population that needed further clarification.

      It turns out that there is a population of neurons that express cldn15la. They are not easily visualized by microscopy because IECs express this gene much more highly. However, the endogenous cldn15la transcripts can be found in neurons as shown in a recently published dataset (PMID: 35108531) as well as in this study We added a discussion point to clarify this issue (lines 463 – 465).

      Reviewer #3 (Public review):

      Summary:

      Childers et al. address a fundamental question about the complex relationship within the gut: the link between nutrient absorption, microbial presence, and intestinal physiology. They focus on the role of lysosome-rich enterocytes (LREs) and the microbiota in protein absorption within the intestinal epithelium. By using germ-free and conventional zebrafishes, they demonstrate that microbial association leads to a reduction in protein uptake by LREs. Through impressive in vivo imaging of gavaged fluorescent proteins, they detail the degradation rate within the LRE region, positioning these cells as key players in the process. Additionally, the authors map protein absorption in the gut using single-cell sequencing analysis, extensively describing LRE subpopulations in terms of clustering and transcriptomic patterns. They further explore the monoassociation of ex-germ-free animals with specific bacterial strains, revealing that the reduction in protein absorption in the LRE region is strain-specific.

      Strengths:

      The authors employ state-of-the-art imaging to provide clear evidence of the protein absorption rate phenotype, focusing on a specific intestinal region. This innovative method of fluorescent protein tracing expands the field of in vivo gut physiology.

      Using both conventional and germ-free animals for single-cell sequencing analysis, they offer valuable epithelial datasets for researchers studying host-microbe interactions. By capitalizing on fluorescently labelled proteins in vivo, they create a new and specific atlas of cells involved in protein absorption, along with a detailed LRE single-cell transcriptomic dataset.

      Weaknesses:

      While the authors present tangible hypotheses, the data are primarily correlative, and the statistical methods are inadequate. They examine protein absorption in a specific, normalized intestinal region but do not address confounding factors between germ-free and conventional animals, such as size differences, transit time, and oral gavage, which may impact their in vivo observations. This oversight can lead to bold conclusions, where the data appear valuable but require more nuance.

      The sections of the study describing the microbiota or attempting functional analysis are elusive, with related data being overinterpreted. The microbiome field has long used 16S sequencing to characterize the microbiota, but its variability due to experimental parameters limits the ability to draw causative conclusions about the link between LRE activity, dietary protein, and microbial composition. Additionally, the complex networks involved in dopamine synthesis and signalling cannot be fully represented by RNA levels alone. The authors' conclusions on this biological phenomenon based on single-cell data need support from functional and in vivo experiments.

      We thank the Reviewer for their assessment and for pointing out some areas that needed to be explained better and/or discussed.

      The Reviewer mentions some potential confounding factors (ie., size differences, transit time, oral gavage) in the gnotobiology experiments. We would like to convey that these aspects have been addressed in our experimental design and are now clarified in the revised manuscript: 1- larval sizes were recorded and found to be similar between GF and monoassociated larvae (Figure S6A); 2- while intestinal transit time may be affected by microbes and is a topic of interest, in our assay luminal mCherry cargo is present at high levels throughout the gut and is not limiting at any point during the experiment; 3- gavage, which is necessary for quantitative assays, is indeed an experimental manipulation that may somehow alter the subjects (the same is true for microscopy and virtually any research method). However, it cannot explain differences between GF and CV or alter our conclusions via microbial or dietary effects. We now elaborate the former point in the revised discussion (line 426). A new panel has been added for Fig.S6 to show that standard length was similar in GF and monoassociated larvae (Figure S6A).

      We are aware that microbial community composition is often highly variable between experiments and this necessitates adequately high biological replication and inclusion of internal controls to allow conclusions to be drawn. Nevertheless, studies evaluating the utility of 16S rRNA gene sequencing have found that this analysis reveals important impacts of environmental factors on the gut microbiome (PMIDs: 21346791, 31409661, 31324413). Our results provide further evidence that 16S rRNA gene sequencing remains a useful method to detect perturbations to the zebrafish gut microbiome. Reproducing previous findings, we detected many of the core zebrafish microbiota strains in our samples that have been identified by other studies (PMIDs: 26339860, 21472014, 17055441). To ensure the robustness of our results, we included several biological replicates for each condition, co-housed genotypes and included large sample sizes to minimize environmental variability between groups. In response to this reviewer concern, we have added a supplemental beta diversity plot and statistical analyses showing that the microbiomes in our larvae were significantly different from the diets or tank water (Figure S7A). This analysis shows that the host environment influenced microbial community composition (lines 376 – 378). We also added an additional supplemental panel and performed analysis showing that the experimental replicates (i.e., different tanks) were not a significant source of variation in this study (lines 378 – 380) (Figure S7B). This result underscores that the microbiota in these larvae were influenced by both the host and diet.

      Regarding dopamine pathways, we acknowledge that it involves complex biology that will require dedicated studies. In this work, we simply point out gene expression patterns we find interesting as they may inform future studies.

      Finally, the Reviewer mentions the use of inadequate statistical methods for some analyses without specifying or indicating alternative analyses, only the need to justify the use of two-way ANOVA is made explicit. In this point, we respectfully disagree and would like to emphasize that we use statistical methods that are standard in the field (PMID: 37707499). We nevertheless added a justification for the use of two-way ANOVA where appropriate (lines 635-637, 653-654, 773-776). The two-way ANOVA test was to compare fluorescence profiles of gavages cargoes or HCR probes along the length of the LRE region. This test accounts for differences in fluorescence between experimental conditions in segments (30 μm) along the LRE region (~300 μm). This allows us to capture differences in fluorescence between experimental conditions while accounting for heterogeneity in the LRE region. Please see our comment below for more information about our use of the 2-way ANOVA.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Please provide in the materials and methods the strain identifiers and sources of the bacteria used in the study.

      Thank you for the suggestions. Strain identifiers and source information were added to the methods (lines 576-579).

      Reviewer #2 (Recommendations for the authors):

      (1) This is a very satisfying and thorough analysis of the reciprocal influence of diet, microbiome, and host genotype on protein absorption by the host. Below I make suggestions that mainly relate to making the paper more accessible to a broader audience.

      (2) Line 233 Starts a section that reports the findings of the scRNA dataset. The writing is inconsistent with respect to how the genes are listed: whether abbreviation only or spelled out followed by abbreviation. I prefer the latter. For example, slc10a2 is a bile acid Na cotransporter but for those not in the know, they would have to look this up. Perhaps adding a supplementary table that provides a gene list of those discussed in the text with abbreviation/spelled-out, and KEGG terms.

      Thank you for pointing out inconsistent gene labeling. We have revised the text with spelled out gene names followed by abbreviations.

      (3) Line 461 Where did the neurons come from when you were sorting cldn+ cells?

      Neuronal expression of cldn15la was detected in our data and other published datasets (PMID: 37995681, 35108531). We added a note to the text clarifying that neuronal cells can express cldn15la (lines 463-465).

      (4) Line 561 1x tricaine should be converted to percentage in solution or concentration throughout.

      The tricaine concentration was 0.2 mg/mL. We added this detail to the methods (line 596).

      (5) Line 612 Please clarify how normalizations are carried out: is it to the peak value in the germ-free condition? CV never reaches 1.

      AUC values were normalized to the peak value in the GF condition at 60 minutes PG. We clarified this step in the methods (lines 618-619).

      (6) Line 654-663 I think mCherry here should be mTourquoise?

      Thank you for catching this typo. We corrected it in the text.

      (7) In Figure 1 Please consider adding a color so that magenta does not represent BOTH germ-free AND mCherry.

      Due to the many colors of fluorescent proteins and HCR probes in this paper, we were not able to find an alternative plot line color to represent GF.

      (8) In Figure 2 I suggest consistency with respect to the order you present GF/CV

      Figure 1 GF->CV

      Figure 2 CV->GF

      My preference is GF->CV

      Images in Figure 2 were re-ordered following reviewer’s recommendation.

      Here, 20 minute time point also appears qualitatively different between GF and CV.

      There can be slight differences in LREs between individuals. These images were selected because they represented the average differences in the amount of mTurquoise degradation activity that occurred between 20 – 60 minutes post-flushing in the GF and CV conditions.

      In Figure 3E Figure legend refers to being able to see BSA in vacuoles. The image should be modified to show this- currently too small.

      In response, we enlarged the confocal microscopy images showing DQ red BSA in the LRE region (Figure 3E). We added a panel with confocal microscopy images of the LREs in 6 dpf larva gavaged with DQ red BSA (Figure S3F). These images show that DQ red BSA fluorescence was localized to the LRE lysosomal vacuole.

      In Figure 5D, Posterior LRE should be pink not green in the key to the right of the heatmap.

      Thank you for catching this error. We have corrected the colors (Figure 5D).

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction and context:

      Expand the introduction to include more background on microbial-mediated protein absorption, with references to relevant findings in Drosophila. This will provide a stronger foundation for the study's contributions to the field.

      Thank you for this suggestion. We added information about microbe-mediated amino acid harvest in Drosophila to the introduction (lines 49-53).

      (12) Methodological suggestions:

      Measure and report differences between germ-free (GF) and conventional (CV) animals, such as transit time, to account for potential confounding factors in protein absorption dynamics.

      We respectfully assert that a transit assay is not required for this study and could actually create confusion as an effect in transit time could be interpreted as a contributing factor when it is in fact not the case due to the experimental design. This is because the concentration of luminal protein was equivalent in GF and CV larvae (Figure S1E), so the LREs had equal saturating access to those proteins in both conditions. Furthermore, we showed the microbiota did not degrade fluorescent protein (Figure S1F). Therefore, we feel confident that there was lower protein uptake in the LREs of CV larvae because the microbiome exerted regulatory effects on LRE activity.

      Provide detailed information on the gating strategy used for single-cell sorting to enhance the dataset's utility and support claims about cell changes.

      The methods we used for sorting cells were previously described (PMID: 31474562). In this manuscript, we describe them under the heading “Fluorescence activated cell sorting for single cell RNA-sequencing.”

      Explain the "GeneRatio" metric in figure legends for clarity.

      The GeneRatio is the ratio of genes associated with each individual GO term to the number of genes associated with the domain. An explanation was added to the caption (Figure S3C).

      (13) Visual and statistical improvements:

      Include images of labeled peptidases within lysosome-rich enterocytes (LREs) to reinforce findings.

      Thank you for the suggestion. We added images of labeled peptidases in the LRE region (Figure S6E-D).

      For Panels 4-F and 5-D, consider using violin plots of selected genes to improve clarity and emphasize major ideas.

      In Figure 4F, the heatmap shows multiple genes were upregulated in mCherry-positive cells. We tried the plotting suggested by the reviewer and felt that violin plots could not convey this message as clearly. Likewise, the heatmap in Figure 5D effectively shows the gradient of expression between ileocytes, anterior and posterior LREs.

      Strengthen statistical analysis by employing more rigorous methods and justifying their selection, such as using two-way ANOVA where appropriate.

      The two-way ANOVA was used to quantify protein uptake or HCR probe fluorescence along the length of the LRE region. This statistical test allowed us to compare differences in fluorescence between experimental conditions in multiple LRE segments (see Authoer response image 1 below for example). As our assays show, the LRE region is heterogenous with segments showing different levels of activity and gene expression. The two-way ANOVA is appropriate because it allows us to account for this heterogeneity by comparing fluorescence across multiple segments.

      Author response image 1.

      Our figures display these fluorescent levels in line plots (above, left) rather than bar plots (above, right). The results are easier to visualize interpret in line plots, and they display the fluorescence profiles in greater detail.

      (14) Technical corrections:

      Correct figure references: Figure 5 about tryptophan metabolism should be 5A, S5G-S5H.

      We corrected the figure references.

      Line 518: Spell out "heterozygotes" instead of using "gets".

      We changed the term from “hets” to “heterozygotes.”

      (15) Revise Figure S2 citation to match the actual figure labeling.

      We corrected the text to indicate “Figure S2” rather than “Figure S2A.”

      Additional manuscript modification

      · Figure panels 3B-C, S3A-B, 4A-C: Two cluster were relabeled with improved descriptors based on our updated annotations. The clusters “Pharynx-esophagus-cloaca 1” (PEC1) and PEC2 were relabeled as “Pharynx-cloaca 1” and “Pharynx-cloaca 2.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:  

      Overall, the conclusions appear appropriately supported by the data, and the data appear of high quality.

      Strengths:

      The particular strengths of the paper include an impressive combination of genomic and imaging-based approaches and insightful genetically engineered cell systems. The manuscript reports interesting and potentially important findings. The text is generally very well written, the ideas are clearly explained, and the reasoning is easy to follow.

      Weaknesses:

      The main weakness seems to be that the heat and ethanol shock approaches likely elicit pleiotropic effects, and therefore it is a challenge to test the causal relationship between various observations. Nevertheless, even as indirect effects might contribute to some of the authors' observations, the results are definitively worth reporting.  

      We agree that these two proteotoxic stresses can impact cell physiology in multiple ways and discuss this on lines 132-143 and 500-519. Moreover, in this revision we have more rigorously quantified the extent of proteotoxic stress elicited by the 39°C heat shock and 8.5% ethanol stress (Figure 1E; see response 1 to Reviewer 2). We have additionally added new Figure 2 that reveals an important difference in the way Hsf1 and its negative regulator, the Hsp70 co-chaperone Sis1, respond to HS and ES. This difference is evident at two different intensities for each stress as described in more detail below (see response 1 to Reviewer 2).

      Presentation of some of the data could be improved.

      We agree and have made improvements/data additions to multiple figures: Figure 1E; Figures 3A, B; Figures 4A, B; Figure 7 (data drawn from original Fig. 6 and Fig. 6 – fig. suppl. 1 and reorganized); Fig. 8B; Figure 9; Figure 10. Corresponding enhancements to the supplemental figures have been made as well. 

      Reviewer #2:  

      (1) The central finding of the study highlights the different dynamics of Hsf1, Pol II, and gene organization in response to heat shock versus ethanol stress. However, one important limitation to consider is that the two chosen conditions may not be directly comparable. For a balanced assessment, the authors should ideally expose yeast to various ethanol concentrations and different heat shock temperatures, ensuring the observed differences stem from the nature of the stressor rather than suboptimal stress intensity. At the very least, an additional single ethanol concentration point on each side of 8.5% should be investigated to ensure that 8.5% is near the optimum. In fact, comparing the number of Hsp104 foci in the two conditions in Fig. 1E and F suggests that the yeast is likely experiencing different intensities of stress for the chosen heat shock condition and ethanol concentration used in this study.

      We thank the reviewer for this important suggestion. In this revision, we have included an enhanced analysis of the yeast cellular response to each of these stresses. As illustrated in revised Figure 1, the two stresses used throughout this study – 39°C heat shock and 8.5% ethanol stress – both elicit a proteotoxic response, as assayed by the de novo formation of Hsp104 clusters. While 10 min exposure to 8.5% ethanol results in the formation of multiple discrete (spherical) foci, a 10 min exposure to the elevated temperature leads the appearance of multiple, largely diffuse Hsp104 clusters, some of which are spherical (new Fig. 1D). The difference in morphology notwithstanding, we have attempted to quantify these clusters using Imaris v. 10.0.1 image analysis software; the results are depicted in Fig. 1E. Such quantification suggests that 8.5% ethanol elicits a more intense stress than exposure to 39°C. A caveat is that it is unclear whether diffuse Hsp104 clusters are comparable to compact Hsp104 foci (see response 3 below).

      Beyond the apparent difference in intensity, a new analysis presented in new Figure 2 reveals that heat shock, elicited by temperature upshift to either 39°C or 42°C, induces relocalization of the J-protein Sis1 – a key negative regulator of Hsf1 – from the nucleoplasm to the nucleolar periphery. Sis1’s perinucleolar ring localization agrees with previous findings of 39°C heat-shocked cells (Feder et al., 2021). Ethanol stress, whether 5% or 8.5%, initially causes Sis1 to relocalize diffusely throughout the nucleus and cytosol. At 10 min, Sis1 localizes to the periphery of the nucleus, thereby providing a marked contrast to what is observed in response to heat shock. These new results are described on lines 174-191.

      Taking these two observations together, we asked whether a less severe ethanol stress (5%) would induce Hsf1 puncta. It does, and as rapidly as 8.5% ethanol (data are presented in revised Figure 8-figure supplement 1). Interestingly, in the presence of 5% ethanol, Hsf1 puncta begin to dissolve at 30 min. This strongly contrasts with the case when cells are exposed to 8.5% ethanol (Figure 8; Figure 8-figure supplement 1). As we state in this revision (lines 414-424), the sustained presence of condensates that we originally observed is likely the consequence of the intensity of the proteotoxic stress elicited by exposure to 8.5% ethanol; analogous responses to these two stress conditions have been observed before (lines 495-501). 

      (2) A second significant concern is the use of the term "Hsf1 condensate". Chowdhary et al.'s 2022 Molecular Cell study highlighted an inhomogeneous distribution and rapid dynamics of Hsf1 clustering upon heat shock, with sensitivity to 1,6-hexandiol, which is interpreted as evidence for condensation by LLPS. However this interpretation has been criticized severely by McSwiggen et al. Genes Dev 2019 and Mussacchio EMBO J 2022. It is important to mention that 1,6-hexandiol is known to affect chromatin organization (Itoh et al. Life Science Alliance 2021). Describing such clusters as 'condensates' without further experimental evidence is premature.  

      While we appreciate and largely agree with the point made by this reviewer, we prefer to maintain the term “condensate”. Banani et al (2017) originally defined “biomolecular condensate” to mean selforganized membrane-free compartments that concentrate specific biomolecules. It was never meant to imply LLPS although its widespread use in the literature has led to that implication. We clarify our use of this term on lines 99-104.   

      (3) Figure 1: Why does ethanol stress at 0 min display a larger number of Hsp104 foci per cell than heat shock at the same time? How are foci defined by the authors? In Fig. 1D, there are many smaller puncta. A comparative assessment of the number and size of foci for heat shock and ethanol stress would be beneficial.

      We thank the reviewer for raising this point and have addressed it as follows.  First, we repeated the assay with a different strain (DPY1561) and increased the number of cells assayed from 40 to 200. This larger sample size created the same T=0 baseline for both stresses (Figure 1E). Second, we define Hsp104 foci as diffraction-limited structures with a diameter of ~0.4 µm (lines 747-749).  Third, employing Imaris v. 10.0.1, we quantified foci size (= volume) and a summary graph has been added to Figure 1E that also displays the number of foci per cell. In the legend to this figure, we point out that to conduct this analysis we assumed that the diffuse Hsp104 clusters seen in HS cells are comparable to the compact Hsp104 foci in ES cells (lines 1169-1171). 

      (4) Figure 2: Selecting a housekeeping gene with consistent expression levels is crucial for meaningful qPCR analysis. Do SCR1 mRNA levels fluctuate during heat shock or ethanol stress?  

      We thank the reviewer for this question. In revised Figure 3 – figure supplement 1C we provide a new graph (reproduced here) revealing that the levels of SCR1 do not significantly change under either heat shock or ethanol stress relative to the non-stressed control (0 min). One-way ANOVA analysis was performed for both HS and ES and p values were 0.094 and 0.083, respectively (calculated using GraphPad Prism 8).

      (5) Additionally, certain genes, such as TMA10 and SSA4, lack visible bars at time 0. Are these levels undetectable? The varying y-axis scales are confusing; presenting data as relative fold changes could offer a clearer perspective.

      Transcript levels for all genes evaluated here are detectable, even in the basal unstressed state. They are not visible on the histogram for certain genes at T= 0 due to the prodigious fold-increase in RNA elicited by heat shock.  However, to address this concern, we have added a bar graph inset displaying basal transcript levels for each gene in revised Figure 3. We reproduce data for SSA4 and TMA10 in the graphs below. In addition, we present transcript levels in new Figure 3 - figure supplement 1 for cells subjected to ethanol stress to allow a better appreciation of their increase over time. 

      Author response image 1.

      (6) Line 239: The evidence for chromatin compaction is unconvincing. An increase in H3 occupancy by ChIP might indicate a reduction in histone exchange dynamics but may not relate to overall chromatin compaction. The authors use H2A-mCherry to suggest a decrease in chromatin volume, but this data is not persuasive. Did the authors observe any changes in nuclear size? Perhaps quantifying chromatin compaction more directly, using signal intensity per volume, would be informative.

      To address this concern, we attempted to quantify integrated density for H2A-mCherry using Image J software. While the volume decreased for both stresses, the integrated density only increased for ethanol stress. We speculate that this may be due to photobleaching which has been reported for heat shock. The combination of heat and acidic pH contribute to loss of fluorescence signal (Alkaabi et al., 2005). While the integrated density supports the idea of global chromatin compaction in the ethanol stress condition, given the above concerns with the HS sample we elected to not present these data.

      (7) Line 340: The claim of a "strong spatiotemporal correlation" isn't evident from the data. Could correlation coefficients be provided? There is potential anti-correlation in Fig. 6 - Figure Supplement 1C.

      We thank the reviewer for this excellent suggestion. We now present an analysis of the correlation between HSP104 – HSP12 coalescence and HSP104 transcription for both HS and ES time courses, using single cell data of Figures 7D, 7E and Figure 7- suppl. 1D.  This analysis is presented in new Figure 7F.

      (8) Figure 8: The WT data in Fig 8 seem inconsistent with Fig. 4 (e.g. the interaction frequency for HSP104 and SSA2). Are these fluctuations between experiments, or are they side effects of IAA treatment? The use of ethanol as an IAA solvent vehicle raises concerns. It would be beneficial if the authors could demonstrate that 1.7% ethanol in the control does not induce ethanol stress.

      We acknowledge that there existed an inconsistency in the magnitude of intergenic interaction frequencies reported in the two experiments for HSP104 and SSA2. Some of this might be attributed to the fact that different strains were used, W303-1B in Figure 4 and LRY016 (W303-1B; LEU2::pGPD1osTIR1) in Figure 8. Nonetheless, in each experiment there was a prodigious fold-increase in interaction frequency over the no stress (T= 0 min) control for both HS and ES conditions and moreover, in each experiment the magnitude of this interaction was greater for the 2.5 min HS sample vs. the 10 min ES sample. However, to obviate this concern, we have removed the HSP104-SSA2 analysis from Figure 9 (corresponds to original Fig. 8).

      Regarding the second point, we cannot entirely rule out the concern that the 1.7% ethanol vehicle might impact 3C interaction frequencies. It is unlikely to be significant, however, given that most other pairwise tests evaluated in the two experiments (Figs. 5 and 9) resulted in similar 3C values. In particular, there was no consistent trend towards higher (or lower) interaction frequencies in the IAA experiment of Fig. 9.  

      Reviewer #3:  

      This is an interesting manuscript that builds off of this group's previous work focused on the interface between Hsf1, heat shock protein (HSP) mRNA production, and 3D genome topology. Here the group subjects the yeast Saccharomyces cerevisiae to either heat stress (HS) or ethanol stress (ES) and examines Hsf1 and Pol II chromatin binding, Histone occupancy, Hsf1 condensates, HSP gene coalescence (by 3C and live cell imaging), and HSP mRNA expression (by RT-qPCR and live cell imaging). The manuscript is well written, and the experiments seem well done, and generally rigorous, with orthogonal approaches performed to support conclusions…While identifying a mechanistic basis for the results [presented here] would be a tough task perhaps beyond the scope of this study, it would nevertheless be helpful to place these results in context with a series of other studies…importantly, this work left out PMID: 32015439 (HSF1 phase transition mediates stress adaptation and cell fate decisions) which is particularly relevant considering that it shows that it is human HSF1 condensate resolution rather than simple condensate formation that is associated with HSF1 transcriptional activity - which is similar to the findings here with this particular dose of HS resulting in resolution and high transcriptional activity versus ES resulting in resolution failure and lower activity. 

      We thank the Reviewer for pointing out this oversight. In this revision, we cite Gaglia et al., 2020 and several others reporting HSF1 foci formation in human cells exposed to heat shock. The single cell analysis of Gaglia et al argued that dissolution of large HSF1 foci (aka “nuclear stress bodies”), typically several µm in diameter and localized over satellite III DNA repeats (Jolly et al., 1997, 2002), correlates with HSP gene activation. Importantly, these condensates are postulated to act as reservoirs of HSF1, sequestered away from HSP genes (Gaglia et al., 2020).  In contrast, Zhang et al., 2022 has shown that human HSF1 inducibly forms small condensates (~300 nm) that localize over HSP genes and whose formation directly correlates with HSP gene activation (we discuss the Jolly, Gaglia and Zhang findings on lines 382-394). Likewise, our work shows that in yeast, Hsf1 inducibly forms small, dynamic clusters that colocalize with HSR genes within 2.5 min of exposure to elevated temperature; these dissolve ~20-60 min later (Figure 8 and Figure 8-supp. 1). In concert with Hsf1 condensate formation, HSR gene repositioning and transcription/ Pol II recruitment are likewise evident within 2.5 min. Therefore, in HS cells there exists coordinate induction of condensate formation, Pol II recruitment, transcription and intergenic interactions (for a detailed kinetic analysis of HSR gene interactions, see Figures 5 and 6 of Chowdhary et al, 2017).  This tight temporal relationship is absent in ethanol stressed cells (Figures 3, 4, 5, 6, 7, 8; summarized in Figure 10 and Table 1).

      It is also worth noting that the stresses themselves are quite different - ethanol can be used as a carbon source and so beyond inducing proteotoxic stress, the yeast are presumably adapting to this distinct metabolic state. Basically, it is not clear whether these differences are due to the dose of stress, versus we are looking at an early timepoint as ES initiates a genome-wide chromatin restructuring and gene expression reprogramming that goes beyond a response to proteotoxic stress. This reviewer is not suggesting a barrage of new experiments, but perhaps discussion points to contextualize results.

      We thank the reviewer for this suggestion and in our revised manuscript discuss these issues (lines 414424 and 486-498 [5% vs. 8.5% ethanol]; lines 500-519 [ethanol as a metabolite]).

      Recommendations for the authors:

      Reviewer #1:

      (1) In Figure 1E, the number of foci in control (0 min) cells is very different for the two conditions. Could the authors clarify/check this? Based on the mean numbers at time point 0, the control cells for the ethanol treatment already contain about 10-20 Hsp104 foci, compared to around 5 foci per cell in the control for heat shock.

      We thank the reviewer for raising this point and have repeated the assay with a different strain (DPY1561).  And as shown in Figure 1E, have confirmed that the control samples have similar number of foci.  

      (2) In the same Figure 1E, is the P-value relative to the control or the same time point in the other treatment? A comparison across treatments would be necessary to support the claim in lines 168-171 of the text.

      The statistical analysis (Mann Whitney test) was performed by comparing each stress timepoint to the no stress control. We clarify this in the figure legend. 

      (3) In Figure 1D, the heat-shock condition shows the same cells that are used in the control, but the cells in the ethanol-shock condition are different. This is a bit visually misleading compared to the experimental setup shown in panel 1C. The authors could show the control cells for the ethanol condition as well.

      We thank the reviewer for this excellent suggestion and have added the 0 min image for the ethanol stress conditions.

      (4) In Figure 7B adding images at 60min would help underscore the point that the condensates are stable in ethanol shocked cells.

      We appreciate this suggestion as well and have included a 60 min timepoint for both stresses (Figure 8B). 

      Reviewer #2:

      (1) Line 113: Has it not been established that yeast Hsf1 is constitutively trimeric?

      In yeast, only a fraction of Hsf1 is thought to be constitutively trimeric and it is this species that binds high-affinity HSEs even under non-stressful conditions (Giardina & Lis, 1995; Pincus et al., 2018). We have added this clarification to the text (lines 121-123). 

      (2) Ethanol can precipitate proteins, especially in rich media like YPD. Did the authors notice any protein precipitation? If yes, how do they account for effects due to nutrient loss by precipitation?

      This is an interesting point, but we did not notice any precipitates in either rich or synthetic liquid media containing 8.5% (v/v) ethanol for any of the time points used in the experiments.

      (3) Figure 3: The figure appears incomplete. Can enhancer, promoter, coding region, and 3'UTR be shown consistently for all genes examined?

      In response to this point, we have simplified this figure (new Fig. 4) by uniform presentation of factor occupancy at enhancer, promoter, and coding region loci for all but one of the genes evaluated. For HSP12 (330 bp), we were unable to distinguish promoter from coding region since the average sonicated chromatin fragment obtained using a Bioruptor is ~300 bp. Therefore, we evaluated only the HSP12 coding region for Pol II and histone H3 occupancy. 

      (4) Figure 4: The comparison between heat shock at 2.5 min and ethanol stress at later points is puzzling. Why not use consistent time points as in Fig. 3?

      Time points for the two stresses examined in this figure (new Fig. 5) were selected to represent times of peak intergenic interaction between HSR genes. These times were derived from our earlier analysis of 3C interactions during a heat shock time course (Figs. 5, 6 of Chowdhary et al., 2017) and ES data presented in this study, including Fig. 4 (Pol II ChIP time course) and Fig. 6 (3C time course). Data presented in Figs. 5 and 6 are consistent with the notion that intergenic interactions in cells subjected to ethanol stress are delayed relative to those observed in heat shocked cells, peaking in most cases at ~10 min (vs. ~2.5 min for heat stress (Chowdhary et al., 2017)).  

      (5) Figure 5: Fig. 5B top panel seems to show color inconsistencies for bars at 0 and 120 min. Also, the xaxis on the top left panel seems to have a typo; should it read "10," not "0?"

      We thank the reviewer for the observation. We changed the graphs in new Figure 6 to display the same color for all time points.  We also fixed the typo. 

      (6) Line 302: The evidence presented supports maximal mRNA levels, but the claim of "maximal transcription" requires support from nascent RNA analysis.

      We agree that RT-qPCR measures mRNA abundance, not nascent transcription. We have changed the text to refer to “transcript levels” where pertinent (lines 301-302; 1331-1332).

      (7) How long do loci remain coalescent during heat shock versus ethanol stress? Both 3C and imaging analyses do not differentiate between frequency and duration, which seems essential for understanding interaction dynamics.

      We thank the reviewer for this excellent question. In new Fig. 7D,E (data drawn from Fig. 6 – fig. suppl. 1), HSR gene coalescence detected in single cells over a HS or ES time course is charted.  Interpretable data exist for a small number of cells. Moreover, for both HS and ES states, in certain cells coalescence between the representative Hsf1 target genes HSP104 and HSP12 dissolves and then reappears. With this caveat in mind, the data suggest that HSP104-HSP12 coalescence can last at least 15 min in HS cells and up to 30 min in ES cells. We have not emphasized this point in the manuscript since a far more comprehensive analysis – beyond the scope of this study – is required.

      (8) For longer analyses, how do the authors accommodate potential ethanol concentration changes due to evaporation?

      For liquid cultures, we relied on maintaining minimal changes in the vapor pressure within the experimental vessel; to facilitate that, flasks were tightly covered to minimize evaporation and temperature was kept at 25°C. For most molecular analyses (RT-qPCR, ChIP, 3C), we confined our analysis to the first 60 min. For microscopy, the samples were encased within a concave slide, covered by a coverslip, as illustrated below. In addition, to tightly seal the coverslip on the slide we used petrolatum.  This arrangement minimized evaporation.

      Author response image 2.

      (9) Figure 9: This legend seems to have an incomplete sentence: "(represented using ...)."

      We have substituted an entirely new model in this revised manuscript (new Figure 10) that omits the use of an ellipsis. (We had used it to symbolize a delay in the appearance of HSR gene transcription in ES cells.)

      References  

      Alkaabi, K. M., Yafea, A., & Ashraf, S. S. (2005). Effect of pH on thermal- and chemical-induced denaturation of GFP. Applied Biochemistry and Biotechnology, 126(2), 149–156. https://doi.org/10.1385/ABAB:126:2:149

      Chowdhary, S., Kainth, A. S., & Gross, D. S. (2017). Heat Shock Protein Genes Undergo Dynamic Alteration in Their Three-Dimensional Structure and Genome Organization in Response to Thermal Stress. Molecular and Cellular Biology, 37(24), 1–23. https://doi.org/10.1128/mcb.00292-17

      Feder, Z. A., Ali, A., Singh, A., Krakowiak, J., Zheng, X., Bindokas, V. P., Wolfgeher, D., Kron, S. J., & Pincus, D. (2021). Subcellular localization of the J-protein Sis1 regulates the heat shock response. Journal of Cell Biology, 220(1), e202005165. https://doi.org/10.1083/JCB.202005165

      Gaglia, G., Rashid, R., Yapp, C., Joshi, G. N., Li, C. G., Lindquist, S. L., Sarosiek, K. A., Whitesell, L., Sorger, P. K., & Santagata, S. (2020). HSF1 phase transition mediates stress adaptation and cell fate decisions. Nature Cell Biology, 22(2), 151–158. https://doi.org/10.1038/s41556-019-0458-3

      Giardina, C., & Lis, J. T. (1995). Dynamic protein-DNA architecture of a yeast heat shock promoter. Molecular and Cellular Biology, 15(5), 2737–2744. https://doi.org/10.1128/mcb.15.5.2737

      Jolly, C., Konecny, L., Grady, D. L., Kutskova, Y. A., Cotto, J. J., Morimoto, R. I., & Vourc’h, C. (2002). In vivo binding of active heat shock transcription factor 1 to human chromosome 9 heterochromatin during stress. Journal of Cell Biology, 156(5), 775–781. https://doi.org/10.1083/jcb.200109018

      Jolly, C., Morimoto, R. I., Robert-Nicoud, M., & Vourc’h, C. (1997). HSF1 transcription factor concentrates in nuclear foci during heat shock: Relationship with transcription sites. Journal of Cell Science, 110(23), 2935–2941. https://doi.org/10.1242/jcs.110.23.2935

      Pincus, D., Anandhakumar, J., Thiru, P., Guertin, M. J., Erkine, A. M., & Gross, D. S. (2018). Genetic and epigenetic determinants establish a continuum of Hsf1 occupancy and activity across the yeast genome. Molecular Biology of the Cell, 29(26), 3168–3182. https://doi.org/10.1091/mbc.E18-060353

      Zhang, H., Shao, S., Zeng, Y., Wang, X., Qin, Y., Ren, Q., Xiang, S., Wang, Y., Xiao, J., & Sun, Y. (2022). Reversible phase separation of HSF1 is required for an acute transcriptional response during heat shock. Nature Cell Biology, 24(3), 340–352. https://doi.org/10.1038/s41556-022-00846-7

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      This study of mixed glutamate/GABA transmission from axons of the supramammillary nucleus to dentate gyrus seeks to sort out whether the two transmitters are released from the same or different synaptic vesicles. This conundrum has been examined in other dual-transmission cases and even in this particular pathway, there are different views. The authors use a variety of electrophysiological and immunohistochemical methods to reach the surprising (to me) conclusion that glutamate and GABA- filled vesicles are distinct yet released from the same nerve terminals. The strength of the conclusion rests on the abundance of data (approaches) rather than the decisiveness of any one approach, and I came away believing that the boutons may indeed produce and release distinct types of vesicles, but have reservations. 

      We thank the reviewer for his/her evaluation of our work. At present, several studies reported that a variety of combinations of two transmitters are co-released from different synaptic vesicles in the central nervous system. In this regard, we think the cotransmission of glutamate/GABA from different synaptic vesicles is not surprising. To better explain to the reader how much we know about co-release of dual transmitters in the brain, we have now added new sentences describing segregated co-release of two neurotransmitters in other synapses in the Introduction (line 63-80).

      Accepting the conclusion, one is now left with another conundrum, not addressed even in the discussion: how can a single bouton sort out VGLUTs and VIAATs to different vesicles, position them in distinct locations with nm precision, and recycle them without mixing? And why do it this way instead of with single vesicles having mixed chemical content? For example, could a quantitative argument be made that separate vesicles allow for higher transmitter concentrations? I feel the paper needs to address these problems with some coherent discussion, at minimum. 

      Although these questions are very important and interesting to address, little is known about molecular mechanisms how VGluT2 and VIAAT are sorted to different vesicles and each synaptic vesicle is segregated. That is why we had not mentioned the sorting mechanisms in the original manuscript. Nevertheless, in response to the reviewer’s suggestion, we have now added new sentences describing possible mechanisms for the sorting and segregation of VGluT2 and VIAAT in the Discussion (line 439-462).

      As for the question regarding why glutamate and GABA are released from different synaptic vesicles, we mentioned the functional roles of separate release of two transmitters over release from single vesicles several times in the Introduction (line 94100), Results (line 300-302), and Discussion (line 406-408, 521-522). Although it seems to be an interesting point to think about transmitter concentrations in the vesicles, we think this issue is beyond the scope of the present study. Given that manipulation of vesicular transmitter contents is technically possible (Hori and Takamori, 2021), this issue awaits further investigation.

      Major concerns: 

      (1) Throughout the paper, the authors use repetitive optogenetic stimulation to activate SuM fibers and co-release glutamate and GABA. There are several issues here: first, can the authors definitively assure the reader that all the short-term plasticity is presynaptic and not due to ChR2 desensitization? This has not been addressed. Second, can the authors also say that all the activated fibers release both transmitters? If for example 20% of the fibers retained a onetransmitter identity and had distinct physiological properties, could that account for some of the physiological findings? 

      Thank you for raising this important point. To examine whether repetitive light illumination induces ChR2 desensitization, the fiber volley was extracellularly recorded. We found that paired-pulse or 10 stimuli at 5, 10, and 20 Hz reliably evoked similar amplitudes of fiber volley during light stimulation. These results clearly indicate that repetitive light stimulation can reliably activate ChR2 and elicit action potentials in the SuM axons. These new findings are now included in Figure 1-figure supplement 2 and Figure 5-figure supplement 2. We also previously demonstrated that by direct patch-clamp recordings from ChR2-expressing hippocampal mossy fiber terminals, 125 times light stimulation at 25 Hz reliably elicited action potentials (Fig. S1: Fukaya et al., 2023). Therefore, we believe that if expression level of ChR2 is high, activation of ChR2 induces action potentials in response to repetitive light stimulation and mediates synaptic transmission with high efficiency.

      We found that most of the SuM terminals (95%) have both VGluT2 and VIAAT (Figure 1E). This anatomical evidence strongly indicates that most of the SuM terminals have the ability to release both glutamate and GABA, and the SuM fibers having one transmitter identity should be minor populations.

      (2) PPR differences in Figures 1F-I are statistically significant but still quite small. You could say they are more similar than different in fact, and residual differences are accounted for by secondary factors like differential receptor saturation. 

      In this experiment, the light intensity was adjusted to yield less than 80% of the maximum response as described in the method section of original and revised manuscript, minimizing the possibility of receptor saturation. We also excluded the possibility that PPR differences could be attributed to differential receptor saturation and desensitization by using a low-affinity AMPA receptor antagonist and a low-affinity GABAA receptor antagonist (Figure 5-figure supplement 3). These results indicate that PPR differences are mediated by the presynaptic origin.

      (3) The logic of the GPCR experiments needs a better setup. I could imagine different fibers released different transmitters and had different numbers of mGluRs, so that one would get different modulations. On the assumption that all the release is from a single population of boutons, then either the mGluRs are differentially segregated within the bouton, or the vesicles have differential responsiveness to the same modulatory signal (presumably a reduced Ca current). This is not developed in the paper. 

      Based on our minimal stimulation results and anatomical analysis, we believe that many SuM terminals contain both glutamate and GABA. Therefore, both transmissions are able to be modulated by mGluRs and GABAB receptors within the same terminals. As the reviewer pointed out, differential responsiveness of glutamate-containing and GABA-containing vesicles to the GPCR signal could be one of the molecular mechanisms for differential effects of GPCRs on EPSCs and IPSCs. In addition, the spatial coupling between GPCRs and active zones for glutamate and GABA in the same SuM terminals may be different, which may give rise to differential modulation of glutamate and GABA release. These possible mechanisms are now described in the Discussion (line 469-476).

      (4) The biphasic events of Figures 3 and S3: I find these (unaveraged) events a bit ambiguous. Another way to look at them is that they are not biphasic per se but rather are not categorizable. Moreover, these events are really tiny, perhaps generated by only a few receptors whose open probability is variable, thus introducing noise into the small currents. 

      We agree with the reviewer that some events are tiny and some small currents could be masked by background noise. We understand that detecting the biphasic events by minimal stimulation has technical limitations. Because we automatically detected biphasic events, which were defined as an EPSC-IPSC sequence, only if an outward peak current following an inward current appeared within 20 ms of light illumination as described in the method section, we cannot exclude the possibility that the biphasic events we detected might include false biphasic responses. To compensate these technical issues, we also performed strontium-induced asynchronous release as another approach and found similar results as minimal stimulation experiments (Figures 3E and 3F). Furthermore, we confirmed that the amplitudes and kinetics of minimal light stimulation-evoked EPSCs or IPSCs were not altered by blockade of their counterpart currents (Figure 3-figure supplement 2). Even if false biphasic responses were accidentally included in the analysis, eventually biphasic events are a minor population and we successfully detected discernible independent EPSCs and IPSCs, which were the major population of uniquantal release-mediated synaptic responses. Thus, multiple pieces of evidence support distinct release of glutamate and GABA from SuM terminals.

      (5) Figure 4 indicates that the immunohistochemical analysis is done on SuM terminals, but I do not see how the authors know that these terminals come from SuM vs other inputs that converge in DG. 

      We thank the reviewer for raising an important point. As shown in Figure 4A, B, almost all VGluT2-positive terminals in the GC layer co-expressed with VIAAT. We are aware that VTA neurons reportedly project to the GC layer of the DG and co-release glutamate and GABA (Ntamati and Luscher, 2016). Contrary to this report, our retrograde tracing analysis did not reveal direct projections from the VTA to the DG. This new data is now included in Figure 4-figure supplement 1. We also added pre-embedding immunogold EM analysis, in which SuM terminals were virally labeled with eYFP, confirming that they form both asymmetric and symmetric synapses (revised Figure 4F). Together with these new data, our results clearly demonstrate that SuM terminals in the GC layer form both asymmetric and symmetric synapses. While our results strongly suggest that VGluT2positive terminals and SuM terminals in the GC layer are nearly identical, we cannot fully exclude the possibility that other inputs originating from unidentified brain regions may co-express VGluT2 and VIAAT in the GC layer. Therefore, in Figure 4 of the revised manuscript, we described “VGluT2-positive terminals” instead of “SuM terminals”.

      (6) Figure 4E also shows many GluN1 terminals not associated with anything, not even Vglut, and the apparent numbers do not mesh with the statistics. Why? 

      In triple immunofluorescence for VGluT2, VIAAT, and GluN1, free GluN1 puncta were predominantly observed in the molecular layer. Given that VGluT2-positive terminals are sparse in the molecular layer, these GluN1 puncta are primarily associated with VGluT1, the dominant subtype. In this study, we focused the analysis of GluN1 puncta specifically on the GC layer, excluding the molecular layer. To avoid miscommunication, we changed the original Figure 4E to the new Figure 4G, which focuses on the GC layer and aligns with the quantitative analysis. Additionally, we used ultrathin sections (100-nm-thick) to enhance spatial resolution, which limits the detection of co-localization events within this confined spatial range, as noted in the Discussion (line 485-488).

      (7) Do the conclusions based on the fluorescence immuno mesh with the apparent dimensions of the EM active zones and the apparent intermixing of labeled vesicles in immuno EM? 

      To further support our immunofluorescence results, we performed EM study and found that a single SuM terminal formed both asymmetric and symmetric synapses on a GC soma (revised Figures 4E and 4F). These new data and our immunofluorescence results clearly indicate that a single SuM terminal forms both glutamatergic and GABAergic synapses on a GC and co-release glutamate and GABA. 

      As the reviewer pointed out, our immuno EM shows that VGluT2 and VIAAT labeled vesicles appear to intermix in asymmetric and symmetric synapses. Accordingly, in the revised manuscript, Figure 7 has been modified to show the intermixing of glutamate and GABA-containing vesicles in the SuM terminal. It should be noted that because of low labeling efficiency, our immuno-EM images don’t represent the whole picture of synaptic vesicles for glutamate and GABA. There could be biased distribution of vesicles close to their release site (more VGluT2-containing vesicles close to asymmetric synapses and more VIAAT-containing vesicles close to symmetric synapses) as reported previously (Root et al., 2018). Additionally, our results could be explained by other mechanisms: co-release of glutamate and GABA from the same vesicles, with one transmitter undetected due to the absence of its postsynaptic receptor. This possibility is now mentioned in the Discussion (line 512-520). More detailed vesicle configuration in a single SuM terminal will have to be investigated in future studies.

      (8) Figure 6 is not so interesting to me and could be removed. It seems to test the obvious: EPSPs promote firing and IPSPs oppose it. 

      We believe these results are necessary for the following two reasons. First, we showed that glutamate/GABA co-transmission balance is dynamically changed in a frequency-dependent manner (Figure 5). In terms of physiological significance, it is important to demonstrate how these frequency-dependent dynamic changes affect GC firing. Therefore, we believe that figure 6, which shows how SuM inputs modulate GC firing by repetitive SuM stimulation, is necessary for this paper. Second, we previously reported the excitatory effects of the SuM inputs on GC firing, suggesting the important roles of glutamatergic transmission of the SuM inputs in synaptic plasticity (Hashimotodani et al., 2018; Hirai et al., 2022; Tabuchi et al., 2022). In contrast, how GABAergic cotransmission contributes to SuM-GC synaptic plasticity and DG information processing was not well understood. Our results in figure 6, which demonstrate the inhibitory effects of GABAergic co-transmission on GC firing by high frequency repetitive SuM input activity, clearly show the contribution of GABAergic co-transmission to short-term plasticity at SuM-GC synapses. For these reasons, we would like to keep Figure 6. We hope that our explanations convince the reviewer. 

      Reviewer #2:

      Summary:

      In this study, the authors investigated the release properties of glutamate/GABA co-transmission at the supramammillary nucleus (SuM)-granule cell (GC) synapses using in vitro electrophysiology and anatomical approaches at the light and electron microscopy level. They found that SuM to dentate granule cell synapses, which co-release glutamate and GABA, exhibit distinct differences in paired-pulse ratio, Ca2+ sensitivity, presynaptic receptor modulation, and Ca2+ channel-vesicle coupling configuration for each neurotransmitter. The study shows that glutamate/GABA co-release produces independent glutamatergic and GABAergic synaptic responses, with postsynaptic targets segregated. They show that most SuM boutons form distinct glutamatergic and GABAergic synapses in close proximity, characterized by GluN1 and GABAAα1 receptor labeling, respectively. Furthermore, they demonstrate that glutamate/GABA co-transmission exhibits distinct short-term plasticity, with glutamate showing frequencydependent depression and GABA showing frequency-independent stable depression. 

      Their findings suggest that these distinct modes of glutamate/GABA co-release by SuM terminals serve as frequency-dependent filters of SuM inputs. 

      Strengths:

      The conclusions of this paper are mostly well supported by the data. 

      We thank the reviewer for their positive and constructive comments on our manuscript.

      Weaknesses: 

      Some aspects of Supplementary Figure 1A and the table need clarification. Specifically, the claim that the authors have stimulated an axon fiber rather than axon terminals is not convincingly supported by the diagram of the experimental setup. Additionally, the antibody listed in the primary antibodies section recognizes the gamma2 subunit of the GABAA receptor, not the alpha1 subunit mentioned in the results and Figure 4. 

      We have now answered these questions in recommendations section below.

      Reviewer #3:

      Summary: 

      In this manuscript, Hirai et al investigated the release properties of glutamate/GABA cotransmission at SuM-GC synapses and reported that glutamate/GABA co-transmission exhibits distinct short-term plasticity with segregated postsynaptic targets. Using optogenetics, whole-cell patch-clamp recordings, and immunohistochemistry, the authors reveal distinct transmission modes of glutamate/GABA co-release as frequency-dependent filters of incoming SuM inputs. 

      Strengths: 

      Overall, this study is well-designed and executed; conclusions are supported by the results. This study addressed a long-standing question of whether GABA and glutamate are packaged in the same vesicles and co-released in response to the same stimuli in the SuM-GC synapses (Pedersen et al., 2017; Hashimotodani et al., 2018; Billwiller et al., 2020; Chen et al., 2020; Li et al., 2020; Ajibola et al., 2021). Knowledge gained from this study advances our understanding of neurotransmitter co-release mechanisms and their functional roles in the hippocampal circuits. 

      Weaknesses:

      No major issues are noted. Some minor issues related to data presentation and experimental details are listed below. 

      We appreciate the reviewer’s positive view of our study. We responded in more detail in recommendations section below.

      Recommendations for the authors:

      Reviewer #1:

      (1) The blue color for VIAAT in panel 1C is extremely hard to see. 

      Thank you for pointing out. We have changed to the cyan color for VIAAT in Figure 1C and D in the revised manuscript.

      (2) Line 329 "perforant" not "perfomant".  

      We appreciate the reviewer’s careful attention. In the revised manuscript, we corrected this misword.

      Reviewer #2:

      To convincingly demonstrate that the authors stimulated SuM axon fiber instead of SuM terminals (Supplementary Figures 1A), they should provide an image showing the distribution of SuMlabeled fibers and axon terminals reaching the dentate gyrus (DG) and the trace of the optic fiber, rather than providing a diagram of the experimental setup. 

      We appreciate the reviewer’s suggestion. We have now provided a new experimental setup image (Figure 1-figure supplement 1A) showing a single GC, the distribution of SuM fibers in the GC layer, and the illumination area at each location. As SuM inputs make synapses onto the GC soma and dendrite close to the GC cell body, SuM-GC synapses in the recording GCs exist in a very limited area. This characteristic synaptic localization allowed us to control the illumination area without applying light to the SuM terminals in the recording GCs. Delayed onsets of EPSCs/IPSCs by over-axon stimulation (Figure 1-figure supplement 1C, D) also support that SuM terminals in the recording GCs were out of illumination area.

      Additionally, the authors should clarify the discrepancy between the antibody mentioned in the list of primary antibodies, which recognizes the gamma2 subunit of the GABAA receptor, and the alpha1 subunit of the GABAA receptor mentioned in the results and Figure 4. 

      We apologize for this mistake. As described in the main text and figure, we used the antibody for a1 subunit of the GABAA receptor. Table S1 has been corrected in the revised version of the paper.

      Reviewer #3:

      (1) In Figure 1, the authors used two [Ca2+]o concentrations to study the EPSC and IPSC amplitudes. How does the Ca2+ concentration affect the PPR in the EPSC and IPSC, respectively? 

      Given that lowering the extracellular Ca2+ concentration reduces the release probability, it is expected that 1 mM extracellular Ca2+ concentration increases PPR compared to 2.5 mM. Actually, we observed that lowering the extracellular Ca2+ concentration increased the synaptic responses from 2nd to 10th (both EPSC and IPSC) by train stimulation (Figure 5).

      (2) In Figure 2D, does baclofen also have a dose-dependent effect on the inhibition of the EPSC and IPSC similar to the DCG-IV in Figure 2C? 

      Thank you for your question. Because we aimed to demonstrate the differential inhibitory effects of baclofen at a certain concentration on glutamatergic and GABAergic co-transmission, we did not go into detail regarding a dose-dependent effect. In response to the reviewer’s comment, we performed the effects of higher concentration of baclofen on EPSCs and IPSCs. As shown in the figure below, 50 µM baclofen inhibited EPSCs and IPSCs to the similar extent. Therefore, by comparing inhibitory effect of two different concentrations of baclofen (5 and 50 µM), we believe that baclofen also has a dose-dependent inhibitory effect on both EPSCs and IPSCs similar to the DCGIV.

      Author response image 1.

      (3) In Figure 2E, statistical labels, such as "*" or "n.s." (not significant), should be provided on the plots to facilitate the reading of figures. 

      In response to the reviewer’s comment, we have provided statistical labels in the Figure 2E.

      (4) In Figure 3A, the latency of the evoked EPSC for the lower light stimulation groups seems to be much slower than the one shown on the left or other figures in the paper, such as Figure 1F.

      Please double-check if the blue light stimulation label is placed in the right location. 

      Corrected, thanks.

      (5) The use of minimal light stimulation in optogenetic experiments is not appropriately justified or described. More detailed information should be provided, such as whether the optogenetic stimulation is performed on the axon or the terminals of the SuM. 

      We appreciate the reviewer’s suggestion. To effectively detect stochastic synaptic responses, the light stimulation was applied on the terminals of the SuM. We have now stated this information (line 212). We also further described the justification of use of minimal light stimulation in the revised manuscript (line 207-209). 

      References

      Fukaya R, Hirai H, Sakamoto H, Hashimotodani Y, Hirose K, Sakaba T (2023) Increased vesicle fusion competence underlies long-term potentiation at hippocampal mossy fiber synapses. Sci Adv 9:eadd3616.

      Hashimotodani Y, Karube F, Yanagawa Y, Fujiyama F, Kano M (2018) Supramammillary Nucleus Afferents to the Dentate Gyrus Co-release Glutamate and GABA and Potentiate Granule Cell Output. Cell Rep 25:2704-2715 e2704.

      Hirai H, Sakaba T, Hashimotodani Y (2022) Subcortical glutamatergic inputs exhibit a Hebbian form of long-term potentiation in the dentate gyrus. Cell Rep 41:111871.

      Hori T, Takamori S (2021) Physiological Perspectives on Molecular Mechanisms and Regulation of Vesicular Glutamate Transport: Lessons From Calyx of Held Synapses. Front Cell Neurosci 15:811892.

      Ntamati NR, Luscher C (2016) VTA Projection Neurons Releasing GABA and Glutamate in the Dentate Gyrus. eNeuro 3.

      Root DH, Zhang S, Barker DJ, Miranda-Barrientos J, Liu B, Wang HL, Morales M (2018) Selective Brain Distribution and Distinctive Synaptic Architecture of Dual Glutamatergic-GABAergic Neurons. Cell Rep 23:3465-3479.

      Tabuchi E, Sakaba T, Hashimotodani Y (2022) Excitatory selective LTP of supra-mammillary glutamatergic/GABAergic co-transmission potentiates dentate granule cell firing. Proc Natl Acad Sci U S A 119:e2119636119.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Goetz et al. takes a new perspective on sensory information processing in cells. In contrast to previous studies, which have used population data to build a response distribution and which estimate sensory information at about 1 bit, this work defines sensory information at the single cell level. To do so, the authors take two approaches. First, they estimate single cells' response distributions to various input levels from time-series data directly. Second, they infer these single-cell response distributions from the population data by assuming a biochemical model and extracting the cells' parameters with a maximum-entropy approach. In either case, they find, for two experimental examples, that single-cell sensory information is much higher than 1 bit, and that the reduction to 1 bit at the population level is due to the fact that cells' response functions are so different from each other. Finally, the authors identify examples of measurable cell properties that do or do not correlate with single-cell sensory information.

      The work brings an important and distinct new insight to a research direction that generated strong interest about a decade ago: measuring sensory information in cells and understanding why it is so low. The manuscript is clear, the results are compelling, and the conclusions are well supported by the findings. Several contributions should be of interest to the quantitative biology community (e.g., the demonstration that single cells' sensory information is considerably larger than previously implied, and the approach of inferring single-cell data from population data with the help of a model and a maximum-entropy assumption).

      We thank the reviewer for the excellent summary of our research.

      Reviewer #2 (Public Review):

      In this paper the authors present an existing information theoretic framework to assess the ability of single cells to encode external signals sensed through membrane receptors.

      The main point is to distinguish actual noise in the signaling pathway from cell-cell variability, which could be due to differences in their phenotypic state, and to formalize this difference using information theory.

      After correcting for this cellular variability, the authors find that cells may encode more information than one would estimate from ignoring it, which is expected. The authors show this using simple models of different complexities, and also by analyzing an imaging dataset of the IGF/FoxO pathway.

      The implications of the work are limited because the analysed data is not rich enough to draw clear conclusions. Specifically,

      • the authors do not distinguish what could be methodological noise inherent to microscopy techniques (segmentation etc), and actual intrinsic cell state. It's not clear that cell-cell variability in the analyzed dataset is not just a constant offset or normalization factor. Other authors (e.g. Gregor et al Cell 130, 153-164) have re-centered and re-normalized their data before further analysis, which is more or less equivalent to the idea of the conditional information in the sense that it aims to correct for this experimental noise.

      We thank the reviewer for the comment. However, we do not believe our analysis is a consequence of normalization artifacts. Prior to modeling the single cell data, we removed well-dependent background fluorescence. This should take care of technical variation related to overall offsets in the data. We agree with the reviewer that background subtraction may not fully account for technical variability. For example, some of the cell-to-cell variability may potentially be ascribed to issues such as incorrect segmentation. Unfortunately, however, attempting to remove this technical variability through cell-specific normalization as suggested by the reviewer1 will diminish to a very large extent the true biological effects related to extensivity (cell size, total protein abundance). We note that these effects are a direct function of cell state-variables (see for example Cohen-Saidon et al.2 who use cell-state specific normalization to improve signaling fidelity). Therefore, an increase in mutual information after normalization does not only reflect removal of technical noise but also accounts for effect of cell state variables.

      Nonetheless, as the reviewer suggested, we performed a cell-specific normalization wherein the mean nuclear FoxO levels in each cell (in the absence of IGF) were normalized to one. Then, for each ligand concentration, we collated FoxO response across all cells and computed the channel capacity corresponding to cell-state agnostic mutual information ICSA. As expected, ICSA increases from ∼0.9 bits to ∼1.3 bits when cell-specific normalization was performed (Author response image 1). However, this value is significantly lower than the average ∼1.95 of cell-state specific mutual information ⟨ICee⟩. Finally, we note that the cell specific normalization does not change the calculations of channel capacity at the single cell level as these calculations do not depend on linear transformations of the data (centering and normalization). Therefore, we do not think that our analysis of experimental data suffers from artifacts related to microscopy.

      Author response image 1.

      Author response image 1. Left: nuclear FoxO response averaged over all cells in the population across different ligand concentration. Right: nuclear FoxO response was first normalized at the single cell level and then averaged over all cells in the population across different ligand concentrations.

      • in the experiment, each condition is shown only once and sequentially. This means that the reproducibility of the response upon repeated exposures in a single cell was not tested, casting doubt on the estimate of the response fidelity (estimated as the variance over time in a single response).

      The reviewer raises an excellent question about persistence of cell states. To verify that cell states are indeed conserved at the time scale of the experiment, we reanalyzed data generated by Gross et al.3 wherein cells were perturbed with IGF (37.5 pM), followed by a washout which allowed the cells to reach pre-stimulation nuclear FoxO levels, followed by a re-perturbation with the same amount of IGF. Nuclear FoxO response was measured at the single cell level after 90 minutes with IGF exposure both these times. Since the response x to the same input u was measured twice in the same cell (x1 and x2), we could evaluate the intrinsic variability in response at the single cell level. We then compared this intrinsic variability to the extrinsic cell-state dependent variability in the population.

      To do so, we computed for each cell δ=x1-x2 the difference between the two responses. reviewer Figure 2 show the histogram p(δ) as computed from the data (pink) and the same computed from the model that was trained on the single cell data (blue). We also computed p(δ0) which represented the difference between responses of two different cells both from the data and from the model.

      As we see in Author response image 2, the distribution p(δ) is significantly narrower than p(δ0) suggesting that intracellular variability is significantly smaller than across-population variability and that cells’ response to the same stimuli are quite conserved, especially when compared to responses in randomly picked pairs of cells. This shows that cell states and the corresponding response to extracellular perturbations are conserved, at least at the time scale of the experiment. Therefore, our estimates of cell-to-cell variability signaling fidelity are stable and reliable. We have now incorporated this discussion in the manuscript (lines 275-281).

      Author response image 2.

      Author response image 2. Left: Cells were treated with 37.5 pM of IGF for 90 minutes, washed out for 120 minutes and again treated with 37.5 pM of IGF. Nuclear FoxO was measured during the treatment and the washout. The distributions on the left show the difference in FoxO levels in single cells after the two 90 minutes IGF stimulations (pink: data, blue: model). Right: Distribution of difference in FoxO levels in two randomly picked cells after 90 minutes of exposure to 37.5 pM IGF.

      • another dataset on the EGF/EGFR pathway is analyzed, but no conclusion can be drawn from it because single-cell information cannot be directly estimated from it. The authors instead use a maximum-entropy Ansatz, which cannot be validated for lack of data.

      We thank the reviewer for this comment. We agree with the reviewer that we have not verified our predictions for the EGF/EGFR pathway. That study was meant to show the potential generality of our analysis. We look forward to validating our predictions for the EGF/EGFR pathway in future studies.

      Reviewer #3 (Public Review):

      Goetz, Akl and Dixit investigated the heterogeneity in the fidelity of sensing the environment by individual cells in a population using computational modeling and analysis of experimental data for two important and well-studied mammalian signaling pathways: (insulin-like growth factor) IGF/FoxO and (epidermal growth factor) EFG/EFGR mammalian pathways. They quantified this heterogeneity using the conditional mutual information between the input (eg. level of IGF) and output (eg. level of FoxO in the nucleus), conditioned on the "state" variables which characterize the signaling pathway (such as abundances of key proteins, reaction rates, etc.) First, using a toy stochastic model of a receptor-ligand system - which constitutes the first step of both signaling pathways - they constructed the population average of the mutual information conditioned on the number of receptors and maximized over the input distribution and showed that it is always greater than or equal to the usual or "cell state agnostic" channel capacity. They constructed the probability distribution of cell state dependent mutual information for the two pathways, demonstrating agreement with experimental data in the case of the IGF/FoxO pathway using previously published data. Finally, for the IGF/FoxO pathway, they found the joint distribution of the cell state dependent mutual information and two experimentally accessible state variables: the response range of FoxO and total nuclear FoxO level prior to IGF stimulation. In both cases, the data approximately follow the contour lines of the joint distribution. Interestingly, high nuclear FoxO levels, and therefore lower associated noise in the number of output readout molecules, is not correlated with higher cell state dependent mutual information, as one might expect. This paper contributes to the vibrant body of work on information theoretic characterization of biochemical signaling pathways, using the distribution of cell state dependent mutual information as a metric to highlight the importance of heterogeneity in cell populations. The authors suggest that this metric can be used to infer "bottlenecks" in information transfer in signaling networks, where certain cell state variables have a lower joint distribution with the cell state dependent mutual information.

      The utility of a metric based on the conditional mutual information to quantify fidelity of sensing and its heterogeneity (distribution) in a cell population is supported in the comparison with data. Some aspects of the analysis and claims in the main body of the paper and SI need to be clarified and extended.

      1. The authors use their previously published (Ref. 32) maximum-entropy based method to extract the probability distribution of cell state variables, which is needed to construct their main result, namely p_CeeMI (I). The salient features of their method, and how it compares with other similar methods of parameter inference should be summarized in the section with this title. In SI 3.3, the Lagrangian, L, and Rm should be defined.

      We thank the reviewer for the comment and apologize for the omission. We have now rewritten the manuscript to include references to previous reviews of works that infer probability distributions4 of cell state variables (lines 156-168). Notably, as we argued in our previous work5, no current method can efficiently estimate the joint distribution over parameters that is consistent with measured single cell data and models of signaling networks. Therefore, we could not use multiple approaches to infer parameter distributions. We have now expanded our discussion of the method in the supplementary information sections.

      1. Throughout the text, the authors refer to "low" and "high" values of the channel capacity. For example, a value of 1-1.5 bits is claimed to be "low". The authors need to clarify the context in which this value is low: In some physically realistic cases, the signaling network may need to simply distinguish between the present or absence of a ligand, in which case this value would not be low.

      We agree with the reviewer that small values of channel capacities might be sufficient for cells to carry out some tasks, in which case a low channel capacity does not necessarily indicate a network not performing its task. Indeed, how much information is needed for a specific task is a related but distinct question from how much information is provided though a signaling network. Both questions are essential to understand a cell's signaling behavior, with the former being far less easy to answer in a way which is generalizable. In contrast, the latter can be quantitatively answered using the analysis presented in our manuscript.

      1. Related to (2), the authors should comment on why in Fig. 3A, I_Cee=3. Importantly, where does the fact that the network is able to distinguish between 23 ligand levels come from? Is this related to the choice (and binning) of the input ligand distribution (described in the SI)?

      We thank the reviewer for the comment. The network can distinguish between all inputs used in the in silico experiment precisely because the noise at the cellular level is small enough that there is negligible overlap between single cell response distributions. Indeed, the mutual information will not increase with the number of equally spaced inputs in a sub-linear manner, especially when the input number is very high.

      1. The authors should justify the choice of the gamma distribution in a number of cases (eg. distribution of ligand, distribution cell state parameters, such as number of receptors, receptor degradation rate, etc.).

      We thank the reviewer for the comment. We note that previous works in protein abundances and gene expression levels (e.g. see6) have reported distributions with positive skews that can be fit well with gamma distributions or log-normal distributions. Moreover, many stochastic models of protein abundance levels and signaling networks are also known to result in abundances that are distributed according to a negative binomial distribution, the discrete counterpart of gamma distribution. Therefore, we chose Gamma distributions in our study. We have now clarified this point in the Supplementary Information. At the same time, gamma distribution only serves as a regularization for the finite data and in principle, our analysis and conclusion do not depend on choice of gamma distribution for abundances of proteins, ligands, and cell parameters.

      1. Referring to SI Section 2, it is stated that the probability of the response (receptor binding occupancy) conditioned on the input ligand concentration and number of receptors is a Poisson distribution. Indeed this is nicely demonstrated in Fig. S2. Therefore it is the coefficient of variation (std/mean) that decreases with increasing R0, not the noise (which is strictly the standard deviation) as stated in the paper.

      We thank the reviewer of the comment. We have now corrected our text.

      1. In addition to explicitly stating what the input (IGF level) and the output (nuclear GFP-tagged FoxO level) are, it would be helpful if it is also stated what is the vector of state variables, theta, corresponding to the schematic diagram in Fig. 2C.

      We thank the reviewer of the comment. We have now corrected our text in the supplementary material as well as the main text (Figure 2 caption).

      1. Related to Fig. 2C, the statement in the caption: "Phosphorylated Akt leads to phosphorylation of FoxO which effectively shuttles it out of the nucleus." needs clarification: From the figure, it appears that pFoxO does not cross the nuclear membrane, in which case it would be less confusing to say that phosphorylation prevents reentry of FoxO into the nucleus.

      We thank the reviewer of the comment. We have now corrected our text (Figure 2 caption).

      1. The explanations for Fig. 2D, E and insets are sparse and therefore not clear. The authors should expand on what is meant by model and experimental I(theta). What is CC input dose? Also in Fig. 2E, the overlap between the blue and pink histograms means that the value of the blue histogram for the final bin - and therefore agreement or lack thereof with the experimental result - is not visible. Also, the significance of the values 3.25 bits and 3 bits in these plots should be discussed in connection with the input distributions.

      We thank the reviewer of the comment. We have now corrected our text (Figure 2 caption and lines 249-251).

      1. While the joint distribution of the cell state dependent mutual information and various biochemical parameters is given in Fig. S7, there is no explanation of what these results mean, either in the SI or main text. Related to this, while a central claim of the work is that establishing this joint distribution will allow determination of cell state variables that differentiate between high and low fidelity sensing, this claim would be stronger with more discussion of Figs. 3 and S7. The related central claim that cell state dependent mutual information leads to higher fidelity sensing at the population level would be made stronger if it can be demonstrated that in the limit of rapidly varying cell state variables, the I_CSA is retrieved.

      We thank the reviewer for this excellent comment. We have now added more discussion about interpreting the correlation between cell state variables and cell-state specific mutual information (lines 294-306). We also appreciate the suggestion about a toy model calculation to show that dynamics of cell state variables affects cell state specific mutual information. We have now performed a simple calculation to show how dynamics of cell state variables affects cells’ sensing ability (lines 325-363). Specifically, we constructed a model of a receptor binding to the ligand wherein the receptor levels themselves changed over time through a slow process of gene expression (Author response image 3, main text Figure 4). In this model, the timescales of fluctuations of ligand-free receptors on the cell surface can be tuned by speeding up/slowing down the degradation rate of the corresponding mRNA while keeping the total amount of steady state mRNA constant. As shown in Author response image 3, the dependence of cell-specific mutual information on cell state variable diminishes when the time scale of change of cell state variables is fast.

      Author response image 3.

      Author response image 3. Cell state dynamics governs cell state conditioned mutual information. A. In a simple stochastic model, receptor mRNA is produced at a constant rate from the DNA and the translated into ligand-free receptors. The number of ligand-bound receptors after a short exposure to ligands is considered the output. B. A schematic showing dynamics of receptor numbers when mRNA dynamics are slower compared to signaling time scales. C. Conditioning on receptor numbers leads to differing abilities in sensing the environment when the time scale of mRNA dynamics τ is slow. In contrast, when the mRNA dynamics are fast (large τ-1), conditioning on cell state variables does not lead to difference in sensing abilities.

      Reviewer #1 (Recommendations For The Authors):

      My major concerns are mainly conceptual, as described below. With proper attention to these concerns, I feel that this manuscript could be a good candidate for the eLife community.

      Major concerns:

      1. The manuscript convincingly demonstrates that cells good sensors after all, and that heterogeneity makes their input-output functions different from each other. This raises the question of what happens downstream of sensing. For single-celled organisms, where it may be natural to define behavioral consequences at the single-cell level, it may very well be relevant that single-cell information is high, even if cells respond differently to the environment. But for cells in multicellular organisms, like those studied here, I imagine that most behavioral consequences of sensing occur at the multicellular level. Thus, many cells' responses are combined into a larger response. Because their responses are different, their high-information individual responses may combine into a low-information collective response. In fact, one could argue that a decent indicator of the fidelity of this collective response is indeed the population-level information measure estimated in previous works. Thus, a fundamental question that the authors must address is: what is the ultimate utility of reliable, but heterogeneous, responses for a multicellular system? This question has an important bearing for the relevance of their findings.

      We thank the reviewer for this thought-provoking comment. We agree that the fidelity with which cells sense their environment, especially those in multicellular organisms, may not always need to be very high. We speculate that when the biological function of a collection of cells can be expressed as an average over the response of individual cells; high-information but heterogeneous cells can be considered equivalent to low-information homogeneous cells. An example of such a function is population differentiation to maintain relative proportions of different cell types in a tissue or producing a certain amount of extracellular enzyme.

      In contrast, we believe that when the biological function involves collective action, spatial patterning, or temporal memory, the difference between reliable but heterogeneous population and unreliable homogeneous population will become significant. We plan to explore this topic in future studies.

      1. The authors demonstrate that the agreement is good between their inference approach and the direct estimation of response distributions from single-cell time series data. In fact, the agreement is so good that it raises the question of why one would need the inference approach at all. Is it because single-cell time series data is not always available? Is that why the authors used it for one example and not the other? The validation is an asset, but I imagine that the inference approach is complicated and may make assumptions that are not always true. Thus, its utility and appropriate use must be clarified.

      We thank the reviewer for the comment. As the reviewer correctly pointed out, live cell imaging data is not always available and has limited scope. Specifically, optical resolution limits measurements of multiple targets. Moreover, typical live cell measurements measure total abundance or localization and not post-translational modification (phosphorylation, methylation, etc.) which are crucial to signaling dynamics. The most readily available single cell data such those measured using single cell RNA sequencing, immunofluorescence, or flow cytometry are necessarily snapshots. Therefore, computational models that can connect underlying signaling networks to snapshot data become essential when imputing single cell trajectories. In addition, the modeling also allows us to identify network parameters that correlate most strongly with cellular heterogeneity. We have now clarified this point in the manuscript (lines 366-380).

      Minor comments:

      1. I would point out that the maximum values in the single-cell mutual information distributions (Fig 2D and E) correspond to log2 of the number of inputs levels, corresponding to perfect distinguishability of each of the equally-weighted input states. It is clear that many of the mutual information values cluster toward this maximum, and it would help readers to point out why.

      We thank the reviewer for the comment. We have now included a discussion about the skew in the distribution in the text (lines 251-260).

      1. Line 216 references Fig 2C for the EGF/EGFR pathway, but Fig 2C shows the FoxO pathway. In fact, I did not see a schematic of the EGF/EGFR pathway. It may be helpful to include one, and for completeness perhaps also one for the toy model, and organize the figures accordingly.

      We thank the reviewer for the comment. We did not include three separate schematics because the schematics of the EGF/EGFR model and the toy model are subsets of the schematic of the IGF/FoxO model. We have now clarified this point in the manuscript (Figure 2 caption).

      Reviewer #2 (Recommendations For The Authors):

      • the simple model of Fig. 2A would gain from a small cartoon explaining the model and its parameters.

      We thank the reviewer for the comment. We did not include a schematic for the toy model as it is a subset of the schematic of the IGF/FoxO model. The schematic of the toy model is included in the supplementary information.

      • L should be called u, and B should be called x, to be consistent with the rest of the notations in the paper.

      We have decided to keep the notation originally presented in the manuscript.

      • legend of 2E and D should be clarified. "CC input dose" is cryptic. The x axis is the input dose, the y axis is its distribution at the argmax of I. CC is the max of I, not its argmax. Likewise "I" in the legend for the colors should not be used to describe the insets, which are input distributions.

      We have now changed this in the manuscript.

      • the data analysis of the IGF/FoxO pathway should be explained in the main text, not the SI. Otherwise it's impossible to understand how one arrives at, or how to intepret, figure 2E, which is central to the paper. For instance the fact that p(x|u,theta) is assumed to be Gaussian, and how the variance and mean are estimated from the actual data is very important to understand the significance of the results.

      While we have added more details in the manuscript in various places, for the sake of brevity and clarity, we have decided to keep the details of the calculations in the supplementary materials.

      • there's no Method's section. Most of the paper's theoretical work is hidden in the SI, while it should be described in the methods.

      We thank the review of the comment. However, we believe that adding a methods section will break the narrative of the paper. The methods are described in detail in the supplementary materials with sufficient detail to reproduce our results. Additionally, we also provide a link to the github page that has all scripts related to the manuscript.

      PS: please submit a PDF of the SI for review, so that people can read it on any platform (as opposed to a word document, especially with equations)

      We have now done this.

      Reviewer #3 (Recommendations For The Authors):

      1. Subplots in Fig. 1, inset in Fig. 3 are not legible due to small font.

      We have now increased the font.

      1. Mean absolute error in Fig. S5 and relative error in related text should be clarified.

      We have now clarified this in the manuscript.

      1. Acronyms (MACO, MERIDIAN) should be defined.

      We have now made these changes.

      References

      1. Gregor T, Tank DW, Wieschaus EF, Bialek W. Probing the limits to positional information. Cell. 2007;130(1):153-64. doi: 10.1016/j.cell.2007.05.025. PubMed PMID: WOS:000248587000018.

      2. Cohen-Saidon C, Cohen AA, Sigal A, Liron Y, Alon U. Dynamics and Variability of ERK2 Response to EGF in Individual Living Cells. Mol Cell. 2009;36(5):885-93. doi: 10.1016/j.molcel.2009.11.025. PubMed PMID: WOS:000272965400020.

      3. Gross SM, Dane MA, Bucher E, Heiser LM. Individual Cells Can Resolve Variations in Stimulus Intensity along the IGF-PI3K-AKT Signaling Axis. Cell Syst. 2019;9(6):580-8 e4.

      4. Loos C H, J. Mathematical modeling of variability in intracellular signaling. Current Opinion in Systems Biology. 2019;16:17-24.

      5. Dixit PD, Lyashenko E, Niepel M, Vitkup D. Maximum Entropy Framework for Predictive Inference of Cell Population Heterogeneity and Responses in Signaling Networks. Cell Syst. 2020;10(2):204-12 e8.

      6. Taniguchi Y, Choi PJ, Li GW, Chen H, Babu M, Hearn J, Emili A, Xie XS. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science. 2010;329(5991):533-8. doi: 10.1126/science.1188308. PubMed PMID: 20671182; PMCID: PMC2922915.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors attempted to dissect the function of a long non-coding RNA, lnc-FANCI-2, in cervical cancer. They profiled lnc-FANCI-2 in different cell lines and tissues, generated knockout cell lines, and characterized the gene using multiple assays.

      Strengths:

      A large body of experimental data has been presented and can serve as a useful resource for the scientific community, including transcriptomics and proteomics datasets. The reported results also span different parts of the regulatory network and open up multiple avenues for future research.

      Thanks for your positive comments on the strengths.

      Weaknesses:

      The write-up is somewhat unfocused and lacks deep mechanistic insights in some places.

      As the lnc-FANCI-2 as a novel lncRNA had never been explored for any functional study, our report found that it regulates RAS signaling. Thus, this report focuses on lnc-FANCI-2 and RAS signaling pathway but also includes some important screening data, which are important for our readers to understand how we could reach the RAS signaling.

      Reviewer #2 (Public review):

      The study by Liu et al provides a functional analysis of lnc-FANCI-2 in cervical carcinogenesis, building on their previous discovery of FANCI-2 being upregulated in cervical cancer by HPV E7.

      The authors conducted a comprehensive investigation by knocking out (KO) FANCI-2 in CaSki cells and assessing viral gene expression, cellular morphology, altered protein expression and secretion, altered RNA expression through RNA sequencing (verification of which by RT-PCR is well appreciated), protein binding, etc. Verification experiments by RT-PCR, Western blot, etc are notable strengths of the study.

      The KO and KD were related to increased Ras signaling and EMT and reduced IFN-y/a responses.

      Thanks for your positive comments. It did take us a few years to reach this scientific point for understanding of lnc-FANCI-2 function.

      Although the large amount of data is well acknowledged, it is a limitation that most data come from CaSki cells, in which FANCI-2 localization is different from SiHa cells and cancer tissues (Figure 1). The cytoplasmic versus nuclear localization is somewhat puzzling.

      Regarding lnc-FANCI-2 localization, it could be both cytoplasmic and nuclear in cervical cancer tissues, HPV16 or HPV18 infected keratinocytes, and HPV16+ cervical cancer cell line CaSki cells which contain multiple integrated HPV16 DNA copies. But surprisingly, it is most detectable in the nucleus in HPV16+ SiHa cells which contain only one copy of integrated HPV16 DNA (Yu, L., et al. mBio 15: e00729-24, 2024). No matter what, knockdown of lnc-FANCI-2 expression from SiHa cells induces RAS signaling leading to an increase in the expression of p-AKT and p-Erk1/2 (suppl. Fig. S6B).

      Reviewer #3 (Public review):

      Summary:

      A long noncoding RNA, lnc-FANCI-2, was reported to be regulated by HPV E7 oncoprotein and a cell transcription factor, YY1 by this group. The current study focuses on the function of lnc-FANCI-2 in HPV-16 positive cervical cancer is to intrinsically regulate RAS signaling, thereby facilitating our further understanding of additional cellular alterations during HPV oncogenesis. The authors used advanced technical approaches such as KO, transcriptome and (IRPCRP) and LC- MS/MS analyses in the current study and concluded that KO Inc-FANCI-2 significantly increases RAS signaling, especially phosphorylation of Akt and Erk1/2.

      Strengths:

      (1) HPV E6E7 are required for full immortalization and maintenance of the malignant phenotype of cervical cancer, but they are NOT sufficient for full transformation and tumorigenesis. This study helps further understanding of other cellular alterations in HPV oncogenesis.

      (2) lnc-FANCI-2 is upregulated in cervical lesion progression from CIN1, CIN2-3 to cervical cancer, cancer cell lines, and HPV transduced cell lines.

      (3) Viral E7 of high-risk HPVs and host transcription factor YY1 are two major factors promoting lnc-FANCI-2 expression.

      (4) Proteomic profiling of cytosolic and secreted proteins showed inhibition of MCAM, PODXL2, and ECM1 and increased levels of ADAM8 and TIMP2 in KO cells.

      (5) RNA-seq analyses revealed that KO cells exhibited significantly increased RAS signaling but decreased IFN pathways.

      (6) Increased phosphorylated Akt and Erk1/2, IGFBP3, MCAM, VIM, and CCND2 (cyclin D2) and decreased RAC3 were observed in KO cells.

      Thanks for your positive comments. It has taken us almost nine years to reach this point to gradually understand lnc-FANCI-2 functions, which are more complex than our initial thoughts.  

      Weaknesses:

      (1) The authors observed the increased Inc-FANCI-2 in HPV 16 and 18 transduced cells, and other cervical cancer tissues as well, HPV-18 positive HeLa cells exhibited different expressions of Inc-FANCI-2.

      Both HPV16 and HPV18 infections induce lnc-FANCI-2 expression in keratinocytes (Liu H., et al. PNAS, 2021). However, HPV18+ cervical cancer cell lines HeLa and C4II cells (Figure S1A and S1B) do not express lnc-FANCI-2 as we see in HPV-negative cell lines such as HCT116, HEK293, HaCaT, and BCBL1 cells. Although we don’t know why, our preliminary data show that the lnc-FANCI-2 promoter functions well and is sensitive to YY1 binding in lnc-FANCI-2 expressing CaSki and C33A cells in our dual luciferase assays but is much less sensitive to YY1 binding in HeLa and HCT116 cells, indicating some unknown cellular factors negatively regulating lnc-FANCI-2 promoter activity.

      Author response image 1.

      A firefly luciferase (FLuc) reporter containing either the wild-type (−600 wt) or YY1-binding-site-mutated lnc-FANCI-2 promoter was evaluated in CaSki, HeLa, C33A, and HCT116 cells for its promoter activity, with Renilla luciferase (RLuc) activity driven by a TK promoter serving as an internal control. The two YY1-binding motifs (A and B) with a X for mutation are illustrated in the right diagram.

      (2) Previous studies and data in the current showed a steadily increased Inc-FANCI-2 during cancer progression, however, the authors did not observe significant changes in cell behaviors (both morphology and proliferation) in KO Inc-FANCI-2.

      Thanks. We do see decreases in cell proliferation, colony formation, and cell migration, accompanied by increased cell senescence, from the lnc-FANCI-2 KO cells to the parent WT cells.  These data are now added to the revised Fig. 1 and the revised supplemental Fig. S3.

      (3) The authors observed the significant changes of RAS signaling (downstream) in KO cells, but they provided limited interpretations of how these results contributed to full transformation or tumorigenesis in HPV-positive cancer.

      As we stated in the title of this function of lnc-FANCI-2, the lnc-FANCI-2 intrinsically restricts RAS signaling and phosphorylation of Akt and Erk in HPV16-infected cervical cancer. Presumably, high RAS-AKT-ERK signaling inhibits tumor cell survival due to senescence induction as we show in our new Figure 1 and supplemental Fig. S3. A similar report was found in a lung cancer study (Patricia Nieto, et al. Nature 548: 239-243, 2017).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) A major issue is that parts of the manuscript read like a collection of experimental results. However, some of the results do not contribute directly to the central story. Besides confusing the reader, the large amount of apparently disparate results can raise more questions. For example:

      a) Why is lnc-FANCI-2 highly expressed in HPV16-infected cervical cancer cell lines (but not in HPV18-infected cells)?

      b) How do p53 and RB repress the expression of lnc-FANCI-2?

      c) What regulates the sub-cellular localization of lnc-FANCI-2?

      d) How does lnc-FANCI-2 negatively regulate RAS signalling?

      e) How does MAP4K4 bind to lnc-FANCI-2?

      f) Do lnc-FANCI-2 and MAP4K4 require each other to regulate RAS signalling?

      g) How does RAS signalling regulate the transcription of MCAM and IGFBP3?

      h) How does MCAM feedback on RAS? Do the different MCAM isoforms impact on RAS signalling differently?

      i) How does IGFBP3 feedback on ERK but not AKT?

      j) How do the other mentioned proteins like ADAM8 fit into the regulatory network?

      k) Each question will require a lot more work to address. I think it would be good if the authors could think through carefully what the key message(s) in the current manuscript should be and then present a more focused write-up.

      Thanks for the critical comments. Because this study is the first time to explore lnc-FANCI-2 functions, we would like to be collective. We believe these data are important to guide any future studies. We really appreciate our reviewer listing many questions related to HPV infection, cell biology, RAS signaling, cancer biology from questions a to k. To address each question in a satisfactory way will be a separate study, but fortunately, our report has pointed out such a direction with some preliminary data for future studies. Here below are our responses to each question from a to k:

      a) Both HPV16 and HPV18 infection induce lnc-FANCI-2 expression in keratinocytes (Liu H., et al. PNAS, 2021). However, HPV18+ cervical cancer cell lines HeLa and C4II cells (Figure S1A and S1B) do not express lnc-FANCI-2 as we see in HPV-negative cell lines such as HCT116, HEK293, HaCaT, and BCBL1 cells. Although we don’t know why, our preliminary data show that lnc-FANCI-2 promoter functions well and is sensitive to YY1 binding in lnc-FANCI-2 expressing CaSki and C33A cells but is much less sensitive to YY1 in HeLa and HCT116 cells, indicating some unknown cellular factors negatively regulating lnc-FANCI-2 promoter activity.

      b) We don’t know whether p53 and pRB could repress the expression of lnc-FANCI-2 although C33A cells bearing a mutant p53 and mutant pRB express high amount of lnc-FANCI-2. However, KD of E2F1 had no effect on lnc-FANCI-2 promoter activity in CaSki cells (Liu, H., et al. PNAS, 2021).

      c) RNA cellular localization can be affected by many factors, including splicing, export, and polyadenylation. As lnc-FANCI-2 is a long non-coding RNA, its regulation of cellular location could be more complicated than mRNAs and thus could be a future research direction.  

      d) The conclusion that lnc-FANCI-2 negatively regulates RAS signaling is based on both lnc-FANCI-2 KO and KD studies.  Please see the proposed hypothetic model in Figure 8E.

      e) The MAP4K4 binding to lnc-FANCI-2 was demonstrated by our IRPCRP-Mass spectrometry (Fig. 8A and 8C), although the exact binding site on lnc-FANCI-2 was not explored. As you probably know, many enzymes today turn out an RNA-binding enzyme (Castello A., et al. Trends Endocrinol. Metab. 26: 746-757, 2015; Hentze MW., et al. Nat. Rev. Mol. Cell Biol. 19: 327-341, 2018)    

      f) Yes, they are slightly relied on each other in regulating RAS signaling. We found that KD of MAP4K4 in parent CaSki cells (Figure 8D) led to more effect on RAS signaling (MCAM, IGFBP3, p-Akt) than that in lnc-FANCI-2 KO ΔPr-A9 cells. In contrast, the latter displayed more p-Erk1/2 than that induced by KD of lnc-FANCI-2 in the parental CaSki cells (Figure S7C).

      g) We believe RAS signaling regulates most likely the transcription of MCAM and IGFBP3 through phosphorylated transcription factors (Figure 8E diagram).

      h) As a signal molecule with at least 13 ligands/coreceptors (Joshkon A., et al. Biomedicines 8: 633, 2020), the increased MCAM appears to sustain RAS signaling (Fig. 7J and Fig. 8E). We are assuming the full-length cytoplasmic MCAM plays a predominant role in RAS signaling due to its abundance than the cleaved nuclear MCAM missing both transmembrane and cytoplasmic regions. Plus, RAS signaling mainly occurs in the cytosol.  

      i) Exact mechanism remains unknown. Lnc-FANCI-2 KO cells exhibit high expression levels of IGFBP3 RNA and protein and p-Erk1/2, but not so much for p-Akt, possibly due to IGFBP3 regulation of MAPK for Erk phosphorylation, but not much so on PI3K for Akt phosphorylation.

      j) The dysregulation of RAS signaling and ADAM protein activity is implicated in various cancers. ADAM proteins can modulate RAS signaling by cleaving and releasing ligands that activate or inactivate RAS-related pathways (Schafer B., et al. JBC 279: 47929-38, 2004; Ohtsu H., et al. Am J Physiol Cell Physiol 291: C1-C10, 2006; Dang M, et al. JBC 286: 17704-17713, 2011; Kleino I, et al. PLoS One 10: e0121301, 2015). Some ADAM proteins are Involved in the migration and invasion of cancer cells, and its loss can promote the degradation of KRAS (Huang Y-K., et al. Nat Cancer 5: 400-419, 2024). In this revision, we have a brief discussion on ADAMs and RAS signaling.

      k) We agree with our reviewer that each question will require a lot more work to address. As this study is to explore the lnc-FANCI-2 function for the first time, however, we prefer to include all of these data that have been selectively included in this write-up. We hope reviewer 1 will be satisfied with our response to each question from a to j. 

      (2) Figures S1A & S1C - Replicates are needed.

      Yes, we have repeated all of the experiments. The quantification shown in Figure S1A and S1C was performed in triplicate, and error bars have been added to the updated figure.

      3) Figure S1D - There seems to be some lnc-FANCI-2 RNA in the nucleus of CaSki cells as well. Please quantify the relative amount of lnc-FANCI-2 in the nucleus vs cytoplasm.

      Yes, a small fraction of lnc-FANCI-2 is in the nucleus of CaSki cells as we reported (Liu H., PNAS, 2021, Movies S1 and S2). We did quantify by fractionation and RT-qPCR the relative amount of lnc-FANCI-2 in the nucleus vs cytoplasm in Figure S1C. 

      (4) Figure S2B - (a) For ΔPr-A9 cells, it looks like there is an increase in E6 and a decrease in E7, instead of "little change" as the authors claimed. (b) I suggest checking the protein levels for all the control and KO clones.

      Thanks for the questions. We had some variation in E6 and E7 detection and the submitted one was one representative.  We grew again the lnc-FANCI-2 KO clones A9 and B3 and reexamined the expression of HPV16 E6/E7 proteins and their downstream targets, p53 and E2F1. As shown in new Figure S3A expt II, we saw again some variations in the detections (~20-30%) and these variations do not reflect a noticeable change for their downstream targets. Thus, we do not consider these changes significantly enough to draw a conclusion in our study, but rather most likely from sampling in the assays.

      (5) In the Proteome Profiler Human sReceptor Array analysis, multiple proteins were highlighted as having at least 30% change. But it is unclear how they relate to RAS signaling.

      Thanks for this comment.  Cellular soluble receptors are essential for RAS signaling, EMT pathway and IFN responses. For example, the dysregulation of RAS signaling and ADAM protein activity is implicated in various cancers. ADAM proteins can modulate RAS signaling by cleaving and releasing ligands that activate or inactivate RAS-related pathways (Schafer B., et al. JBC 279: 47929-38, 2004; Ohtsu H., et al. Am J Physiol Cell Physiol 291: C1-C10, 2006; Dang M, et al. JBC 286: 17704-17713, 2011; Kleino I, et al. PLoS One 10: e0121301, 2015). Some ADAM proteins are Involved in the migration and invasion of cancer cells, and its loss can promote the degradation of KRAS (Huang Y-K., et al. Nat Cancer 5: 400-419, 2024). In this revision, we have a brief discussion on ADAMs and RAS signaling.

      (6) Does knockdown of MAP4K4 lead to an increase in MCAM and IGFBP3?

      Yes, the MAP4K4 KD from parental WT CaSki cells does lead an increase in MCAM (~70%) and IGFBP3 (~30%) which is like the knockdown of lnc-FANCI-2 shown in the revised Figure 8D.

      Minor comments:

      (7) In the opinion of this reviewer the title is somewhat unwieldy.

      Thanks. We have shortened the title as “The lnc-FANCI-2 intrinsically restricts RAS signaling in HPV16-infected cervical cancer”

      (8) The abstract can be more focused and doesn't have to mention so many gene names. In fact, the significance paragraph works better as an abstract. For the significance, the authors can provide another write-up on the implications of their research instead.

      Thanks. We have revised the abstract and added the implications of this research.

      (9) The last sentence of the introduction feels a little abrupt. It would be good to elaborate a little more on the key findings.

      Thanks for this critical comment. We have revised as in the following: In this report, we demonstrate that lnc-FANCI-2 in HPV16-infected cells controls RAS signaling by interaction with MAP4K4 and other RNA-binding proteins. Ablation of lnc-FANCI-2 in the cells promotes RAS signaling and phosphorylation of Akt and Erk. High levels of lnc-FANCI-2 and low level of MCAM expression in cervical cancer patients correlate with improved survival, indicating that lnc-FANCI-2 plays a critical role in regulating RAS signaling to affect cervical cancer progression and patient outcomes.

      (10) Typo on line 191: Should be ADAM8 and not ADMA8.

      Corrected.

      Reviewer #2 (Recommendations for the authors):

      The paper contains a vast amount of data and would greatly benefit from an expanded version of the schematic of Figure 8E summarizing the main results. Including additional details on FANCI-2 regulation by HPV (primarily from previous studies) and its implications for HPV16-driven carcinogenesis would provide a more comprehensive overview.

      Thanks for the suggestion. We have modified our Figure 8E to include HR-HPV E7 and YY1 in regulation of lnc-FANCI-2 transcription.

      Further specific comments:

      (1) The introduction may be shortened to increase readability (e.g. lines 77-90; 94-105).

      We have shortened the introduction by deletion of the lines 94-105 from our initial submission.

      (2) Lines 55-57 the number of cervical cancer diagnoses and mortality need to be updated to the latest literature. The reference is from 2012.

      Thanks. We have revised and updated accordingly with a new citation (Bray F., et al: Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 74, 229-263 (2024))

      (3) Line 61: Progression rate of CIN3 is incorrect (31% in 30 years according to reference 5).

      Thanks. Corrected.

      (4) Lines 108-112 are difficult to understand and should be rewritten.

      Thanks. Revised accordingly.

      (5) Line 116 Is this correct or should 'but' be 'and'?

      Thanks. Corrected accordingly.

      (6) Figure 1A top: The difference between cervical cancer and normal areas is hard to see in the top figure. The region labeled as "normal" does not resemble typical differentiating epithelium or normal glandular epithelium, though this is difficult to assess accurately from the image provided. I suggest adding HE staining and also the histotypes.

      We have added an H&E staining panel in the corresponding region to Figure 1A, which clearly shows the normal and cancer regions. Both cervical cancer tissues were cervical squamous cell carcinoma.

      (7) HFK-HPV16 & 18 cells (Figure 1B) are not described in the Materials & Methods.

      Thanks. We revised our Materials and Methods by citing our two previous publications.

      (8) Figure 2E (RNA scope on FANCI-2 KO) only shows 2 to 3 cells, which makes it somewhat difficult to assess downregulated expression in the KO. I suggest replacing these with pictures showing more cells (i.e. >10) to strengthen the results.

      We have replaced the image in Figure 2E to include more cells.

      (9) The spindle-like morphology in deltaPr-A9 cells shown in FigS2A is not very distinct. Including images at higher magnification could help clarify this feature.

      Good comment. We have enlarged the images for better view and revised the context.

      (10) Both protein and RNA expression analysis have been performed on WT CaSki cells and FANCI-2 KO cells. If I am correct there is little overlap between the significantly changed gene products. What does this mean? Have you looked into the comparison?

      The DEGs identified from RNA-seq indicated a genome wide transcriptome change, while the protein array we used only covered 105 soluble protein receptors. However, we did find 9/15 (60%) membrane proteins in cell lysates (PODXL2, ECM1, NECTIN2, MCAM, ADAM9, CDH5, ADAM10, ITGA5, NOTCH1, SCARF2, ADAM8, TIMP2, LGALS3BP, CDH13, and ITGB6) exhibited consistent changes in expression (underlined) by both RNA-seq and protein array assays. We have revised the text with this information (page 11). Other six proteins (40%) had inconsistent expression correlation in two assays could be due to post-translational mechanisms, such as protein stability, modifications and secretion, etc.  

      (11) Figure S7, which represents TCGA data and survival is quite complex. It would be more effective to display a similar figure for FANCI-2, as was done for MCAM in Figure 7I, to simplify the comparison and enhance clarity.

      Thanks. However, the suggested figure for lnc-FANCI-2 was published in PNAS paper already (Liu H., et al. PNAS, 2021).  The Figure S8 in this revision is the result from our in-house GradientScanSurv pipeline, a new way to correlate the expression and survival more accurately.

      What do the Figures look like if you analyse only HPV16+ patients versus HPV18+ patients, considering that FANCI-2 upregulation in cell lines is related to HPV16 and not 18? Is there an effect of histotype? Or tumor stage?

      HPV18 infected keratinocytes express high level of lnc-FANCI-2. Two HPV18<sup>+</sup> HeLa and C4II cell lines and HPV-negative cell lines, such as HCT116 cells, which do not express lnc-FANCI-2 could be due to the presence of some unknow repressive factors. We found that lnc-FANCI-2 promoter functions well in responding to YY1 binding in CaSki and C33A cells expressing lnc-FANCI-2 but does not so in HeLa and HCT116 cells in our dual luciferase assays. 

      (12) It remains puzzling that FANCI-2 upregulation was previously shown to already occur in CIN lesions and increase further in cervical cancer, while the current data indicate that FANCI-2 suppresses AKT activation. If I am correct Akt activation has been linked to cervical carcinogenesis. Similarly, line 434 states that increased MCAM might promote cervical tumorigenesis, implying that low FANCI-2 would stimulate tumorigenesis. If I understand correctly, the increase in FANCI-2 observed in CIN lesions would reflect a "brake" on the carcinogenic pathway and its sustained increase in cancer might indicate that growth is still (partly) controlled. As mentioned earlier, a Figure illustrating the relation between FANCI-2, HPV, and the carcinogenic process would be beneficial for clarity.

      Yes. Increased MCAM, but low level of lnc-FANCI-2, correlates with poor cervical cancer survival. We have revised Figure 8E to illustrate this relation better.  

      (13) May part of the potentially conflicting findings be explained by CaSki cells being of metastatic origin? Related to this, does the expression of FANCI-2 or MALM depend on the tumor stage?

      Thanks for this important suggestion. Unfortunately, we found that the expression of lnc-FANCI-2 and MCAM is not associated with cervical cancer stage based on the TCGA data (http://gepia.cancer-pku.cn/index.html). See the data below:

      Author response image 2.

      Despite some lingering uncertainty, the extensive experiments conducted using KO and KD cells do provide compelling evidence that lnc-FANCI-2 function is linked to RAS signaling and EMT.

      Thanks for your positive review and instructive comments.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors observed the increased Inc-FANCI-2 in HPV 16 and 18 transduced cells, and other cervical cancer tissues as well, HPV-18 positive HeLa cells exhibited different expressions of Inc-FANCI-2. I suggest authors provide more discussions on this difference, for example, HPV genotypes. HPV genome status in host cells? Cell types?

      Thanks. We found the keratinocyte infections with HPV16, HPV18, and other HR-HPVs could induce lnc-FANCI-2 expression (Liu H., et al. PNAS, 2021). In this report, we found HPV18<sup>+</sup> HeLa and C4II cells and other HPV-negative cell lines do not. Our preliminary data on lnc-FANCI-2 promoter activity assays showed the presence of a negative regulatory factor (s) in non-lnc-FANCI-2 expressing cells. See the data in Author response image 1.

      We have revised our discussion by inclusion these sets of the luciferase data as data not shown.

      (2) I suggest the authors discuss more details on how the changes of RAS signaling in KO cells help our further understanding of the molecular mechanisms for HPV-associated full-cell transformation and malignancy in addition to the well-known functions of HPV E6 and E7.

      Thanks. We have modified the Figure 8E as suggested by reviewer 2 and revised the discussion further.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      This paper performs fine-mapping of the silkworm mutants bd and its fertile allelic version, bdf, narrowing down the causal intervals to a small interval of a handful of genes. In this region, the gene orthologous to mamo is impaired by a large indel, and its function is later confirmed using expression profiling, RNAi, and CRISPR KO. All these experiments are convincingly showing that mamo is necessary for the suppression of melanic pigmentation in the silkworm larval integument. The authors also use in silico and in vitro assays to probe the potential effector genes that mamo may regulate. Strengths: The genotype-to-phenotype workflow, combining forward (mapping) and reverse genetics (RNAi and CRISPR loss-of-function assays) linking mamo to pigmentation are extremely convincing.

      Response: Thank you very much for your affirmation of our work. The reviewer discussed the parts of our manuscript that involve evolution sentence by sentence. We have further refined the description in this regard and improved the logical flow. Thank you again for your help.

      Weaknesses:

      1) The last section of the results, entitled "Downstream target gene analysis" is primarily based on in silico genome-wide binding motif predictions.

      While the authors identify a potential binding site using EMSA, it is unclear how much this general approach over-predicted potential targets. While I think this work is interesting, its potential caveats are not mentioned. In fact the Discussion section seems to trust the high number of target genes as a reliable result. Specifically, the authors correctly say: "even if there are some transcription factor-binding sites in a gene, the gene is not necessarily regulated by these factors in a specific tissue and period", but then propose a biological explanation that not all binding sites are relevant to expression control. This makes a radical short-cut that predicted binding sites are actual in vivo binding sites. This may not be true, as I'd expect that only a subset of binding motifs predicted by Positional Weight Matrices (PWM) are real in vivo binding sites with a ChIP-seq or Cut-and-Run signal. This is particularly problematic for PWM that feature only 5-nt signature motifs, as inferred here for mamo-S and mamo-L, simply because we can expect many predicted sites by chance.

      Response: Thank you very much for your careful work. The analysis and identification of transcription factor-binding sites is an important issue in gene regulation research. Techniques such as ChIP-seq can be used to experimentally identify the binding sites of transcription factors (TFs). However, reports using these techniques often only detect specific cell types and developmental stages, resulting in a limited number of downstream target genes for some TFs. Interestingly, TFs may regulate different downstream target genes in different cell types and developmental stages.

      Previous research has suggested that the ZF-DNA binding interface can be understood as a “canonical binding model”, in which each finger contacts DNA in an antiparallel manner. The binding sequence of the C2H2-ZF motif is determined by the amino acid residue sequence of its α-helical component. Considering the first amino acid residue in the α-helical region of the C2H2-ZF domain as position 1, positions -1, 2, 3, and 6 are key amino acids for recognizing and binding DNA. The residues at positions -1, 3, and 6 specifically interact with base 3, base 2, and base 1 of the DNA sense sequence, respectively, while the residue at position 2 interacts with the complementary DNA strand (Wolfe SA et al., 2000; Pabo CO et al., 2001). Based on this principle, the binding sites of C2H2-ZF have good reference value. For the 5-nt PWM sequence, we referred to the study of D. melanogaster, which was identified by EMSA (Shoichi Nakamura et al., 2019). In the new version, we have rewritten this section.

      Pabo CO, Peisach E, Grant RA. Design and selection of novel Cys2His2 zinc finger proteins. Annu Rev Biochem. 2001;70:313-340.

      Wolfe SA, Nekludova L, Pabo CO. DNA recognition by Cys2His2 zinc finger proteins. Annu Rev Biophys Biomol Struct. 2000;29:183-212.

      Nakamura S, Hira S, Fujiwara M, et al. A truncated form of a transcription factor Mamo activates vasa in Drosophila embryos. Commun Biol. 2019;2:422. Published 2019 Nov 20.

      2) The last part of the current discussion ("Notably, the industrial melanism event, in a short period of several decades ... a more advanced self-regulation program") is flawed with important logical shortcuts that assign "agency" to the evolutionary process. For instance, this section conveys the idea that phenotypically relevant mutations may not be random. I believe some of this is due to translation issues in English, as I understand that the authors want to express the idea that some parts of the genome are paths of least resistance for evolutionary change (e.g. the regulatory regions of developmental regulators are likely to articulate morphological change). But the language and tone is made worst by the mention that in another system, a mechanism involving photoreception drives adaptive plasticity, making it sound like the authors want to make a Lamarckian argument here (inheritance of acquired characteristics), or a point about orthogenesis (e.g. the idea that the environment may guide non-random mutations).

      Because this last part of the current discussion suffers from confused statements on modes and tempo of regulatory evolution and is rather out of topic, I would suggest removing it.

      In any case, it is important to highlight here that while this manuscript is an excellent genotype-to-phenotype study, it has very few comparative insights on the evolutionary process. The finding that mamo is a pattern or pigment regulatory factor is interesting and will deserve many more studies to decipher the full evolutionary study behind this Gene Regulatory Network.

      Response: Thank you very much for your careful work. In this part of the manuscript, we introduced some assumptions that make the statement slightly unconventional. The color pattern of insects is an adaptive trait. The bd and bdf mutants used in the study are formed spontaneously. As a frequent variation and readily observable phenotype, color patterns have been used as models for evolutionary research (Wittkopp PJ et al., 2011). Darwin's theory of natural selection has epoch-making significance. I deeply believe in the theory that species strive to evolve through natural selection. However, with the development of molecular genetics, Darwinism’s theory of undirected random mutations and slow accumulation of micromutations resulting in phenotype evolution has been increasingly challenged.

      The prerequisite for undirected random mutations and micromutations is excessive reproduction to generate a sufficiently large population. A sufficiently large population can contain sufficient genotypes to face various survival challenges. However, it is difficult to explain how some small groups and species with relatively low fertility rates have survived thus far. More importantly, the theory cannot explain the currently observed genomic mutation bias. In scientific research, every theory is constantly being modified to adapt to current discoveries. The most famous example is the debate over whether light is a particle or a wave, which has lasted for hundreds of years. However, in the 20th century, both sides seemed to compromise with each other, believing that light has a wave‒particle duality.

      In summary, we have rewritten this section to reduce unnecessary assumptions.

      Wittkopp PJ, Kalay G. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat Rev Genet. 2011;13(1):59-69.

      Minor Comment:

      The gene models presented in Figure 1 are obsolete, as there are more recent annotations of the Bm-mamo gene that feature more complete intron-exon structures, including for the neighboring genes in the bd/bdf intervals. It remains true that the mamo locus encodes two protein isoforms.

      An example of the Bm-mamo locus annotation, can be found at: https://www.ncbi.nlm.nih.gov/gene/101738295 RNAseq expression tracks (including from larval epidermis) can be displayed in the embedded genome browser from the link above using the "Configure Tracks" tool.

      Based on these more recent annotations, I would say that most of the work on the two isoforms remains valid, but FigS2, and particularly Fig.S2C, need to be revised.

      Response: Thank you very much for your careful work. In this study, we referred to the predicted genes of SilkDB, NCBI and Silkbase. In different databases, there are varying degrees of differences in the number of predicted genes and the length of gene mRNA. Because the SilkDB database is based on the first silkworm genome, it has been used for the longest time and has a relatively large number of users. In the revised manuscript, we have added the predicted genes of NCBI and Silkbase in Figure S1.

      Author response image 1.

      The predicted genes and qPCR analysis of candidate genes in the responsible genomic region for bd mutant. (A) The predicted genes in SilkDB;(B) the predicted genes in Genbak;(C) the predicted genes in Silkbase;(D) analysis of nucleotide differences in the responsible region of bd;(E) investigation of the expression level of candidate genes.

      Reviewer #2 (Public Review):

      Summary:

      The authors tried to identify new genes involved in melanin metabolism and its spatial distribution in the silkworm Bombyx mori. They identified the gene Bm-mamo as playing a role in caterpillar pigmentation. By functional genetic and in silico approaches, they identified putative target genes of the Bm-mamo protein. They showed that numerous cuticular proteins are regulated by Bm-mamo during larval development.

      Strengths:

      • preliminary data about the role of cuticular proteins to pattern the localization of pigments

      • timely question

      • challenging question because it requires the development of future genetic and cell biology tools at the nanoscale

      Response: Thank you very much for your affirmation of our work. The reviewer's familiarity with the color patterns of Lepidoptera is helpful, and the recommendation raised has provided us with very important assistance. This has allowed us to make significant progress with our manuscript.

      Weaknesses:

      • statistical sampling limited

      • the discussion would gain in being shorter and refocused on a few points, especially the link between cuticular proteins and pigmentation. The article would be better if the last evolutionary-themed section of the discussion is removed.

      A recent paper has been published on the same gene in Bombyx mori (https://www.sciencedirect.com/science/article/abs/pii/S0965174823000760) in August 2023. The authors must discuss and refer to this published paper through the present manuscript.

      Response: Thank you very much for your careful work. First, we believe that competitive research is sometimes coincidental and sometimes intentional. Our research began in 2009, when we began to configure the recombinant population. In 2016, we published an article on comparative transcriptomics (Wu et al. 2016). The article mentioned above has a strong interest in our research and is based on our transcriptome analysis for further research, with the aim of making a preemptive publication. To discourage such behavior, we cannot cite it and do not want to discuss it in our paper.

      Songyuan Wu et al. Comparative analysis of the integument transcriptomes of the black dilute mutant and the wild-type silkworm Bombyx mori. Sci Rep. 2016 May 19:6:26114. doi: 10.1038/srep26114.

      Reviewer #1 (Recommendations For The Authors):

      1) please consider using a more recent annotation model of the B. mori genome to revise your Result Section 1, Fig.1, and Fig. S2. https://www.ncbi.nlm.nih.gov/gene/101738295

      Specifically, you used BGIM_ gene models, while the current annotation such as the one above featured in the NCBI database provides more accurate intron-exon structures without splitting mamo into tow genes. I believe this can be done with minor revisions of the figures, and you could keep the BGIM_ gene names for the text.

      Response: Thank you very much for your careful work. The GenBank of NCBI (National Center for Biotechnology Information) is a very good database that we often use and refer to in this research process. Our research started in 2009, so we mainly referred to the SilkDB database (Jun Duan et al., 2010), although other databases also have references, such as NCBI and Silkbase (https://silkbase.ab.a.u-tokyo.ac.jp/cgi-bin/index.cgi). Because the SilkDB database was constructed based on the first published silkworm genome data, it has been used for the longest time and has a relatively large number of users. Recently, researchers are still using these data (Kejie Li et al., 2023).

      The problem with predicting the mamo gene as two genes (BGIBMGA012517 and BGIBMGA012518) in SilkDB is mainly due to the presence of alternative splicing of the mamo gene. BGIBMGA012517 corresponds to the shorter transcript (mamo-s) of the mamo gene. Due to the differences in sequencing individuals, sequencing methods, and methods of gene prediction, there are differences in the number and sequence of predicted genes in different databases. We added the pattern diagram of predicted genes from NCBI and Silkbase, and the expression levels of new predicted genes are shown in Supplemental Figure S1.

      Jun Duan et al., SilkDB v2.0: a platform for silkworm (Bombyx mori) genome biology. Nucleic Acids Res. 2010 Jan;38(Database issue): D453-6. doi: 10.1093/nar/gkp801. Kejie Li et al., Transcriptome analysis reveals that knocking out BmNPV iap2 induces apoptosis by inhibiting the oxidative phosphorylation pathway. Int J Biol Macromol. 2023 Apr 1;233:123482. doi: 10.1016/j.ijbiomac.2023.123482. Epub 2023 Jan 31.

      Author response image 2.

      The predicted genes and qPCR analysis of candidate genes in the responsible genomic region for bd mutant. (A) The predicted genes in SilkDB;(B) the predicted genes in Genbak;(C) the predicted genes in Silkbase;(D) analysis of nucleotide differences in the responsible region of bd;(E) investigation of the expression level of candidate genes.

      2) As I mentioned in my public review, I strongly believe the interpretation of the PWM binding analyses require much more conservative statements taking into account the idea that short 5-nt motifs are expected by chance. The work in this section is interesting, but the manuscript would benefit from a quite significant rewrite of the corresponding Discussion section, making it that the in silico approach is prone to the identification of many sites in the genomes, and that very few of those sites are probably relevant for probabilistic reasons. I would recommend statements such as "Future experiments assessing the in vivo binding profile of Bm-mamo (eg. ChIP-seq or Cut&Run), will be required to further understand the GRNs controlled by mamo in various tissues".

      Response: Thank you very much for your careful work. Previous research has suggested that the ZF-DNA binding interface can be understood as a “canonical binding model”, in which each finger contacts DNA in an antiparallel manner. The binding sequence of the C2H2-ZF motif is determined by the amino acid residue sequence of its α-helical component. Considering the first amino acid residue in the α-helical region of the C2H2-ZF domain as position 1, positions -1, 2, 3, and 6 are key amino acids for recognizing and binding DNA. The residues at positions -1, 3, and 6 specifically interact with base 3, base 2, and base 1 of the DNA sense sequence, respectively, while the residue at position 2 interacts with the complementary DNA strand (Wolfe SA et al., 2000; Pabo CO et al., 2001). Based on this principle, the prediction of DNA recognition motifs of C2H2-type zinc finger proteins currently has good accuracy.

      The predicted DNA binding sequence (GTGCGTGGC) of the mamo protein in Drosophila melanogaster was highly consistent with that of silkworms. In addition, in D. melanogaster, the predicted DNA binding sequence of mamo, the bases at positions 1 to 7 (GTGCGTG), was highly similar to the DNA binding sequence obtained from EMSA experiments (Seiji Hira et al., 2013). Furthermore, in another study on the mamo protein of Drosophila melanogaster, five bases (TGCGT) were used as the DNA recognition core sequence of the mamo protein (Shoichi Nakamura et al., 2019). In the JASPAR database (https://jaspar.genereg.net), there are also some shorter (4-6 nt) DNA recognition sequences; for example, the DNA binding sequence of Ubx is TAAT (ID MA0094.1) in Drosophila melanogaster. However, we used longer DNA binding motifs (9 nt and 15 nt) of mamo to study the 2 kb genomic regions near the predicted gene. Over 70% of predicted genes were found to have these feature sequences near them. This analysis method is carried out with common software and processes. Due to sufficient target proteins, the accessibility of DNA, the absence of suppressors, the suitability of ion environments, etc., zinc finger protein transcription factors are more likely to bind to specific DNA sequences in vitro than in vivo. Using ChIP-seq or Cut&Run techniques to analyze various tissues and developmental stages in silkworms can yield one comprehensive DNA-binding map of mamo, and some false positives generated by predictions can be excluded. Thank you for your suggestion. We will conduct this work in the next research step. In addition, for brevity, we deleted the predicted data (Supplemental Tables S7 and S8) that used shorter motifs.

      Pabo CO, Peisach E, Grant RA. Design and selection of novel Cys2His2 zinc finger proteins. Annu Rev Biochem. 2001;70:313-340.

      Wolfe SA, Nekludova L, Pabo CO. DNA recognition by Cys2His2 zinc finger proteins. Annu Rev Biophys Biomol Struct. 2000;29:183-212.

      Anton V Persikov et al., De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res. 2014 Jan;42(1):97-108. doi: 10.1093/nar/gkt890. Epub 2013 Oct 3.

      Seiji Hira et al., Binding of Drosophila maternal Mamo protein to chromatin and specific DNA sequences. Biochem Biophys Res Commun. 2013 Aug 16;438(1):156-60. doi: 10.1016/j.bbrc.2013.07.045. Epub 2013 Jul 20.

      Shoichi Nakamura et al., A truncated form of a transcription factor Mamo activates vasa in Drosophila embryos. Commun Biol. 2019 Nov 20;2: 422. doi: 10.1038/s42003-019-0663-4. eCollection 2019.

      3) In my opinion, the last section of the Discussion needs to be completely removed ("Notably, the industrial melanism event, in a short period of several decades ... a more advanced self-regulation program"), as it is over-extending the data into evolutionary interpretations without any support. I would suggest instead writing a short paragraph asking whether the pigmentary role of mamo is a Lepidoptera novelty, or if it could have been lost in the fly lineage.

      Below, I tried to comment point-by-point on the main issues I had.

      Wu et al: Notably, the industrial melanism event, in a short period of several decades, resulted in significant changes in the body color of multiple Lepidoptera species(46). Industrial melanism events, such as changes in the body color of pepper moths, are heritable and caused by genomic mutations(47).

      Yes, but the selective episode was brief, and the relevant "carbonaria" mutations may have existed for a long time at low-frequency in the population.

      Response: Thank you very much for your careful work. Moth species often have melanic variants at low frequencies outside industrial regions. Recent molecular work on genetics has revealed that the melanic (carbonaria) allele of the peppered moth had a single origin in Britain. Further research indicated that the mutation event causing industrial melanism of peppered moth (Biston betularia) in the UK is the insertion of a transposon element into the first intron of the cortex gene. Interestingly, statistical inference based on the distribution of recombined carbonaria haplotypes indicates that this transposition event occurred in approximately 1819, a date highly consistent with a detectable frequency being achieved in the mid-1840s (Arjen E Van't Hof, et al., 2016). From molecular research, it is suggested that this single origin melanized mutant (carbonaria) was generated near the industrial development period, rather than the ancient genotype, in the UK. We have rewritten this part of the manuscript.

      Arjen E Van't Hof, et al., The industrial melanism mutation in British peppered moths is a transposable element. Nature. 2016 Jun 2;534(7605):102-5. doi: 10.1038/nature17951.

      Wu et al: If relying solely on random mutations in the genome, which have a time unit of millions of years, to explain the evolution of the phenotype is not enough.

      What you imply here is problematic for several reasons.

      First, as you point out later, some large-effect mutations (e.g. transpositions) can happen quickly.

      Second, it's unclear what "the time units of million of years" means here... mutations occur, segregate in populations, and are selected. The speed of this process depends on the context and genetic architectures.

      Third, I think I understand what you mean with "to explain the evolution of the phenotype is not enough", but this would probably need a reformulation and I don't think it's relevant to bring it here. After all, you used loss-of-function mutants to explain the evolution of artificially selected mutants. The evolutionary insights from these mutants are limited. Random mutations at the mamo locus are perfectly sufficient here to explain the bd and bdf phenotypes and larval traits.

      Response: Thank you very much for your careful work. Charles Darwin himself, who argued that “natural selection can act only by taking advantage of slight successive variations; she can never take a leap, but must advance by the shortest and slowest steps” (Darwin, C. R. 1859). This ‘micromutational’ view of adaptation proved extraordinarily influential. However, the accumulation of micromutations is a lengthy process, which requires a very long time to evolve a significant phenotype. This may be only a proportion of the cases. Interestingly, recent molecular biology studies have shown that the evolution of some morphological traits involves a modest number of genetic changes (H Allen Orr. 2005).

      One example is the genetic basis analysis of armor-plate reduction and pelvic reduction of the three-spined stickleback (Gasterosteus aculeatus) in postglacial lakes. Although the marine form of this species has thick armor, the lake population (which was recently derived from the marine form) does not. The repeated independent evolution of lake morphology has resulted in reduced armor plate and pelvic structures, and there is no doubt that these morphological changes are adaptive. Research has shown that pelvic loss in different natural populations of three-spined stickleback fish occurs by regulatory mutations deleting a tissue-specific enhancer (Pel) of the pituitary homeobox transcription factor 1 (Pitx1) gene. The researchers genotyped 13 pelvic-reduced populations of three-spined stickleback from disparate geographic locations. Nine of the 13 pelvic-reduced stickleback populations had sequence deletions of varying lengths, all of which were located at the Pel enhancer. Relying solely on random mutations in the genome cannot lead to such similar mutation forms among different populations. The author suggested that the Pitx1 locus of the stickleback genome may be prone to double-stranded DNA breaks that are subsequently repaired by NHEJ (Yingguang Frank Chan et al., 2010).

      The bd and bdf mutants used in the study are formed spontaneously. Natural mutation is one of the driving forces of evolution. Nevertheless, we have rewritten the content of this section.

      Darwin, C. R. The Origin of Species (J. Murray, London, 1859).

      H Allen Orr. The genetic theory of adaptation: a brief history. Nat Rev Genet. 2005 Feb;6(2):119-27. doi: 10.1038/nrg1523.

      Yingguang Frank Chan et al., Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science. 2010 Jan 15;327(5963):302-5. doi: 10.1126/science.1182213. Epub 2009 Dec 10.

      Wu et al: Interestingly, the larva of peppered moths has multiple visual factors encoded by visual genes, which are conserved in multiple Lepidoptera, in the skin. Even when its compound eyes are covered, it can rely on the skin to feel the color of the environment to change its body color and adapt to the environment(48). Therefore, caterpillars/insects can distinguish the light wave frequency of the background. We suppose that perceptual signals can stimulate the GRN, the GRN guides the expression of some transcription factors and epigenetic factors, and the interaction of epigenetic factors and transcription factors can open or close the chromatin of corresponding downstream genes, which can guide downstream target gene expression.

      This is extremely confusing because you are bringing in a plastic trait here. It's possible there is a connection between the sensory stimulus and the regulation of mamo in peppered moths, but this is a mere hypothesis. Here, by mentioning a plastic trait, this paragraph sounds as if it was making a statement about directed evolution, especially after implying in the previous sentence that (paraphrasing) "random mutations are not enough". To be perfectly honest, the current writing could be misinterpreted and co-opted by defenders of the Intelligent Design doctrine. I believe and trust this is not your intention.

      Response: Thank you very much for your careful work. The plasticity of the body color of peppered moth larvae is very interesting, but we mainly wanted to emphasize that their skin shows the products of visual genes that can sense the color of the environment by perceiving light. Moreover, these genes are conserved in many insects. Human skin can also perceive light by opsins, suggesting that they might initiate light–induced signaling pathways (Haltaufderhyde K et al., 2015). This indicates that the perception of environmental light by the skin of animals and the induction of feedback through signaling pathways is a common phenomenon. For clarity, we have rewritten this section of the manuscript.

      Haltaufderhyde K, Ozdeslik RN, Wicks NL, Najera JA, Oancea E. Opsin expression in human epidermal skin. Photochem Photobiol. 2015;91(1):117-123.

      Wu et al: In addition, during the opening of chromatin, the probability of mutation of exposed genomic DNA sequences will increase (49).

      Here again, this is veering towards a strongly Lamarckian view with the environment guiding specific mutation. I simply cannot see how this would apply to mamo, nothing in the current article indicates this could be the case here. Among many issues with this, it's unclear how chromatin opening in the larval integument may result in heritable mutations in the germline.

      Response: Thank you very much for your careful work. Previous studies have shown that there is a mutation bias in the genome; compared with the intergenic region, the mutation frequency is reduced by half inside gene bodies and by two-thirds in essential genes. In addition, they compared the mutation rates of genes with different functions. The mutation rate in the coding region of essential genes (such as translation) is the lowest, and the mutation rates in the coding region of specialized functional genes (such as environmental response) are the highest. These patterns are mainly affected by the traits of the epigenome (J Grey Monroe et al., 2022).

      In eukaryotes, chromatin is organized as repeating units of nucleosomes, each consisting of a histone octamer and the surrounding DNA. This structure can protect DNA. When one gene is activated, the chromatin region of this gene is locally opened, becoming an accessible region. Research has found that DNA accessibility can lead to a higher mutation rate in the region (Radhakrishnan Sabarinathan et al., 2016; Schuster-Böckler B et al., 2012; Lawrence MS et al., 2013; Polak P et al., 2015). In addition, the BTB-ZF protein mamo belongs to this family and can recruit histone modification factors such as DNA methyltransferase 1 (DMNT1), cullin3 (CUL3), histone deacetylase 1 (HDAC1), and histone acetyltransferase 1 (HAT1) to perform chromatin remodeling at specific genomic sites. Although mutations can be predicted by the characteristics of apparent chromatin, the forms of mutations are diverse and random. Therefore, this does not violate randomness. For clarity, we have rewritten this section of the manuscript.

      J Grey Monroe, Mutation bias reflects natural selection in Arabidopsis thaliana. Nature. 2022 Feb;602(7895):101-105.

      Sabarinathan R, Mularoni L, Deu-Pons J, Gonzalez-Perez A, López-Bigas N. Nucleotide excision repair is impaired by binding of transcription factors to DNA. Nature. 2016;532(7598):264-267.

      Schuster-Böckler B, Lehner B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature. 2012;488(7412):504-507.

      Lawrence MS, Stojanov P, Polak P, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499(7457):214-218.

      Polak P, Karlić R, Koren A, et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature. 2015;518(7539):360-364.

      Mathew R, Seiler MP, Scanlon ST, et al. BTB-ZF factors recruit the E3 ligase cullin 3 to regulate lymphoid effector programs. Nature. 2012;491(7425):618-621.

      Wu et al: Transposon insertion occurs in a timely manner upstream of the cortex gene in melanic pepper moths (47), which may be caused by the similar binding of transcription factors and opening of chromatin.

      No, we do not think that the peppered moth mutation is Lamarckian at all, as seems to be inferred here (notice that by mentioning the peppered moth twice, you are juxtaposing a larval plastic trait and then a purely genetic wing trait, making it even more confusing). Also, the "in a timely manner" is superfluous, because all the data are consistent with a chance mutation being eventually picked up by strong directional mutation. The mutation and selection did NOT occur at the same time.

      Response: Thank you very much for your careful work. The insertion of one transposon into the first intron of the cortex gene of industrial melanism in peppered moth occurred in approximately 1819, which is similar to the time of industrial development in the UK (Arjen E Van't Hof, et al., 2016). In multiple species of Heliconius, the cortex gene is the shared genetic basis for the regulation of wing coloring patterns. Interestingly, the SNP of the cortex, associated with the wing color pattern, does not overlap among different Heliconius species, such as H. erato dephoon and H. erato favorinus, which suggests that the mutations of this cortex gene have different origins (Nadeau NJ et al., 2016). In addition, in Junonia coenia (van der Burg KRL et al., 2020) and Bombyx mori (Ito K et al., 2016), the cortex gene is a candidate for regulating changes in wing coloring patterns. Overall, the cortex gene is an evolutionary hotspot for the variation of multiple butterfly and moth wing coloring patterns. In addition, it was observed that the variations in the cortex are diverse in these species, including SNPs, indels, transposon insertions, inversions, etc. This indicates that although there are evolutionary hotspots in the insect genome, this variation is random. Therefore, this is not completely detached from randomness.

      Arjen E Van't Hof, et al., The industrial melanism mutation in British peppered moths is a transposable element. Nature. 2016 Jun 2;534(7605):102-5. doi: 10.1038/nature17951.

      Nadeau NJ, Pardo-Diaz C, Whibley A, et al. The gene cortex controls mimicry and crypsis in butterflies and moths. Nature. 2016;534(7605):106-110.

      van der Burg KRL, Lewis JJ, Brack BJ, Fandino RA, Mazo-Vargas A, Reed RD. Genomic architecture of a genetically assimilated seasonal color pattern. Science. 2020;370(6517):721-725.

      Ito K, Katsuma S, Kuwazaki S, et al. Mapping and recombination analysis of two moth colour mutations, Black moth and Wild wing spot, in the silkworm Bombyx mori. Heredity (Edinb). 2016;116(1):52-59.

      Wu et al: Therefore, we proposed that the genetic basis of color pattern evolution may mainly be system-guided programmed events that induce mutations in specific genomic regions of key genes rather than just random mutations of the genome.

      While the mutational target of pigment evolution may involve a handful of developmental regulator genes, you do not have the data to infer such a strong conclusion at the moment.

      The current formulation is also quite strong and teleological: "system-guided programmed events" imply intentionality or agency, an idea generally assigned to the anti-scientific Intelligent Design movement. There are a few examples of guided mutations, such as the adaptation phase of gRNA motifs in bacterial CRISPR assays, where I could see the term ""system-guided programmed events" to be applicable. But it is irrelevant here.

      Response: Thank you very much for your careful work. The CRISPR-CAS9 system is indeed very well known. In addition, recent studies have found the existence of a Cas9-like gene editing system in eukaryotes, such as Fanzor. Fanzor (Fz) was reported in 2013 as a eukaryotic TnpB-IS200/IS605 protein encoded by the transposon origin, and it was initially thought that the Fz protein (and prokaryotic TnpBs) might regulate transposon activity through methyltransferase activity (Saito M et al., 2023). Fz has recently been found to be a eukaryotic CRISPR‒Cas system. Although this system is found in fungi and mollusks, it raises hopes for scholars to find similar systems in other higher animals. However, before these gene-editing systems became popular, zinc finger nucleases (ZFNs) were already being studied as a gene-editing system in many species. The mechanism by which ZFN recognizes DNA depends on its zinc finger motif (Urnov FD et al., 2005). This is consistent with the mechanism by which transcription factors recognize DNA-binding sites.

      Furthermore, a very important evolutionary event in sexual reproduction is chromosome recombination during meiosis, which helps to produce more abundant alleles. Current research has found that this recombination event is not random. In mice and humans, the PRDM9 transcription factors are able to plan the sites of double-stranded breaks (DSBs) in meiosis recombination. PRDM9 is a histone methyltransferase consisting of three main regions: an amino-terminal region resembling the family of synovial sarcoma X (SSX) breakpoint proteins, which contains a Krüppel-associated box (KRAB) domain and an SSX repression domain (SSXRD); a PR/SET domain (a subclass of SET domains), surrounded by a pre-SET zinc knuckle and a post-SET zinc finger; and a long carboxy-terminal C2H2 zinc finger array. In most mammalian species, during early meiotic prophase, PRDM9 can determine recombination hotspots by H3K4 and H3K36 trimethylation (H3K4me3 and H3K36me3) of nucleosomes near its DNA-binding site. Subsequently, meiotic DNA DSBs are formed at hotspots through the combined action of SPO11 and TOPOVIBL. In addition, some proteins (such as RAD51) are involved in repairing the break point. In summary, programmed events of induced and repaired DSBs are widely present in organisms (Bhattacharyya T et al., 2019).

      These studies indicate that on the basis of randomness, the genome also exhibits programmability.

      Saito M, Xu P, Faure G, et al. Fanzor is a eukaryotic programmable RNA-guided endonuclease. Nature. 2023;620(7974):660-668.

      Urnov FD, Miller JC, Lee YL, et al. Highly efficient endogenous human gene correction using designed zinc-finger nucleases. Nature. 2005;435(7042):646-651.

      Bhattacharyya T, Walker M, Powers NR, et al. Prdm9 and Meiotic Cohesin Proteins Cooperatively Promote DNA Double-Strand Break Formation in Mammalian Spermatocytes [published correction appears in Curr Biol. 2021 Mar 22;31(6):1351]. Curr Biol. 2019;29(6):1002-1018.e7.

      Wu et al: Based on this assumption, animals can undergo phenotypic changes more quickly and more accurately to cope with environmental changes. Thus, seemingly complex phenotypes such as cryptic coloring and mimicry that are highly similar to the background may have formed in a short period. However, the binding sites of some transcription factors widely distributed in the genome may be reserved regulatory interfaces to cope with potential environmental changes. In summary, the regulation of genes is smarter than imagined, and they resemble a more advanced self-regulation program.

      Here again, I can agree with the idea that certain genetic architectures can evolve quickly, but I cannot support the concept that the genetic changes are guided or accelerated by the environment. And again, none of this is relevant to the current findings about Bm-mamo.

      Response: Thank you very much for your careful work. Darwin's theory of natural selection has epoch-making significance. I deeply believe in the theory that species strive to evolve through natural selection. However, with the development of molecular genetics, Darwinism’s theory of undirected random mutations and slow accumulation of micromutations resulting in phenotype evolution has been increasingly challenged.

      The prerequisite for undirected random mutations and micromutations is excessive reproduction to generate a sufficiently large population. A sufficiently large population can contain sufficient genotypes to face various survival challenges. However, it is difficult to explain how some small groups and species with relatively low fertility rates have survived thus far. More importantly, the theory cannot explain the currently observed genomic mutation bias. In scientific research, every theory is constantly being modified to adapt to current discoveries. The most famous example is the debate over whether light is a particle or a wave, which has lasted for hundreds of years. However, in the 20th century, both sides seemed to compromise with each other, believing that light has a wave‒particle duality.

      Epigenetics has developed rapidly since 1987. Epigenetics has been widely accepted, defined as stable inheritance caused by chromosomal conformational changes without altering the DNA sequence, which differs from genetic research on variations in gene sequences. However, an increasing number of studies have found that histone modifications can affect gene sequence variation. In addition, both histones and epigenetic factors are essentially encoded by genes in the genome. Therefore, genetics and epigenetics should be interactive rather than parallel. However, some transcription factors play an important role in epigenetic modifications. Meiotic recombination is a key process that ensures the correct separation of homologous chromosomes through DNA double-stranded break repair mechanisms. The transcription factor PRDM9 can determine recombination hotspots by H3K4 and H3K36 trimethylation (H3K4me3 and H3K36me3) of nucleosomes near its DNA-binding site (Bhattacharyya T et al., 2019). Interestingly, mamo has been identified as an important candidate factor for meiosis hotspot setting in Drosophila (Winbush A et al., 2021).

      Bhattacharyya T, Walker M, Powers NR, et al. Prdm9 and Meiotic Cohesin Proteins Cooperatively Promote DNA Double-Strand Break Formation in Mammalian Spermatocytes [published correction appears in Curr Biol. 2021 Mar 22;31(6):1351]. Curr Biol. 2019;29(6):1002-1018.e7.

      Winbush A, Singh ND. Genomics of Recombination Rate Variation in Temperature-Evolved Drosophila melanogaster Populations. Genome Biol Evol. 2021;13(1): evaa252.

      Reviewer #2 (Recommendations For The Authors):

      Major comments

      Response: Thank you very much for your careful work. First, we believe that competitive research is sometimes coincidental and sometimes intentional. Our research began in 2009, when we began to configure the recombinant population. In 2016, we published an article on comparative transcriptomics (Wu et al. 2016). The article mentioned above has a strong interest in our research and is based on our transcriptome analysis for further research, with the aim of making a preemptive publication.

      To discourage such behavior, we cannot cite it and do not want to discuss it in our paper.

      Songyuan Wu et al. Comparative analysis of the integument transcriptomes of the black dilute mutant and the wild-type silkworm Bombyx mori. Sci Rep. 2016 May 19:6:26114. doi: 10.1038/srep26114.

      • line 52-54. The numerous biological functions of insect coloration have been thoroughly investigated. It is reasonable to expect more references for each function.

      Response: Thank you very much for your careful work. We have made the appropriate modifications.

      Sword GA, Simpson SJ, El Hadi OT, Wilps H. Density-dependent aposematism in the desert locust. Proc Biol Sci. 2000;267(1438):63-68. … Behavior.

      Barnes AI, Siva-Jothy MT. Density-dependent prophylaxis in the mealworm beetle Tenebrio molitor L. (Coleoptera: Tenebrionidae): cuticular melanization is an indicator of investment in immunity. Proc Biol Sci. 2000;267(1439):177-182. … Immunity.

      N. F. Hadley, A. Savill, T. D. Schultz, Coloration and Its Thermal Consequences in the New-Zealand Tiger Beetle Neocicindela-Perhispida. J Therm Biol. 1992;17, 55-61…. Thermoregulation.

      Y. G. Hu, Y. H. Shen, Z. Zhang, G. Q. Shi, Melanin and urate act to prevent ultraviolet damage in the integument of the silkworm, Bombyx mori. Arch Insect Biochem. 2013; 83, 41-55…. UV protection.

      M. Stevens, G. D. Ruxton, Linking the evolution and form of warning coloration in nature. P Roy Soc B-Biol Sci. 2012; 279, 417-426…. Aposematism.

      K. K. Dasmahapatra et al., Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature.2012; 487, 94-98…. Mimicry.

      Gaitonde N, Joshi J, Kunte K. Evolution of ontogenic change in color defenses of swallowtail butterflies. Ecol Evol. 2018;8(19):9751-9763. Published 2018 Sep 3. …Crypsis.

      B. S. Tullberg, S. Merilaita, C. Wiklund, Aposematism and crypsis combined as a result of distance dependence: functional versatility of the colour pattern in the swallowtail butterfly larva. P Roy Soc B-Biol Sci.2005; 272, 1315-1321…. Aposematism and crypsis combined.

      • line 59-60. This general statement needs to be rephrased. I suggest remaining simple by indicating that insect coloration can be pigmentary, structural, or bioluminescent. About the structural coloration and associated nanostructures, the authors could cite recent reviews, such as: Seago et al., Interface 2009 + Lloyd and Nadeau, Current Opinion in Genetics & Development 2021 + "Light as matter: natural structural colour in art" by Finet C. 2023. I suggest doing the same for recent reviews that cover pigmentary and bioluminescent coloration in insects. The very recent paper by Nishida et al. in Cell Reports 2023 on butterfly wing color made of pigmented liquid is also unique and worth to consider.

      Response: Thank you very much for your careful work. We have made the appropriate modifications.

      Insect coloration can be pigmentary, structural, or bioluminescent. Pigments are mainly synthesized by the insects themselves and form solid particles that are deposited in the cuticle of the body surface and the scales of the wings (10, 11). Interestingly, recent studies have found that bile pigments and carotenoid pigments synthesized through biological synthesis are incorporated into body fluids and passed through the wing membranes of two butterflies (Siproeta stelenes and Philaethria diatonica) via hemolymph circulation, providing color in the form of liquid pigments (12). The pigments form colors by selective absorption and/or scattering of light depending on their physical properties (13). However, structural color refers to colors, such as metallic colors and iridescence, generated by optical interference and grating diffraction of the microstructure/nanostructure of the body surface or appendages (such as scales) (14, 15). Pigment color and structural color are widely distributed in insects and can only be observed by the naked eye in illuminated environments. However, some insects, such as fireflies, exhibit colors (green to orange) in the dark due to bioluminescence (16). Bioluminescence occurs when luciferase catalyzes the oxidation of small molecules of luciferin (17). In conclusion, the color patterns of insects have evolved to be highly sophisticated and are closely related to their living environments. For example, cryptic color can deceive animals via high similarity to the surrounding environment. However, the molecular mechanism by which insects form precise color patterns to match their living environment is still unknown.

      • RNAi approach. I have no doubt that obtaining phenocopies by electroporation might be difficult. However, I find the final sampling a bit limited to draw conclusions from the RT-PCR (n=5 and n=3 for phenocopies and controls). Three control individuals is a very low number. Moreover, it would nice to see the variability on the plot, using for example violin plots.

      Response: Thank you very much for your careful work. In the RNAi experiment, we injected more than 20 individuals in the experimental group and control group. We have added the RNAi data in Figure 4.

      Author response table 1.

      • Figure 6. Higher magnification images of Dazao and Bm-mamo knockout are needed, as shown in Figure 5 on RNAi.

      Response: Thank you very much for your careful work. We have added enlarged images.

      Author response image 3.

      • Phylogenetic analysis/Figure S6. I am not sure to what extent the sampling is biased or not, but if not, it is noteworthy that mamo does not show duplicated copies (negative selection?). It might be interesting to discuss this point in the manuscript.

      Response: Thank you very much for your careful work. mamo belongs to the BTB/POZ zinc finger family. The members of this family exhibit significant expansion in vertebrates. For example, there are 3 members in C. elegans, 13 in D. melanogaster, 16 in Bombyx mori, 58 in M. musculus and 63 in H. sapiens (Wu et al, 2019). These members contain conserved BTB/POZ domains but vary in number and amino acid residue compositions of the zinc finger motifs. Due to the zinc finger motifs that bind to different DNA recognition sequences, there may be differences in their downstream target genes. Therefore, when searching for orthologous genes from different species, we required high conservation of their zinc finger motif sequences. Due to these strict conditions, only one orthologous gene was found in these species.

      • Differentially-expressed genes and CP candidate genes (line 189-191). The manuscript would gain in clarity if the authors explain more in details their procedure. For instance, they moved from a list of 191 genes to CP genes only. Can they say a little bit more about the non-CP genes that are differentially expressed? Maybe quantify the number of CPs among the total number of differentially-expressed genes to show that CPs are the main class?

      Response: Thank you very much for your careful work. The nr (Nonredundant Protein Sequence Database) annotations for 191 differentially expressed genes in Supplemental Table S3 were added. Among them, there were 19 cuticular proteins, 17 antibacterial peptide genes, 6 transporter genes, 5 transcription factor genes, 5 cytochrome genes, 53 enzyme-encoding genes and others. Because CP genes were significantly enriched in differentially expressed genes (DEGs), previous studies have found that BmorCPH24 can affect pigmentation. Therefore, we first conducted an investigation into CP genes.

      • Interaction between Bm-mamo. It is not clear why the authors chose to investigate the physical interaction of Bm-mamo protein with the putative binding site of yellow, and not with the sites upstream of tan and DDC. Do the authors test one interaction and assume the conclusion stands for the y, tan and DDC?

      Response: Thank you very much for your careful work. In D. melanogaster, the yellow gene is the most studied pigment gene. The upstream and intron sequences of the yellow gene have been identified as containing multiple cis-regulatory elements. Due to the important pigmentation role of the yellow gene and its variable cis-regulatory sequence among different species, it has been considered a research model for cis-regulatory elements (Laurent Arnoult et al. 2013, Gizem Kalay et al. 2019, Yaqun Xin et al. 2020, Yann Le Poul et al. 2020). We use yellow as an example to illustrate the regulation of the mamo gene. We added this description to the discussion.

      Laurent Arnoult et al. Emergence and diversification of fly pigmentation through evolution of a gene regulatory module. Science. 2013 Mar 22;339(6126):1423-6. doi: 10.1126/science.1233749.

      Gizem Kalay et al. Redundant and Cryptic Enhancer Activities of the Drosophila yellow Gene. Genetics. 2019 May;212(1):343-360. doi: 10.1534/genetics.119.301985. Epub 2019 Mar 6.

      Yaqun Xin et al. Enhancer evolutionary co-option through shared chromatin accessibility input. Proc Natl Acad Sci U S A. 2020 Aug 25;117(34):20636-20644. doi: 10.1073/pnas.2004003117. Epub 2020 Aug 10.

      Yann Le Poul et al. Regulatory encoding of quantitative variation in spatial activity of a Drosophila enhancer. Sci Adv. 2020 Dec 2;6(49):eabe2955. doi: 10.1126/sciadv.abe2955. Print 2020 Dec.

      • Please note that some controls are missing for the EMSA experiments. For instance, the putative binding-sites should be mutated and it should be shown that the interaction is lost.

      Response: Thank you very much for your careful work. In this study, we found that the DNA recognition sequence of mamo is highly conserved across multiple species. In D. melanogaster, studies have found that mamo can directly bind to the intron of the vasa gene to activate its expression. The DNA recognition sequence they use is TGCGT (Shoichi Nakamura et al. 2019). We chose a longer sequence, GTGCGTGGC, to detect the binding of mamo. This binding mechanism is consistent across species.

      • Figure 7 and supplementary data. How did the name of CPs attributed? According to automatic genome annotation of Bm genes and proteins? Based on Drosophila genome and associated gene names? Did the authors perform phylogenetic analyses to name the different CP genes?

      Response: Thank you very much for your careful work. The naming of CPs is based on their conserved motif and their arrangement order on the chromosome. In previous reports, sequence identification and phylogenetic analysis of CPs have been carried out in silkworms (Zhengwen Yan et al. 2022, Ryo Futahashi et al. 2008). The members of the same family have sequence similarity between different species, and their functions may be similar. We have completed the names of these genes in the text, for example, changing CPR2 to BmorCPR2.

      Zhengwen Yan et al. A Blueprint of Microstructures and Stage-Specific Transcriptome Dynamics of Cuticle Formation in Bombyx mori. Int J Mol Sci. 2022 May 5;23(9):5155.

      Ningjia He et al. Proteomic analysis of cast cuticles from Anopheles gambiae by tandem mass spectrometry. Insect Biochem Mol Biol. 2007 Feb;37(2):135-46.

      Maria V Karouzou et al. Drosophila cuticular proteins with the R&R Consensus: annotation and classification with a new tool for discriminating RR-1 and RR-2 sequences. Insect Biochem Mol Biol. 2007 Aug;37(8):754-60.

      Ryo Futahashi et al. Genome-wide identification of cuticular protein genes in the silkworm, Bombyx mori. Insect Biochem Mol Biol. 2008 Dec;38(12):1138-46.

      • Discussion. I think the discussion would gain in being shorter and refocused on the understudied role of CPs. Another non-canonical aspect of the discussion is the reference to additional experiments (e.g., parthogenesis line 290-302, figure S14). This is not the place to introduce more results, and it breaks the flow of the discussion. I encourage the authors to reshuffle the discussion: 1) summary of their findings on mamo and CPs, 2) link between pigmentation mutant phenotypes, pigmentation pattern and CPs, 3) general discussion about the (evo-)devo importance of CPs and link between pigment deposition and coloration. Three important papers should be mentioned here:

      1) Matsuoka Y and A Monteiro (2018) Melanin pathway genes regulate color and morphology of butterfly wing scales. Cell Reports 24: 56-65... Yellow has a pleiotropic role in cuticle deposition and pigmentation.

      2) https://arxiv.org/abs/2305.16628... Link between nanoscale cuticle density and pigmentation

      3) https://www.cell.com/cell-reports/pdf/S2211-1247(23)00831-8.pdf... Variation in pigmentation and implication of endosomal maturation (gene red).

      Response: Thank you very much for your careful work. We have rewritten the discussion section.

      1) We have summarized our findings.

      Bm-mamo may affect the synthesis of melanin in epidermis cells by regulating yellow, DDC, and tan; regulate the maturation of melanin granules in epidermis cells through BmMFS; and affect the deposition of melanin granules in the cuticle by regulating CP genes, thereby comprehensively regulating the color pattern in caterpillars.

      2) We describe the relationship among the pigmentation mutation phenotype, pigmentation pattern, and CP.

      Previous studies have shown that the lack of expression of BmorCPH24, which encodes important components of the endocuticle, can lead to dramatic changes in body shape and a significant reduction in the pigmentation of caterpillars (53). We crossed Bo (BmorCPH24 null mutation) and bd to obtain F1(Bo/+Bo, bd/+), then self-crossed F1 and observed the phenotype of F2. The lunar spots and star spots decreased, and light-colored stripes appeared on the body segments, but the other areas still had significant melanin pigmentation in double mutation (Bo, bd) individuals (Fig. S13). However, in previous studies, introduction of Bo into L (ectopic expression of wnt1 results in lunar stripes generated on each body segment) (24) and U (overexpression of SoxD results in excessive melanin pigmentation of the epidermis) (58) strains by genetic crosses can remarkably reduce the pigmentation of L and U (53). Interestingly, there was a more significant decrease in pigmentation in the double mutants (Bo, L) and (Bo, U) than in (Bo, bd). This suggests that Bm-mamo has a stronger ability than wnt1 and SoxD to regulate pigmentation. On the one hand, mamo may be a stronger regulator of the melanin metabolic pathway, and on the other hand, mamo may regulate other CP genes to reduce the impact of BmorCPH24 deficiency.

      3) We discussed the importance of (evo-) devo in CPs and the relationship between pigment deposition and coloring.

      CP genes usually account for over 1% of the total genes in an insect genome and can be categorized into several families, including CPR, CPG, CPH, CPAP1, CPAP3, CPT, CPF and CPFL (68). The CPR family is the largest group of CPs, containing a chitin-binding domain called the Rebers and Riddiford motif (R&R) (69). The variation in the R&R consensus sequence allows subdivision into three subfamilies (RR-1, RR-2, and RR-3) (70). Among the 28 CPs, 11 RR-1 genes, 6 RR-2 genes, 4 hypothetical cuticular protein (CPH) genes, 3 glycine-rich cuticular protein (CPG) genes, 3 cuticular protein Tweedle motif (CPT) genes, and 1 CPFL (like the CPFs in a conserved C-terminal region) gene were identified. The RR-1 consensus among species is usually more variable than RR-2, which suggests that RR-1 may have a species-specific function. RR-2 often clustered into several branches, which may be due to gene duplication events in co-orthologous groups and may result in conserved functions between species (71). The classification of CPH is due to their lack of known motifs. In the epidermis of Lepidoptera, the CPH genes often have high expression levels. For example, BmorCPH24 had a highest expression level, in silkworm larvae epidermis (72). The CPG protein is rich in glycine. The CPH and CPG genes are less commonly found in insects outside the order Lepidoptera (73). This suggests that they may provide species specific functions for the Lepidoptera. CPT contains a Tweedle motif, and the TweedleD1 mutation has a dramatic effect on body shape in D. melanogaster (74). The CPFL members are relatively conserved in species and may be involved in the synthesis of larval cuticles (75). CPT and CPFL may have relatively conserved functions among insects. The CP genes are a group of rapidly evolving genes, and their copy numbers may undergo significant changes in different species. In addition, RNAi experiments on 135 CP genes in brown planthopper (Nilaparvata lugens) showed that deficiency of 32 CP genes leads to significant defective phenotypes, such as lethal, developmental retardation, etc. It is suggested that the 32 CP genes are indispensable, and other CP genes may have redundant and complementary functions (76). In previous studies, it was found that the construction of the larval cuticle of silkworms requires the precise expression of over two hundred CP genes (22). The production, interaction, and deposition of CPs and pigments are complex and precise processes, and our research shows that Bm-mamo plays an important regulatory role in this process in silkworm caterpillars. For further understanding of the role of CPs, future work should aim to identify the function of important cuticular protein genes and the deposition mechanism in the cuticle.

      Minor comments - Title. At this stage, there is no evidence that Bm-mamo regulates caterpillar pigmentation outside of Bombyx mori. I suggest to precise 'silkworm caterpillars' in the title.

      Response: Thank you very much for your careful work. We have modified the title.

      • Abstract, line 29. Because the knowledge on pigmentation pathway(s) is advanced, I would suggest writing 'color pattern is not fully understood' instead of 'color pattern is not clear'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 29. I suggest 'the transcription factor' rather than 'a transcription factor'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 30. If you want to mention the protein, the name 'Bm-mamo' should not be italicized.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 30. 'in the silkworm'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 31. 'mamo' should not be italicized.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 31. 'in Drosophila' rather 'of Drosophila'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 32. Bring detail if the gamete function is conserved in insects? In all animals?

      Response: Thank you very much for your careful work. The sentence was changed to “This gene has a conserved function in gamete production in Drosophila and silkworms and evolved a pleiotropic function in the regulation of color patterns in caterpillars.”

      • Introduction, line 51. I am not sure what the authors mean by 'under natural light'. Please rephrase.

      Response: Thank you very much for your careful work. We have deleted “under natural light”.

      • line 43. I find that the sentence 'In some studies, it has been proven that epidermal proteins can affect the body shape and appendage development of insects' is not necessary here. Furthermore, this sentence breaks the flow of the teaser.

      Response: Thank you very much for your careful work. We have deleted this sentence.

      • line 51-52. 'Greatly benefit them' should be rephrased in a more neutral way. For example, 'colours pattern have been shown to be involved in...'.

      Response: Thank you very much for your careful work. We have modified to “and the color patterns have been shown to be involved in…”

      • line 62. CPs are secreted by the epidermis, but I would say that CPs play their structural role in the cuticle, not directly in the epidermis. I suggest rephrasing this sentence and adding references.

      Response: Thank you very much for your careful work. We have modified “epidermis” to “cuticle”.

      • line 67. Please indicate that pathways have been identified/reported in Lepidoptera (11). Otherwise, the reader does not understand if you refer to previous biochemical in Drosophila for example.

      Response: Thank you very much for your careful work. We have modified this sentence. “Moreover, the biochemical metabolic pathways of pigments used for color patterning in Lepidoptera…have been reported.”

      • line 69. Missing examples of pleiotropic factors and associated references. For example, I suggest adding: engrailed (Dufour, Koshikawa and Finet, PNAS 2020) + antennapedia (Prakash et al., Cell Reports 2022) + optix (Reed et al., Science 2011), etc. Need to add references for clawless, abdominal-A.

      Response: Thank you very much for your careful work. We have made modifications.

      • line 76. The simpler term moth might be enough (instead of Lepidoptera).

      Response: Thank you very much for your careful work. We have modified this to “insect”.

      • line 96. I would simplify the text by writing "Then, quantitative RT-PCR was performed..."

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 112. 'Predict' instead of 'estimate'?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 113. I would rather indicate the full name first, then indicate mamo between brackets.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 144. The Perl script needs to be made accessible on public repository.

      Response: Thank you very much for your careful work.

      • line 147-150. Too many technical details here. The details are already indicated in the material and methods section. Furthermore, the details break the flow of the paragraph.

      Response: Thank you very much for your careful work. We have modified this section.

      • line 152. Needs to make the link with the observed phenotypes in Figure 1. Just needs to state that RNAi phenocopies mimic the mutant alleles.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 153-157. Too many technical details here. The details are already indicated in the material and methods section. Furthermore, the details break the flow of the paragraph.

      Response: Thank you very much for your careful work. We have simplified this paragraph.

      • line 170. Please rephrase 'conserved in 30 species' because it might be understood as conserved in 30 species only, and not in other species.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 182. Maybe explain the rationale behind restricting the analysis to +/- 2kb. Can you cite a paper that shows that most of binding sites are within 2kb from the start codon?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 182. '14,623 predicted genes'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 183. '10,622 genes'

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 183. Redundancy. Please remove 'silkworm' or 'B. mori'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 187. '10,072 genes'

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 188. '9,853 genes'

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 200. "Therefore, the differential...in caterpillars" is a strong statement.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 204. Remove "The" in front of eight key genes. Also, needs a reference... maybe a recent review on the biochemical pathway of melanin in insects.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 220. This sentence is too general and vague. Please explicit what you mean by "in terms of evolution". Number of insect species? Diversity of niche occupancy? Morphological, physiological diversity?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 285. The verb "believe" should be replaced by a more neutral one.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 354-355. This sentence needs to be rephrased in a more objective way.

      Response: Thank you very much for your careful work. We have rewritten this sentence.

      • line 378. Missing reference for MUSCLE.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 379. Pearson model?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 408. "The CRISPRdirect online software was used...".

      Response: Thank you very much for your careful work. We have modified this sentence.

      • Figure 1. In the title, I suggest indicating Dazao, bd, bdf as it appears in the figure. Needs to precise 'silkworm larval development'.

      Response: Thank you very much for your careful work. We have modified this figure title.

      • Figure 3. In the title, is the word 'pattern' really necessary? In the legend, please indicate the meaning of the acronyms AMSG and PSG.

      Response: Thank you very much for your careful work. We have modified this figure legend.

      • Figure S7A. Typo 'Znic finger 1', 'Znic finger 2', 'Znic finger 3',

      Response: Thank you very much for your careful work. We have fixed these typos. .

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:

      The authors identified that genetically and pharmacological inhibition of CERS1, an enzyme implicated in ceramides biosynthesis worsen muscle fibrosis and inflammation during aging.<br /> Strengths:

      The study points out an interesting issue on excluding CERS1 inhibition as a therapeutic strategy for sarcopenia. Overall, the article it's well written and clear.<br /> Weaknesses:

      Many of the experiments confirmed previous published data, which also show a decline of CERS1 in ageing and the generation and characterization of a muscle specific knockout mouse line. The mechanistic insights of how the increased amount of long ceramides (cer c24) and the decreased of shorter ones (cer c18) might influence muscle mass, force production, fibrosis and inflammation in aged mice have not been addressed.

      We thank the reviewer for the assessment and would like to point out that Cers1 had not previously been studied in the context of aging. Moreover, our unbiased pathway analyses in human skeletal muscle implicate CERS1 for the first time with myogenic differentiation, which we validate in cell culture systems. To improve mechanistic insights, as suggested by Reviewer #1, we performed more experiments to gain insights how Cers1 derived c18, and Cers2 derived c24 ceramide species affect myogenesis. We recently showed that knocking out Cers2 reduces c24:0/c24:1 and promotes muscle cell maturation (PMID: 37118545, Fig. 6m-r and Supplementary Fig. 5e). This suggests that the very long chain ceramides c24 might indeed be driving the effect we see upon Cers1 inhibition because we observe an accumulation of c24 ceramides upon Cers1 (c18) inhibition (Fig 2B, Fig 3B, Fig 4A, Fig S3E), which is associated with impaired muscle maturation (Fig 4B-C, Fig S3G-I, Fig S4G-I). To study whether impaired muscle cell differentiation upon Cers1 inhibition is dependent on Cers2, we knocked-down Cers1 alone, or in combination with the knockdown of Cers2. Results show that reduced muscle cell maturation mediated by Cers1KD is rescued by the simultaneous knockdown of Cers2 as shown by gene expression analyses and immunohistochemical validation and quantification. Hence, we believe that reducing Cers1 function during aging might lead to an increase in sphingosine levels as has been shown previously (PMID: 31692231). Increased sphingosine triggers cell apoptosis due to its toxicity (PMID: 12531554). Therefore, channeling accumulating sphingosine towards C24 ceramides may avoid toxicity but, as we show in this manuscript, will reduce the myogenic potential in muscle. However, if also C24 production is blocked by Cers2 inhibition, sphingosine is forced towards the production of other, potentially less toxic or myogenesis-impairing ceramides. We added these new data to the revised manuscript as new Fig 5D-E and new Fig S5G-I.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Wohlwend et al. investigates the implications of inhibiting ceramide synthase Cers1 on skeletal muscle function during aging. The authors propose a role for Cers1 in muscle myogenesis and aging sarcopenia. Both pharmacological and AAV-driven genetic inhibition of Cers1 in 18month-old mice lead to reduced C18 ceramides in skeletal muscle, exacerbating age-dependent features such as muscle atrophy, fibrosis, and center-nucleated fibers. Similarly, inhibition of the Cers1 orthologue in C. elegans reduces motility and causes alterations in muscle morphology.<br /> Strengths:

      The study is well-designed, carefully executed, and provides highly informative and novel findings that are relevant to the field.

      Weaknesses:

      The following points should be addressed to support the conclusions of the manuscript.

      (1) It would be essential to investigate whether P053 treatment of young mice induces age-dependent features besides muscle loss, such as muscle fibrosis or regeneration. This would help determine whether the exacerbation of age-dependent features solely depends on Cers1 inhibition or is associated with other factors related to age- dependent decline in cell function. Additionally, considering the reported role of Cers1 in whole-body adiposity, it is necessary to present data on mice body weight and fat mass in P053treated aged-mice.

      We thank the reviewer to suggest that we study Cers1 inhibition in young mice. In fact, a previous study shows that muscle-specific Cers1 knockout in young mice impairs muscle function (PMID: 31692231). Similar to our observation, these authors report reduced muscle fiber size and muscle force. Therefore, we do not believe that our observed effects of Cers1 inhibition in aged mice are specific to aging, although the phenotypic consequences are accentuated in aged mice. As requested by the reviewer, we attached the mice body weights and fat mass (Author response image 1A-B). The reduced fat mass upon P053 treatment is in line with previously reported reductions in fat mass in chow diet or high fat diet fed young mice upon Cers1 inhibition (PMID: 30605666, PMID: 30131496), again suggesting that the effect of Cers1 inhibition might not be specific to aging.

      Author response image 1.

      (A-B) Body mass (A) and Fat mass as % of body mass (B) were measured in 22mo C57BL/6J mice intraperitoneally injected with DMSO or P053 using EchoMRI (n=7-12 per group). (C-D) Grip strengh measurements in all limbs (C) or only the forelimbs (D) in 24mo C57BL/6J mice intramuscularly injected with AAV9 particles containing scramble, or shRNA targeting Cers1 (n=8 per group). (E-F) Pax7 gene expression in P053 or AAV9 treated mice (n=6-7 per group) (E), or in mouse C2C12 muscle progenitor cells treated with 25nM scramble or Cers1 targeting shRNA (n=8 per group) (F). (G) Proliferation as measured by luciferase intensity in mouse C2C12 muscle muscle cells treated with 25nM scramble or Cers1 targeting shRNA (n=24 per group). Each column represents one biological replicate. (H) Overlayed FACS traces of Annexin-V (BB515, left) and Propidium Iodide (Cy5, right) of mouse C2C12 muscle myotubes treated with 25nM scramble or Cers1 targeting shRNA (n=3 per group). Quantification right: early apoptosis (Annexin+-PI-), late apoptosis (Annexin+-PI+), necrosis (Annexin--PI+), viability (Annexin--PI-). (I) Normalized Cers2 gene expression in mouse C2C12 muscle muscle cells treated with 25nM scramble or Cers1 targeting shRNA (n=6-7 per group). (J-K) Representative mitochondrial respiration traces of digitonin-permeablized mouse C2C12 muscle muscle cells treated DMSO or P053 (J) with quantification of basal, ATP-linked, proton leak respiration as well as spare capacity and maximal capacity linked respiration (n=4 per group). (L) Reactive oxygen production in mitochondria of mouse C2C12 muscle muscle cells treated DMSO or P053. (M) Enriched gene sets related to autophagy and mitophagy in 24mo C57BL/6J mouse muscles intramuscularly injected with AAV9 particles containing scramble, or shRNA targeting Cers1 (left), or intraperitoneally injected with DMSO or P053 (right). Color gradient indicates normalized effect size. Dot size indicates statistical significance (n=6-8 per group). (N) Representative confocal Proteostat® stainings with quantifications of DMSO and P053 treated mouse muscle cells expressing APPSWE (top) and human primary myoblasts isolated from patients with inclusion body myositis (bottom). (O) Stillness duration during a 90 seconds interval in adult day 5 C. elegans treated with DMSO or 100uM P053. (P) Lifespan of C. elegans treated with DMSO or P053. (n=144-147 per group, for method details see main manuscript page 10).

      (2) As grip and exercise performance tests evaluate muscle function across several muscles, it is not evident how intramuscular AAV-mediated Cers1 inhibition solely in the gastrocnemius muscle can have a systemic effect or impact different muscles. This point requires clarification.

      The grip strength measurements presented in the manuscript come from hindlimb grip strength, as pointed out in the Methods section. We measured grip strength in all four limbs, as well as only fore- (Author response image 1C-D). While forelimb strength did not change, only hindlimb grip strength was significantly different in AAV-Cers1KD compared to the scramble control AAV (Fig 3I), which is in line with the fact that we only injected the AAV in the hindlimbs. This is similar to the effect we observed with our previous data where we saw altered muscle function upon IM AAV delivery in the gastrocnemius (PMID: PMID: 34878822, PMID: 37118545). The gastrocnemius likely has the largest contribution to hindlimb grip strength given its size, and possibly even overall grip strength as suggested by a trend of reduced grip strength in all four limbs (Author response image 1C). We also suspect that the hindlimb muscles have the largest contribution to uphill running as we could also see an effect on running performance. While we carefully injected a minimal amount of AAV into gastrocnemius to avoid leakage, we cannot completely rule out that some AAV might have spread to other muscles. We added this information to the discussion of the manuscript as a potential limitation of the study.

      (3) To further substantiate the role of Cers1 in myogenesis, it would be crucial to investigate the consequences of Cers1 inhibition under conditions of muscle damage, such as cardiotoxin treatment or eccentric exercise.<br /> While it would be interesting to study Cers1 in the context of muscle regeneration, and possibly mouse models of muscular dystrophy, we think such work would go beyond the scope of the current manuscript.

      (4) It would be informative to determine whether the muscle defects are primarily dependent on the reduction of C18-ceramides or the compensatory increase of C24-ceramides or C24-dihydroceramides.

      To improve mechanistic insights, as suggested by Reviewer #2, we performed more experiments to gain insights how Cers1 derived c18, and Cers2 derived c24 ceramide species affect myogenesis. We recently showed that knocking out Cers2 reduces c24:0/c24:1 and promotes muscle cell maturation (PMID: 37118545, Fig. 6m-r and Supplementary Fig. 5e). This suggests that the very long chain ceramides c24 might indeed be driving the effect we see upon Cers1 inhibition because we observe an accumulation of c24 ceramides upon Cers1 (c18) inhibition (Fig 2B, Fig 3B, Fig 4A, Fig S3E), which is associated with impaired muscle maturation (Fig 4B-C, Fig S3G-I, Fig S4G-I). To study whether impaired muscle cell differentiation upon Cers1 inhibition is dependent on Cers2, we knocked-down Cers1 alone, or in combination with the knockdown of Cers2. Results show that reduced muscle cell maturation mediated by Cers1KD is rescued by the simultaneous knockdown of Cers2 as shown by gene expression analyses and immunohistochemical validation and quantification. We added these data to the manuscript as new Fig 5D-E, new Fig S5G-I. These data, together with our previous results showing that Degs1 knockout reduces myogenesis (PMID: 37118545, Fig. 6s-x and Fig. 7) suggest that C24/dhC24 might contribute to the age-related impairments in myogenesis. We added the new results to the revised manuscript.

      (5) Previous studies from the research group (PMID 37118545) have shown that inhibiting the de novo sphingolipid pathway by blocking SPLC1-3 with myriocin counteracts muscle loss and that C18-ceramides increase during aging. In light of the current findings, certain issues need clarification and discussion. For instance, how would myriocin treatment, which reduces Cers1 activity because of the upstream inhibition of the pathway, have a positive effect on muscle? Additionally, it is essential to explain the association between the reduction of Cers1 gene expression with aging (Fig. 1B) and the age-dependent increase in C18-ceramides (PMID 37118545).

      Blocking the upstream enzyme of the ceramide pathway (SPT1) shuts down the entire pathway that is overactive in aging, and therefore seems beneficial for muscle aging. While most enzymes in the ceramide pathway that we studied so far (SPTLC1, CERS2) revealed muscle benefits in terms of myogenesis, inflammation (PMID: 35089797; PMID: 37118545) and muscle protein aggregation (PMID: 37196064), the CERS1 enzyme shows opposite effects. This is also visible in the direction of CERS1 expression compared to the other enzymes in one of our previous published studies (PMID: 37118545, Fig. 1e and Fig. 1f). In the current study, we show that Cers1 inhibition indeed exacerbates age-related myogenesis and inflammation as opposed to the inhibition of Sptlc1 or Cers2. As the reviewer points out, both C18- and C24-ceramides seem to accumulate upon muscle aging. We think this is due to an overall overactive ceramide biosynthesis pathway. Blocking C18-ceramides via Cers1 inhibition results in the accumulates C24-ceramides and worsens muscle phenotypes (see reply to question #4). On the other hand, blocking C24-ceramides via Cers2 inhibition improves muscle differentiation. These observations together with the finding that Cers1 mediated inhibition of muscle differentiation is dependent on proper Cers2 function (new Fig 5D-E, new Fig S5G-I) points towards C24-ceramides as the main culprit of reduced muscle differentiation. Hence, at least a significant part of the benefits of blocking SPTLC1 might have been related to reducing very long-chain ceramides. We believe that reduced Cers1 expression in skeletal muscle upon aging, observed by us and others (PMID: 31692231), might reflect a compensatory mechanism to make up for an overall overactive ceramide flux in aged muscles. Reducing Cers1 function during aging might lead to an increase in sphingosine levels as has been shown previously (PMID: 31692231). Increased sphingosine triggers cell apoptosis due to its toxicity (PMID: 12531554). Therefore, channeling accumulating sphingosine towards C24 ceramides may avoid toxicity but, as we show in this manuscript, will reduce the myogenic potential in muscle. However, if also C24 production is blocked by Cers2 inhibition (new Fig 5E-D, new Fig S5G-I), sphingosine is forced towards the production of other, potentially less toxic, or myogenesis-impairing ceramides. These data are now added to the revised manuscript (see page 7). Details were added to the discussion of the manuscript (see page 8).

      Addressing these points will strengthen the manuscript's conclusions and provide a more comprehensive understanding of the role of Cers1 in skeletal muscle function during aging.

      Reviewer #1 (Recommendations For The Authors):

      The authors identified that genetical and pharmacological inhibition of CERS1, an enzyme implicated in ceramides biosynthesis worsen muscle fibrosis and inflammation during aging.

      Even though many of the experiments only confirmed previous published data (ref 21, 11,37,38), which also show a decline of CERS1 in ageing and the generation and characterization of a muscle specific knockout mouse line, the study points out an interesting issue on excluding CERS1 inhibition as a therapeutic strategy for sarcopenia and opens new questions on understanding how inhibition of SPTLC1 (upstream CERS1) have beneficial effects in healthy aging (ref 15 published by the same authors).

      Overall, the article it's well written and clear. However, there is a major weakness. The mechanistic insights of how the increased amount of long ceramides (c24) and the decreased of shorter ones (cer c18) might influence muscle mass, force production, fibrosis and inflammation in aged mice have not been addressed. At the present stage the manuscript is descriptive and confirmatory of CERS1 mediated function in preserving muscle mass. The authors should consider the following points:

      Comments:

      (1) Muscle data

      (a) The effect of CERS1 inhibition on myotube formation must be better characterized. Which step of myogenesis is affected? Is stem cell renewal or MyoD replication/differentiation, or myoblast fusion or an increased cell death the major culprit of the small myotubes? Minor point: Figure S1C: show C14:00 level at 200 h; text of Fig S2A and 1F: MRF4 and Myogenin are not an early gene in myogenesis please correct, Fig S2B and 2C: changes in transcript does not mean changes in protein or myotube differentiation and therefore, authors must test myotube formation and myosin expression.

      Cers1 inhibition seems to affect differentiation and myoblast fusion. To test other suggested effects we performed more experiments as delineated. Inhibiting Cers1 systemically with the pharmacological inhibitor of Cers1 (P053) or with intramuscular delivery of AAV expressing a short hairpin RNA (shRNA) against Cers1 in mice did not affect Pax7 transcript levels (Author response image 1E). Moreover, we did also not observe an effect of shRNA targeting Cers1 on Pax7 levels in mouse C2C12 muscle progenitor cells (Author response image 1F). To characterize the effect of Cers1 inhibition on muscle progenitor proliferation/renewal, we used scramble shRNA, or shRNA targeting Cers1 in C2C12 muscle progenitors and measured proliferation using CellTiter-Glo (Promega). Results showed that Cers1KD had no significant effect on cell proliferation (Author response image 1G). Next, we assayed cell death in differentiating C2C12 myotubes deficient in Cers1 using FACS Analysis of Annexin V (left) and propidium iodide (right). We found no difference in early apoptosis, late apoptosis, necrosis, or muscle cell viability, suggesting that cell death can be ruled out to explain smaller myotubes (Author response image 1H). These findings support the notion that the inhibitory effect of Cers1 knockdown on muscle maturation are primarily based on effects on myogenesis rather than on apoptosis. Our data in the manuscript also suggests that Cers1 inhibition affects myoblast fusion, as shown by reduced myonucleation upon Cers1KD (Fig S3H right, Fig S5I).

      (b) The phenotype of CESR1 knockdown is milder than 0P53 treated mice (Fig S5D and Figure 3F, 3H are not significant) despite similar changes of Cer18:0, Cer24:0, Cer 24:1 concentration in muscles . Why?

      Increases in very long chain ceramides were in fact larger upon P053 administration compared to AAVmediated knockdown. For example, Cer24:0 levels increased by >50% upon P053 administration, compared to 20% by AAV injections. Moreover, dhC24:1 increased by 6.5-fold vs 2.5-fold upon P053 vs AAV treatment, respectively. These differences might not only explain the slightly attenuated phenotypes in the AA- treated mice but also underlines the notion that very long chain ceramides might cause muscle deterioration. We believe inhibiting the enzymatic activity of Cers1 (P053) as compared to degrading Cers1 transcripts is a more efficient strategy to reduce ceramide levels. However, we cannot completely rule out multi-organ, systemic effects of P053 treatment beyond its direct effect on muscle. We added these details in the discussion of the revised manuscript (see page 8 of the revised manuscript).

      (c) The authors talk about a possible compensation of CERS2 isoform but they never showed mRNA expression levels or CERS2 protein levels aner treatment. Is CERS2 higher expressed when CERS1 is downregulated in skeletal muscle?

      We appreciate the suggestion of the reviewer. We found no change in Cers2 mRNA levels upon Cers1 inhibition in mouse C2C12 myoblasts (Author response image 1I). We would like to point out that mRNA abundance might not be the optimal measurement for enzymes due to enzymatic activities. Therefore, we think metabolite levels are a better proxy of enzymatic activity. It should also be pointed out that “compensation” might not be an accurate description as sphingoid base substrate might simply be more available upon Cers1KD and hence, more substrate might be present for Cers2 to synthesize very long chain ceramides. This “re-routing” has been previously described in the literature and hypothesized to be related to avoid toxic (dh)sphingosine accumulation (PMID: 30131496). Therefore, we changed the wording in the revised manuscript to be more precise.

      (d) Force measurement of AAV CERS1 downregulated muscles could be a plus for the study (assay function of contractility)

      In the current study we measured grip strength in mice, which had previously been shown to be a good proxy of muscle strength and general health (PMID: 31631989). Indeed, our results of reduced muscle grip strength are in line with previous work that shows reduced contractility in muscles of Cers1 deficient mice (PMID: 31692231).

      (e) How are degradation pathways affected by the downregulation of CERS1. Is autophagy/mitophagy affected? How is mTOR and protein synthesis affected? There is a recent paper that showed that CerS1 silencing leads to a reduction in C18:0-Cer content, with a subsequent increase in the activity of the insulin pathway, and an improvement in skeletal muscle glucose uptake. Could be possible that CERS1 downregulation increases mTOR signalling and decreases autophagy pathway? Autophagic flux using colchicine in vivo would be useful to answer this hypothesis

      Cers1 in skeletal muscle has indeed been linked to metabolic homeostasis (see PMID: 30605666). In line with their finding in young mice we also find reduced fat mass upon P053 treatment in aged mice (Author response image 1A-B). We also looked into mitochondrial bioenergetics upon blocking Cers1 with P053 treatment using an O2k oxygraphy (Author response image 1J-L). Results show that Cers1 inhibition in mouse muscle cells increases mitochondrial respiration, similar to what has been shown before (PMID: 30131496). However, we also found that reactive oxygen species production in mouse muscle cells is increased upon P053 treatment, suggesting the presence of dysfunctional mitochondria upon inhibiting Cers1 with P053.We next looked into the mitophagy/autophagy degradation pathways suggested by the reviewer and do not find convincing evidence supporting that Cers1 has a major impact on autophagy or mitophagy derived gene sets in mice treated with shRNA against Cers1, or the Cers1 pharmacological inhibitor P053 (Author response image 1M).

      We then assessed the effect of Cers1 inhibition on transcripts levels related to the mTORC1/protein synthesis, as suggested by the reviewer. Cers1 knockdown in differentiating mouse muscle cells showed only a weak trend to reduce mTORC1 and its downstream targets (new Fig S4A). In line with this, there was no notable difference in protein synthesis in differentiating, Cers1 deficient mouse C2C12 myoblasts as assessed by L-homopropargylglycine (HPG) amino acid labeling using confocal microscopy (new Fig S4B) or FACS analyses (new Fig S4C). However, Cers1KD increased transcripts related to the myostatin-Foxo1 axis as well as the ubiquitin proteasome system (e.g. atrogin-1, MuRF1) (new Fig S4D), suggesting Cers1 inhibition increases protein degradation. We added these details to the revised manuscript on page 7. We recently implicated the ceramide pathway in regulating muscle protein homeostasis (PMID: 37196064). Therefore, we assessed the effect of Cers1 inhibition with the P053 pharmacological inhibitor on protein folding in muscle cells using the Proteostat dye that intercalates into the cross-beta spine of quaternary protein structures typically found in misfolded and aggregated proteins. Interestingly, inhibiting Cers1 further increased misfolded proteins in C2C12 mouse myoblasts expressing the Swedish mutation in APP and human myoblasts isolated from patients with inclusion body myositis (Author response imageure 1N). These findings suggest that deficient Cers1 might upregulate protein degradation to compensate for the accumulation of misfolded and aggregating proteins, which might contribute to impaired muscle function observed upon Cers1 knockdown. Further studies are needed to disentangle the underlying mechanstics.

      (f) The balances of ceramides have been found to play roles in mitophagy and fission with an impact on cell fate and metabolism. Did the authors check how are mitochondria morphology, mitophagy or how dynamics of mitochondria are altered in CERS1 knockdown muscles? (fission and fusion). There is growing evidence relating mitochondrial dysfunction to the contribution of the development of fibrosis and inflammation.

      Previously, CERS1 has been studied in the context of metabolism and mitochondria (for reference, please see PMID: 26739815, PMID: 29415895, PMID: 30605666, PMID: 30131496). In summary, these studies demonstrate that C18 ceramide levels are inversely related to insulin sensitivity in muscle and mitochondria, and that Cers1 inhibition improves insulin-stimulated suppression of hepatic glucose production and reduced high-fat diet induced adiposity. Moreover, improved mitochondrial respiration, citrate synthase activity and increased energy expenditure were reported upon Cers1 inhibition. Lack of Cers1 specifically in skeletal muscle was also reported to improve systemic glucose homeostasis. While these studies agree on the effect of Cers1 inhibition on fat loss, results on glucose homeostasis and insulin sensitivity differ depending on whether a pharmacologic or a genetic approach was used to inhibit Cers1. The current manuscript describes the effect of CERS1 on muscle function and myogenesis because these were the most strongly correlated pathways with CERS1 in human skeletal muscle (Fig 1C) and impact of Cers1 on these pathways is poorly studied, particularly in the context of aging. Therefore, we would like to refer to the mentioned studies investigating the effect of CERS1 on mitochondria and metabolism.

      (2) C.elegans data:

      (a) The authors checked maternal RNAi protocol to knockdown lagr-1 and showed alteration of muscle morphology at day 5. They also give pharmacological exposure of P053 drug at L4 stage. Furthermore, the authors also used a transgenic ortholog lagr-1 to perform the experiments. All of them were consistent showing a reduced movement. It would be important to show rescue of the muscle phenotype by overexpressing CERS1 ortholog in knockdown transgenic animals.

      We used RNAi to knockdown the Cers1 orthologue, lagr-1, in C.elegans. Therefore, we do not have transgenic animals. Overexpressing lagr-1 in the RNAi treated animals would also not be possible as the RNA from the overexpression would just get degraded.

      (b) The authors showed data about distance of C.elegans. It would be interesting to specify if body bends, reversals and stillness are affected in RNAi and transgenic Knockdown worms.

      As suggested, we measured trashing and stillness as suggested by the reviewer and found reduced trashing (new Fig S5B) and a trend towards an increase in stillness (Author response image 1O) in P053 treated worms on day 5 of adulthood, which is the day we observed significant differences in muscle morphology and movement (Fig 4D-E, Fig S5A). These data are now included in the revised manuscript.

      (c) Is there an effect on lifespan extension by knocking down CERS1?

      We performed two independent lifespan experiments in C.elegans treated with the Cers1 inhibitor P053 and found reduced lifespan in both replicate experiments (for second replicate, see Author response image 1P). We added these data to the revised manuscript as new Fig 4H.

      How do the authors explain the beneficial effect of sptlc1 inhibition on healthy aging muscle? Discuss more during the article if there is no possible explanation at the moment.

      We believe that blocking the upstream enzyme of the ceramide pathway (SPT1) shuts down the entire pathway that is overactive in aging, and therefore is more beneficial for muscle aging. Our current work suggests that at least a significant part of Sptlc1-KD benefits might stem from blocking very long chain ceramides. While SPTLC1 and CERS2 revealed muscle benefits in terms of myogenesis, inflammation (PMID: 35089797; PMID: 37118545) and muscle protein aggregation (PMID: 37196064), the CERS1 enzyme shows opposite effects, which is also visible in Fig 1e and Fig 1f of PMID: 37118545. In the current study, we show that Cers1 inhibition indeed exacerbates aging defects in myogenesis and inflammation as opposed to the inhibition of Sptlc1 or Cers2. The fact that the effect of Cers1 on inhibiting muscle differentiation is dependent on the clearance of Cers2-derived C24-ceramides suggests that reducing very long chain ceramides might be crucial for healthy muscle aging. We added details to the discussion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their time and insightful and constructive comments. We are pleased that reviewers found this study “opens the way for novel future work” and the findings “interesting”. We have experimentally addressed the points raised by the reviewers and have substantially revised the manuscript by modifying 30 figures panels. The reviewers’ points are specifically addressed below.

      1) The authors concluded that an accumulation of Ly6Clo monocytes occurred in the Rbpjfl/fl Lyz2cre/cre mouse by examining the percentage of cells among CD45+ cells in Figure 1. It would be helpful if the authors could give an account of the total cell count numbers of monocyte subsets per ml of blood and in the bone marrow to give the readers a better idea of the extent of increase as cell percentages among CD45+ cells may be influenced by the number of other immune subsets.

      We thank the reviewer for raising these points. In this research, we crossed Rbpjfl/fl mice with Lyz2-Cre mice carrying the Cre recombinase inserted in the Lysozyme-M (Lyz2) gene locus results in the selective deletion of RBP-J in myeloid cells, such as monocytes, macrophages and granulocytes. We then proceeded to examine the neutrophil levels in the bone marrow and blood. The percentage of neutrophils observed was found to be similar to that of control mice, which was in line with the findings reported in the literature (Metzemaekers et al. 2020). Furthermore, the proportion of Ly6Chi monocytes in RBP-J deficient mice was found to be similar to that of control mice, which is consistent with the literature (Ginhoux et al. 2014). Based on these results, we thought that the changes observed in the proportion of Ly6Clo monocytes could reliably indicate the alterations occurring in Ly6Clo monocytes within the Rbpjfl/flLyz2cre/cre mice.

      2) The authors demonstrated no significant differences in bone marrow progenitor and monocyte numbers, therefore concluding that monocyte egress from the bone marrow did not contribute to the increase in Ly6Clo monocyte numbers in the blood (Figure 1B-D). As it is unclear what is the exact cell number increase in the blood, the changes in bone marrow monocyte numbers might be too small to be reflected in their percentage calculations. In light that CCR2 was also found to play a role in Ly6Clo monocyte homeostasis in Rbpjfl/fl Lyz2cre/cre mice, could the authors demonstrate if Rbpj-deficient Ly6Clo monocytes might be more responsive to CCL2 through transwell experiments? This would also provide readers a more in-depth mechanism of how an increase in CCR2 on Rbpj-deficient Ly6Clo monocytes leads to their accumulation in the periphery.

      The experimental results regarding the proportion of monocytes and precursor cells in the bone marrow were derived from multiple experiments. The data obtained from individual experiments as well as the final integrated data did not reveal significant differences between the control mice and Rbpjfl/flLyz2cre/cre mice. Therefore, we believed that even if there were small changes in cell numbers, these differences could still be reflected through alterations in their proportions. We attempted transwell experiments, but unfortunately, they were not technically successful. Nearly all sorted Ly6Clo monocytes attached to the transwell membrane, making it challenging to draw a conclusion regarding the responsiveness of RBP-J deficient Ly6Clo monocytes to CCL2.

      3) In the parabiosis experiment conducted in Figure 3C-E, the authors provide conclusive evidence that the accumulation of Rbpj-deficient Ly6Clo monocytes was cell intrinsic as Rbpj-deficient Ly6Clo monocytes continued to accumulate in the blood of control counterparts. Monocytes have also been shown to accumulate in the spleen and re-enter or home back to the bone marrow. Assessing if there is a change in monocyte homing abilities in Rbpj-deficient Ly6Clo monocytes by examining their numbers in the spleen and bone marrow of control parabiotic mice would substantiate their claims that the defect was cell intrinsic and provide further understanding for the readers of why Rbpj-deficient Ly6Clo monocytes accumulate in the blood.

      We thank the reviewer for bringing out this interesting point. We also analyzed the proportions of GFP- Ly6Chi monocytes and Ly6Clo monocytes in the bone marrow of parabiotic mice. The experimental results revealed that there were no significant differences in the proportion of GFP- monocytes between the control mice and the KO animals (see the figure A below). We also detected the expression of CXCR4 in bone marrow Ly6Clo monocytes. Rbpjfl/flLyz2cre/cre mice exhibited normal expression of CXCR4 (see Author response image 1 below), which participates in the homing of classical and nonclassical monocytes to bone marrow and spleen monocyte reservoirs (Chong et al. 2016). The homing abilities of RBP-J deficient Ly6Clo monocytes may not have changed.

      Author response image 1.

      4) Authors should provide cell counts for Figure 5B to demonstrate the extent CCR2 depletion affects the number of Ly6Clo monocytes in Rbpjfl/fl Lyz2cre/cre mice as explained in point 1.

      As mentioned before, we believed that the proportion of circulating monocytes could, to some extent, provide evidence of the impact of CCR2 deficiency on Ly6Clo monocytes.

      Reviewer #2

      1) The confirmation of knockout in supplemental figure 1A shows only a two third knockdown when this should be almost totally gone. Perhaps poor primer design, cell sorting error or low Cre penetrance is to blame, but this is below the standard one would expect from a knockout.

      Kang et al (PMID: 31944217) evaluated the knockout efficiency of Rbpj in sorted colonic macrophages of Rbp-jfl/flLyz2cre/cre mice using qPCR and immunoblotting. The qPCR result indicated a two-third knockdown, while the immunoblotting results demonstrated efficient deletion of RBP-J protein in Rbp-jfl/flLyz2cre/cre mice. As pointed out by the reviewer, the observed two-third knockdown, which is lower than the expected complete knockout, may be attributed to primer design.

      2) Many figures (e.g. 1A) only show proportional data (%) when the addition of cell numbers would also be informative

      We appreciate the reviewer for bringing up these points. Indeed, multiple articles studying monocytes only show changes in cell proportions. As mentioned above, we believed that analyzing the proportion of circulating monocytes could offer valuable evidence of the influence of RBP-J deficiency on Ly6Clo monocytes.

      3) Many figures only have an n of 1 or 2 (e.g. 2B, 2C)

      Here, we employed annexin V (AnnV) and propidium iodide (PI) staining to evaluate apoptosis and cell death in Ly6Chi and Ly6Clo blood monocytes from control and RBPJ deficient mice. The results showed no significant difference in the levels of apoptosis and cell death between the two groups (see Author response image 2 below). The statistical data for Ki-67 expression obtained from multiple experiments, and the expression of Ki-67 showed no significant difference between the control and RBP-J deficient mice (see the figure B below). In Figure 2C, each dot represents 2-3 mice, and there were no differences observed between control and RBP-J deficient mice at multiple time points during the repeated measurements.

      Author response image 2.

      4) Sometimes strong statements were based on the lack of statistical significance, when more n number could have changed the interpretation (e.g. 2G, 3E)

      We have derived the corresponding conclusions based on the observed experimental results.

      5) There is incomplete analysis (e.g. Network analysis) and interpretation of RNAsequencing results (figure 4), the difference between the genotypes in both monocyte subsets would provide a more complete picture and potentially reveal mechanisms

      We thank the reviewer for bringing out this point. We agreed that a more comprehensive analysis, including a comparison between the genotypes in both monocyte subsets, would provide a deeper understanding and potentially uncover underlying mechanisms. Having observed alterations in blood Ly6Clo monocytes in RBP-J deficient mice, our primary focus had been on analyzing the differentially expressed genes within this subset of monocytes to gain further insights into its specific characteristics and behavior. We also uploaded sequencing data sets in the Genome Expression Omnibus with assigned accession numbers GSE208772 to facilitate interested researchers in accessing and downloading the data.

      6) The experiments in Figures 5 and 7 are missing a control (Lyz2cre/cre Ccr2RFP/RFP or the Rbpj+/+ versions) and may have been misinterpreted. For example if the control (RBP-J WT, CCR2 KO) was used then it would almost certainly show falling Ly6C low numbers compared to RBP-J WT CCR2 WT, but RBP-J KO CCR2 KO would still have more Ly6c low monocytes than RBP-J WT, CCR2 KO - meaning that the RBP-J function is independent of CCR2. I.e. Ly6c low numbers are mostly dependent on CCR2 but this is irrespective of RBP-J.

      The diminished Ly6Clo monocytes in Rbpjfl/flLyz2cre/creCcr2RFP/RFP (DKO) mice can be divided into two distinct subpopulations: one portion originates from Ly6Chi monocytes, while the other comprises Ly6Clo monocytes characterized by heightened CCR2 expression. The Ly6Clo monocytes that remain in DKO mice exhibit CCR2 expression levels within the normal range when compared to Lyz2cre/cre mice, but lower levels compared to RBP-J deficient mice (Figure 5A). These findings suggest that RBP-J exerts regulatory influence over Ly6Clo monocytes, at least in part, through CCR2.

      7) Figure 6 was difficult to interpret because of the lack of shown gating strategy. This reviewer assumes that alveolar macrophages were gated out of analysis

      The gating strategy of lung interstitial macrophage in the manuscript Figure 6 was consistent with the published work (Schyns et al, cited in the manuscript). We also measured alveolar macrophages (AM) from control and RBP-J deficient mice bronchoalveolar lavage fluid. At the resting state, RBP-J deficient mice exhibited normal AM frequency and number (see Author response image 3 below).

      Author response image 3.

      8) The statements around Figure 7 are not completely supported by the evidence, i) a significant proportion of CD16.2+ cells were CCR2 independent and therefore potentially not all recently derived from monocytes, and ii) there is nothing to suggest that the source was not Ly6C high monocytes that differentiated - the manuscript in general seems to miss the point that the source of the Ly6C low cells is almost certainly the Ly6C high monocytes - which further emphasises the importance of both cells in the sequencing analysis

      Schyns et al and Sabatel at al showed that the numbers of IM and CD16.2+ were similar in Ccr2 sufficient and Ccr2-/- mice, demonstrating that CD16.2+ cells were Ccr2 independent. The number of CD16.2+ cells was significantly reduced in Rbpjfl/flLyz2cre/creCcr2RFP/RFP mice as compared to Rbpjfl/flLyz2cre/cre mice, in line with decreased number of lung Ly6Clo monocytes and blood Ly6Clo monocytes, showing that CD16.2+ cells depended on Ccr2 for their presence in Rbpjfl/flLyz2cre/cre mice.

      9) The authors did not refer to or cite a similar 2020 study that also investigated myeloid deletion of Rbpj (Qin et al. 2020 - https://doi.org/10.1096/fj.201903086RR). Qin et al identified that Ly6Clo alveolar macrophages were decreased in this model - it is intriguing to synthesise these two studies and hypothesise that the ly6c low monocytes steal the lung niche, but this was not discussed

      We thank the reviewer for bringing this study to our attention. According to their findings, myeloid-specific RBP-J deficiency resulted in a decrease in Ly6CloCD11bhi alveolar macrophages but an increase in Ly6CloCD11blo alveolar macrophages after bleomycin treatment, while the total number of alveolar macrophages showed no significant difference. These results suggest that RBP-J may play a role in regulating the balance between these specific alveolar macrophage subsets in response to bleomycin-induced injury, without affecting the overall population of alveolar macrophages. This may be different from what we observe in interstitial macrophages under resting conditions.

      Reviewer #3

      1) It is curious that the authors do not see the increase in circulating monocytes reflected in the spleen however, the n-number is 2. Increasing the n-number would enable the author to understand the data which is not interpretable at the moment. There are multiple other places in which a low n-number makes it hard to fully understand the biology (eg Figure 2C&E)

      Although we only counted the number of splenic monocyte subsets in two mice, the proportion of splenic monocyte subsets was calculated based on additional quantity of mice in our study.

      2) Given that Ly6Clow monocytes are thought to be longer lived than Ly6C+ and there is still considerable labelling of Ly6Clow monocytes at the end of the 96 hours analysed in the EdU experiment, it is not possible to determine from the data here whether RBPJ deficiency increases life span. Could it be that differences in %EdU+ cells would only be seen at later time points? If the timeline was extended, could it be that differences in %EdU+ become apparent

      Based on the latex bead experiment, we observed that the presence of latex+ Ly6Clo monocytes at 7 days in control and RBP-J deficient mice did not differ, indicating that the lifespan of Ly6Clo monocytes did not increase.

      3) Similarly for the latex bead experiment. Given that there is only n=2 at the first time point and only ~30% of Ly6Clow monocytes are Latex+, it is very hard to conclusively claim that RBP-J does not influence monocyte survival or proliferation. An interesting experiment to assess whether RBP-J is increasing monocyte survival could be an adoptive transfer model in which Ly6Clow monocytes are injected into a congenic mouse and tracked over time.

      In RBP-J deficient mice, there was an increase in the proportion of Ly6Clo monocytes. We hypothesized that this lower proportion of latex+ cells might make it easier to observe differences, but clearly, in our experiment, no differences were observed between control and RBP-J deficient mice.

      4) RNA-seq: Ccr2 and Itgax are not the top hits. The authors do not investigate the top hits which may provide very interesting insight into how RBP-J influences monocyte biology.

      We thank the reviewer for raising these points. We also analyzed some top changed genes. The top two gene in the downregulated gene list are Hes1 and Nrarp, which are regulated by the Notch pathway (Krebs et al 2001 and Radtke et al 2010). We tested blood monocytes, but the population of monocyte subsets displayed no differences between Hes1fl/flRbp-jfl/flLyz2cre/cre and Rbp-jfl/flLyz2cre/cre mice (data not shown). As shown in Figure 2- figure supplement 1A, expression of Nr4a1 showed no significant differences between control and RBP-J deficient mice. The top gene in the upregulated gene list is Erdr1, which has been reported to play a role in cellular survival (Soto et al 2017), while blood monocyte subsets in RBP-J deficient mice displayed normal survival.

      5) The PCA plot in figure 4C- it would be interesting to see where all the biological replicates fall.

      We agree with the reviewer’s assessment that observing the positions of all biological replicates on the PCA plot may indeed yield valuable insights. However, it is worth noting that the upregulated and downregulated genes also offer suggestive hints.

      6) Based on CCR2 expression and CD11c expression, monocytes from RBP-J deficient mice look more like Ly6C+ monocytes - could it be that RBP-J is increasing conversion from Ly6C+ monocytes to Ly6Clow? Or could it be that Ly6Clow monocytes are heterogeneous and RBP-J is increasing survival or conversion of one subtype of Ly6Clow monocytes but looking at all Ly6Clow monocytes together is masking this?

      Ly6Clo monocyte can be subdivided into different subpopulations depending on surface makers, such as CD43, MHC-II, CD11c and CCR2 (Jakubzick et al 2013 and Ginhoux et al. 2014). Carlin et al founded that a subset of blood Ly6Clow cells was independent of both Ccr2 and Nr4a1. As said by the reviewer, Ly6Clo monocytes are heterogeneous. Therefore, there is a possibility of altered survival in a certain group of Ly6Clo monocytes.

      7) The data presented here suggest that lung CD16.2+ interstitial macrophages are derived from Ly6Clow monocytes which are increased via CCR2. Although the data are suggestive, they are not conclusive, lineage tracing and CCR2 blockade or better, conditional CCR2 deficiency would help to strengthen the claim.

      Schyns et al showed that the number of CD16.2+ was similar in Ccr2 sufficient and Ccr2-/- mice, demonstrating that CD16.2+ cells were Ccr2 independent. While number of CD16.2+ cells was significantly reduced in Rbpjfl/flLyz2cre/creCcr2RFP/RFP mice as compared to Rbpjfl/flLyz2cre/cre mice, in line with decreased number of lung Ly6Clo monocytes and blood Ly6Clo monocytes. Moreover, the turnover of lung Ly6Chi and Ly6Clo monocytes was normal. These results implicated that CD16.2+ cells depended on Ccr2 for their presence in Rbpjfl/flLyz2cre/cre mice.

      8) The figures could do with more headings/ more detailed legends to help the reader, for example including what is BM, what is blood, what is spleen. Figure 2E needs the days labelled on or above the histograms.

      We thank the reviewer for raising this important point. We have now added additional detailed legends to the figure.

      9) Gating strategies should be included to help the reader understand which cells you are looking at, especially for Figure 6&7.

      The gating strategy for Figures 6 and 7 followed the method reported in the literature, which included the identification of alveolar macrophages. Additionally, we labeled the markers for cell populations in the figure.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study aims to understand the malaria antigen-specific cTfh profile of children and adults living in a malaria holoendemic area. PBMC samples from children and adults were unstimulated or stimulated with PfSEA-1A or PfGARP in vitro for 6h and analysed by a cTfh-focused panel. Unsupervised clustering and analysis on cTfh were performed.

      The main conclusions are:

      (1) the cohort of children has more diverse (cTfh1/2/17) recall responses compared to the cohort of adults (mainly cTfh17) and

      (2) Pf-GARP stimulates better cTfh17 responses in adults, thus a promising vaccine candidate.

      Strengths:

      This study is in general well-designed and with excellent data analysis. The use of unsupervised clustering is a nice attempt to understand the heterogeneity of cTfh cells. Figure 9 is a beautiful summary of the findings.

      Weaknesses:

      (1) Most of my concerns are related to using PfSEA-1A and PfGARP to analyse cTfh in vitro stimulation response. In vitro, stimulation on cTfh cells has been frequently used (e.g. Dan et al, PMID: 27342848), usually by antigen stimulation for 9h and analysed CD69/CD40L expression, or 18h and CD25/OX40. However, the authors use a different strategy that has not been validated to analyse in vitro stimulated cTfh. Also, they excluded CD25+ cells which might be activated cTfh. I am concerned about whether the conclusions based on these results are reliable.

      It has been shown that cTfh cells can hardly produce cytokines by Dan et al. However, in this paper, the authors report the significant secretion of IL-4 and IFNg on some cTfh clusters after 6h stimulation. If the stimulation is antigen-specific through TCR, why cTfh1 cells upregulate IL-4 but not IFNg in Figure 6? I believe including the representative FACS plots of IL-4, IFNg, IL21 staining, and using %positive rather than MFI can make the conclusion more convincing. Similarly, the author should validate whether TCR stimulation under their system for 6h can induce robust BCL6/cMAF expression in cTfh cells. Moreover, there is no CD40L expression. Does this mean TCR stimulation mediated BCl6/cMAF upregulation and cytokine secretion precede CD40L expression?

      In summary, I am particularly concerned about the method used to analyse PfSEA-1A and PfGARP-specific cTfh responses because it lacks proper validation. I am unsure if the conclusions related to PfSEA-1A/PfGARP-specific responses are reliable.

      An unfortunate reality of these types of complex immunologic studies is that it takes time to optimize a multiparameter flow cytometry panel, run this number of samples, and then conduct the analysis (not to mention the time it takes for a manuscript to be accepted for peer-review). An unexpected delay, frankly, was the COVID-19 pandemic when non-essential research lab activities were put on hold. We designed our panel in 2019 and referred to the “T Follicular Helper Cells” Methods and Protocols book from Springer 2015. Obviously the field of human immunology took a huge leap forward during the pandemic as we sought to characterize components of protective immunity, and as a result there are several new markers we will choose for future studies of Tfh subsets. We agree with the reviewer that cytokine expression kinetics differ depending on the in vitro stimulation conditions. Due to small blood volumes obtained from healthy children, we were limited in the number of timepoints we could test. However, since we were most interested in IL21 expression, we found 6 hrs to be the best in combination with the other markers of interest during our optimization experiments. We did find IFNg expression from non-Tfh cells, therefore we believe our stimulation conditions worked.

      Dan et al used stimulated tonsils cells to assess the CXCR5<sup>pos</sup>PD1<sup>pos</sup>CD45RA<sup>neg</sup> Tfh and CXCR5<sup>neg</sup> CD45RA<sup>neg</sup> non-Tfh whereas in our study, we evaluated CXCR5<sup>pos</sup>PD1<sup>pos</sup>CD45RA<sup>neg</sup> Tfh from PBMCs. Dan et al PBMCs’ work used EBV/CMV or other pathogen product stimuli and only gated on CD25<sup>pos</sup>OX40<sup>pos</sup> cells which are not the cells we are assessing in our study. This might explain in part the differences in cytokine kinetics, as we evaluated CD25<sup>neg</sup> PBMCs only. However, we agree that more recent studies focused on CXCR5<sup>pos</sup>PD1<sup>pos</sup> cells included more Activation-induced marker (AIM) markers, which are missing in our study, inducing a lack of depth in our analysis.

      Percentage of positive cells and MFI are complementary data. Indeed, the percentage of positive cells only indicates which cells express the marker of interest without giving a quantitative value of this expression. MFI indicates how much the marker of interest is expressed by cells which is important as it can indicate degree of activation or exhaustion per cell. Meta-cluster analysis is not ideal to assess the percentage of positivity whereas it does provide essential information regarding the intensity of expression. We added supplemental figures 14 (Bcl6 and cMAF), 15 (INFg and IL21) and 16 (IL4 and IL21) where percentage of positive cells were manually gated directly from the total CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> TfH based on the FMO or negative control, and we overlaid the positive cells on the UMAP of all the CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> meta-clusters. Results from the manual gating are consistent with the results we show using clustering. However, it helps to better visualize that antigen-specific IL21 expression was statistically significant in children whereas the high background observed for adults did not reveal higher expression after stimulation, perhaps suggesting an upper threshold of cytokine expression (supplemental figure 15). The following sentence has been added in the methods at the end of the “OMIQ analysis” section: “ However, the percentage of positive IFN𝛾, IL-4, IL-21, Bcl6, or cMAF using manual gating can be found in Supplemental Figures 14, 15, and 16 along with the overlay of the gated positive cells on the CD4<sup>pos</sup>CXCR5<sup>pos</sup>CD25<sup>neg</sup> UMAP and the cytoplots of the gated positive cells for each meta-cluster (Supplemental Figures 14, 15, and 16).”

      Indeed cMAF can be induced by TCR signaling, ICOS and IL6 (Imbratta et. al, 2020). However, in our study populations, ICOS was expressed (see Author response image 1, panel A) in absence of any stimulation suggesting that CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD25<sup>neg</sup>CD45RA<sup>neg</sup> cells were already capable of expressing cMAF. Indeed, after gating Bcl6 and cMAF positive cells based on their FMOs (Author response image 1, panel B and C, respectively), we overlaid positive cells on the CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD25<sup>neg</sup>CD45RA<sup>neg</sup> cells UMAP and we can see that most of our cells already express cMAF alone (Author response image 1, panel D), co-express cMAF and Bcl6 (Author response image 1, panel E), confirming that they are TfH cells, whereas very few cells only expressed Bcl6 alone (Author response image 1, panel F). Because we knew that cT<sub>FH</sub> already expresses Bcl6 and cMAF, we focused our analysis on the intensity of their expression to assess if our vaccine candidates were inducing more expression of these transcription factors.

      Author response image 1.

      (2) The section between lines 246-269 is confusing. Line 249, comparing the abundance after antigen stimulation is improper because 6h stimulation (under Golgi stop) should not induce cell division. I think the major conclusions are contained in Figure 5e, that (A) antigen stimulation will not alter cell number in each cluster and (B) children have more MC03, 06 and fewer MC02, etc.). The authors should consider removing statements between lines 255-259 because the trends are the same regardless of stimulations.

      We agree, there is no cell division after 6h and that different meta clusters did not proliferate after this short of in vitro stimulation. The use of the word ‘abundance’ in the context of cluster analysis is in reference to comparing the contribution of events by each group to the concatenated data. After the meta clusters are defined and then deconvoluted by study group, certain meta clusters could be more abundant in one group compared to another - meaning they contributed more events to a particular metacluster.

      Dimensionality reduction is more nuanced than manual gating and reveals a continuum of marker expression between the cell subsets, as there is no hard “straight line” threshold, as observed when using in 2D gating. Because of this, differences are revealed in marker expression levels after stimulation making them shift from one cluster to another - thereby changing their abundance.

      To clarify how this type of analysis is interpreted, we have modified lines 255-259 as follows:

      “In contrast, the quiescent PfSEA-1A- and PfGARP-specific cT<sub>FH</sub>2-like cluster (MC02) was significantly more abundant in adults compared to children (Figure 5c and 5d, pf<0.05). Interestingly, following PfGARP stimulation, the activated cT<sub>FH</sub>1/17-like subset (MC09) became more abundant in children compared to adults (Figure 5d, pf<0.05 with a False Discovery Rate=0.08), but no additional subsets shifted phenotype after PfSEA-1A stimulation (Figure 5c).”

      Reviewer #2 (Public Review):

      Summary:

      Forconi et al explore the heterogeneity of circulating Tfh cell responses in children and adults from malaria-endemic Kenya, and further compare such differences following stimulation with two malaria antigens. In particular, the authors also raised an important consideration for the study of Tfh cells in general, which is the hidden diversity that may exist within the current 'standard' gating strategies for these cells. The utility of multiparametric flow cytometry as well as unbiased clustering analysis provides a potentially potent methodology for exploring this hidden depth. However, the current state of analysis presented does not aid the understanding of this heterogeneity. This main goal of the study could hopefully be achieved by putting all the parameters used in one context, before dissecting such differences into their specific clinical contexts.

      Strengths:

      Understanding the full heterogeneity of Tfh cells in the context of infection is an important topic of interest to the community. The study included clinical groupings such as age group differences and differences in response to different malaria antigens to further highlight context-dependent heterogeneity, which offers new knowledge to the field. However, improvements in data analyses and presentation strategies should be made in order to fully utilize the potential of this study.

      Weaknesses:

      In general, most studies using multiparameter analysis coupled with an unbiased grouping/clustering approach aim to describe differences between all the parameters used for defining groupings, prior to exploring differences between these groupings in specific contexts. However, the authors have opted to separate these into sections using "subset chemokine markers", "surface activation markers" and then "cytokine responses", yet nuances within all three of these major groups were taken into account when defining the various Tfh identities. Thus, it would make sense to show how all of these parameters are associated with one another within one specific context to first logically establish to the readers how can we better define Tfh heterogeneity. When presented this way, some of the identities such as those that are less clear such as "MC03/MC04/ MC05/ MC08" may even be better revealed. once established, all of these clusters can then be subsequently explored in further detail to understand cluster-specific differences in children vs adults, and in the various stimulation conditions. Since the authors also showed that many of the activation markers were not significantly altered post-stimulation thus there is no real obstacle for merging the entire dataset for the first part of this study which is to define Tfh heterogeneity in an unbiased manner regardless of age groups or stimulation conditions. Other studies using similar approaches such as Mathew et al 2020 (doi: 10.1126/science.abc8) or Orecchioni et al 2017 (doi: 10.1038/s41467-017-01015-3) can be referred to for more effective data presentation strategies.

      Accordingly, the expression of cytokines and transcription factors can only be reliably detected following stimulation. However, the underlying background responses need to be taken into account for understanding "true" positive signals. The only raw data for this was shown in the form of a heatmap where no proper ordering was given to ensure that readers can easily interpret the expression of these markers following stimulation relative to no stimulation. Thus, it is difficult to reliably interpret any real differences reported without this. Finally, the authors report differences in either cluster abundance or cluster-specific cytokine/ transcription factor expression in Tfh cell subsets when comparing children vs adults, and between the two malaria antigens. The comparisons of cytokine/transcription factor between groups will be more clearly highlighted by appropriately combining groupings rather than keeping them separate as in Figures 6 and 7.

      Thank you for sharing these references. Similar to SPADE clustering and ViSNE dimensionality algorithms used in Orecchioni et al, we used all the extracellular markers from our panel in our FlowSOM algorithm with consensus meta-clustering which includes both the chemokine receptors and activation markers even though they are presented separately in our manuscript across the figure 3 and 4. This was explained in the methods section (lines 573 - 587). We then chose the UMAP algorithm as visual dimensionality reduction of the meta-clusters generated by FlowSOM-consensus meta-clustering as explained under the “OMIQ analysis” subpart of our methods (lines 588- 604). Therefore, we believe we have conducted the analysis as this reviewer suggests even if we chose to show the figures that were informative to our story. The heatmap of the results brings the possibility to see which combination of markers respond or not to the different conditions and between groups, all the raw data are present from the supplemental figures 10 to 13 showing, using bar plots, the differences expressed in the heatmaps. We believe it strengthens our interpretation of the results.

      Regarding the transcription factor and cytokine background, we added supplemental figures 14, 15 and 16 where we used manual gating to select Bcl6, cMAF, IFNg, IL21 or IL4 positive cells directly from total CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> TfH cells based on the FMO or negative control, and we overlaid the positive cells on the UMAP of all the CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> meta-clusters. Moreover, all the dot plots (with their statistics) used for the heatmap figure 6 and 7 can be found in the supplemental figures 10, 11, 12 and 13. These supplemental figures address the concerns above by showing the difference of signals between unstimulated and stimulated conditions.

      Reviewer #3 (Public Review):

      Summary:

      The goal of this study was to carry out an in-depth granular and unbiased phenotyping of peripheral blood circulating Tfh specific to two malaria vaccine candidates, PfSEA-1A and PfGARP, and correlate these with age (children vs adults) and protection from malaria (antibody titers against Plasmodium antigens.). The authors further attempted to identify any specific differences in the Tfh responses to these two distinct malaria antigens.

      Strengths:

      The authors had access to peripheral blood samples from children and adults living in a malaria-endemic region of Kenya. The authors studied these samples using in vitro restimulation in the presence of specific malaria antigens. The authors generated a very rich data set from these valuable samples using cutting-edge spectral flow cytometry and a 21-plex panel that included a variety of surface markers, cytokines, and transcription factors.

      Weaknesses:

      - Quantifying antigen-specific T cells by flow cytometry requires the use of either 1- tetramers or 2- in vitro restimulation with specific antigens followed by identification of TCR-activated cells based on de-novo expression of activation markers (e.g. intracellular cytokine staining and/or surface marker staining). Although authors use an in vitro restimulation strategy, they do not focus their study on cells de-novo expressing activation markers as a result of restimulation; therefore, their study is not really on antigen-specific cTfh. Moreover, the authors report no changes in the expression of activation markers commonly used to identify antigen-specific T cells upon in vitro restimulation (including IFNg and CD40L); therefore, it is not clear if their in vitro restimulation with malaria antigens actually worked.

      We understand the reviewer’s point of view and apologies for any confusion. IFNg was expressed but not statistically different between groups. Indeed, looking at the CD8 T cells and using manual gating, we were able to show that IFNg was increased but not statistically significant upon stimulation from CD4<sup>pos</sup>CXCR5<sup>pos</sup> cells (supplemental figure 15, panel C), confirming our primary observation using clustering analysis. These results showed that our malaria antigen induced IFNg response in some participants, but not all of them, revealing heterogeneity in this response among individuals within the same group.

      Regarding CD40L, in the supplemental figure 7, we can see that some of our meta-clusters expressed more CD40L upon stimulation, but again without leading to statistical differences between groups. Combined with the increased expression of other cytokines and transcription factors, we showed that our stimulation did indeed work. However, because of the high variation within groups, there were no statistical differences across our groups. Because CD40L is not the only marker showing specific T cell activation, and not all T cells respond using this marker alone, a more comprehensive multimarker AIM panel might have highlighted differences between groups. We recognized the limitations of our study and believe that future study will benefit from more activation markers commonly used to identify antigone-specific T cells such as CD69, OX40, 4-1BB (AIM panel), among other markers.

      - CXCR5+CD4+ memory T cells have been shown to present multi-potency and plasticity, capable of differentiating to non-Tfh subsets upon re-challenge. Although authors included in their flow panel a good number of markers commonly used in combination to identify Tfh (CXCR5, PD-1, ICOS, Bcl-6, IL-21), they only used one single marker (CXCR5) as their basis to define Tfh, thus providing a weak definition for Tfh cells and follow up downstream analysis.

      Sorry for the confusion, even though the subsampled on the CD4<sup>pos</sup>CXCR5<sup>pos</sup> CD25<sup>neg</sup> cells to run our FlowSOM, we showed the different levels of expression across meta-clusters (figure 4 panels A and B) of PD1 (Tfh being PD1 positive cells) and ICOS (indicating the activation stage of the Tfh, “T Follicular Helper Cells” Methods and Protocols book from Springer 2015). We also included an overlay of the manually gated double positive Bcl6-cMAF cells on the CXCR5<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> CD4 T cell UMAP plot to show that most of them express Bcl6 (supplemental figure 14). Interestingly, the manually gated IL21 positive cells were less abundant, particularly for children (supplemental figure 15). Because we were not able to include all the markers that are now used to define Tfh cells, we referred to our cell subsets as “TFH-like”. This is an acknowledged limitation of our study. Due to the limited blood volume obtained from children and cost of running multiplex flow cytometry assays, our results showing antigen-specific heterogeneity of Tfh subset will have to be validated in future studies that include these additional defining markers.

      - Previous works have used FACS-sorting and in vitro assays for cytokine production and B cell help to study the functional capacity of different cTfh subsets in blood from Plasmodium-infected individuals. In this study, authors do not carry out any such assays to isolate and evaluate the functional capacity of the different Tfh subsets identified. Thus, all the suggestions for the role that these different cTfh subsets may have in vivo in the context of malaria remain highly hypothetical.

      Unfortunately, low blood volumes obtained from children prevented us from running in vitro functional assays and the study design did not allow us to correlate them with protection. However, since the function of identified Tfh subsets from malaria-exposed individuals has been evaluated using Pf lysates in other studies, we referenced them when interpreting the differences we reported in Tfh subset recognition between malaria antigens. If either of these antigens move forward into vaccine trials, then evaluating their function would be important.

      - The authors have not included malaria unexposed control groups in their study, and experimental groups are relatively small (n=13).

      This study design did not include the recruitment of malaria naive negative controls as its goal was to assess malaria antigen-specific responses comparing the quality and abundance between malaria-exposed children to adults to these potential new vaccine targets PfSEA-1A and PfGARP. We did however test 3 malaria-naive adults and found no non-specific activation after stimulation with these two malaria antigens. Since this was done as part of our assay optimization, we did not feel the need to show these negative findings.

      And even with our small sample size, we demonstrated significant age-associated differences in malaria antigen-specific responses from cT<sub>FH</sub>-like subsets.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor points are:

      (1) Line 88, cTfh cells are not only from GC-Tfh, they have GC-independent origin (He et al, PMID: 24138884).

      The following sentence was added line 88 “Interestingly, cT<sub>FH</sub> cells can also come from peripheral cT<sub>FH</sub> precursor CCR7<sup>low</sup>PD1<sup>high</sup>CXCR5<sup>pos</sup> cells; thus, they also have a GC-independent origin (He, Cell, 2013 PMID: 24138884).

      (2) I believe all participants were free of blood-stage infection upon enrolment. But can authors clearly state this information between lines 151-159?

      We mentioned in the methods, line 495-496 “Participants were eligible if they were healthy and not experiencing any symptoms of malaria at the time venous blood was collected”. However, using qPCR we found 5 children with malaria blood stage. As shown in Author response image 2, comparing malaria free to blood-stage children, no differences were observed without any stimulation. However, MC03 is more abundant upon malaria antigen stimulation in the blood-stage group whereas MC04 is more abundant in the malaria free group upon PfGARP stimulation only confirming that our stimulation worked.

      Author response image 2.

      Reviewer #3 (Recommendations For The Authors):

      (1) The strategy for gating on antigen-specific cTfh cells needs to be revised. The correct approach would be to gate on those cells that respond by de-novo expression of activation markers upon antigen restimulation (also termed activation-induced markers. e.g. CD69, CD40L, CXCL13 and IL-21, Niessl 2020; CD69, CD40L, CD137 and OX40, Lemieux 2023; CD137 and OX40, Grifoni 2020). As it stands, the study is not really on antigen-specific T cells, but rather on the overall CD4 T cell compartment plus or minus antigenic stimulation.

      We recognized the limitation in our flow panel design which prevents us from performing this gating. We originally based our panel design on the “T follicular helper cells methods and protocols” book (Springer 2015) which used CD45RA, CD25, CXCR5, CCR6, CXCR3, CCR7, ICOS and PD1 to define cT<sub>FH</sub>. We had already optimized our 21-color panel, purchased reagents and started to run our experiments by the time these publications modified how to define TFH cells Niessl, Lemieux and Grifoni’s publication. Indeed we optimized and performed our assay from November 2019 to March 2020, finishing to run the samples during the first quarantine. Because of the urgent needs of research on SARS-CoV-2 that we were involved with from this time and moving forward, the analysis of our TFH work got highly postponed. Moreover, 2020 is also the year where many TFH papers came out with better ways to define cT<sub>FH</sub> and responses to antigen stimulations. In our future studies, our panel will include AIM.

      (2) It is not clear if the antigenic stimulation actually worked. Does the proportion of IFNg+ or IL-4+ or IL-21+ or CD40L+ or CD25+ CD4 or CD8 T cells increase following in vitro antigen restimulation?

      Yes, using manual gating, we are able to show an increase of IL4 (supplemental figure 16 panel B and C), and IL21 (supplemental figure 15 panel J and K) production in both children and adults. However, we did not observe significant production of IFNg (supplemental figure 15, panel C) and changes in CD40L expression (supplemental figure 7) after malaria antigen stimulation, however, our positive control SEB worked. So, yes our stimulation assay worked but these 2 malaria antigens did not significantly induce these cytokines. This could be that they are too low to detect in every participant since they are single antigens and not whole parasite lysates, as other studies have used. It could also be that these antigens don’t stimulate CD40L or IFNg in all our participants. We brought up this limitation as follow in the discussion, line 473: “Although the heterogeneity in the response of CD40L and IFNγ suggests that our tested malaria antigens did not induce significant differences in the expression of these markers in all our participants, our panel did not include other activated induced markers, such as OX40, 4-1BB, and CD69”.

      (3) It is not clear what is the proportion of cTfh over the total CD4 T cell compartment among the different groups. Does this vary among different groups? It would be valuable to display this as an old-fashioned combination of contour plots with outliers for illustrating flow cytometry and bar graphs for the cumulative data.

      The proportion of CD3<sup>pos</sup>CD4<sup>pos</sup>CD25<sup>neg</sup>CXCR5<sup>pos</sup> cTfh cells did not differ within the total number of CD4 T cells between groups (figure 2).

      (4) The gating strategy could be refined and become more robust if adding additional markers in combination with CXCR5 for identifying cTfh (e.g. CXCR5+Bcl6+).

      Thank you for this suggestion. An overlay of Bcl6 expression can be found in supplemental figure 14 where we confirm that our CXCR5+ cT<sub>FH</sub>-like subsets express cMAF and Bcl6.

      (5) The protocols for intracellular and intranuclear staining seem to be incomplete in Materials and Methods. In particular, cell permeabilization strategies seem to be missing.

      Our apologies for this oversight, we added the following sentences in the methods line 545: “Cells were fixed and permeabilized for 45 mins using the transcription factor buffer set (BD Pharmingen) followed by a wash with the perm-wash buffer. Intracellular staining was performed at 4 °C for 45 more mins followed by two washes using the kit’s perm-wash buffer”.

      (6) In Materials and Methods, the authors mention they have used fluorescence minus one control to set their gating strategy. It would be valuable to show these, either on the main body or as part of supplementary figures.

      We added the cytoplots of the FMOs and/or negative controls as appropriate in the supplemental figures 14 (cMAF and Bcl6), 15 (IFNg and IL21) and 16 (IL4 and IL21).

      (7) Line 194 and Figure 3, it is not clear the criteria that the authors used for down-sampling events before FlowSOM analysis. Was this random? Was this done with unstimulated or stimulated samples?

      We chose to down-sample on CD3posCD4<sup>pos</sup>CD25<sup>neg</sup>CD45RA<sup>neg</sup> and CXCR5<sup>pos</sup> cells prior to our FlowSOM to allow more cluster analysis to focus only on the differences among those cells. The down-sampling used 1,000 CD3posCD4<sup>pos</sup>CD25<sup>neg</sup> CD45RA<sup>neg</sup>CXCR5<sup>pos</sup> cells from each fcs file (unstimulated and stimulated samples). If the fcs file had more than 1,000 CXCR5<sup>pos</sup> cells, the down-sampling was done randomly by the OMIQ platform algorithm to select only 1,000 CXCR5<sup>pos</sup> cells within this specific fcs file. The latest sentence was added to the methods line 593.

      (8) Lanes 201, 202, As it stands, the take of the authors on the role of different cTfh subsets during infection remains highly speculative. Are these differences in cTfh phenotypes actually reflected in their in vitro capacity to provide B cell help (e.g. as in the Obeng-Adjei 2015 paper) or to produce IL-21, express co-stimulatory molecules, or any other characteristic that would allow them to better infer their functional roles during infection? Any additional in vitro analysis of the functional capacity of isolated cTfh subsets identified in this research would greatly increase its value.

      We agree with the reviewer that this sentence is speculative, and we rephrase it as follow: “First, we found different CXCR5 expression levels between meta-clusters (Figure 3b); CXCR5 is essential for cT<sub>FH</sub> cells to migrate to the lymph nodes and interact with B-cells”. We would have liked to perform in vitro functional assays. However, as explained above, we did not have sufficient cells collected from children to do so.

      (9) It is not clear why authors omitted IL-17 and did not use IFNg and IL-4 to refine their definition of Th1, Th2 and Th17 cTfh.

      We would have liked to include IL-17, however we were constrained by only having access to a 4 lasers cytometer at the time we ran our assay. In light of needing to prioritize markers, when we were designing our flow panel, cTfh1 were shown to be preferentially activated during episodes of acute febrile malaria children (Obeng-Adjei). Therefore, we chose to focus on IFNg and IL4 to differentiate Tfh1 from Tfh2, in addition to other markers as surrogate of functional potential. We did not use IFNg and IL4 to refine our definition of Tfh1, Tfh2 and Tfh17 as recent publications have shown that IL4 is not only expressed in Tfh2 but also in the other Tfh subsets, at lower intensity (Gowthaman among others). Therefore IFNg and IL4 by themselves were not sufficient to properly define the different Tfh subsets. In future studies, we plan to include transcription factor profiles (T-bet, BATF, GATA3) to further refine definitions of Tfh subsets.

      (10) Lines, 226, 228, based on the combination of markers that the MC03 subset expresses, it is tempting to think that this is the only "truly" committed Tfh subset from the entire analysis. Please, discuss.

      If the reviewer is referring to changes in marker expression levels that indicate they have not reached a level of differentiation that would make them reliable (ie “true) Tfh cells, we agree that this is an important question now that we have technology that can measure and analyse so many phenotypic markers at once. This brings forward the need for the scientific method - to replicate study findings to determine whether they are consistent given the same study design and experimental conditions.

      (11) Lines 243 244, Again, is this reflected in functional capacity?

      The study described in this manuscript did not include functional assays. However, this did not change the key finding that different malaria antigens behaved differently, demonstrating heterogeneity in Tfh recognition of malaria antigens. Regarding CD40L expression, we did not observe differences between groups, however some individuals had an increase of their CD40L (supplemental figure 7). It is possible that some individuals had responded through other activated induced markers (CD69, ICOS, OX40, 4-1BB among others) and that our stimulation condition was not long enough to assess CD40L expression upon malaria antigen stimulation. This limitation has been addressed by editing the line 243-244 as follows: “we were unable to find statistical differences in the CD40L expression between groups as only few individuals responded through it (supplemental figure 7).”

      (12) Lines 243, 244, Are these cTfh subsets exclusively detected in malaria-exposed individuals? This is confounded by the lack of a malaria unexposed control group in this study, which would have been highly valuable.

      We agree with the reviewer that having non-naive children would have been valuable as a negative control group. However, this study was conducted in Kenya where all children are suspected to have had at least one malaria infection. We also did not have ethical approval or the means to enroll children in the USA who would not have been exposed to malaria as a negative control group. Since we were also evaluating differences by age group, comparing US adults would not have helped to address this point. Therefore, this remains an open question that might be addressed by another study recruiting children in non-malaria endemic areas.

      (13) Line 267, as the authors have not gated on T cells de-novo expressing activation markers in response to antigen restimulation, how do they know these are indeed antigen-specific cTfh?

      Omiq analysis accounts for marker expression levels in the resting cells (unstimulated well) for each individual compared to each experimental/stimulated well. The algorithm computationally determines whether that expression level changed without an arbitrary positive threshold, keeping the expression levels as a continuous variable, not dichotomous - which is the power of unbiased cluster analyses. Therefore, we know that these cells are antigen-specific based on the statistical difference in intensity expression between the resting cells and the stimulated ones. Nevertheless, manual gating to show “de-novo” responding cells, produced the same results as assessing the MFI of each meta-cluster (supplemental figures 14, 15 and 16).

      (14) Lines, 292-295, it is very surprising that Tfh cells would not produce IL-21 upon restimulation. Have the authors observed upregulation of IL-21 following SEB restimulation?

      Yes, we observed IL21 positive cells upon SEB stimulation (supplemental figure 15, panel J and K). However we found unexpectedly high background levels of IL21, specifically within the adult group (supplemental figure 15, panel K and M) making it challenging to find antigen-specific increases above background. Interestingly, an increase in IL21 using manual gating was observed upon PfSEA-1A or PfGARP stimulation in children (supplemental figure 15, panel J and L).

      (15) In Figures 3 and 4, it is not clear if there are any significant differences in expression of different markers between different cTfh subsets and/or different conditions. Moreover, the lack of differences in response to antigen stimulation seems to suggest that it did not work adequately.

      We intentionally chose 6-hours stimulation to better assess changes in cytokines which we did. However, because it is a short stimulation, we did not expect dramatic changes in the extracellular markers presented in the figure 3 and 4. A longer stimulation, such as 24h, will highlight properly these changes.

      (16) Figure 5b would benefit from bar graphs.

      Please find below the bar-graphs for the highlighted meta-clusters in figure 5b. We did not include these bar-graphs to our figure 5 as they do not bring new information. They repeat the information already presented through the EdgeR plot.

      Author response image 3.

      (17) Figures 6 and 7 would greatly benefit from showing individual examples of old-fashioned contour with outliers flow plots to illustrate the different cTfh subsets identified in the study.

      The different cT<sub>FH</sub> subsets can be found with a contour plot with outliers in the supplemental figure 4.

      (18) Figures 3,4, 6, and 7, the authors exclusively focused on the study of MFI to measure the expression of cytokine and transcription factors among different groups/stimulations. Have the authors observed any differences in the percentage or absolute counts of cytokine+ and/or TF+ between different subsets of cTfh and/or different conditions?

      Yes. We added the supplemental figures 14 (transcription factors) and 15/16 (cytokines) where cytokines and transcription factors were assessed using manual gating. We found that total CD4<sup>pos</sup>CXCR5<sup>pos</sup> IL4 was significantly increased upon stimulation in both adults and children while IFNg was not. However, we found significantly higher IFNg on total CD8<sup>pos</sup> cells showing that the stimulation worked, but the total CD4<sup>pos</sup>CXCR5<sup>pos</sup> did not express IFNg. Finally, we observed a trend of higher IL21<sup>pos</sup>CD4<sup>pos</sup>CXCR5<sup>pos</sup> in adults, not significant due to high background whereas IL21 was significantly increased upon stimulation in children. Regarding cMAF and Bcl6, both transcription factors were significantly increased upon stimulation within children only.

      (19) Figure 8, the definition for high and low PfGARP antibody titers seems rather arbitrary. Are these associations still significant when attempting a regular correlation analysis between Ab values (i.e. Net MFI) and different cTfh subsets?

      Yes, the definition for high and low PfGARP antibody levels is arbitrary but when looking at the antibody data (figure 1b), it was naturally bimodal. Therefore as a sub-analysis, we assess the association between PfGARP antibodies levels and cT<sub>FH</sub> subsets, see Author response image 4. We checked the correlation between the abundance of the meta-clusters and the level of IgG anti-PfGARP and anti-PfSEA after PfGARP and PfSEA stimulation. We also checked the correlation between the MFI expression of Bcl6 and cMAF after stimulation (PfGARP or PfSEA-1A minus the unstimulated) by the meta-clusters and the level of IgG anti-PfGARP and anti-PfSEA. However, we believe that because of our small sample size, our results are not robust enough and that we risk over-interpreting the data. Therefore, we choose not to include this analysis in the manuscript.

      Author response image 4.

      (20) The comprehensive 21-plex panel that authors used in this study could generate insights on additional immune cells beyond cTfh (e.g. additional CD4 T cell subsets, CD8 T cells, CD19 B cells). It is not clear why the authors limited their analysis to cTfh only.

      The primary goal of the study was to assess the cT<sub>FH</sub> response to malaria vaccine candidates. However, we were able to assess the IFNg expression for CD8 T cells upon stimulation using the manual gating as indicated in the supplemental figure 15. Without additional markers to more clearly define other CD4 T cell or B cell subsets, we do not believe this dataset would go deep enough into characterizing antigen-specific responses to malaria antigens that would yield new insight.

      (21) Minor point, the punctuation should be revised throughout the manuscript.

      Punctuation was revised throughout the manuscript by our departmental scientific writer Dr. Trombly, as per reviewer request.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study reveals the RelA/Stat3-dependent gene program in the liver influences intestinal homeostasis. The evidence supporting the conclusions is compelling, although some additional experiments will strengthen the study. The work will be of interest to scientists in gastrointestinal research fields.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors showed that activation of RelA and Stat3 in hepatocytes of DSS-treated mice induced CYPs and thereby produced primary bile acids, particularly CDCA, which exacerbated intestinal inflammation.

      Strengths:

      This study reveals the RelA/Stat3-dependent gene program in the liver influences intestinal homeostasis.

      Our reply: We thank the reviewer for the positive feedback and for appreciating the strength of our study.

      Weaknesses:

      Additional evidence will strengthen the conclusion.

      (1) In Fig. 1C, photos show that phosphorylation of RelA and Stat3 was induced in only a few hepatocytes. The authors conclude that activation of both RelA and Stat3 induces inflammatory pathways. Therefore, the authors should show that phosphorylation of RelA and Stat3 is induced in the same hepatocytes during DSS treatment.

      Our reply: The reviewers have raised a pertinent issue in Figure 1, as later on in our study we suggest that the combined activation of Rela and Stat3 is critical for aggravating the colitogenic phenotype in the murine model.

      To address this issue, we have co-stained the fixed liver tissue of untreated and DSS-treated wild type mice with p-RelA (Ser536) and p-Stat3(Ser727) antibodies. Author response image 1 below shows the single staining for p-Rela (Ser536), pStat3 (Ser727), DAPI (to demarcate the nuclei) and merged image (p-Rela + pStat3).

      Author response image 1.

      Further, the signal intensity of p-RelA (Ser536) and p-Stat3(Ser727) per nuclei was calculated and plotted as a box plot. It is evident that the median of p-Rela and p-Stat3 signal intensity in DSS-treated samples is more than that of the control samples, suggesting that the majority of the treated hepatocytes have the presence of both p-Rela and p-Stat3 in the nuclei.

      Author response image 2.

      Further, we calculate the number of nuclei in the DSS-treated samples which are above the 90th percentile of the control samples (data has been provided in Author response table 1 below). We also calculate the percentage overlap of p-Rela to p-Stat3 and vice versa in Author response table 1 below.

      Author response table 1.

      Together our analysis concludes that indeed there is an activation of Rela and Stat3 in the same hepatocytes to generate the downstream effect that we observe in our study post-DSS treatment.

      (2) In Fig. 5, the authors treated mice with CDCA intraperitoneally. In this experiment, the concentration of CDCA in the colon of CDCA-treated mice should be shown.

      Our reply: We have experimentally examined if the CDCA supplemented intraperitoneally at the experimental dose used in our study, is reaching the colon or not. To quantify colonic CDCA we have performed targeted mass spectrometric studies and the data has been provided as a bar plot below.

      Author response image 3.

      It is evident from the plot that the CDCA levels are significantly higher in mice supplemented with CDCA as compared to their corresponding control (where only the vehicle was supplemented). The data has been added to the supplementary section S5b and the main text has been modified accordingly.

      Reviewer #2 (Public Review):

      Singh and colleagues employ a methodical approach to reveal the function of the transcription factors Rela and Stat3 in the regulation of the inflammatory response in the intestine.

      Strengths of the manuscript include the focus on the function of these transcription factors in hepatocytes and the discovery of their role in the systemic response to experimental colitis. While the systemic response to induce colitis is appreciated, the cellular and molecular mechanisms that drive such systemic response, especially those involving other organs beyond the intestine are an active area of research. As such, this study contributes to this conceptual advance. Additional strengths are the complementary biochemical and metabolomics approaches to describe the activation of these transcription factors in the liver and their requirement - specifically in hepatocytes - for the production of bile acids in response to colitis.

      Our reply: We express our gratitude to the reviewer for recognizing and appreciating the mechanistic insight provided by our work, and for considering it valuable in advancing conceptual understanding in the relevant field.

      Some weaknesses are noted in the presentation of the data, including a comprehensive representation of findings in all conditions and genotypes tested.

      Our reply: We thank the reviewer for the query and we have suitably modified the figures for a comprehensive representation of the findings, as described below:

      ● In Figure 2C, we have added the control alcian blue stained samples to clarify that there were no qualitative differences in the mucin levels observed in the relaΔhepstat3Δhep as compared to the wild type mice.

      ● We have also modified the figure 2D for a better presentation of the data.

      ● We have included histopathological analysis for the relaΔhepstat3Δhep mice in Figures S3a and S3b, following a format similar to the wild-type data previously provided as Figure S1a and S1b.

      ● For Figure 5C, the corresponding untreated samples with and without CDCA supplementation have been provided in the supplementary section Figure S5e.

      ● For Figure 2E, 3E, and 4C - the RT-qPCR data of the DSS-treated samples is plotted relative to their corresponding control samples, hence we only display two conditions in the bar plot. We have accordingly modified the figure legend for better clarity.

      Reviewer #3 (Public Review):

      Summary:

      The authors try to elucidate the molecular mechanisms underlying the intra-organ crosstalks that perpetuate intestinal permeability and inflammation.

      Strengths:

      This study identifies a hepatocyte-specific rela/stat3 network as a potential therapeutic target for intestinal diseases via the gut-liver axis using both murine models and human samples.

      Our reply: We thank the reviewer for appreciating the therapeutic potential of our work.

      Weaknesses:

      (1) The mechanism by which DSS administration induces the activation of the Rela and Stat3 pathways and subsequent modification of the bile acid pathway remains clear. As the authors state, intestinal bacteria are one candidate, and this needs to be clarified. I recommend the authors investigate whether gut sterilization by administration of antibiotics or germ-free condition affects 1. the activation of the Rela and Stat3 pathway in the liver by DSS-treated WT mice and 2. the reduction of colitis in DSS-treated relaΔhepstat3Δhep mice.

      Our reply: We thank the reviewer for bringing up the aspect of gut microbiota in imparting colitis in our mice model. In accordance with reviewer's recommendation, we have sterilized the gut by administration of antibiotics, to evaluate if the intestinal bacteria are an important component leading to the activation of Rela and Stat3 pathway in the liver of DSS-treated WT mice or not.

      (a) A brief schematic representation of the experimental design has been provided below and the detailed description of the methods has been described in supplementary methods.

      Author response image 4.

      Extract of liver tissues from mice treated with DSS for 6 days with/without prior antibiotic treatment were probed with p-Stat3 (Ser727) to examine the activation status of the hepatic Stat3 pathway. We observe that the signals for p-Stat3 (Ser727) are comparatively reduced post antibiotic treatment as evident from the blot below. p-Stat3 (Ser727) was a prominent activation signal at Day 6 DSS treatment that we have observed in Figure 1D,E.

      Author response image 5.

      These studies suggest that the activation status of Stat3 activation is hampered by antibiotic treatment and considering that Rela and Stat3 have to coordinate activity, presumably the downstream activation will be modulated upon gut sterilization. However, it should be appreciated that a sterilized gut is not likely to be physiologically relevant and intestinal bacteria along with bile acid levels would modulate Rela/Stat3 pathways.

      b) It is likely that the hepatic deficiency of Rela and Stat3 may have modified the gut microbiome in relaΔhepstat3Δhep mice because of the altered bile composition. Moreover, the gut microbiota is a key component that guides the outcome of colitis. Hence, future studies are important to examine the role of the gut microbiome in imparting resistance in relaΔhepstat3Δhep mice, to colitogenic insults.

      (2) It has not been shown whether DSS administration causes an increase in primary bile acids, represented by CDCA, in the colon of WT mice following activation of the Rela and Stat3 pathways, as demonstrated in Figure 6.

      Our reply: In order to address the query, we would kindly like to request the reviewers to look at figure 4B where we show an increase in the CDCA levels of the colonic tissue, which is corresponding to our CDCA levels in the liver tissue (figure 4A) thus indicating that it may be driven by the hepatic Rela and Stat3 pathways.

      (3) The implications of these results for IBD treatment, especially in what ways they may lead to therapeutic intervention, need to be discussed.

      Our reply: We are grateful to the reviewer for bringing this topic for discussion.

      Until now, only immunosuppressive agents and immunomodulators have been conventionally considered as therapeutic measures to manage IBD. However, with increasing research on the role of hepatic bile acid metabolism during experimental colitis, its potential cannot be undermined in the clinical setting. The potential of bile acids as a therapeutic target has been harnessed in the past; bile acid sequestrants have been utilized as a treatment for hyperlipidemia 46. Remedies like fecal microbial transplantation, which serve to normalize the bile acid ratios in the gut, are emerging as potential therapeutics in the last decade for IBD 47, 40. However, the potential of altering hepatic bile metabolism has remained unexplored for IBD, possibly due to a lack of mechanistic insight. Towards this, our work demonstrates the pro-inflammatory potential of CDCA during colitis following the activation of the Rela/Stat3 pathway. The suppression of Rela/Stat3-induced CDCA could provide beneficial effects in IBD patients while protecting the basal bile acid levels (through FXR signaling). Thus our studies identify a hepatocyte-specific rela/stat3 network as a potential therapeutic target for intestinal diseases. Another approach could be the use of bile acid sequestrants, which will temporarily decrease the levels of primary bile acids in the colon until the proinflammatory pathways are dampened as a combinatorial therapy alongside existing treatments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor:

      Fig. 4C should be Fig. 4D and vice versa.

      Our reply: We have swapped Fig. 4C and Fig. 4D and corresponding changes have been incorporated in the main text.

      Reviewer #2 (Recommendations For The Authors):

      Please make note of the following specific comments

      The immunostainings for phosphorylated p-Rela and STAT3 are unclear. Is there nuclear translocation of these phosphorylated transcription factors? Can the authors enumerate the percentage of cells in which nuclear translocation (presumably in hepatocytes) is detected?

      Our reply: We apologize that immunostainings for phosphorylated p-Rela and STAT3 are unclear to the reviewers. Here we have tried our best to make the data clear by analyzing the stained section and plotting them.

      To start with, we have co-stained the fixed liver tissue of untreated and DSS-treated wild type mice with p-RelA (Ser536) and p-Stat3(Ser727) antibodies, below we have provided a representative image used for analysis. To demarcate the nuclear boundary of the hepatocytes DAPI was used and the signal intensity for p-RelA (Ser536) and p-Stat3(Ser727) was quantified using ZenBlue software.

      Author response image 6.

      Below we have provided the box plot for the calculated nuclear intensities in the control (untreated) and DSS-treated samples for p-Rela and p-Stat3. We can clearly see that the median of p-Rela and p-Stat3 signal intensity in DSS-treated samples is more than that of the control samples, suggesting that the majority of the treated hepatocytes have the translocation of p-Rela and p-Stat3 in their nuclei.

      Author response image 7.

      The figure legends for Figures 2C and D are flipped. Please correct.

      Our reply: Thank you for pointing it out, our apologies for the error and we have corrected the figure 2 accordingly.

      For all H&E stainings, the authors should include histological scoring disease severity.

      Our reply: Thank you for the query put forward, histological scoring to quantify the qualitative data obtained through microscopy is given below. Dot plot for the histological scoring of the H&E data for untreated and DSS-treated colon samples, we have referred to the scale described by Ren Y et al. 2019 (doi: 10.1038/s41598-019-53305-z) to score the sections.

      Author response image 8.

      We have added the dot plot to supplementary figure 2d, also the method applied for the above analysis has been described in the supplementary method section.

      Please include Alcian Blue Staining in non-DSS treated WT and rel/stat3 double cKO mice.

      Our reply: Thank you for pointing this out, we have added the Alcian Blue Staining of non-DSS treated WT and rel/stat3 double KO mice to figure 2C

      For Figure 3C, can the authors indicate in the figure itself which bile acid is being represented (not only in the Figure legend)?

      Our reply: Thank you for the suggestion we have indicated the respective bile acid in Figure 3C for better understanding.

      As these data are from untargeted metabolomics, were other bile acids detected?

      Our reply: This is a part of a separate study conducted by our collaborator, and will form a part of a new manuscript which will be focussed on human studies.

      Can the authors validate the downregulation of key enzymes shown in Figure 3D, E at the protein level?

      Our reply: We agree with the reviewer’s comment, that mRNA levels are not critical determinants of activation of any pathway, rather an indicator of probable activation. In that scenario, the estimation of protein levels is more determinative. But taking into consideration that we have the metabolomic data in subsequent figures (as in Figure 4 A, B) supporting our findings in Figure 3D, E, this makes RT-qPCR data a more robust indicator of an activated hepatic bile acid biosynthesis machinery.

      The figure legends for Figures 4C and D are flipped. Please correct.

      Our reply: Taking into consideration the suggestions by reviewer 1 we have swapped Fig. 4C and Fig. 4D and corrected the legend placement accordingly, thank you for pointing this out.

      Also, please include representative images for the data represented in 4C.

      Our reply: Thank you for the query, we have already added the representative images of confocal microscopy as figure S4.

      Figure 5B should indicate that the data presented is from double cKO mice.

      Our reply: We have indicated that the colon length data is from double KO animals in figure to make the visual representation clear for the readers, thank you for the concern.

      Please correct typos: "entrocytic" and "Untread" in Figure Legend 5.

      Our reply: Thank you for pointing out the error in the Legend, we apologize for the error in these errors we have corrected Figure 5.

      Figure S4 includes a dataset (qPCR for Mmp3) that is not described. Neither Figure S4 nor S5 are described in the text.

      Our reply: Thank you for the query, firstly we have already added Figure S4 and S5 to the text, our apologies that it has not been properly highlighted.

      Secondly, the data for RT-qPCR for Mmp3 has been removed from supplementary figures as it may not be very relevant to the study.

      Overall, the manuscript should be edited to ensure the correct use of English. Please also note that the last name of the first author seems to be missing in the main text.

      Our reply: Thank you for the suggestion we have re-checked the manuscript for the probable errors and rectified them. The first author has a single name (with no surname) and we would like to correct that during the final print of the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors need to show if DSS treatment affects the serological or histological changes in the liver of relaΔhepstat3Δhep mice.

      Our reply: To address that, we have analyzed key serological markers of liver damage as well as looked into tissue histology.

      The pathophysiological parameters of the liver of DSS treated relaΔhepstat3Δhep mice has been added to the revised manuscript as figure S3a and S3b. Here we show that the serological parameters are within the physiological range upon DSS treatment (Author response image 9a). Besides, the histological parameters remain unaltered as compared to the control tissue (Author response image 9b).

      Cumulatively, both at the tissue level and functional level, there is not much effect of DSS

      treatment on liver of relaΔhepstat3Δhep mice.

      Author response image 9.

      (2) It is recommended to use a second model to verify if this phenomenon is applicable to colitic status in general.

      Our reply: We appreciate the query put forward, this is an ongoing study and we hope to examine further the role of hepatic RelA and Stat3 in TNBS-induced colitis model and in T cell transfer model of colitis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Liu et al. present CROWN-seq, a technique that simultaneously identifies transcription-start nucleotides and quantifies N6,2'-O-dimethyladenosine (m6Am) stoichiometry. This method is derived from ReCappable-seq and GLORI, a chemical deamination approach that differentiates A and N6-methylated A. Using ReCappable-seq and CROWN-seq, the authors found that genes frequently utilize multiple transcription start sites, and isoforms beginning with an Am are almost always N6-methylated. These findings are consistently observed across nine cell lines. Unlike prior reports that associated m6Am with mRNA stability and expression, the authors suggest here that m6Am may increase transcription when combined with specific promoter sequences and initiation mechanisms. Additionally, they report intriguing insights on m6Am in snRNA and snoRNA and its regulation by FTO. Overall, the manuscript presents a strong body of work that will significantly advance m6Am research.

      Strengths:

      The technology development part of the work is exceptionally strong, with thoughtful controls and well-supported conclusions.

      We appreciate the reviewer for the very positive assessment of the study. We have addressed the concerns below.

      Weaknesses:

      Given the high stoichiometry of m6Am, further association with upstream and downstream sequences (or promoter sequences) does not appear to yield strong signals. As such, transcription initiation regulation by m6Am, suggested by the current work, warrants further investigation.

      We thank the reviewer for the insightful comments. We have softened the language related to m<sup>6</sup>Am and transcription regulation. We totally agree with the reviewer that future investigation is required to determine the molecular mechanism behind m<sup>6</sup>Am and transcription regulation.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript "Decoding m6Am by simultaneous transcription-start mapping and methylation quantification" Liu and co-workers describe the development and application of CROWN-Seq, a new specialized library preparation and sequencing technique designed to detect the presence of cap-adjacent N6,2'-O-dimethyladenosine (m6Am) with single nucleotide resolution. Such a technique was a key need in the field since prior attempts to get accurate positional or quantitative measurements of m6Am positioning yielded starkly different results and failed to generate a consistent set of targets. As noted in the strengths section below the authors have developed a robust assay that moves the field forward.

      Furthermore, their results show that most mRNAs whose transcription start nucleotide (TSN) is an 'A' are in fact m6Am (85%+ for most cell lines). They also show that snRNAs and snoRNAs have a substantially lower prevalence of m6Am TSNs.

      Strengths:

      Critically, the authors spent substantial time and effort to validate and benchmark the new technique with spike-in standards during development, cross-comparison with prior techniques, and validation of the technique's performance using a genetic PCIF1 knockout. Finally, they assayed nine different cell lines to cross-validate their results. The outcome of their work (a reliable and accurate method to catalog cap-adjacent m6Am) is a particularly notable achievement and is a needed advance for the field.

      Weaknesses:

      No major concerns were identified by this reviewer.

      We thank the reviewer for the positive assessment of the method and dataset. We have addressed the concerns below.

      Mid-level Concerns:

      (1) In Lines 625 and 626, the authors state that “our data suggest that mRNAs initate (mis-spelled by authors) with either Gm, Cm, Um, or m6Am.” This reviewer took those words to mean that for A-initiated mRNAs, m6Am was the ‘default’ TSN. This contradicts their later premise that promoter sequences play a role in whether m6Am is deposited.

      We thank the reviewer for the comment. We have changed this sentence into “Instead, our data suggest that mRNAs initiate with either Gm, Cm, Um, or Am, where Am are mostly m<sup>6</sup>Am modified.” The revised sentence separates the processes of transcription initiation and m<sup>6</sup>Am deposition, which will not confuse the reader.

      (2) Further, the following paragraph (lines 633-641) uses fairly definitive language that is unsupported by their data. For example in lines 637 and 638 they state “We found that these differences are often due to the specific TSS motif.” Simply, using ‘due to’ implies a causative relationship between the promoter sequences and m6Am has been demonstrated. The authors do not show causation, rather they demonstrate a correlation between the promoter sequences and an m6Am TSN. Finally, despite claiming a causal relationship, the authors do not put forth any conceptual framework or possible mechanism to explain the link between the promoter sequences and transcripts initiating with an m6Am.

      (3) The authors need to soften the language concerning these data and their interpretation to reflect the correlative nature of the data presented to link m6Am and transcription initiation.

      For (2) and (3). We have softened the language in the revised manuscript. Specifically, for lines 633-641 in the original manuscript, we have changed “are often due to” into “are often related to” in the revised manuscript, which claims a correlation rather than a causation.

      Reviewer #3 (Public review):

      Summary:

      m6Am is an abundant mRNA modification present on the TSN. Unlike the structurally similar and abundant internal mRNA modification m6A, m6Am’s function has been controversial. One way to resolve controversies surrounding mRNA modification functions has been to develop new ways to better profile said mRNA modification. Here, Liu et al. developed a new method (based on GLORI-seq for m6A-sequencing), for antibody-independent sequencing of m6Am (CROWN-seq). Using appropriate spike-in controls and knockout cell lines, Liu et al. clearly demonstrated CROWN-seq’s precision and quantitative accuracy for profiling transcriptome-wide m6Am. Subsequently, the authors used CROWN-seq to greatly expand the number of known m6Am sites in various cell lines and also determine m6Am stoichiometry to generally be high for most genes. CROWN-seq identified gene promoter motifs that correlate best with high stoichiometry m6Am sites, thereby identifying new determinants of m6Am stoichiometry. CROWN-seq also helped reveal that m6Am does not regulate mRNA stability or translation (as opposed to past reported functions). Rather, m6Am stoichiometry correlates well with transcription levels. Finally, Liu et al. reaffirmed that FTO mainly demethylates m6Am, not of mRNA but of snRNAs and snoRNAs.

      Strengths:

      This is a well-written manuscript that describes and validates a new m6Am-sequencing method: CROWN-seq as the first m6Am-sequencing method that can both quantify m6Am stoichiometry and profile m6Am at single-base resolution. These advantages facilitated Liu et al. to uncover new potential findings related to m6Am regulation and function. I am confident that CROWN-seq will likely be the gold standard for m6Am-sequencing henceforth.

      Weaknesses:

      Though the authors have uncovered a potentially new function for m6Am, they need to be clear that without identifying a mechanism, their data might only be demonstrating a correlation between the presence of m6Am and transcriptional regulation rather than causality.

      We thank the reviewer for the very positive assessment of the CROWN-seq method. We have softened the language which is related to the correlation between m<sup>6</sup>Am and transcription regulation.

      Reviewer recommendations:

      We thank the reviewers for their constructive suggestions. In the revised manuscript, we have corrected the errors and updated the requested discussions and figures.

      Reviewer #1 (Recommendations for the authors):

      (1) The prior work from the research group, "Reversible methylation of m6Am in the 5′ cap controls mRNA stability" (PMID: 28002401), should be cited, even if the current findings differ from earlier conclusions-particularly in line 58 and the section titled "m6Am does not substantially influence mRNA stability or translation".

      We thank the reviewer for this comment. We have added the citation.

      (2) I wonder why the authors chose to convert A to I before capping and recapping, as RNA fragmentation caused by chemical treatment may introduce noise into these processes.

      We thank the reviewer for this comment. This is a very good point. We have indeed considered this alternative protocol. There are two concerns in performing decapping-and-recapping before A-to-I conversion: (1) it is unclear whether the 3’-desthiobiotin, which is essential for the 5’ end enrichment, is stable or not during the harsh A-to-I conversion; (2) performing decapping-and-recapping first requires more enzyme and 3’-desthiobiotin-GTP, which are the major cost of the library preparation. This is because the input of CROWN-seq (~1 μg mRNA) is much higher than that in ReCappable-seq (~5 μg total RNA or ~250 ng mRNA). In the current protocol, many 5’ ends are highly fragmented and therefore are lost during the A-to-I conversion. As a result, less enzyme and 3’-desthiobiotin-GTP are needed.

      (3) During CROWN-seq benchmarking, the authors found that 93% of reads mapped to transcription start sites, implying a 7% noise level with a spike-in probe. This noise could lead to false positives in TSN assignments in real samples. It appears that additional filters (e.g., a known TSS within 100 nt) were applied to mitigate false positives. If so, I recommend that the authors clarify these filters in the main text.

      We thank the reviewer for this comment. We think that the spike-in probes might lead to an underestimation of the accuracy of TSN mapping. The spike-in probes are made by in vitro transcription with m<sup>7</sup>Gpppm<sup>6</sup>AmG or m<sup>7</sup>GpppAmG analogs. We found that the in vitro transcription exhibits a small amount of non-specific initiation, which leads to spike-in probes with 5’ ends that are not precisely aligned with the desired TSS. To better illustrate the mapping accuracy of CROWN-seq, we provided Figure 2H, which compares the non-conversion rates of newly found A-TSNs between wild-type and PCIF1 knock cells. If the newly found A-TSNs are real, they should show high non-conversion rates in wild-type cells (i.e., high m<sup>6</sup>Am) and almost zero non-conversion rates (i.e., Am) in PCIF1 knockout cells. As expected, most of the newly found A-TSNs are true A-TSNs since they are m6Am in wild-type and Am in PCIF1 knockout. Thus, we think that CROWN-seq is very precise in TSS mapping. We have clarified this in the Discussion.

      (4) I wonder if PCIF1 knockout affects TSN choice and abundance. If not, this data should be presented. If so, how are these changes accounted for in Figure 2H and Figure S5?

      We thank the reviewer for this comment.  PCIF1 KO does not really affect TSN choice. Here we calculate the correlation of relative TSN expression within genes between wild-type and PCIF1 KO cells (shown using Pearson’s r). It shows that most of the genes have similar TSN choices (with higher Pearson’s r) in both wild-type and PCIF1 KO cells. Thus, PCIF1 KO does not alter global TSN expressions.

      Author response image 1.

      (5) The manuscript refers to Am as a rare modification in mRNA (e.g., introduction lines 101-102; discussion lines 574, 608; and possibly other locations) without specifying this only applies to transcription start sites. As this study does not cover entire mRNA sequences, these statements may not be misleading.

      We thank the reviewer for this comment.  We have clarified it.

      Reviewer #2 (Recommendations for the authors):

      (1) On line 122, the authors state that: "On average, a gene uses 9.5{plus minus}9 (mean and s.d., hereafter) TSNs (Figure 1A)." However, they do not discuss the dispersion apparent in the TSNs they observed. Figure panels 1A, B, and S1A, B show a range of 120 bases or less. What is the predominant range of distances between annotated TSNs and the newly identified ones?

      1a) For example, what percentage of new TSNs fall within 20? 50? 75? bases of the annotated sites? Additional text describing the distribution of these TSNs would help readers better understand the diversity inherent in these novel 5' RNA ends. Notably, this additional text likely is best placed in the CROWN-Seq section related to Figure 2 or S2.

      We thank the reviewer for this comment. We have updated Figure S2 to describe the newly found TSSs. Depending on the coverage in CROWN-seq, the TSSs with higher coverage tend to overlap with or locate proximally to known TSSs. In contrast, the TSSs with low coverage tend to be located further away from annotated TSSs.

      1b) The alternate TSNs can have effects on splicing patterns and isoform identity. Providing a few sentences to explain how regularly this occurs would be helpful.

      We thank the reviewer for this comment. It is a very interesting point. Different TSNs can indeed have different splicing patterns. Although the discovery of splicing patterns regulated by TSNs is out of the scope of this study, we have discussed this possibility in the revised Discussion section.

      (2) On Lines 241 and 242, the authors mentioned that 1284 sites were excluded from the analysis based on low (under 20-explained in the figure legend) read count, distance from TSS, or false negatives (which are not explained). Although I agree that the authors are justified in setting these reads aside, the information could be useful to readers willing to perform follow-up work if their mRNAs of interest were included in these 1284 sites.

      2a) An annotation of all of these sites (broken down by category, i.e. the 811, the 343, and the 130) as a supplementary table should be provided.

      We thank the reviewer for this comment. We have added the categories to the revised Table S1.

      (3) Although I have marked several typos/grammar mistakes in several parts of this review, others exist elsewhere in the text and should be corrected.

      We thank the reviewer for this comment. We have corrected them.

      (4) In lines 122 and 123 the authors say "Only ~9% of genes contain a single TSN (Figure 1A)." However, their figure shows 81% with a single TSN. Why is there a 10% discrepancy?

      We thank the reviewer for this comment. We have corrected the plot in Figure 1A, to match the description.

      (5) The first Tab of Table S2 is labeled 'Legend', but is blank. Is this intentional?

      We thank the reviewer for this comment. We have updated the table legends.

      (6) On lines 70 and 76 of the supplementary figure file pertaining to Figure S2, the legend labels for Figure S2E and S2F are not accurate, they need to be changed to G and H.

      (7) In Figure 4A 'percentile' is misspelled.

      (8) The color-coding legend for the 4 bases is missing from (and should be added to) Figure S4A.

      (9) On Lines 984, 1163, and 1194 the '2s' should be properly sub-scripted where appropriate.

      For (6) to (9). We thank the reviewer for finding these issues. We have now corrected them.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors should discuss if their results can definitively distinguish between the SSCA+1GC motif promoting m6Am that, in turn, promotes transcription, versus the SCA+1GC motif promoting m6Am but also separately promoting transcription in a m6Am-independent manner. The authors should also discuss this in light of recent findings by An et al. (2024 Mol. Cell), which support the former conclusion.

      We thank the reviewer for the suggestion. We now have updated the Discussion to address that our paper and An et al. can support each other.

      (2) Given that the authors showed m6Am promotes gene expression (Figure 5) but does not affect mRNA stability (Fig. S5), logic dictates that m6Am must regulate mRNA transcription. However, the authors should explain why this regulation focuses on the initiation aspect of transcription rather than other aspects of transcriptional e.g. premature termination, pause release, and elongation.

      We thank the reviewer for this comment. In this study, we did not profile the 3’ ends of nascent RNAs and thus we can only make conclusions about the overall transcription process but not a specific aspect. We have updated the revised Discussion section to mention that An et al. discovered that m<sup>6</sup>Am can sequester PCF11 and thus promote transcription, and therefore some of the effects we see could be related to differential premature termination.

      (3) Authors should add alternative versions of Figure 1D but with 3 colours corresponding to Am vs. m6Am vs. Cm/Gm/Um for all the cells, they performed CROWN-seq on.

      We thank the reviewer for this comment. We have updated Figure S5 as the corresponding figure showing the fraction of Am vs. m6Am vs. Cm/Gm/Um.

      (4) Figure 2H (left): Please comment on the few outliers that still show high non-conversion even in PCIF1-KO cells.

      We thank the reviewer for this comment. We have discussed the outliers in the main text. These outliers can be found in the revised Table S3.

      (5) Line 254: "Second, if these sites were RNA fragments they would not contain m6Am." is missing a comma.

      (6) S2G and S2H labelling in Figure S2 legends is wrong.

      For (5) and (6). We thank the reviewer for these comments. We have corrected them.

      (7) Figure 3D: Many gene names are printed multiple times (e.g. ACTB is printed 5 times). Is this correct; is each dot representing 1 cell line?

      We thank the reviewer for this comment. These gene names represent different transcription-start nucleotides. We now clarify that each instance refers to a different start site.

      (8) S5A-C: Even if there's no substantial difference, authors should still display the Student's T-test P-values as they did for S5D-G.

      We thank the reviewer for this comment. We have updated the P-values.

      (9) Figure 5C and S5E: Why are the authors not showing the respective analysis for C-TSN and U-TSN genes?

      We thank the reviewer for this comment. Most mRNAs start with A or G. We therefore selected G-TSN as the control. Unlike G-TSNs which occur in diverse sequence and promoter contexts, C-TSNs and U-TSNs are unusual. Genes that mainly use C-TSNs and U-TSNs are the so-called “5’ TOP (Terminal OligoPyrimidine)” genes. The 5’ TOP genes are mostly genes related to translation and metabolism, and thus their expressions reflect the homeostasis of cell metabolism. Thus, we were concerned that any differential expression of the C-TSN and U-TSN genes between wild-type and PCIF1 knockout cells might reflect specific effects on TOP transcriptional regulation rather than the general effects of PCIF1 on transcription.

      (10) Line 82, 470, 506, 676: The authors should also cite Koh et al (2019 Nat. Comm.) in these lines that describe how snRNAs can also be m6Am-methylated and how FTO targets these same snRNAs for demethylation.

      We thank the reviewer for this comment. We have updated the citation.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript presents a method to infer causality between two genes (and potentially proteins or other molecules) based on the non-genetic fluctuations among cells using a version of the dual-reporter assay as a causal control, where one half of the dual-reporter pair is causally decoupled, as it is inactive. The authors propose a statistical invariant identity to formalize this idea. 

      We thank the referee for this summary of our work. 

      Strengths: 

      The paper outlines a theoretical formalism, which, if experimentally used, can be useful in causal network inference, which is a great need in the study of biological systems. 

      We thank the referee for highlighting the potential value of our proposed method.

      Weaknesses: 

      The practical utility of this method may not be straightforward and potentially be quite difficult to execute. Additionally, further investigations are needed to provide evidence of the broad applicability of the method to naturally occurring systems and its scalability beyond the simple circuit in which it is experimentally demonstrated. 

      We agree with these two points and have rewritten the manuscript, in particular highlighting the considerable future work that remains to be done to establish the broad applicability and scalability of our method.

      In the rewritten manuscript we explicitly spell out potential practical issues and we explicitly state that our presented proof–of–principle feasibility study does not guarantee that our method will successfully work in systems beyond the narrowly sampled test circuits. This helps readers to clearly distinguish between what we claim to have done from what remains to be done. The re-written parts and additional clarifications are:

      Abstract (p. 1), Introduction (p. 1-2), Sec. “Proposed additional tests” (p. 8), and “Limitations of this study” (p. 10).

      Reviewer #2 (Public Review): 

      Summary: 

      This paper describes a new approach to detecting directed causal interactions between two genes without directly perturbing either gene. To check whether gene X influences gene Z, a reporter gene (Y) is engineered into the cell in such a way that (1) Y is under the same transcriptional control as X, and (2) Y does not influence Z. Then, under the null hypothesis that X does not affect Z, the authors derive an equation that describes the relationship between the covariance of X and Z and the covariance of Y and Z. Violation of this relationship can then be used to detect causality. 

      The authors benchmark their approach experimentally in several synthetic circuits. In four positive control circuits, X is a TetR-YFP fusion protein that represses Z, which is an RFP reporter. The proposed approach detected the repression interaction in two or three of the positive control circuits. The authors constructed sixteen negative control circuit designs in which X was again TetR-YFP, but where Z was either a constitutively expressed reporter or simply the cellular growth rate. The proposed method detected a causal effect in one of the eight negative controls, which the authors argue is not a false positive, but due to an unexpected causal effect. Overall, the data support the practical usefulness of the proposed approach. 

      We thank the referee for their summary of our work.

      Strengths: 

      The idea of a "no-causality control" in the context of detected directed gene interactions is a valuable conceptual advance that could potentially see play in a variety of settings where perturbation-based causality detection experiments are made difficult by practical considerations. 

      By proving their mathematical result in the context of a continuous-time Markov chain, the authors use a more realistic model of the cell than, for instance, a set of deterministic ordinary differential equations. 

      We thank the referee for summarizing the value of our work. 

      Caveats: 

      The term "causally" is used in the main-text statement of the central theorem (Eq 2) without a definition of this term. This makes it difficult to fully understand the statement of the paper's central theorem without diving into the supplement.  

      We thank the referee for this suggestion. In the revised manuscript we now define causal effects right before the statement of the main theorem of the main text (p. 2). We have also added a definition of the causal network arrows in the caption of Fig. 1 to help readers better understand our central claim.

      The basic argument of theorem 1 appears to rely on establishing that x(t) and y(t) are independent of their initial conditions. Yet, there appear to be some scenarios where this property breaks down: 

      (1) Theorem 1 does not seem to hold in the edge case where R=beta=W=0, meaning that the components of interest do not vary with time, or perhaps vary in time only due to measurement noise. In this case x(t), y(t), and z(t) depend on x(0), y(0), and z(0). Since the distributions of x(0), y(0), and z(0) are unspecified, a counterexample to the theorem may be readily constructed by manipulating the covariance matrix of x(0), y(0), and z(0). 

      (2) A similar problem may occur when transition probabilities decay with time. For example, suppose that again R=0 and X are degraded by a protease (B), but this protease is subject to its own first-order degradation. The deterministic version of this situation can be written, for example, dx/dt=-bx and db/dt=-b. In this system, x(t) approaches x(0)exp(-b(0)) for large t. Thus, as above, x(t) depends on x(0). If similar dynamics apply to the Y and Z genes, we can make all genes depend on their initial conditions, thus producing a pathology analogous to the above example. 

      The reviewer does not know when such examples may occur in (bio)physical systems. Nevertheless, since one of the advantages of mathematics is the ability to correctly identify the domain of validity for a claim, the present work would be strengthened by "building a fence" around these edge cases, either by identifying the comprehensive set of such edge cases and explicitly prohibiting them in a stated assumption set, or by pointing out how the existing assumptions already exclude them.  

      We thank the referee for bringing to our attention these edge cases that indeed violate our theorem as stated. In the revised manuscript we have “built a fence” around these edge cases by adding two requirements to the premise of our theorem: First, we have added the requirement that the degradation rate does not decay to zero for any possible realization. That is, if beta(t) is the degradation rate of X and Y for a particular cell over time, then taking the time average of beta(t) over all time must be non-zero. Second, we have added the requirement that the system has evolved for enough time such that the dual reporter averages <x> and <y>, along with the covariances Cov(x, z_{k}) and Cov(y, z_{k}) have reached a time-independent stationary state.  

      With these requirements, no assumptions need to be made about the initial conditions of the system, because any differences in the initial conditions will decay away as the system reaches stationarity. For instance, the referee’s example (1) is not possible with these requirements because beta(t) can no longer remain zero. Additionally, example (2) is no longer possible because the time average of the degradation rate would be zero, which is no longer allowed (i.e., we would have that integral from 0 to T of b(0)exp(-t)/T dt =  0 when T goes to infinity). 

      Note that adding the condition that degradation cannot decay to exactly zero does not reduce the biological applicability of the theorem. But as the referee correctly points out any mathematical theorem needs to be accurately stated and stand on its own regardless of whether biological systems could realize particular edge cases. Also note, that the requirement that the cellular ensemble has reached a time-independent distribution of cell-to-cell variability can be (approximately) experimentally verified by taking snapshots of ensemble variability at two sufficiently separate different moments in time. 

      In response to the referee’s comment, we have added the above requirements when stating the theorem in the main text. We have also added the requirement of non-decay of the degradation rate to the definition of the system in SI Sec. 4, along with the stationarity requirement in theorem 1 in SI Sec 5. We have also added mathematical details to the proof of the invariant in SI Sec 5.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      This manuscript presents a method to infer causality between two genes (and potentially proteins or other molecules) based on the non-genetic fluctuations among cells using a version of the dual-reporter assay as a causal control, where one half of the dual-reporter pair is causally decoupled, as it is inactive. The authors propose a statistical invariant identity to formalize this idea. They propose and experimentally demonstrate the utility of this idea with a synthetic reporter system in bacteria. 

      The paper is well written and clearly outlines the principle, the mathematical invariant relationship both to give the reader an intuitive understanding of why the relationship must be true and in their mathematical derivation of the proof of Theorem 1. 

      The paper outlines a theoretical formalism, which, if experimentally used, can be useful in causal network inference, which is a great need in the study of biological systems. However, the practical utility of this method may not be straightforward and potentially be quite difficult to execute. We think this work could offer a platform to advance the field of network inference, but would encourage the authors to address the following comments. 

      We thank the reviewer for the positive comments on readability, summarizing the value of our work, as well as the critical comments below that helped us improve the manuscript.

      Major comments: 

      (1) Although the invariant identity seems theoretically sound, the data from synthetic engineered circuits in this manuscript do not support that the invariant holds for natural causal relations between genes in wild-type cells. In all the positive control synthetic circuits (numbers 1 to 4) the target gene Z i.e. RFP was always on the plasmid, and in circuit #4 there was an additional endogenous copy. The authors recapitulate the X-to-Z causality in circuits 1, 2, and 3 but not 4. Ultimately, the utility of this method lies in the ability to capture causality from endogenous correlations, this observation suggests that the method might not be useful for that task. 

      We thank the referee for their careful reading of our synthetic circuits and sincerely apologize for an error in our description of circuit #4 in the schematic of Table S2 of the supplement. We incorrectly stated that this circuit contained a chromosomally expressed RFP. In fact, in circuit #4 RFP was only on the plasmid just like in the circuits #1-3. We have corrected the schematic in the revised manuscript and have verified that the other circuits are correctly depicted.

      In the revised manuscript, we now explicitly spell out that all our “positive control” test cases had the genes of interest expressed on plasmids, and that we have not shown that our method successfully detected causal interactions in a chromosomally encoded gene regulatory circuit, see additional statements in Sec. “Causally connected genes that break the invariant” on p. 6. 

      In the absence of any explicit experimental evidence, it is then important to consider whether chromosomally encoded circuits are expected to cause problems for our method which is based on a fluctuation test. Due to plasmid copy number fluctuations, X and Z will fluctuate significantly more when expressed on plasmids than when expressed chromosomally. However, because this additional variability is shared between X and Z it does not help our analysis which relies on stochastic differences in X and Z expression due to “intrinsic noise” effects downstream of copy number fluctuations. The additional “extrinsic noise” fluctuations due to plasmid copy number variability would wash out violations of Eq. (2) rather than amplify them. If anything, we thus expect our test cases to have been harder to analyze than endogenous fluctuations. This theoretical expectation is indeed borne out by numerical test cases presented in the revised supplement where plasmid copy fluctuations severely reduced the violations of Eq. 2, see new additional SI Sec. 15. 

      Additionally, the case of the outlier circuit (number 12) suggests that exogenous expression of certain genes may lead to an imbalance of natural stoichiometry and lead to indirect effects on target genes which can be misinterpreted as causal relations. Knocking out the endogenous copy may potentially ameliorate this issue but that remains to be tested. 

      We agree with the referee that the expression of exogenous genetic reporters can potentially affect cellular physiology and lead to undesired effects. In the revised manuscript we now explicitly spell out that the metabolic burden or the phototoxicity of introducing fluorescent proteins could in principle cause artificial interactions that do not correspond to the natural gene regulatory network, see Sec. “Proposed additional tests” on p. 8.

      However, it is also important to consider that the test circuit #12 represents a synthetic circuit with genes that were expressed at extremely high levels (discussed in 3rd paragraph of Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit”, p. 8), which led to the presumed cellular burden. Arguably, natural systems would not typically exhibit such high expression levels, but importantly even if they did, our method does not necessarily rely on fluorescently tagged proteins but can, in principle, also be applied to other methods such as transcript counting through sequencing or in-situ hybridization of fluorescent probes.  

      Ultimately, the value of this manuscript will be greatly elevated if the authors successfully demonstrate the recapitulation of some known naturally existing causal and non-causal relations. For this, the authors can choose any endogenous gene Z that is causally controlled by gene X. The gene X can be on the exogenous plasmid along with the reporter and the shared promoter. Same for another gene Z' which is not causally controlled by gene X. Potentially a knockout of endogenous X may be required but it might depend  on what genes are chosen. 

      If the authors think the above experiments are outside the scope of this manuscript, they should at least address these issues and comment on how this method could be effectively used by other labs to deduce causal relations between their favorite genes. 

      Because a full analysis of naturally occurring gene interactions was beyond the scope of our work, we agree with the referee’s suggestion to add a section to discuss the limitations of our experimental results. In the revised manuscript we reiterate that additional investigations are needed to show that the method works to detect causal interactions between endogenous genes, see Abstract (p. 1), Introduction (p. 1-2), Sec. “Proposed additional tests” (p. 8), and “Limitations of this study”  (p. 9). In the original manuscript we explicitly spelled out how other researchers can potentially carry out this further work in the subsections titled “Transcriptional dual reporters” (p. 3) and ”Translational dual reporters” (p. 3).  In the revised manuscript, we have added a section “Proposed additional tests” (p. 8) in which we propose an experiment analogous to the one proposed by the referee above, involving an endogenous gene circuit found in E. coli, as an example to test our invariant. 

      (2) For a theoretical exposition that is convincing, we suggest the authors simulate a larger network (for instance, a network with >10 nodes), like the one shown schematically in Figure 1, and demonstrate that the invariant relationship holds for the causally disconnected entities, but is violated for the causally related entities. It would also be interesting to see if any quantification for the casual distance between "X" and the different causally related entities could be inferred.  

      We thank the referee for this suggestion. We have added SI Sec. 14 where we present simulation results of a larger network with 10 nodes. We find that all of the components not affected by X satisfy Eq. (2) as they must. However, it is important to consider that we have analytically proven the invariant of Eq. (2) for all possible systems. It provably applies equally to networks with 5, 100, or 10,000 components. The main purpose of the simulations presented in Fig. (2) is to illustrate our results and to show that correlation coefficients do not satisfy such an invariant. However, they are not used as a proof of our mathematical statements.

      We thank the referee for the interesting suggestion of quantifying a “causal distance”. Unfortunately, the degree to which Eq. (2) is violated cannot directly equate to an absolute measure for the “causal distance” of an interaction. This is because both the strength of the interaction and the size of the stochastic fluctuations in X affect the degree to which Eq. (2) is violated. The distance from the line should thus be interpreted as a lower bound on the causal effect from X to Z because we do not know the magnitude of stochastic effects inherent to the expression of the dual reporters X and Y. While the dual reporters X and Y are identically regulated, they will differ due to stochastic fluctuations. Propagation of these fluctuations from X to Z are what creates an asymmetry between the normalized covariances. In the most extreme example, if X and Y do not exhibit any stochastic fluctuations we have x(t)=y(t) for all times and Eq. (2) will not be violated even in the presence of a strong causal link from X to Z.

      However, it might be possible to infer a relative causal distance to compare causal interactions within cells.

      That is, in a given network, the normalized covariances between X, Y and two other components of interest Z1, Z2 that are affected by X can be compared. If the asymmetry between (η𝑥𝑧1 , η𝑦𝑧1) is larger than the asymmetry between (η𝑥𝑧2 , η𝑦𝑧2) , then we might be able to conclude that X affects Z1 with a stronger interaction than the interaction from X to Z2, because here the intrinsic fluctuations in X are the same in both cases. 

      In response to the referee’s comment and to test the idea of a relative causal distance, we have simulated a larger network made of 10 components. In this network, X affects a cascade of components called Z8, Z9, and Z10, see the additional SI Sec. 14. Here the idea of a causal distance can be defined as the distance down the cascade: Z8 is closest to X and so has the largest causal strength, whereas Z10 has the weakest. Indeed, simulating this system we find that the asymmetry between η𝑥𝑧8 and η𝑦𝑧8 is the largest whereas that between  η𝑥𝑧10 and η𝑦𝑧10 the smallest. We also find that all of the components not affected by X have normalized covariances that satisfy Eq. (2). This result suggests that the relative causal distance or strength in a network could potentially be estimated from the degree of the violations of Eq. (2). 

      However, we note that these are preliminary results. In the case of the specific regulatory cascade now considered in SI Sec. 14, the idea of a causal distance can be well defined. Once feedback is introduced into the system, this definition may no longer make sense. For instance, consider the same network that we simulate in SI Sec. 14, but where the most downstream component in the cascade, Z10, feeds back and affects X and Y. In such a circuit it is unclear whether Z8 or Z10 is “causally closer” to X. A more thorough theoretical analysis, equipped with a more universal quantitative definition for causal distance or strength, would be needed to deduce what information can be inferred from the relative distances in the violations of Eq. (2). While this defines an interesting research question, answering it goes beyond the scope of the current manuscript. 

      Minor comments: 

      - The method relies on the gene X and the reporter Y having the same control which would result in similar dynamics. The authors do not quantitatively compare the YFP and CFP expression if this indeed holds for the synthetic circuits. It would be useful to know how much deviation between the two can be tolerated while not affecting the outcome. 

      We thank the referee for their comment. The invariant of Eq. (2) is indeed only guaranteed to hold only when the transcription rate of Y is proportional to that of X. How much levels of X and Y covary depends on the stochastic effects intrinsic to the expression of the dual reporters as well as how similar the transcriptional control of X and Y is. The stochastic difference between X and Y is exactly what we exploit. 

      However, in the limit of high YFP and CFP levels, intrinsic fluctuations that cause stochastic expression differences between X and Y become negligible and we can directly infer whether they are indeed tightly co-regulated from time-traces: Below, we show two single cell traces taken with our experimental setup in which the YFP and CFP fluorescence trajectories are almost exactly proportional. Both of these traces are from circuit #10 as defined in Table. S4. 

      Author response image 1.

      We chose the above traces because they showed the highest correlation between YFP and CFP levels. Other traces for lower expression levels have lower correlations due to effects of intrinsic noise (see Tables S2-S4). However, the existence of one trace in which YFP is almost perfectly proportional to CFP throughout can only occur if the YFP and CFP genes are under the same control. And, since the control of YFP and CFP genes in all of our synthetic circuits are identical (with the same promoters and plasmid positions), these data strongly suggest that our dual reporters are tightly co-regulated in all the synthetic circuits. Moreover, the negative control experiments presented in Fig. 3E provide a natural consistency check that the YFP and CFP are under the same control and satisfy Eq. (1).

      We agree that it would be useful to know how much the X and Y production rates can differ for Eq. (2) to hold. Importantly, our proven theorem already allows for the rates to differ by an unspecified proportionality constant. In response to the referee’s comment we have derived a more general condition under which our approach holds. In the newly added SI Sec. 7 we prove that Eq. (2) holds also when rates differ as long as the difference is stochastic in nature with an average of zero. We also prove that Eq. (2) holds in the face of multiplicative noise that is independent of the X and Y production rates.

      However, the production rates of X and Y cannot differ in all ways. Some types of differences between the X and Y production rates can lead to deviations of Eq. (2) even when there is no causal interaction. To highlight this, we added the results of simulations of a toy model in which the X and Y production rates differ by an additive noise term that does not average to zero, see Fig. S19B of the newly added SI Sec. 7.

      - The invariant should potentially hold true for any biological species that are causally related e.g. protein-protein interactions. Also, this method could potentially find many applications in eukaryotic cells. Although it's outside the scope of current work to experimentally demonstrate such applications, the authors should comment on experimental strategies to apply this method to overcome potential pitfalls (e.g. presence of enhancers in eukaryotic cells). 

      We thank the referee for this suggestion. We agree that there are potential pitfalls that could come into effect when our proposed approach is applied on more complex systems such as eukaryotic gene expression. In response to the referee’s comment, we have added an explicit discussion of these potential pitfalls in the discussion section “Limitations of this study” (see p. 10). 

      In particular, in eukaryotes there are many genes in which promoter sequences may not be the sole factor determining transcription rates. Other factors that can be involved in gene regulation include the presence of enhancers, epigenetic modifications, and bursts in gene expression, to name a few. We thus propose a few strategies, which include positioning the passive reporter at a similar gene loci as the gene of interest, measuring the gene regulation activities of the gene of interest and its passive reporter using a separate method, and exploiting the invariant with a third gene, where it is known there is no causal interaction, as a consistency check. In addition, we include in the SI a new section SI Sec. 8 which shows that the invariant holds in the face of many types of bursty gene expression dynamics.

      However, the above is not a comprehensive list. Some of the issues the referee mentions are serious and may not be straightforward to overcome. We now spell this out explicitly in the revised manuscript (p. 10). 

      - In the legend of Fig. 1, the sentence "Data points here are for..." is missing a few words, or needs to be rephrased. 

      We thank the referee for this comment. We have rewritten the figure caption, which now reads “Data points are numerical simulations of specific example networks (see SI for details) to illustrate the analytically proven theorem of Eq. 2.”

      - Fig. 2 talks about the uncertainties associated with each point on the scatter plots. However, it is difficult to understand the quantification in such a plot. It would be great to have a plot quantifying the uncertainties in the invariant relation for the different topologies studied, specifically in order to understand if one topology is consistently deviating more from the x=y line than the other topologies studied here.  

      We thank the referee for this suggestion. In the supplement of the revised manuscript we have added supplemental Figs. S3, S4, and  S5 to separately quantify the uncertainty of the difference processes plotted in Fig. 2 and have added a new section (SI Sec. 11) to discuss the processes simulated in Fig. 2 in more detail. In short, each simulated process generated less than ~5% of outliers when considering 95% confidence intervals (with the max percentage deviation being 5.01% for process 5, see Fig. S5). These outliers were then simulated over a larger number of simulations to reduce the sampling error, which resulted in 0% of outliers (see Sec. “Confidence intervals for finite sampling error” on Materials and Methods on p. 11). Some simulated processes generated larger percentage errors in the normalized covariances than others, but this is expected as different processes have different dynamics which will result in different degrees of sampling of the underlying distributions.

      Note, that the invariant of Eq. 2 is analytically proven for all tested topologies as none of the topologies include a causal effect from X to Z. Any deviation of the numerical data from the straight line prediction of Eq. 2 (right column in Fig. 2C) is due to the finite sampling of a stochastic process to estimate the true covariance from the sampling covariance. Any given parameter set was simulated several times which allowed us to estimate the sampling error from differences in between repeated samples. In the additional SI figures we now quantify this error for the different topologies. 

      In addition to the above changes we want to highlight that the purpose of the simulations presented in Fig. (2) is not to prove our statements or explore the behavior of different topologies. The purpose of the data presented in the right column of Fig. 2C is to illustrate the theoretical invariant and act as a numerical sanity check of our analytically proven result. In contrast, the data in the left column of Fig 2C illustrates that the correlations do not satisfy an invariant like Eq. 2 which applies to covariances but not correlations.  

      - The legend for Fig. 3 seems to end abruptly. There likely needs to be more.  

      We thank the referee for catching this mistake. We have corrected the accidentally truncated figure caption of Fig. 3.

      - There is a typo in equation (5.3) on page 23 of supplementary material, there should be x instead of y in the degradation equation of x. 

      We thank the referee for catching this mistake which has been corrected in the revised manuscript.

      - In the supplemental material, to understand the unexpected novel discovery of causality, Figure S5 is presented. However, this doesn't give the context for other negative controls designed, and the effect of rfp dynamics (which can be seen in the plots both in the main paper and the supplement) in the growth rate of cells in those constructs. As a baseline, it would be nice to have those figures.  

      We thank the referee for this suggestion. We have now included representative RFP traces with the growth rates for other negative control circuits, see Fig. S10. In addition, we have now included the cross correlation functions between RFP and growth rate in these negative control circuits, see Fig. S10A. While in all cases, RFP and growth rate are negatively correlated, the outlier circuit exhibits the largest negative correlation.

      The suggested comparison of the referee thus highlights that – in isolation – a negative correlation between RFP and growth rate is only weak evidence for our hypothesized causal interaction because negative correlations can result from the effect of growth rate affecting volume dilution and thus RFP concentration. Crucially, we thus additionally considered the overall variability of growth rate and found the outlier circuit has the largest growth rate variability which is indicative of something that is affecting the growth rate of those cells, see Fig. S10B. To compare the magnitude of RFP variability against other strains requires constraining the comparison group to other synthetic circuits that have RFP located on the chromosome rather than a plasmid. This is why we compare the CV of the outlier with the CV of circuit #5, which corresponds to the “regular” repressilator (i.e., the outlier circuit without the endogenous lacI gene). As an additional comparison, we computed the CV for a strain of E. coli that does not contain a synthetic plasmid at all, but still contains the RFP gene on the chromosome. We find that the CVs in the outlier circuit to be larger than in these two additional circuits, suggesting that the outlier circuit causes additional fluctuations in the RFP and growth rate. We now spell this out explicitly in the revised manuscript (see Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit“, p. 8).

      The referee is correct that the above arguments are only circumstantial evidence, but they do show that the data is consistent with a plausible explanation of the hypothesized causal interaction. Our main evidence for an RpoS mediated stress response that explains the deviations from Eq. 2 in the outlier circuit is the perturbation experiment in which the deviation disappears for the RpoS knockout strain. We now spell out this argument explicitly in the revised manuscript (see Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit“, p. 8).

      Reviewer #2 (Recommendations For The Authors): 

      The proof of theorem 1 relies on an earlier result, lemma 1. Lemma 1 only guarantees the existence of a "dummy" system that satisfies the separation requirement and preserves the dynamics of X and Y. However, in principle, it may be possible to maintain the dynamics of X and Y while still changing the relationship between Cov(X,Zk) and Cov(Y,Zk). This could occur if the dynamics of Zk differ in a particular way between the original system and the dummy system. So lemma 1 needs to be a little stronger- it needs  to mention that the dynamics of Zk are preserved, or something along these lines. The proof of lemma 1 appears to contain the necessary ingredients for what is actually needed, but this should be clarified. 

      We agree with the referee that this is an important distinction. Lemma 1 does in fact guarantee that any component Zk that is not affected by X and Y will have the same dynamics in the “dummy” system. However, as the referee points out, this is not stated in the lemma statement nor in the proof of the lemma. In response to the referee’s comment, we have made it clear in the lemma statement that the Zk dynamics are preserved in the “dummy” system, and we have also added details to the proof to show that this is the case, see Lemma 1 on p. 27 of the SI. 

      Readers who are familiar with chemical reaction diagrams, but not birth-death process diagrams may waste some time trying to interpret Equation 1 as a chemical reaction diagram with some sort of rate constant as a label on each arrow (I did this). It may be helpful to either provide a self-contained definition of the notation used, or mention a source where the necessary definitions can be found. 

      We agree with the referee. In the revised manuscript we have added a description of the notation used below Equation 1 of the main text, see p. 2. The notational overloading of the “arrow notation” is a perennial problem in the field and we thank the referee for reminding us of the need to clarify what the arrows mean in our diagrams.

      It would be helpful if the authors could propose a rule for deciding whether dependence is detected or not. As it stands presently, the output of the approach seems to be a chart like that in Figure 3D where you show eta_xz and eta_yz with confidence interval bars and the reader must visually assess whether the points more-or-less fall on the line of unity. It would be better to have some systematic procedure for making a "yes or no" call as to whether a causal link was detected or not. Having a systematic detection rule would allow you to make a call as to whether dependence in circuit 3 was detected or not. It would also allow you or a future effort to evaluate the true positive rate of the approach in simulated settings. 

      We thank the referee for this suggestion. In the revised manuscript we have added an explicit rule for detecting causality using the invariant of Eq. (2). Specifically, Eq. (2) can be re-written as r = 1 where r is the covariability ratio r = etaXZ/etaYZ. In that case, given 95% confidence intervals for the experimentally determined covariability ratio r, we say that there is a causal interaction if the confidence intervals overlap with the value of r = 1. 

      This corresponds to a null hypothesis test at the 2.5% significance level. The reason that it is at 2.5% significance and not 5% significance is as follows. Let’s say we measure a covariability ratio of r_m, and the 95% confidence interval is [r_m - e_m, r_m + e_m] for some error e_m. Without loss of generality, let’s say that r_m > 1 (the same applies if r_m < 1). This means that Prob(r < r_m - e_m) = 2.5% and Prob(r > r_m + e_m) = 2.5% , where r is the actual value of the covariability ratio. Under the null hypothesis that there is no causal interaction, we set r = 1. However, we now have Prob(1 < r_m + e_m) = 0, because we know that r_m > 1 and so we must have r_m + e_m > 1. The probability that the value of 1 falls outside the error bars is therefore 2.5% under the null hypothesis. 

      This proposed rule is the same rule that we used to detect statistical outliers in our simulations, where we found a “false positive” rate of 2.3% over 6522 simulated systems due to statistical sampling error (as discussed in the Materials and Methods section). In response to the referee’s suggestion, we have added the section “A rule for detecting causality in the face of measurement uncertainty” (p. 4). We also apply the rule to the experimental data and find that the rule detects 2/4 causal interactions in Fig. 3D. We have clarified this in the Fig. 3D caption, in the main text, and we have added a figure in the SI (Fig. S2) where we apply the null hypothesis test on the measured covariability ratios. 

      Note, whether the third interaction is “detected” or not depends on the cut-off value used. We picked the most common 95% rule to be consistent with the traditional statistical approaches. With this rule one of the data points lies right at the cusp of detection, but ultimately falls into the “undetected” category if a strictly binary answer is sought under the above rule. 

      It would be helpful to mention what happens when the abundance of a species hits zero. Specifically, there are two ways to interpret the arrow from X to X+d with a W on top: 

      Interpretation (1): 

      P(X+d | X) = W if X+d {greater than or equal to} 0  P(X+d | X) = 0 if X_i+d_i < 0 for at least one i 

      Interpretation (2): 

      P(X+d | X) = W regardless of whether X+d < 0  W = 0 whenever X_i < d_i for at least one i 

      Interpretation (1) corresponds to a graph where the states are indexed on the non-negative integers. Interpretation (2) corresponds to a graph where the states are indexed on the integers (positive or negative), and W is responsible for enforcing the non-negativity of mass. I believe you need the second interpretation because the first interpretation leads to problems with your definition of causality. For example, consider the reaction: 

      (Na, K) -- 0.1 --> (Na-1, K+1) 

      This could occur if Na and K are the intracellular concentrations of sodium and potassium ions in a cell that has an ATP-driven sodium-potassium exchanger whose rate is limited by the frequency with which extracellular potassium ions happen to flow by. Per the definition of causality found in the appendix, Na has no causal effect on K since Na does not show up in the reaction rate term. However, under interpretation (1), Na clearly has a causal effect on K according to a reasonable definition of causality because if Na=0, then the reaction cannot proceed, whereas if Na>0 then it can. However, under interpretation (2), the reaction above cannot exist and so this scenario is excluded. 

      We thank the referee for this comment that helped us clarify the meaning of arrows with propensities. In short, interpretation (2) corresponds to the definition of our stochastic systems. This is consistent with the standard notation used for the chemical master equation. As the referee points out, because molecular abundances cannot be negative, any biochemical system must then have the property that the propensity of a reaction must be equal to zero when the system is in a state in which an occurrence of that reaction would take one of the abundances to negative numbers. Stochastic networks that do not have this property cannot correspond to biochemical reaction networks.

      In the revised manuscript, we now spell this out explicitly to avoid any confusion, see SI page 25.

      Furthermore, we additionally discuss the referee’s example in which the rate of exchanging Na for K through an ion exchanger is approximately independent of the intracellular Na concentration. Because biochemical systems cannot become negative, it cannot be that the rate is truly constant, but at some point for low concentrations must go down until it becomes exactly zero for zero molecules. 

      Importantly, agreement with Eq. (2) does not imply that there is no causal effect from X to Zk. It is the deviation from Eq. (2) that implies the existence of a causal effect from X to Zk. Therefore, although the above referee’s example would constitute a causal interaction in our framework, it would not lead to a deviation of Eq. (2) because the fluctuations in Na (which we exploit) do not propagate to K. From a practical point of view, our method thus detects whether changing X over the observed range affects the production and degradation rates of Zk. 

      In the course of setting up the negative control benchmark circuits, a perturbation-based causal validation would be nice. For instance, first, verify that X does not affect Z by intervening on X (e.g. changing its copy number or putting it under the control of an inducible promoter), and ensuring that Z's activity is not affected by such interventions upon X. This approach would help to adjudicate questions of whether the negative control circuits actually have an unknown causal link. The existing benchmark is already reasonably solid in my view, and I do not know how feasible this would be with the authors' setup, but I think that a perturbation-based validation could in principle be the gold standard benchmark.  

      We agree that additional perturbation-based validation tests on all of the negative control circuits would indeed improve the evidence that our method worked as advertised. While such experiments are indeed beyond the scope of our current work we now explicitly point out the benefits of such additional controls in the revised Discussion.

      Below is a series of comments about typography, mostly about section 4 of the supplement. 

      We thank the referee for their careful reading and highlighting those mistakes.

      At the bottom of page 21, Z_aff is defined as the set of components that are affected by X. However, later Z_aff seems to refer to components affected by X or Y. For instance, in the proof of lemma 1, it is written "However, because a is part of z_aff, the {ak} variables must be affected by X and/or Y." 

      We thank the referee for catching this mistake. We have changed the definition of Z_aff throughout the supplement to refer to components affected by X or Y. If it can be experimentally ensured that Y is a passive reporter (i.e., it does not affect other components in the cell), then the theorem can only be violated if X affects Z. 

      In the equation following Eq 5.2, W_k and d_k should be W_i and d_i ?  

      Yes, the referee is correct. In the revised manuscript we have corrected W_k and d_k to W_i and d_i. 

      In Eq 5.3 in the lower-left transition diagram, I think a "y" should be an "x". 

      Yes, the referee is correct. In the revised manuscript  we have fixed this typo.

      In the master equation above Eq 5.5, the "R" terms for the y reactions are missing the alpha term, and I think two of the beta terms need to be multiplied by x and y respectively.  

      The referee is correct. In the revised manuscript  we have fixed this typo.

      The notation of Eq 5.8, where z_k(t) is the conditional expectation of z_kt, is strange and difficult to follow. Why does z_k(t) not get a bar over it like its counterparts for x, y, R, and beta? The bars, although not a perfect solution, do help.  

      We agree with the referee’s comment and have added further explanations to define the averages in question, see SI p. 28. In short, when we condition on the history of the components not affected by X or Y, we in effect condition on the time trajectories of z_{k} (when it is part of the components not affected by X and/or Y) and beta (since it only depends on the components not affected by X or Y). We thus previously did not include the bars when taking the averages of these components in the conditional space because the conditioning in effect sets their time-trajectories (so they become deterministic functions of time). In the revised manuscript we now also denote these conditional expectations with bars and we have added comments to the proof to clarify their definition.

      I think it would be helpful to show how the relationship <x>=<y>/alpha is obtained from Eq 5.5.  

      We agree with this suggestion and have added the derivations, see Eqs. (5.9) - (5.13) in the revised SI. 

      In the main text, the legend of Fig 3 cuts off mid-sentence.  

      We thank the referee for catching this mistake which has been fixed in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Oor et al. report the potentially independent effects of the spatial and feature-based selection history on visuomotor choices. They outline compelling evidence, tracking the dynamic history effects based on their clever experimental design (urgent version of the search task). Their finding broadens the framework to identify variables contributing to choice behavior and their neural correlates in future studies.

      Strengths:

      In their urgent search task, the variable processing time of the visual cue leads to a dichotomy in choice performance - uninformed guesses vs. informed choices. Oor et al. did rigorous analyses to find a stronger influence of the location-based selection history on the uninformed guesses and a stronger influence of the feature-based selection history on the informed choices. It is a fundamental finding that contributes to understanding the drivers of behavioral variance. The results are clear.

      Weaknesses:

      (1) In this urgent search task, as the authors stated in line 724, the variability in performance was mainly driven by the amount of time available for processing the visual cue. The authors used processing time (PT) as the proxy for this "time available for processing the visual cue." But PT itself is already a measure of behavioral variance since it is also determined by the subject's reaction time (i.e., PT = Reaction time (RT) - Gap). In that sense, it seems circular to explain the variability in performance using the variability in PT. I understand the Gap time and PT are correlated (hinted by the RT vs. Gap in Figure 1C), but Gap time seems to be more adequate to use as a proxy for the (imposed) time available for processing the visual cue, which drives the behavioral variance. Can the Gap time better explain some of the results? It would be important to describe how the results are different (or the same) if Gap time was used instead of PT and also discuss why the authors would prefer PT over Gap time (if that's the case).

      Thanks to Rev 1 for requesting clarification of this important point. As Rev 1 notes, PT is a derived variable, computed for each trial by subtracting the Gap interval from RT (PT=RT‒Gap). While it is true that Gap and PT are correlated (inversely), it is precisely because of the variance in RT that Gap alone is not an adequate (or certainly not the best) predictor of choice outcome. First, note that, if the Gap were fixed, there would still be variance in RT and in outcome, and any dependence of outcome on time would be explained necessarily by the PT. This is true at any Gap. So, clearly, the PT predicts outcome in a way that the Gap cannot. It is easy to see why: the Gap is the part of the RT interval during which no cue information is present, whereas the PT is the part of the same interval during which it is. Therefore, if one accepts the logical premise that the likelihood of a correct choice depends on the amount of time available to view the Cue before making that choice (i.e., the definition of PT), it follows that the relationship between PT and performance should be tighter than that between performance and Gap. And, indeed, this is the case. Mean accuracy declines systematically as a function of Gap, as expected, but its correlation with performance is much weaker than for PT.

      Rev 1’s request for a comparison of how accuracy varies as function of PT versus how it varies with Gap has appeared in earlier publications (Stanford et al., 2010; Shankar et al., 2011; Salinas et al., 2014) and we now include it here for the current dataset by adding plots of accuracy versus Gap as a new panel in Fig. 1 (Fig. 1c). That PT (not Gap) better predicts the likelihood of success on a given trial is evident in comparing the tachometric (Fig. 1b) and psychometric curves (Fig. 1c). The tachometric curves vary from chance to asymptotic performance and do so over a short range of PT (~75 ms) with well-defined inflection points identifying key transitions in performance (e.g., from guesses to increasingly informed choices). In contrast, the psychometric function plotting average accuracy versus Gap (Fig. 1c) varies much more gradually, a reduction in temporal definition attributable to the failure to account for the RT’s contribution to determining PT for each trial at a given Gap.

      (2) The authors provide a compelling account of how the urgent search task affords

      (i) more pronounced selection history effects on choice and

      (ii) dissociating the spatial and feature-based history effects by comparing their different effects on the tachometric curves. However, the authors didn't discuss the limits of their task design enough. It is a contrived task (one of the "laboratory tasks"), but the behavioral variability in this simple task is certainly remarkable. Yet, is there any conclusion we should avoid from this study? For instance, can we generalize the finding in more natural settings and say, the spatial selection history influences the choice under time pressure? I wonder whether the task is simple yet general enough to make such a conclusion.

      As Rev. 1 notes, the CO task is a laboratory task that produces large history effects. But importantly, we don't think urgency is causal or essential to the existence of such effects (this is now more explicitly stated in the first section of the Results); it is simply a powerful tool for revealing and characterizing them. As noted in the Discussion, our results are consistent with studies that, based on simpler, non-urgent tasks, demonstrated either reward-driven spatial biases or color priming effects. The CO task uses urgency to generate a psychometric function that time resolves perceptually informed from perceptually uninformed choices, and thereby provides the logical key to disambiguating the simultaneous contributions of perceptual and non-perceptual biases to performance. Such was essential to our demonstration that distinct biases act independently on the same saccade choices.

      In a natural setting, we would certainly expect the respective magnitudes of such non-volitional history-based biases to be highly context dependent, but it would be difficult, if not impossible, to discern their relative impact on natural behavior. That said, we think that the biases revealed by the CO task are exemplary of those that would manifest in natural behaviors depending on the real-world context to which such behaviors correspond. Here, it is important to emphasize that the spatial- and feature-based biases we observed were not strategic, on average neither helping nor hindering overall performance. Thus, in the real-world we might expect the expression of similar biases to be an important source of behavioral variance. These observations are now summarized in the penultimate paragraph of the Discussion.

      (3) Although the authors aimed to look at both inter- and intra-trial temporal dynamics, I'm not sure if the results reflect the true within-trial dynamics. I expected to learn more about how the spatial selection history bias develops as the Gap period progresses (as the authors mentioned in line 386, the spatial history bias must develop during the Gap interval). Does Figure 3 provide some hints in this within-trial temporal dynamics?

      Because it is based on the location of the saccadic choice(s) on previous trial(s), we might expect a signal of spatial bias to be present before and during the Gap period and perhaps even before a trial begins (i.e., intertrial interval). However, because behavioral bias is a probabilistic measure of saccade tendency, we have no way of knowing if such a signal is present during periods devoid of saccadic choices. Note that, for both monkey subjects, average RT exceeded the duration of the longest Gap employed (Fig. 1), and this means that relatively few saccades occurred prior to Cue onset. That said, it's clear in both Figs. 2, 3, and 6 that location bias is evident for saccades initiated at the transition between Gap and Cue intervals (PT=0). Anecdotally, we can report that that spatial bias is evident when we extend our analysis back further into the range of negative PTs (i.e., Gap interval), but the statistics are weak given the paucity of trials at that point. Nevertheless, this is consistent with a bias that exists from the beginning of the trial, as would be expected based on neurophysiological studies from Hikosaka's lab in a simpler but comparable spatial bias task.

      Although our data do not unequivocally identify the temporal origin of the spatial bias, they clearly show that the bias is present early (at short PTs) and diminishes rapidly as the perceptual information accrues (at long PTs). Thus, the PT-dependent temporal dynamics that are revealed clearly suggest that spatial and perceptual biases operate over different intra-trial time frames, one decreasing and the other increasing. As mentioned by Rev. 1, Fig. 3 emphasizes this dichotomy.

      (4) The monkeys show significant lapse rates (enough error trials for further analyses). Do the choices in the error trials reflect the history bias? For example, if errors are divided in terms of PTs, do the errors with short PT reflect more pronounced spatial history bias (choosing the previously selected location) compared to the errors with long PT?

      The short answer is “yes”. Errors generally show a PT-dependent influence of history bias. However, correct and error trials are the result of the same biased dynamics, and analyzing them separately post-hoc does not provide much additional insight about the history effects beyond that provided by the tachometric curves themselves.

      To see this, first consider the figure below (Author response image 1). Two tachometric curves conditioned on color history are shown (left). These are the two extreme curves plotted in Fig. 2a, which correspond to the 4S (i.e., 4 repeats of the current target color) and 4D (4 color repeats and then a switch) conditions. Each of these curves already shows the probability of making an error at each PT but, indeed, we can compare the proportions of correct and error trials at short PTs (guesses) and long PTs (informed choices). These are indicated by the bar graphs on the right. Now, the effect of a bias would be to create a difference in success rate between repetitions (4S, blue) and switches (4D, red) relative to the overall, unbiased expectation (indicated by dotted lines). For color-based history, there is no bias at short PT: the proportions of correct choices are almost exactly at the expected chance level (filled bars coincide with dotted line). In contrast, at long PTs, there is a differential effect, but it is due both to a proportion of correct trials that is higher than expected in the 4S case (filled blue bar above dotted line) and to a proportion of correct trials that is lower than expected in the 4D case (filled orange bar below dotted line). This is exactly as one would expect if the current choice was biased by target color history.

      Author response image 1.

      A similar analysis can be done for location history (Author response image 2, which shows the two extreme curves from Fig. 2e). In this case the bias is much stronger at short PTs, and the difference between repeats (4S, blue) and switches (4D, red) is largely explained by a proportion of correct choices that is much higher than expected by chance in the 4S condition (filled blue bar well above dotted line). This makes sense, because a rewarded location is likely to become the next guess, so if the target happens to appear again at that same location, the subsequent guess is more likely than chance to be correct. At longer PTs, the differential effect is smaller, as would be expected for more informed choices, but it is again driven by the 4S condition. Importantly, in the case of location the total number of S trials is much smaller than the total number of D trials (because a target-location repetition has a probability of 0.25 only), so it only makes sense to compare the proportions of correct (or error) trials, not the absolute numbers, between those conditions.

      Author response image 2.

      In summary, although it is possible to examine the separate dependencies of correct and error trials on history and PT, the distinction is not very useful. Only the frequency of errors relative to that of correct choices makes complete sense, not so much, say, the frequency of short PT errors relative to that of long PT errors.  

      Reviewer #2 (Public review):

      Summary:

      This is a clear and systematic study of trial history influences on the performance of monkeys in a target selection paradigm. The primary contribution of the paper is to add a twist in which the target information is revealed after, rather than before, the cue to make a foveating eye movement. This twist results in a kind of countermanding of an earlier "uninformed" saccade plan by a new one occurring right after the visual information is provided. As with countermanding tasks in general, time now plays a key factor in the success of this task, and it is time that allows the authors to quantitatively assess the parametric influences of things like previous target location, previous target identity, and previous correctness rate on choice performance. The results are logical and consistent with the prior literature, but the authors also highlight novelties in the interpretation of prior-trial effects that they argue are enabled by the use of their paradigm.

      Strengths:

      Careful analysis of a multitude of variables influencing behavior

      Weaknesses:

      Results appear largely confirmatory.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors provide comprehensive accounts of the urgent search task in multiple places in the manuscript. But the description can be simpler and more consistent throughout. I found it confusing when the authors compared their task with previous search tasks used by Bichot and Schall, McPeek et al. I believe the authors wanted to explain that it is not just the urgency but the fact that the target color being randomly interleaved also contributes to the pronounced history bias in their task. I appreciate their thorough comparison with previous studies but it can be distracting or lose focus. It might read better if this statement can be expanded in the Discussion, not in the Results (lines 366-376).

      We thank the reviewer for pointing this out. We agree that the paragraph in question was ambiguous and appeared to elaborate a Discussion point, which was not our intent. Indeed, as the reviewer noted, the main point was that the randomization of the target colors (and not urgency) is the critical aspect of the task that makes it surprisingly difficult for the monkeys. We have revised the paragraph to emphasize this conclusion and the two empirical results from our own data that support it. The agreement with prior studies, which is somewhat tangential, is now briefly mentioned at the end of the paragraph. It should now be clear that the text mainly describes current data that are relevant to the interpretation of the main results.

      (2) It's important to state that feature-based selection history bias is not merely due to the monkey's intrinsic bias to one color over the other (red vs green). The authors did a nice job controlling that, as mentioned in Methods (lines 194-196) and supplementary figure (Figure 1 - Figure Supplement 2). It would be helpful for readers to read in Results as well.

      Thank you for the suggestion. We now mention this in the second section of the Results.

      (3) D trial examples for the location history in Results can be confusing to readers (lines 407-409; left-left-right, up-up-left). The examples in Methods (lines 224-229; left-up-right, up-down-left) are better to convey the preceding (different) trials can be of any kind.

      Indeed. Both types of example are now mentioned in the Results.

      Reviewer #2 (Recommendations for the authors):

      I have only minor comments:

      (1) In the abstract, I'm not sure what "when combined" means in the last sentence. What is combined? Selection history and stimulus salience? If so, this is not very clear. Also, it might be nice to end the abstract on how the study addresses the three components of attention that the abstract started with in the first place (salience, task, and history). Otherwise, I spent multiple abstract reads (before even reading the rest of the paper) trying to see whether indeed the paper addresses the three components of attention that were so prominently described at the beginning of the abstract or not. And, I still could not convince myself of whether all three were addressed by the study or not (I then resorted to proceeding with a reading of the rest of the paper).

      Thanks for pointing this out. We have reworded the abstract to clarify that we are focusing on selection history, not salience or top-down attention.

      (2) Line 72: isn't stimulus location still a feature????

      Our nomenclature here is intended to be consistent with the commonly applied distinction between “spatial” and “feature” -based attention that underscores the distinct mechanistic underpinnings of “where” and “what”.

      (3) Lines 76-79: I'm very confused here. The part about "guesses can be strongly biased toward an arbitrary location early on". However, I expected the later part of the sentence to still stick to location and mention what the temporal dynamic is. Instead, it discusses perceptual bias, which I presume is the color thing. So, the net result is that I'm a bit confused about how *both* location and color behave in *both* early and late times.

      We have rewritten the end of this paragraph to clarify when and how location and feature biases manifest in behavior. It may be useful to note the following. The tachometric curve describes different types of choices distinguished by their timing, guesses at short PTs vs informed decisions at long PTs. However, this also corresponds to the degree to which perceptual information becomes available over time within a single trial. Namely, perceptual information is initially absent but arrives later on. The revised text now reflects this distinction, making the logic for the expected results clearer.

      (4) Last paragraph of the introduction (lines 80-82): it would be helpful to justify here why the psychophysics were done in monkeys in this study, instead of humans.

      We now allude to the reason these studies were done in monkeys but feel that more elaboration of this point is better left to Discussion. The Discussion now more explicitly states that the current data are closely related to neurophysiological studies of spatial attention and color priming in monkeys (beginning of 4th paragraph).

      - Line 389: this kind of formulation is much clearer to me than lines 76-79 mentioned above.

      As noted, the above-mentioned section has been revised.

      - I'm a bit confused by Figure 4 in the sense that some of the effect sizes are not too different from Figure 2, even when there are some intermediate inconsistent trials. I guess the problem is aggravated by the different axis ranges in Figures 2, and 4.

      All the 1S and 1D data points are the same in both figures, as they should, but the problem is that, otherwise, the two figures are just not comparable. Apples and oranges. To see this, note that the trends for the difference between S and D conditions should go in opposite directions as trials go further into the past, and indeed they do. In Figures 2c, f, the differences between 1S and 1D results are small, and those between 4S and 4D results are the largest because both S and D effects grow away from the average with more repetitions. In contrast, in Figure 4b-d, the differences between S and D shrink as the effect of a single trial becomes more distant (differences are largest between 1S and 1D results, smallest between 1S9x and 1D9x results). The only slightly ambiguous trend is that of Figure 2g, because the S data are more noisy. We have expanded the text surrounding Figure 4 to highlight the different expected trends for this analysis in contrast to that presented in Figure 2. This should clarify the qualitative difference between the two.

      - On a related note, it is odd that the summary figures (e.g. Figures. 2, 4, etc) are vertically aligned such that the dependent measure is on the x-axis rather than the y-axis. For example, looking at Figure 2, it would make much more sense if panels b-d and f-h were rotated by 90 deg, such that the vertical axis is indeed the low asymptote or high asymptote or RT. This would directly correlate with the same data in panels a and e in the same figure and would be much easier to follow. Then, later in the paper, Fig. 8 suddenly does the dependent measure on the y-axis, as I said. I think it can help to use similarly consistent plotting approaches across all (or most) analyses.

      We tried other formats but settled on the current one because we felt it made it (slightly) easier to compare the patterns across history conditions between any two of the 6 bar graphs in each figure (in Figs 2, 5, 6), in part because it prevents any confusion with the PT axes. As this does not make a substantial difference either way, we prefer to maintain the present arrangement. Additional labels are now included, which should make the figures a bit more friendly.

      - At the beginning of the paper, I was under the impression that this will really be a free viewing search task (e.g. Wolfe search arrays or old Nakayama search arrays), but then it became clear later that it was still an instructed task, with the only difference being that the target onset is now 4 targets. I think this distinction should be clarified very early on, in order to avoid confusion by the readers. The reason I say this is that with enforced fixation, there are other factors in this task that come into play, like the monkey's individual microsaccade rates etc, which can modulate performance since they also have a form of countermanding that is like the one imposed by the compelled saccade task. So, better alert the readers to the context of the task early on.

      Thanks. We have provided additional detail when introducing the task for the first time in the Introduction, along with a citation to an earlier publication in which the specific task is described. There should be no ambiguity now.

      Reviewing Editor Comments:

      Short Assessment:

      This important study makes compelling use of the monkey animal model to capture the long-time course over which trial history affects decision-making under time pressure, showing decisions are affected by the stimulus sequence extending back as many as four trials previously.

      Summary:

      Decision-making is variable, but how much of this variability can be accounted for by the immediate previous history is not well known. Using an "urgent" saccade, Oor et al manipulated how much time monkeys had to process evidence, and evaluated what they did when there was too little time to make an evidence-based decision. They report that the history affected performance as far back as 4 previous trials and that different aspects of the stimulus history (color and location) affected performance differently.

      Strengths:

      The key strengths of this paper are that the monkey paradigm permitted a study under highly controlled conditions with stable performance across sessions and enough trials to conduct the history analysis farther back in time than is possible with smaller data sets. While the fact that prior history affects decisions was previously known, this study provides a careful quantification of the effect -- which proves to be quite large - as well as an assessment of both location and feature histories in combination with each other. The manuscript is well-written and easy to follow.

      Weaknesses and recommendations for the authors:

      (1) The figures are lovely but could use some more text/design elements to clarify, and there is space to do so. e.g., in Figure 2, there could be titles to indicate that the top row involves the color history and the bottom row involves location history. The information is there, in the y labels of panels B and F, but it takes a while to see that.

      Done. Titles have been added to Figure 2 and several others.

      (2) Furthermore, the abbreviations 1D, 4S, etc are explained in the legend but it seems there is room to spell them out or include a graphic to indicate what they mean.

      The labels 1D, 4S, etc are difficult to spell out because each one represents multiple conditions; for instance, 2S may correspond to green-green or red-red target colors, and so on. Figure legends have been edited to more clearly indicate that S and D labels correspond to repeat and switch trials, respectively, and that the associated number indicates how far back the history goes.

      (3) The terms "low asymptote" and "high asymptote" could be indicated in a graphic of a tachymetric function, smoothing the transition to the rightmost panels. (Consider also alternative terms - perhaps "floor" and "ceiling" might be more readily understandable than asymptote to the student reader??).

      Thanks for the suggested terms, “floor” and “ceiling”, which we’ve adopted. They are indeed more natural. Figure 2a now indicates that floor and ceiling accuracies correspond to opposite ends of the PT axis.

      (4) The units for the asymptotes are not indicated - I assume these are "% correct" but that would be helpful to clarify.

      Yes. Units for floor and ceiling (and RT) are now indicated in all figures.

      (5) Figure 3 - "PT", and "1S-1D" could be spelled out, and the meaning of the two colored traces could be in the figure itself rather than only in the legend. Similar suggestions apply about labeling, abbreviations apply in subsequent figures.

      PT is now spelled out in all figures other than Figure 1, and labels for the two traces were added to Figure 3. Thanks for all the detailed suggestions.

    1. Author response:

      The following is the authors’ response to the previous reviews.’

      Public Reviews:

      Reviewer #1 (Public Review):

      For the colony analysis, it is unclear from the methods and main text whether the initial individual sorted colonies were split and subject to different conditions to support the claim of bi-potency. The finding that 40% of colonies displayed tenogenic differentiation, may instead suggest heterogeneity of the sorted progenitor population. The methods as currently described, suggest that two different plates were subject to different induction conditions. It is therefore difficult to assess the strength of the claim of bi-potency.

      Thanks for your valuable comment. We are sorry for the confusing illustration of colony assay. In fact, we first obtained CD29+/CD56+ myogenic progenitors by FACs. Then these freshly isolated cells were randomly seeded to 96-well plate with density of 1 cell/well. Subsequently, the single cell in each plate was cultured with growth medium to form colonies for ten days. Then myogenic induction was performed in three 96-well plates and tenogenic induction was performed in another three 96-well plates for subsequent analyses. We agree with your point that the sorted cell population could be heterogeneous myogenic progenitors. The result showed over 95% colonies successfully differentiated into myotubes, while 40% of colonies displayed tenogenic differentiation (Fig. 2g). Since the freshly obtained CD29+/CD56+ myogenic progenitors were randomly seeded for tenogenic induction or myogenic induction, the undifferentiated cells in each group were considered as the same sample. Furthermore, the optimal tenogenic differentiation condition for these cells was still waiting for investigation. Thus, we believe the colony analysis combined with the data in Figure 1 and Figure 2 could indicate the bi-potency for human CD29+/CD56+ myogenic progenitors.

      This group uses the well-established CD56+/CD29+ sorting strategy to isolate muscle progenitor cells, however recent work has identified transcriptional heterogeneity within these human satellite cells (ie Barruet et al, eLife 2020). Given that they identify a tenocyte population in their human muscle biopsy in Figure 1a, it is critical to understand the heterogeneity contained within the population of human progenitors captured by the authors' FACS strategy and whether tenocytes contained within the muscle biopsy are also CD56+/CD29+.

      Thanks for your constructive suggestion. We have included more samples to perform scRNA-seq and reanalyzed the data. The scRNA-seq data revealed that all the CD29+/CD56+ cells were myogenic progenitors, which occupied 19.3% of all the myogenic progenitors (Fig. 1e). However, there existed no tenocytes with CD29+/CD56+ (Fig. 1d), and tenocytes made up only a small percentage (0.06%) of all the mononuclear cells. Thus, human CD29+/CD56+ cells are myogenic progenitors, and tenocytes contained within the muscle biopsy are not CD56+/CD29+. In addition, both published research and our results indicated the heterogeneity of CD29+/CD56+ myogenic progenitors. Since the main purpose of current study was to investigate the tenogenic differentiation potential of CD29+/CD56+ myogenic progenitors, the heterogeneity in CD29+/CD56+ myogenic progenitors should be investigated in the further study.

      The bulk RNA sequencing data presented in Figure 3 to contrast the expression of progenitor cells under different differentiation conditions are not sufficiently convincing. In particular, it is unclear whether more than one sample was used for the RNAseq analyses shown in Figure 3. The volcano plots have many genes aligned on distinct curves suggesting that there are few replicates or low expression. There is also a concern that the sorted cells may contain tenocytes as tendon genes SCX, MKX, and THBS4 were among the genes upregulated in the myogenic differentiation conditions (shown in Figure 3b).

      Thanks for your comment. Each group consisted of three samples for RNAseq analyses. We are sorry there existed a minor analysis mistake in Fig. 3b and Fig. 3c, which have been reanalyzed in the revised version. There was no significantly difference of tendon related marker genes after myogenic differentiation (Fig. 3b), while these tenogenic genes were significantly up-regulated after tenogenic induction (Fig. 3c). As for contamination of tenocytes, scRNA-seq data showed there were no tenocytes with both CD29 and CD56 positive (please see response to Comment 2). And almost all the obtained cells highly expressed myogenic progenitors markers PAX7/MYOD1/MYF5 (Fig. 1f-g). Low expression levels of tendon markers were identified in these cells (Fig. 2a-c). Furthermore, although tendon genes slightly upregulated in myogenic differentiation conditions, these markers dramatically upregulated in tenogenic differentiation conditions (Fig. 2c). Thus, we believe the bulk RNA sequencing data could add the evidence of tenogenic differentiation ability of human CD29+/CD56+ myogenic progenitors.

      Reviewer #2 (Public Review):

      scRNAseq assay using total mononuclear cell population did not provide meaningful insight that enriched knowledge on CD56+/CD29+ cell population. CD56+/CD29+ cells information may have been lost due to the minority identity of these cells in the total skeletal muscle mononuclear population, especially given the total cell number used for scRNAseq was very low and no information on participant number and repeat sample number used for this assay. Using this data to claim a stem cell lineage relationship for MuSCs and tenocytes may not convincing, as seeing both cell types in the total muscle mononuclear population does not establish a lineage connection between them.

      Thanks for your constructive suggestion. We have included more samples to perform scRNA-seq and reanalyzed the data. Three samples with a total of 57,193 cells were included for analysis. As you can see in Fig. 1d and 1e, the joint expression analysis revealed that all the CD29+/CD56+ cells were myogenic progenitors, which occupied 19.3% of all the myogenic progenitors.  In addition, we agree with your comment that the pseudotime analysis could be a bit misleading as the nature of computational biology with pseudotime plots, so we deleted this assay.

      The TGF-b pathway assay uses a small molecular inhibitor of TGF-b to probe Smad2/3. The assay conclusion regarding Smad2/3 pathway responsible for tenocyte differentiation may be overinterpretation without Smad2/3 specific inhibitors being applied in the experiments.

      Thanks for your comment. We agree with your comment and we have revised it in the revision version (Figure 7, Line 306-326).

      Reviewer #3 (Public Review):

      This dual differentiation capability was not observed in mouse muscle stem cells.

      Thanks for your comment. We have explored the tenogenic differentiation potential of mouse MuSCs both in vivo and in vitro. However, low tenogenic differentiation ability was revealed (Figure 4), which might be due to species diversity. Maybe it is more demanding for humans to maintain the homeostasis of the locomotion system and the whole organism locomotion ability in much longer life span and bigger body size. Thus, the current study also indicated that anima studies may not clinically relevant when investigating human diseases.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      The methods section contained insufficient details for sample tissue for many methods, including the single cell analysis, RNA FISH, and for in vivo cardiotoxin treatment. ie. how were the samples subclustered for the monocle pseudotime analysis; how many cells were counted in the FISH shown in Fig 1e/f, does the n=5 refer to tissue sections or biological replicates?; for the double injury, what was the cardiotoxin dose?

      Thanks for your comment. Three samples and a total 57,193 cells were analyzed in single cell analysis (Line 464). We deleted RNA FISH assay data because it provided limited information to prove bipotential ability of human CD29+/CD56+ myogenic progenitors. In addition, since the pseudotime analysis could be a bit misleading as the nature of computational biology with pseudotime plots, we also deleted this assay. For the double injury, 15μl of 10μM cardiotoxin was used for lineage tracing (Line 533).

      Additionally, the RNA sequencing datasets are not currently publicly available under the accession numbers provided.

      The raw data of RNA sequencing has been uploaded in NCBI (accession number: PRJNA1178160, PRJNA1012476 and PRJNA1012828), and these data will be released immediately after publication.

      The poor resolution of 1d makes it impossible to read any of the gene names or interpret the expression profiles of their proposed trajectories.

      Since the pseudotime analysis could be a bit misleading as the nature of computational biology with pseudotime plots, we deleted this assay.

      What does the color key for 3a refer to? It is not indicated in the figure or legend.

      Thanks for your comment. The color key for 3a refer to “Scaled expression values”, which has been added in the revised version.

      scRNAseq of the sorted CD29/56+ population could help uncover possible cell heterogeneity within these muscle progenitors and which sub-populations of myogenic progenitor cells have tenogenic potential.

      Thanks for your valuable suggestion. We included more cells from three biological repetitions to perform scRNA-seq and found that CD29/CD56+ cells were absolutely from myogenic progenitors (Fig. 1d and 1e). We agree with you that additional scRNAseq will be helpful to clarify the possible cell heterogeneity within these muscle progenitors. Since the main scope of current study is to investigate the biopotential of CD29/CD56+ myogenic progenitors, analysis of scRNAseq of the sorted CD29/56+ population would be performed in the further study for further exploration.

      Typos: Line 459 sored cells... preparasion with Chromium Single Cell 3' Reagent Kits (10X genomics, cat# 1000121-1000157). Figure 4E - typo in the word tamoxifen.

      Thanks for your valuable suggestion. We are sorry for the typos and have revised these typos (Line 459 and Fig. 4e).

      Reviewer #2 (Recommendations For The Authors):

      (1) scRNAseq is performed in total mononuclear cells isolated from human skeletal muscle. The cell number (around 15000 cells) seems very low for this assay, given the CD56+/CD29+ cells are a minority population in this sequencing, the data does not seem to provide meaningful insight into the MuSC cell identities. No information on sample numbers and number of patient participants can be found in the paper.

      Thanks for your comment. We added more cells to reanalyze the data in the revised manuscript. Three samples with a total of 57,193 cells were analyzed (Line 464). The joint expression analysis revealed that all the CD29+/CD56+ cells were myogenic progenitors, which occupied 19.3% of all the myogenic progenitors (Fig. 1d and 1e). These scRNA-seq data combined with functional experiment confirmed the MuSC cell identity of CD29+/CD56+ cells from mononuclear cells.

      In this regard, the paragraph starts with "To confirm the single cell analysis results, we first isolated myogenic progenitor cells from human muscle biopsy using FACS as described previously" which is misleading as the seRNAseq is not the result of the sorted cells. Please reword this paragraph to clarify.

      The related paragraph has been reworded (Line 84-95).

      Similarly, the existence of myocytes and tenocytes in scRNAseq does not necessarily prove a stem cell and mature cell lineage relationship. Please edit the wording to avoid overinterpretation.

      Thanks for your reminding. Since the pseudotime analysis could be a bit misleading as the nature of computational biology with pseudotime plots, we deleted this assay.

      (2) The in vitro differentiation assays are well performed, which included bulk culture and clonal culture. The efficiencies of those two assays seem to have discrepancies which may need clarification. Again, no sample numbers and repeats have been informed.

      Since the tendon differentiation period for bulk culture was 12 days, those myotubes fused by CD29+/CD56+ myogenic progenitors with only myogenic differentiation potential will be no longer alive. Thus, the efficiency of bulk culture seemed higher than that in clonal culture. As stated in statistical analysis, at least three biological replicates and technical repeats were performed in each experimental group (Line 577).

      In these paragraphs, terminologies including MuSCs, myogenic progenitors, CD56+/CD29+, and Pax7+ are interchangeably used, which generates confusion while reading. It is probably best to consistently use the cell sorting markers markers to address this cell population, throughout the paper.

      Thanks for your constructive suggestion. The cell population was consistently named as CD29+/CD56+ myogenic progenitors throughout the paper.

      Information on the proliferation rate and expansion of the MuSCs would be useful but not provided.

      Thanks for your comment. The analysis of cell proliferation was added in Figure 1 (Fig. 1h).

      The murine cell differentiation assays are not as convincing as the human study. The assay regarding "mouse muscle CD29+/CD56+ cells were isolated for tenogenic induction. However, very few mouse muscle CD29+/CD56+ cells expressed myogenic progenitor cell marker Pax7, MyoD1 and Vcam1" does not add any value to the work as those markers are not mouse MuSC markers to start with.

      Thanks for your comment. The experiments concerning mouse muscle CD29+/CD56+ cells have been deleted to avoid misleading.

      The Pax7-cre-TdTomato assay was also not convincing, as a negative finding may not be the best proof of absence.

      Thanks for your comment. Pax7 positive cells could consistently express TdTomato for lineage tracing. In current study, large amount of tdTomato+ myofibers were observed after muscle injury (SFig. 2c-d), suggesting that the tracing system works well. However, less than 0.2% tendon cells originated from TdTomato+ MuSCs were observed even four months after tendon removal (Fig. 4f-g). When comparing in vivo data between murine MuSCs and human CD29+/CD56+ myogenic progenitors, we believe these data could indicate the poor tendon differentiation abilities of murine MuSCs.

      (5) TGFb as a pathway of smad2/3 mediated tenocyte differentiation assays were well done albeit not novel. Using TGFb universal inhibitor may not accurately state the pathways were due to SMAD2/3 inhibition either.

      We agree with your comment and the conclusion concerning SMAD2/3 has been deleted throughout the manuscript.

      The paper also needs thorough proofreading. Currently, typographic, grammatical, and logical sequences of writing do not lend the paper to easy reading.

      (1) Figure 1K and 1I have similar legends but presumably K is referring to MuSC and I is referring to differentiated cells.

      (2) Tenogenic and myogenic induction should be changed to tenogenic/myogenic differentiation as they are the cells at the end of differentiation.

      (3) Figure 6, it is not clear how the "human cells" are calculated in this assay.

      Thanks for your constructive comment. (1) The figure legends in Figure1 have been revised (Line 797-804).  (2) Tenogenic and myogenic induction have been changed to tenogenic/myogenic differentiation manuscript when they are referring to cells at the end of differentiation (Fig.1, Fig.2, Fig.3, Fig.4, Fig.7 and SFig.1). (3) In Figure 6, “human cells” is referring to those injured tendons with transplantation of human CD29+/CD56+ myogenic progenitors. To evaluate the function of human CD29+/CD56+ myogenic progenitors, PBS group was set as negative control and uninjured group was set as normal control.

      Reviewer #3 (Recommendations For The Authors):

      (1) The full extent of the differentiation potential of CD29+/CD56+ stem/progenitor cells has not been thoroughly evaluated. There can also exist heterotopic ossification in injured tendon sites. Thus, it remains unclear whether these cells are truly bipotent as the authors claim, or can they differentiate into chondrocytes and osteoblasts.

      Thanks for your comment. The current study focused on the tenogenic differentiation potential of CD29+/CD56+ myogenic progenitors, so the research priority was the bipotential ability of CD29+/CD56+ myogenic progenitors. We agree with you that chondrogenic and osteogenic ability of CD29+/CD56+ myogenic progenitors is also important and would investigate it in the further study.

      (2) In Figure 3, the GO analysis also shows increased enrichment of muscle-related terms including muscle contraction and filament. Please clarify it.

      The tenogenic differentiation efficiency of CD29+/CD56+ myogenic progenitors was about 40% in clonal assay. Some cells would myogenically differentiated under this tenogenic induction system. Thus, the GO analysis could also enrich muscle related terms including muscle contraction and filament.

      (3) The authors use TNC staining to evaluate cell transplantation. My concern is whether the TNC expression is specific to the tendon site, or do engrafted human cells also express TNC in other sites such as muscle?

      TNC is one of a well-known tendon-related markers. As you can see in Figure 6b and Figure 6c, although some human cells (labeled by Lamin A/C) were engrafted in muscle tissue area (labeled by MyHC), these engrafted human cells didn’t express TNC in muscle. In addition, we also used tendon related markers SCX and TNMD to confirm the tenogenic differentiation ability of engrafted human cells in vivo (SFig. 3a and 3b).

      (4) The authors demonstrate that CD29+/CD56+ human stem/progenitor cells could efficiently transplant and contribute to myofiber regeneration in vivo. However, why were only a few transplanted human cells differentiating into myofiber (labeled by MyHC) in the tenon injury model even with CTX injection?

      Thanks for your comment. Since skeletal muscle is able to regenerate with in situ muscle progenitor cells, regeneration of injured muscle by CTX injection was dependent on not only CD29+/CD56+ myogenic progenitors, but also native murine MuSCs. Thus, it is reasonable that there were only a few transplanted human cells differentiating into myofiber (labeled by MyHC) in the tenon injury model even with CTX injection.

      (5) Figure 7 shows the crucial role of TGFB/SMAD signaling for the tenogenesis of human CD29+/CD56+ stem/progenitor cells. However, can TGFB/SMAD signaling activation facilitate the tenogenic differentiation of mouse MuSCs? This point is crucial to clarify the difference of MuSCs between different species.

      Thanks for your valuable suggestion. We did a series of pilot assays to investigate the effect of TGFβ signaling activation to facilitate tenogenic differentiation of mouse MuSCs (Author response image 1). As you can see, activating TGFβ by SRI-011381 could slightly increase the expression of tenogenic markers of murine MuSCs. It’s an interesting topic and we would investigate it in the further study.

      Author response image 1.

      TGFβ signaling pathway slightly elevated tenogenic differentiation ability of murine MuSCs (a) Immunofluorescence staining of tendon marker Scx and Tnc in murine MuSCs induced for tenogenic differentiation with or without TGFβ signaling pathway agonist SRI-011381, respectively. Scale bars, 50 µm. (b) Quantification of Scx and Tnc fluorescent intensity in murine MuSCs undergone tenogenic induction with or without TGFβ signaling pathway agonist SRI-011381, respectively. Error bars indicated standard deviation (n=5). (c) Protein levels of Tnc and Scx. Murine MuSCs were induced towards tenogenic differentiation with or without TGFβ signaling pathway agonist SRI-011381. Total protein was extracted from cells before and after differentiation and subjected for Tnc and Scx immunoblotting. GAPDH was served as loading control.

      (6) Please quantify the WB blot data throughout the manuscript.

      Thanks for your comment. The WB blot data has been quantified throughout the manuscript.

      (7) The data of RT-qPCR should indicate what the fold changes in relative to throughout the manuscript.

      Thanks for your comment. The sentence “GAPDH was served as reference gene” was added in the figure legends to illustrate RT-qPCR results.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study provides a thorough analysis of Nup107's role in Drosophila metamorphosis, demonstrating that its depletion leads to developmental arrest at the third larval instar stage due to disruptions in ecdysone biosynthesis and EcR signaling. Importantly, the authors establish a novel connection between Nup107 and Torso receptor expression, linking it to the hormonal cascade regulating pupariation.

      However, some contradictory results weaken the conclusions of the study. The authors claim that Nup107 is involved in the translocation of EcR from the cytoplasm to the nucleus. However, the evidence provided in the paper suggests it more likely regulates EcR expression positively, as EcR is undetectable in Nup107-depleted animals, even below background levels.

      We appreciate the concern raised in this public review. However, we must clarify that we do not claim that Nup107 directly regulates the translocation of EcR from the cytoplasm to nucleus, rather Nup107 regulates Ecdysone hormone (20E) synthesis which in turn affects EcR translocation. In the manuscript, we posited this hypothesis if Nup107 will regulate EcR nuclear translocation (9th line of 2nd paragraph on page 6). We have spelled this out more clearly as the 3rd subsection title of the Results section, and in the discussion (8th line of 2nd paragraph on page 11).

      20E acts through the EcR to induce the transcription of EcR responsive genes including the EcR. This creates a positive autoregulatory loop that enhances the EcR level through ecdysone signaling (1). Since Nup107 depletion leads to a reduction in ecdysone levels, it disrupts the transcription autoregulatory EcR expression loop. This can contribute to the reduced EcR levels seen in Nup107-depleted animals. 

      Additionally, the link between Nup107 and Torso is not fully substantiated. While overexpression of Torso appears to rescue the lack of 20E production in the prothoracic gland, the distinct phenotypes of Torso and Nup107 depletion-developmental delay in the former versus complete larval arrest in the latter complicate understanding of Nup107's precise role.

      We understand that there are differences in the developmental delay when Tosro and Nup107 depletion is analyzed. However, the two molecules being compared here are very different, and variability in their depletion could contribute observed phenotypic differences (2). Even if there is no variability of depletion of Torso and Nup107­­­, we believe that Nup107, being more widely expressed, and involved in the regulation of various cellular processes, induces stronger defects.

      Further, we think that RNAi-mediated depletion of Nup107 in prothoracic glands (PG) causes significant reduction in the PG size, which may exert a pronounced defect in 20E biosynthesis through the Halloween genes, inducing a stronger developmental arrest.

      To clarify these discrepancies, further investigation into whether Nup107 interacts with other critical signaling pathways related to the regulation of ecdysone biosynthesis, such as EGFR or TGF-β, would be beneficial and could strengthen the findings.

      In summary, although the study presents some intriguing observations, several conclusions are not well-supported by the experimental data.

      We agree with the reviewer’s suggestion. As noted in the literature, five RTKs-torso, InR, EGFR, Alk, and Pvr-stimulate the PI3K/Akt pathway, which plays a crucial role in the PG functioning and controlling pupariation and body size (3). We have checked the torso and EGFR signaling. We rescued Nup107 defects with the torso overexpression, however, constitutively active EGFR (BL-59843) did not rescue the phenotype (data was not shown). Nonetheless, we plan to examine the EGFR pathway activation by measuring the pERK levels in Nup107-depleted PGs.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Kawadkar et al investigates the role of Nup107 in developmental progression via the regulation of ecdysone signaling. The authors identify an interesting phenotype of Nup107 whole-body RNAi depletion in Drosophila development - developmental arrest at the late larval stage. Nup107-depleted larvae exhibit mis-localization of the Ecdysone receptor (EcR) from the nucleus to the cytoplasm and reduced expression of EcR target genes in salivary glands, indicative of compromised ecdysone signaling. This mis-localization of EcR in salivary glands was phenocopied when Nup107 was depleted only in the prothoracic gland (PG), suggesting that it is not nuclear transport of EcR but the presence of ecdysone (normally secreted from PG) that is affected. Consistently, whole-body levels of ecdysone were shown to be reduced in Nup107 KD, particularly at the late third instar stage when a spike in ecdysone normally occurs. Importantly, the authors could rescue the developmental arrest and EcR mislocalization phenotypes of Nup107 KD by adding exogenous ecdysone, supporting the notion that Nup107 depletion disrupts biosynthesis of ecdysone, which arrests normal development. Additionally, they found that rescue of the Nup107 KD phenotype can also be achieved by over-expression of the receptor tyrosine kinase torso, which is thought to be the upstream regulator of ecdysone synthesis in the PG. Transcript levels of the torso are also shown to be downregulated in the Nup107KD, as are transcript levels of multiple ecdysone biosynthesis genes. Together, these experiments reveal a new role of Nup107 or nuclear pore levels in hormone-driven developmental progression, likely via regulation of levels of torso and torso-stimulated ecdysone biosynthesis.

      Strengths:

      The developmental phenotypes of an NPC component presented in the manuscript are striking and novel, and the data appears to be of high quality. The rescue experiments are particularly significant, providing strong evidence that Nup107 functions upstream of torso and ecdysone levels in the regulation of developmental timing and progression.

      Weaknesses:

      The underlying mechanism is however not clear, and any insight into how Nup107 may regulate these pathways would greatly strengthen the manuscript. Some suggestions to address this are detailed below.

      Major questions:

      (1) Determining how specific this phenotype is to Nup107 vs. to reduced NPC levels overall would give some mechanistic insight. Does knocking down other components of the Nup107 subcomplex (the Y-complex) lead to similar phenotypes? Given the published gene regulatory function of Nup107, do other gene regulatory Nups such as Nup98 or Nup153 produce these phenotypes?

      We thank this public review for raising this concern. Working with a Nup-complex like the Nup107 complex, this concern is anticipated but difficult to address as many Nups function beyond their complex identity. Our observations with all other members of the Nup107-complex, including dELYS, suggest that except Nup107, none of the other tested Nup107-complex members could induce larval developmental arrest.

      In this study, we primarily focused on the Nup107 complex (outer ring complex) of the NPC. However, previous studies have reported that Nup98 and Nup153 interact with chromatin, with these investigations conducted in Drosophila S2 cells (4, 5, 6). We have now examined other nucleoporins outside of this complex, such as Nup153.

      We ubiquitously depleted Nup153 using the Actin5C-Gal4 driver and assessed the pupariation profile of the knockdown larvae in comparison to control larvae. In contrast to the Nup107 knockdown, when Nup153 is depleted to less than 50% levels, no impact on pupariation was observed (Auhtor response image 1)

      Author response image 1.

      Nup153 depletion does not affect the Drosophila metamorphosis. Actin5C-Gal4 is used as a ubiquitous driver. (A) Comparison of pupariation profiles of control and Nup153 knockdown organisms. (B) Quantification of Nup153 knockdown efficiency. Data are represented from at least three independent experiments. Statistical significance was derived from the Student’s t-test. Error bars represents SEM. ***p = <0.001.

      (2) In a related issue, does this level of Nup107 KD produce lower NPC levels? It is expected to, but actual quantification of nuclear pores in Nup107-depleted tissues should be added. These and the above experiments would help address a key mechanistic question - is this phenotype the result of lower numbers of nuclear pores or specifically of Nup107?

      We agree with the concern raised here, and to address the concern raised here, we stained the control and Nup107 depleted salivary glands with mAb414 antibody (exclusively FG-repeat Nup recognizing antibody). While Nup107 intensities are significantly reduced at the nuclear envelope in Nup107 depleted salivary glands, the mAb414 staining seems unperturbed (Author response image 2).

      Author response image 2.

      Nup107 depletion does not perturb overall NPC composition. Comparison of salivary gland nucleus upon control and Nup107 knockdown. The Nup107 is shown in green and mAb414, staining for other FG-repeat containing nucleoporins is shown in red. Scale bars, 5µm.

      (3) Additional experiments on how Nup107 regulates the torso would provide further insight. Does Nup107 regulate transcription of the torso or perhaps its mRNA export? Looking at nascent levels of the torso transcript and the localization of its mRNA can help answer this question. Or alternatively, does Nup107 physically bind the torso?

      While the concern regarding torso transcript level is genuine, we have already reported in the manuscript that Nup107 directly regulates torso expression. When Nup107 is depleted, torso levels go down, which in turn controls ecdysone production and subsequent EcR signaling (Figure 6B of the manuscript).

      However, the exact nature of Nup107 regulation on torso expression is still unclear. Since the Nup107 is known to interact with chromatin (7), it may affect torso transcription. The possibility of a stable and physiologically relevant interaction between Nup107 and the torso in a cellular context is unlikely largely due to their distinct subcellular localizations. If we investigate this further, it will require a significant amount of time for having reagents and experimentation, and currently stands beyond the scope of this manuscript.

      (4) The depletion level of Nup107 RNAi specifically in the salivary gland vs. the prothoracic gland should be compared by RT-qPCR or western blotting.

      Although we know that the Nup107 protein signal is reduced in SG upon knockdown (Figure 3B), we have not compared the Nup107 transcript level in these two tissues (SG and PG) upon RNAi. As suggested here, we evaluated the knockdown efficiency of Nup107 using the salivary gland-specific driver AB1-Gal4 and the prothoracic gland-specific driver Phm-Gal4. Our results indicate a significant reduction in Nup107 transcript levels upon Nup107 RNAi in both SG and PG compared to their respective controls (Author response image 3).

      Author response image 3.

      Nup107 levels are significantly reduced upon Nup107<sup>KK</sup> RNAi. Quantification of Nup107 transcript levels from control and Nup107 depleted larvae [tissue specific depletion using AB1-Gal4 (A) and Phm-Gal4 (B)]. Data are represented from at least three independent experiments. Statistical significance was derived from the Student’s t-test. Error bars represent SEM. **p = <0.004

      (5) The UAS-torso rescue experiment should also include the control of an additional UAS construct - so Nup107; UAS-control vs Nup107; UAS-torso should be compared in the context of rescue to make sure the Gal4 driver is functioning at similar levels in the rescue experiment.

      This is a very valid point, and we took this into account while planning the experiment. In such cases, often the GAL4 dilution can be critical. We have demonstrated in Figure S7, that GAL4 dilution is not blurring our observations. We used the Nup107<sup>KK</sup>; UAS-GFP as control alongside the Nup107<sup>KK</sup>; UAS-torso. We conclude that the presence of GFP signals in prothoracic glands and their reduced size indicates genes downstream to both UAS sequences are transcribed, and GAL4 dilution does not play a role here.

      Minor:

      (6) Figures and figure legends can stand to be more explicit and detailed, respectively.

      We have revisited all figures and their corresponding legends to ensure appropriate and explicit details are provided.

      Reviewer #3 (Public review):

      Summary:

      In this study by Kawadkar et al, the authors investigate the developmental role of Nup107, a nucleoporin, in regulating the larval-to-pupal transition in Drosophila through RNAi knockdown and CRISPR-Cas9-mediated gene editing. They demonstrate that Nup107, an essential component of the nuclear pore complex (NPC), is crucial for regulating ecdysone signaling during developmental transitions. The authors show that the depletion of Nup107 disrupts these processes, offering valuable insights into its role in development.

      Specifically, they find that:

      (1) Nup107 depletion impairs pupariation during the larval-to-pupal transition.

      (2) RNAi knockdown of Nup107 results in defects in EcR nuclear translocation, a key regulator of ecdysone signaling.

      (3) Exogenous 20-hydroxyecdysone (20E) rescues pupariation blocks, but rescued pupae fail to close.

      (4) Nup107 RNAi-induced defects can be rescued by activation of the MAP kinase pathway.

      Strengths:

      The manuscript provides strong evidence that Nup107, a component of the nuclear pore complex (NPC), plays a crucial role in regulating the larval-to-pupal transition in Drosophila, particularly in ecdysone signaling.

      The authors employ a combination of RNAi knockdown, CRISPR-Cas9 gene editing, and rescue experiments, offering a comprehensive approach to studying Nup107's developmental function.

      The study effectively connects Nup107 to ecdysone signaling, a key regulator of developmental transitions, offering novel insights into the molecular mechanisms controlling metamorphosis.

      The use of exogenous 20-hydroxyecdysone (20E) and activation of the MAP kinase pathway provides a strong mechanistic perspective, suggesting that Nup107 may influence EcR signaling and ecdysone biosynthesis.

      Weaknesses:

      The authors do not sufficiently address the potential off-target effects of RNAi, which could impact the validity of their findings. Alternative approaches, such as heterozygous or clonal studies, could help confirm the specificity of the observed phenotypes.

      This is a very valid point raised, and we are aware of the consequences of the off-target effects of RNAi. To assert the effects of authentic RNAi and reduce the off-target effects, we have used two RNAi lines (Nup107<sup>GD</sup> and Nup107<sup>KK</sup>) against Nup107. Both RNAi induced comparable levels of Nup107 reduction, and using these lines, ubiquitous and PG specific knockdown produced similar phenotypes. Although the Nup107<sup>GD</sup> line exhibited a relatively stronger knockdown compared to the Nup107<sup>KK</sup> line, we preferentially used the Nup107<sup>KK</sup> line because the Nup107<sup>GD</sup> line is based on the P-element insertion, and the exact landing site is unknown. Furthermore, there is an off-target predicted for the Nup107<sup>GD</sup> line, where a 19bp sequence aligns with the bifocal (bif) sequence. The bif-encoded protein is involved in axon guidance and regulation of axon extension. However, the Nup107<sup>KK</sup> line does not have a predicted off-target molecule, and we know its precise landing site on the second chromosome. Thus, the Nup107<sup>KK</sup> line was ultimately used in experimentation for its clearer and more reliable genetic background.

      We are also investigating Nup107 knockdown in the prothoracic gland, which exhibits polyteny. Additionally, the number of cells in the prothoracic gland is quite limited, approximately 50-60 cells (8). Given this, there is a possibility that a clonal study may not yield the phenotype.

      NPC Complex Specificity: While the authors focus on Nup107, it remains unclear whether the observed defects are specific to this nucleoporin or if other NPC components also contribute to similar defects. Demonstrating similar results with other NPC components would strengthen their claims.

      We thank this public review for raising this concern. Working with a Nup-complex like the Nup107 complex, this concern is anticipated but difficult to address as many Nups function beyond their complex identity. Our observations with all other members of the Nup107-complex, including dELYS, suggest that except Nup107, none of the other Nup107-complex members could induce larval developmental arrest. Since the study is primarily focused on the Nup107 complex (outer ring complex) of the NPC, we have not examined many more nucleoporins outside of this complex. But our observations with Nup153 knockdown, a nuclear basket nucleoporin, is comparable to control, with no delay in development (Author response image 1)

      Although the authors show that Nup107 depletion disrupts EcR signaling, the precise molecular mechanism by which Nup107 influences this process is not fully explored. Further investigation into how Nup107 regulates EcR nuclear translocation or ecdysone biosynthesis would improve the clarity of the findings.

      We appreciate the concern raised. Through our observation, we have proposed the upstream effect of Nup107 on the PTTH-torso-20E-EcR axis regulating developmental transitions. We know that Nup107 regulates torso levels, but we do not know if Nup107 directly interacts with torso. We would like to address whether Nup107 exerts control on PTTH levels also.

      However, we must emphasize that Nup107 does not directly regulate the translocation of EcR. On the contrary, we have demonstrated that when Nup107 is depleted only in the salivary gland, EcR translocates into the nucleus. Thus we conclude that the EcR translocation is 20E dependent and Nup107 independent. Further, we have argued that Nup107 regulates the expression of Halloween genes required for ecdysone biosynthesis. We are interested in identifying if Nup107 associates directly or through some protein to chromatin to bring about the changes in gene expression required for normal development.

      There are some typographical errors and overly strong phrases, such as "unequivocally demonstrate," which could be softened. Additionally, the presentation of redundant data in different tissues could be streamlined to enhance clarity and flow.

      Response: We thank the reviewer for this observation. We have put our best efforts to remove all typographical errors and have now made more reasonable statements based on our conclusions.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      The manuscript presents compelling evidence that Nup107 plays a role in regulating ecdysone production. However, significant concerns remain regarding the effects on EcR localization and expression, as well as the claimed link between PTTH/Torso signaling and Nup107's function, as the evidence provided is not conclusive.

      The hypothesis that Nup107 mediates EcR translocation from the cytoplasm to the nucleus appears misinterpreted by the authors. Based on the presented images, particularly for the prothoracic gland (PG) Figure 3C, Nup107 depletion seems to impact EcR protein levels rather than its localization. This conclusion is supported by data showing that EcR transcripts are autonomously downregulated in the absence of Nup107. Furthermore, the restoration of nuclear EcR levels upon exogenous 20E supplementation suggests that (1) Nup107 is dispensable for EcR activation and function, and (2) its primary role lies in regulating ecdysone production.

      We appreciate the concern raised by reviewer. However, we must clarify that we do not claim that Nup107 directly regulates the translocation of EcR from the cytoplasm, rather Nup107 regulates Ecdysone hormone (20E) synthesis which in turn affects EcR translocation. In the manuscript, we posited this hypothesis if Nup107 will regulate EcR nuclear translocation (9th line of 2nd paragraph on page 6). We have spelled this out more clearly as the 3rd subsection title of the Results section, and in the discussion (8th line of 2nd paragraph on page 11).

      20E acts through the EcR to induce the transcription of EcR responsive genes including the EcR. This creates a positive autoregulatory loop that enhances the EcR level through ecdysone signaling (1). Since Nup107 depletion leads to a reduction in ecdysone levels, it disrupts the transcription autoregulatory EcR expression loop. This can contribute to the reduced EcR levels seen in Nup107-depleted animals.

      Given that nucleoporins are known to influence mRNA transport-for instance, Nup107 has been shown to control Scn5a mRNA transport (Guan et al., 2019)-the observed effects on Halloween gene and EcR expression may stem from disruptions in mRNA transport to the cytoplasm. The downregulation of Shade further supports this hypothesis, as restricted ecdysone biosynthesis typically induces Shade upregulation in peripheral tissues. Quantifying potential mRNA accumulation in the nuclei of PG cells in Nup107-depleted animals would clarify this.

      The reviewer raised a valid point, and we fully agree with the concern that Nup107 has been shown to control Scn5a mRNA transport (Guan et al., 2019). The observed effects on Halloween gene and EcR expression could indeed stem from disruptions in efficient mRNA export to the cytoplasm. However, if Nup107 were regulating the mRNA export of Halloween genes and EcR, we should not expect a rescue of the Nup107 developmental delay phenotype with torso overexpression. But, by overexpressing the torso in the Nup107 depletion background, we are activating the torso pathway dependent Halloween gene expression, and rescuing the developmental delay phenotype of Nup107 depletion.

      With the current data, it is difficult to conclusively claim a role for Nup107 in EcR translocation or expression. Additional experiments, such as EcR overexpression in Nup107-depleted animals or Nup107 overexpression, would help determine its precise role.

      We appreciate the concern raised by reviewer. We did attempt to rescue the Nup107 depletion phenotype by overexpressing EcR (BL-6868) in the Nup107-RNAi background. However, we were unable to rescue the Nup107 depletion dependent developmental delay phenotype with this approach. This further suggests that the phenotype is not merely due to low level of EcR, but it is due to low availability of ecdysone hormone and EcR signaling.

      The second major issue is the proposed link between Nup107 and PTTH/Torso signaling. The authors suggest that Nup107 regulates ecdysone production through Torso expression based on rescue experiments. However, this is inconsistent with the distinct phenotypes observed when Nup107 or Torso signaling is disrupted. While PTTH/Torso signaling causes only a modest developmental delay (12 hours to 2 days, depending on the mutant), Nup107 depletion results in a complete developmental arrest at the larval stage. This discrepancy raises doubts about the assertion that Torso overexpression alone rescues such a severe phenotype. One possibility is that PTTH levels are upregulated in Nup107-depleted animals, leading to overactivation of the pathway when Torso is overexpressed. Quantifying PTTH levels in Nup107-depleted animals could address this.

      The reviewer raised a valid point, and we fully acknowledge this concern. While we do not completely agree with the idea of PTTH upregulation in Nup107 depleted larvae, as suggested here, we believe that quantifying PTTH levels upon Nup107 depletion can provide a useful insight. To address it, we quantified PTTH levels in Nup107-depleted larvae and found no significant change in PTTH expression compared to controls (Author response image 4).

      Author response image 4.

      Nup107 knockdown does not affect the PTTH level. Quantitation of PTTH transcript levels from control and Nup107 depleted larvae (Prothoracic specific depletion Phm-Gal4). Data are represented from at least three independent experiments. Statistical significance was derived from the Student's t-test. ns is non-significant.

      Another possibility is that the stock used for Torso overexpression, which includes a trk mutant, may introduce genetic interactions that overactivate the pathway. Using a clean UAS-Torso stock would resolve this issue.

      We appreciate the reviewer’s observation regarding the use of the Torso overexpression line (BL-92604), which carries the trk null allele on the second chromosome. The cleaved form of the trk serves as ligand for the troso receptor. Since it may serve as ligand for the torso, I am not sure how trk null allele bearing line when used along for torso overexpression studies will overactivate the pathway. 

      We realized this concern and the fly line used in this study and reported in the manuscript was generated through the following genetic strategy using the BL-92604 line.  First, a double balancer stock (Sco/CyO; MKRS/TM6.Tb) was used to generate the Sco/CyO; UAS-torso/ UAS-torso genotype. This recombinant line was subsequently combined with the Nup107<sup>KK</sup> line. Through the use of the double balancer strategy, we effectively replaced Nup107 RNAi genotype on the second chromosome, thereby ensuring that our final experimental setup is free from trk mutant contamination, if at all.

      Moreover, the rescue of Nup107 depletion phenotypes by RasV12 overexpression suggests that multiple RTKs, not just Torso, are affected. EGFR signaling, the primary regulator of ecdysone biosynthesis in the PG during the last larval stage, is notably absent from the authors' analysis. EGFR inactivation is known to arrest development, and previous studies indicate that Nup107 can reduce EGFR pathway activity (Kim et al, 2010). The authors should analyze EGFR pathway activity in the absence of Nup107. Overexpressing EGF ligands like Vein or Spitz in the PG (rather than the receptor) in a Nup107-depleted background would provide more relevant insights.

      The RasGTPase is one of the common effector molecules downstream of an activated receptor kinase. Rescue with a constitutively activated form of RasGTPase (RasV12) suggests one of the routes which is activated downstream of the torso receptor. It does not directly suggest all different RTKs are affected and are involved. Our idea of performing a rescue experiment was to see if the pathway activated downstream of the torso involves RasGTPase. 

      As noted in the literature, five RTKs—torso, InR, EGFR, Alk, and Pvr—stimulate the PI3K/Akt pathway, which plays a crucial role in the PG for controlling pupariation and body size (3). Although EGFR signaling is important, PTTH/Torso signaling is considered the primary mediator of metamorphic timing. In response to the suggestion to analyze EGFR pathway activity in the absence of Nup107, we attempted to rescue the phenotype by overexpressing constitutively active EGFR (BL-59843) in the Nup107-depleted background (data was not shown). We used constitutively active EGFR to bypass the availability of its ligands (vein and spitz). Unfortunately, we were unable to rescue the phenotype with this approach, which further suggests that EGFR is not the targeted RTK pathway in this context. By rescuing with torso, we found that Nup107 regulates torso-mediated Ras/Erk signaling to control metamorphosis.

      Additional issues require clarification:

      (1) RNAi Efficiency: In Figure 1C, the Nup107GD line shows a stronger knockdown effect than Nup107KK, yet most experiments were conducted with the weaker line. This might explain the residual Nup107 protein observed in Figure 2. Could the authors justify this choice?

      This is a very valid point raised, and we are aware of the consequences of the off-target effects of RNAi. To assert the effects of authentic RNAi and reduce the off-target effects, we have used two RNAi lines (Nup107<sup>GD</sup> and Nup107<sup>KK</sup>) against Nup107. Both RNAi induced comparable levels of Nup107 reduction, and using these lines, ubiquitous and PG specific knockdown produced similar phenotypes. Although the Nup107<sup>GD</sup> line exhibited a relatively stronger knockdown compared to the Nup107<sup>KK</sup> line, we preferentially used the Nup107<sup>KK</sup> line because the Nup107<sup>GD</sup> line is based on the P-element insertion, and the exact landing site is unknown. Furthermore, there is an off-target predicted for the Nup107<sup>GD</sup> line, where a 19bp sequence aligns with the bifocal (bif) sequence. The bif-encoded protein is involved in axon guidance and regulation of axon extension. However, the Nup107<sup>KK</sup> line does not have a predicted off-target molecule, and we know its precise landing site on the second chromosome. Thus, the Nup107<sup>KK</sup> line was ultimately used in experimentation for its clearer and more reliable genetic background.

      (2) Control Comparisons: In Figure 3, the effects of Nup107 depletion on EcR expression in salivary glands (SG) and PG are shown, but only SG controls are provided. Including PG controls would enable proper comparisons. These controls should also be added to Figures 5, 6, and S5.

      As suggested by the reviewer, we have checked the EcR localization in prothoracic gland (Author response image 5), also. As shown in figure R5, when PGs isolated from control, Nup107-RNAi and torso overexpression in Nup107 background were stained for EcR, the observations made were indistinguishable from those made in SGs of the indicated genetic combinations. This indicated that Nup107 regulates EcR signaling by regulating the 20E biosynthesis.

      Author response image 5.

      Prothoracic gland’s specific torso expression rescues EcR nuclear translocation defects. Immunofluorescence-based detection of nucleocytoplasmic distribution of EcR (EcR antibody, red) in control, prothoracic gland specific Nup107 knockdown (Phm-Gal4>Nup107<sup>KK</sup>) and torso overexpressing PG-specific Nup107 knockdown (Phm-Gal4>Nup107<sup>KK</sup>; UAS-torso) third instar larval Prothoracic gland nuclei. DNA is stained with DAPI. Scale bars, 20 μm.

      (3) Clarify the function of Torso in the text: The authors must revise their description of Torso signaling as the primary regulator of ecdysone production in both the results and discussion sections. Specifically, in the results section, the claim that Torso depletion induces developmental arrest is inaccurate. Instead, available evidence, including Rewitz et al. 2009, demonstrates that Torso depletion causes a delay of approximately five days rather than a complete developmental arrest. This discrepancy should be corrected to avoid overstating the role of Torso signaling in ecdysone regulation and to align the manuscript with established findings.

      We agree with the reviewer. We have incorporated the suggestion at the relevant place in the main manuscript.

      Reviewer #3 (Recommendations for the authors):

      These findings suggest that Nup107 is involved in regulating ecdysone signaling during developmental transitions, with depletion of Nup107 disrupting hormone-regulated processes. Moreover, the rescue experiments hint that Nup107 might directly influence EcR signaling and ecdysone biosynthesis, though the precise molecular mechanism remains unclear.

      Overall, the manuscript presents compelling data supporting Nup107's role in regulating developmental transitions. However, I have a few comments for consideration:

      Major Comments:

      RNAi Specificity: While RNAi is a powerful tool, the authors do not sufficiently address potential off-target effects, which could undermine the conclusions. Although a mutant Nup107 is described, it is lethal-are heterozygous or clonal studies possible to validate the findings more robustly?

      This is a very valid point raised, and we are aware of the consequences of the off-target effects of RNAi. To assert the effects of authentic RNAi and reduce the off-target effects, we have used two RNAi lines (Nup107<sup>GD</sup> and Nup107<sup>KK</sup>) against Nup107. Both RNAi induced comparable levels of Nup107 reduction, and using these lines, ubiquitous and PG specific knockdown produced similar phenotypes. Although the Nup107<sup>GD</sup> line exhibited a relatively stronger knockdown compared to the Nup107<sup>KK</sup> line, we preferentially used the Nup107<sup>KK</sup> line because the Nup107<sup>GD</sup> line is based on the P-element insertion, and the exact landing site is unknown. Furthermore, there is an off-target predicted for the Nup107<sup>GD</sup> line, where a 19bp sequence aligns with the bifocal (bif) sequence. The bif-encoded protein is involved in axon guidance and regulation of axon extension. However, the Nup107<sup>KK</sup> line does not have a predicted off-target molecule, and we know its precise landing site on the second chromosome. Thus, the Nup107<sup>KK</sup> line was ultimately used in experimentation for its clearer and more reliable genetic background.

      Following the suggestion from the reviewer, we considered conducting heterozygous and clonal analyses using the Nup107 mutant. We have carried out Nup107 knockdown studies in the prothoracic gland, which has a limited number of cells (50-60 cells) and is known to exhibit polyteny (8). Keeping these aspects of the Prothoracic gland in mind, the possibility that a clonal study will yield the phenotype is scarce. However, we will consider moving forward with this approach also.

      (2) NPC Complex Specificity: It remains unclear whether the observed defects are specific to Nup107 or if other NPC components also cause similar defects. If the authors are unable to use Nup107 mutants, they could demonstrate similar defects with other critical NPC members to bolster their claim.

      We thank this public review for raising this concern. Working with a Nup-complex like the Nup107 complex, this concern is anticipated but difficult to address as many Nups function beyond their complex identity. Our analysis of Nup153 depleted organisms indicates no developmental delay/defect. We have also assessed effects of knockdown of all other members of the Nup107-complex, including dELYS, but except Nup107 no other member of the Nup107-complex could induce developmental arrest in the third instar stage causing lack of pupariation. However, the null mutant of Nup133, the direct interactor of Nup107 in the Nup107-complex, induces a delay in pupariation (unpublished data).

      (3) Molecular Mechanism of EcR Signaling: The manuscript shows that Nup107 depletion affects EcR signaling and ecdysone biosynthesis, but the molecular basis of this regulation is not fully explored. Does phosphorylated ERK (p-ERK) fail to enter the nucleus? Clarifying this mechanism would strengthen the study's impact.

      We appreciate the reviewer’s insightful comment and fully agree with the concern. To address this, we examined the subcellular localization of phosphorylated ERK (p-ERK) in the prothoracic gland of control larvae, Nup107-depleted larvae, and Nup107-depleted larvae with torso overexpression. In control larvae, p-ERK was predominantly localized in the nucleus. However, in Nup107-depleted larvae, p-ERK was largely retained in the cytoplasm, indicating impaired pathway activation and nuclear translocation. Notably, overexpression of the torso in the Nup107-depleted background restored nuclear localization of p-ERK in the prothoracic gland (Author response image 6). These findings suggest that Nup107 regulates Drosophila metamorphosis, in part, through modulation of torso-mediated MAPK signaling.

      Author response image 6.

      Nup107 regulates torso activation dependent p-ERK localization. Detection of nucleocytoplasmic distribution of p-ERK (anti- p-ERK antibody, green) in the third instar larval prothoracic glands of control, PG-specific Nup107 knockdown (Phm-Gal4>Nup107<sup>KK</sup>) and PG-specific torso overexpression in Nup107 knockdown background (Phm-Gal4>Nup107<sup>KK</sup>; UAS-torso). DNA is stained with DAPI. Scale bars, 20 µm.

      Minor Comments:

      (1) The manuscript contains typographical errors that may hinder readability. Additionally, some phrases (e.g., "unequivocally demonstrate") may be overly strong. Consider adjusting language to reflect the nature of the data more accurately.

      We agree with the reviewer. We have edited the manuscript accordingly to crease out such typographical errors at relevant places in the main manuscript.

      (2) The data presentation could be improved by eliminating redundancy. Some sections repeat similar findings in different tissues, which could be consolidated to improve clarity and flow.

      While we agree with the comment, we could not help ourselves in tissue redundancy for presenting our data for EcR translocation studies. I wish we could use another tissue. However, we have put EcR localization and p-ERK translocation data in the responses to present another non-redundant tissue perspective (Figures R5 and R6).

      References:

      (1) Varghese, Jishy, and Stephen M Cohen. “microRNA miR-14 acts to modulate a positive autoregulatory loop controlling steroid hormone signaling in Drosophila.” Genes & development vol. 21,18 (2007): 2277-82. doi:10.1101/gad.439807

      (2) Rewitz, Kim F et al. “The insect neuropeptide PTTH activates receptor tyrosine kinase torso to initiate metamorphosis.” Science (New York, N.Y.) vol. 326,5958 (2009): 1403-5. doi:10.1126/science.1176450

      (3) Pan, Xueyang, and Michael B O'Connor. “Coordination among multiple receptor tyrosine kinase signals controls Drosophila developmental timing and body size.” Cell reports vol. 36,9 (2021): 109644. doi:10.1016/j.celrep.2021.109644

      (4) Pascual-Garcia, Pau et al. “Metazoan Nuclear Pores Provide a Scaffold for Poised Genes and Mediate Induced Enhancer-Promoter Contacts.” Molecular cell vol. 66,1 (2017): 63-76.e6. doi:10.1016/j.molcel.2017.02.020

      (5) Pascual-Garcia, Pau et al. “Nup98-dependent transcriptional memory is established independently of transcription.” eLife vol. 11 e63404. 15 Mar. 2022, doi:10.7554/eLife.63404

      (6) Kadota, Shinichi et al. “Nucleoporin 153 links nuclear pore complex to chromatin architecture by mediating CTCF and cohesin binding.” Nature communications vol. 11,1 2606. 25 May. 2020, doi:10.1038/s41467-020-16394-3

      (7) Gozalo, Alejandro et al. “Core Components of the Nuclear Pore Bind Distinct States of Chromatin and Contribute to Polycomb Repression.” Molecular cell vol. 77,1 (2020): 67-81.e7. doi:10.1016/j.molcel.2019.10.017

      (8) Shimell, MaryJane, and Michael B O'Connor. “Endoreplication in the Drosophila melanogaster prothoracic gland is dispensable for the critical weight checkpoint.” microPublication biology vol. 2023 10.17912/micropub.biology.000741. 21 Feb. 2023, doi:10.17912/micropub.biology.000741

    1. Author response:

      The following is the authors’ response to the original reviews.

      We have responded to these criticisms below and have revised the main text and figures. Here, we outline the major points of our responses:

      (1) The reviewers asked for more clarification regarding cell type annotation in the lung mesenchyme as shown in Figure 3C. We have included a new supplementary figure (Supplementary Figure 2) which shows differentially expressed genes amongst these mesenchymal cell subsets using a variety of visualization tools including a heatmap, UMAP plots, and the dotplot which was originally shown in Supplementary Figure 1D. The other supplemental figures have been re-numbered.

      (2) We acknowledge the lack of consensus in the field regarding the nomenclature of fibroblast subsets in the developing mouse lung. We are not attempting to define new subsets, but rather we adopted annotations based on previously published work. Specifically, we used Seurat to define mesenchymal cell clusters and then compared the gene expression patterns of these clusters to published work by Hurskainen et al. (Bernard Thebaud’s group) and Narvaez Del Pilar et al. (Jichou Chen’s group). We acknowledge these annotations might conflict with other published data, but any approach to choosing a cell label would be subject to scrutiny. For example, Col13a1 fibroblasts share markers with cells which have been defined by others as lipofibroblasts or alveolar fibroblasts. Similarly, Col14a1 fibroblasts appear to share markers with matrix fibroblasts. Further work is clearly needed to address these discrepancies, and we hope that making our data publicly available will help that effort. 

      (3) The reviewers asked us to interrogate changes in canonical markers of fibroblast subsets (i.e. lipofibroblasts, matrix fibroblasts) to address whether the apparent loss of myofibroblasts could be explained by a change in myofibroblast specification/differentiation. We have included these data in the responses, but because we are unable to draw any clear conclusions from these results, we do not feel these data warrant inclusion in the manuscript/figures.

      (4) As highlighted in the eLife assessment, our study does not include tissue validation (i.e. immunohistochemistry) of myofibroblast markers to distinguish whether the loss of myofibroblasts is attributable to lack of proliferation and/or changes in differentiation/specification. We spent considerable time over the past few months attempting to address these questions, however we were unable to produce convincing PDGFRa staining on tissues that we had collected during our original studies. Without PDGFRa staining, we regretfully could not co-stain for other useful markers to assess proliferation (EdU), apoptosis (TUNEL or caspase), or fibroblast function/specification (ACTA2, SM22a/TAGLN, ADRP, etc). We suspect that these experiments would require optimization of tissue fixation/processing at the time of harvest or the inclusion of a Pdgfra lineage tool for better identification of these cells by immunohistochemistry. Given that the majority of Pdgfra lineage tools require a knock-in/knock-out approach, data generated using these tools should be interpreted with caution given our results here show that Pdgfra-haploinsufficiency alone worsens disease outcomes after hyperoxia exposure.

      In summary, we have addressed several concerns raised by the reviewers and have attempted to perform some of the additional experiments suggested.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors used both the commonly used neonatal hyperoxia model as well as cell-type-specific genetic inactivation of Tgfbr2 models to study the basis of BPD. The bulk of the analyses focus on the mesenchymal cells. Results indicate impaired myofibroblast proliferation, resulting in decreased cell number. Inactivation of Etc2 in Pdgfra-lineaged cells, preventing cytokinesis of myofibroblasts, led to alveolar simplification. Together, the findings demonstrate that disrupted myofibroblast proliferation is a key contributor to BPD pathogenesis.

      Strengths:

      Overall, this comprehensive study of BPD models advances our understanding of the disease. The data are of high quality.

      Weaknesses:

      The critiques are mostly minor and can be addressed without extensive experimentation.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors systematically explore the mechanism(s) of impaired postnatal lung development with relevance to BPD (bronchopulmonary dysplasia) in two murine models of 'alveolar simplification', namely hyperoxia and epithelial loss of TGFb signaling. The work presented here is of great importance, given the limited treatment options for a clinical entity frequently encountered in newborns with high morbidity and mortality that is still poorly understood, and the unclear role of TGFb signaling, its signaling levels, and its cellular effects during secondary alveolar septum formation, a lung structure generating event heavily impacted by BPD. The authors show that hyperoxia and epithelial TGFb signaling loss have similar detrimental effects on lung structure and mechanical properties (emphysema-like phenotype) and are associated with significantly decreased numbers of PDGFRa-expressing cells, the major cell pool responsible for generation of postnatal myofibroblasts. They then use a single-cell transcriptomic approach combined with pathway enrichment analysis for both models to elucidate common factors that affect alveologenesis. Using cell communication analysis (NicheNet) between epithelial and myofibroblasts they confirm increased projected TGFb-TGFbR interactions and decreased projected interactions for PDGFA-PDGFRA, and other key pathways, such as SHH and WNT. Based on these results they go on to uncover in a sequela of experiments that surprisingly, increased TGFb appears reactive to postnatal lung injury and rather protective/homeostatic in nature, and the authors establish the requirement for alpha V integrins, but not the subtype alphaVbeta6, a known activator of TGFb signaling and implied in adult lung fibrosis. The authors then go beyond the TGFb axis evaluation to show that mere inhibition of proliferation by conditional KO of Ect2 in Pdgfra lineage results in alveolar simplification, pointing out the pivotal role of PDGFRa-expressing myofibroblasts for normal postnatal lung development.

      Strengths:

      (1) The approach including both pharmacologic and mechanistically-relevant transgenic interventions both of which produced consistent results provides robustness of the results presented here.

      (2) Further adding to this robustness is the use of moderate levels of hyperoxia at 75% FiO2, which is less extreme than 100% FiO2 frequently used by others in the field, and therefore favors the null hypothesis.

      (3) The prudent use of advanced single-cell analysis tools, such as NicheNet to establish cell interactions through the pathways they tested and the validation of their scRNA-seq results by analysis of two external datasets. Delineation of the complexity of signals between different cell types during normal and perturbed lung development, such as attempted successfully in this study, will yield further insights into the underlying mechanism(s).

      (4) The combined readout of lung morphometric (MLI) and lung physiologic parameters generates a clinically meaningful readout of lung structure and function.

      (5) The systematic evaluation of TGFb signaling better determines the role in normal and postnatally-injured lungs.

      Weaknesses:

      (1) While the study convincingly establishes the effect of lung injury on the proliferation of PDGFRa-expressing cells, differentiation is equally important. Characterization of PDGFRa expressing cells and tracking the changes in the injury models in the scRNA analysis, a key feature of this study, would benefit from expansion in this regard. PDGFRa lineage gives rise to several key fibroblast populations, including myofibroblasts, lipofibroblasts, and matrix-type fibroblasts (Collagen13a1, Collagen14a1). Lipofibroblasts constitute a significant fraction of PDGFRa+ cells, and expand in response to hyperoxic injury, as shown by others. Collagen13a1-expressing fibroblasts expand significantly under both conditions (Figure 3), and appear to contain a significant number of PDGFRa-expressing cells (Suppl Fig.1). Effects of the applied injuries on known differentiation markers for these populations should be documented. Another important aspect would be to evaluate whether the protective/homeostatic effect of TGFb signaling is supporting the differentiation of myofibroblasts. Postnatal Gli1 lineage gains expression of PDGFRa and differentiation markers, such as Acta2 (SMA) and Eln (Tropoelastin). Loss of PDGFRa expression was shown to alter Elastin and TGFb pathway-related genes. TGFb signaling is tightly linked to the ECM via LTBPs, Fibrillins, and Fibulins. An additional analysis in the aforementioned regard has great potential to more specifically identify the cell type(s) affected by the loss of TGFb signaling and allow analysis of their specific transcriptomic changes in response and underlying mechanism(s) to postnatal injury.

      We attempted to conduct additional analyses on our sequencing data to evaluate the impact of lung injury on the differentiation of Pdgfra-expressing cells towards other fibroblast lineages. To specifically address the impact of hyperoxia on fibroblast differentiation, we subsetted wildtype cells collected at the P7 timepoint (while pups were still undergoing hyperoxia treatment) from the larger data set. Shown below are several Violin Plots comparing gene expression between RA and O2 conditions across the mesenchymal populations.

      Although there are some interesting observations in this analysis, we could not identify a consistent theme from these data which could clearly answer the reviewers’ questions. We see a clear reduction of Pdgfra and Eln in both myofibroblast subsets with hyperoxia, which support our findings of reductions in the myofibroblast subsets. Acta2 and Tagln appear slightly lower in alveolar myofibroblasts, but both are higher in ductal myofibroblasts. Interestingly, both Acta2 and Tagln are higher in Col14a1 fibroblasts with hyperoxia. The functional relevance of these data are unclear because there appears to be higher per-cell expression of Acta2 in ductal myofibroblasts while the relative contribution of these cells is reduced (Figure 3D-E). Col14a1 fibroblasts show increased Acta2 and Tagln expression and are slightly increased in proportion at P7 with hyperoxia treatment (Figure 3D), albeit to a much lesser degree compared to Col13a1 fibroblasts.

      Author response image 1.

      Markers of ductal myofibroblasts including Hhip, Cdh4, and Aspn all appear lower with hyperoxia. Interestingly Plin2 expression is only slightly increased in Col13a1 fibroblasts with hyperoxia treatment, and there is also increased expression in alveolar myofibroblasts. Tcf21 is another marker commonly used to identify lipofibroblasts and its expression is similarly increased in myofibroblasts during hyperoxia, although its expression is conversely lower in Col13a1 and Col14a1 fibroblasts in our data. Overall, these data would appear consistent with recently published data by Ricetti et al. in which the authors observed an increase in lipofibroblast gene signatures and reduced myofibroblast gene signatures with hyperoxia treatment.

      Author response image 2.

      Author response image 3.

      The ability of our data to clearly identify changes in cell fate differentiation is limited by our use of Seurat to define cell clusters because these methods are likely to mask subtle gene expression changes in a small number of cells nested within a parent cluster. In the example above with Plin2, the change in Plin2 expression within myofibroblasts is not significant enough for Seurat to pull these cells out from their parent clusters to define a different lineage, nor are these cells similar enough in their current moment in time to be considered Col13a1 fibroblasts or lipofibroblasts. Increasing the dimensions used to define Seurat clusters might be sufficient to identify this subset of cells as a distinct cluster, however this approach would come at the expense of creating several more cell subsets with increasingly small populations which would be difficult to further analyze.

      One alternative approach to address these questions regarding differentiation might include using pseudo-time analysis of our sequencing data to predict cell lineage. Unfortunately, these analyses are beyond the scope of our current study, but we hope that our public data set can be used by investigators hoping to utilize this approach. Another method to address these questions could utilize a pulse-chase lineage experiment where one could label Pdgfra-expressing cells at the onset of injury and compare the differentiation of these labeled cells following injury. Li et al. conducted a similar experiment with hyperoxia in which Pdgfra-expressing cells were labeled during embryonic development and then postnatally following hyperoxia exposure. The authors noted a decrease in both lineaged myofibroblasts and lineaged lipofibroblasts and concluded that Pdgfra-lineaged cells were lost with hyperoxia treatment rather than undergoing aberrant differentiation. While these experiments likely have their own caveats related to the timing and efficiency of labeling, they represent a more conclusive approach to addressing differences in cell specification as compared to our sequencing- and flow cytometry-based approaches.

      Author response image 4.

      Author response image 5.

      (2) Of the three major lung abnormalities encountered in BPD, the authors focus on alveolarization impairment in great detail, to a very limited extent on inflammation, and not on vascularization impairment. However, this would be important not only to better capture the established pathohistologic abnormalities of BPD, but also it is needed since the authors alter TGFb signaling, and inflammatory and vascular phenotypes with developmental loss of TGFb signaling and its activators have been described. Since the authors make the point about the absence of inflammation in their BPD model, it will be important to show the evidence.

      We acknowledge that vascular changes significantly contribute to BPD pathogenesis, however our study was not designed to adequately characterize changes in vascular/endothelial cells. We were motivated to focus on the lung mesenchyme after observing a dramatic loss of PDGFRa+ cells with our initial characterization of the hyperoxia injury model (Figure 2). At the onset of our study, the existing publicly available data did not contain enough mesenchymal cells for in-depth analysis. To generate new observations and hypotheses within the lung mesenchyme we enriched our single cell prep for mesenchymal cells at the time of FACS-sorting to ensure we would have sufficient cell numbers for downstream analysis.

      (3) Conceptually it would be important that in the discussion the authors reconcile their findings in the experimental BPD models in light of human BPD and the potential implications it might have on new ways to target key pathways and cell types for treatment. This allows the scientific community to formulate the next set of questions in a disease-relevant manner.

      We have edited text in the discussion to address this point.

      Reviewer #3 (Public Review):

      Summary:

      This paper seeks to understand the role of alveolar myofibroblasts in abnormal lung development after saccular stage injury.

      Strengths:

      Multiple models of neonatal injury are used, including hyperoxia and transgenic models that target alveolar myofibroblasts.

      Weaknesses:

      There are several weaknesses that leave the conclusions significantly undersupported by the data as presented:

      (1) There is no validation of the decreased number of myofibroblasts suggested by flow cytometry/scRNAseq at the level of the tissue. Given that multiple groups have reported increased myofibroblasts (aSMA+ fibroblasts) in humans with BPD and in mouse models, demonstrating a departure from prior findings with tissue validation in the mouse models is essential. There are many reasons for decreased numbers of a subpopulation by flow cytometry, most notably that injured cells may be less likely to survive the cell sorting process.

      Unfortunately, we were unable to produce convincing PDGFRa staining on tissues that we had collected during our original studies. Without PDGFRa staining, we regretfully could not co-stain for other useful markers to assess proliferation (EdU), apoptosis (TUNEL or caspase), or fibroblast function/specification (aSMA/ACTA2, SM22a/TAGLN, ADRP, etc). We suspect that these experiments would require optimization of tissue fixation/processing at the time of harvest or the inclusion of a Pdgfra lineage tool for better identification of these cells by immunohistochemistry. Given that the majority of Pdgfra lineage tools require a knock-in/knock-out approach, data generated using these tools should be interpreted with caution given our results here show that Pdgfra-haploinsufficiency alone worsens disease outcomes after hyperoxia exposure.

      Our single cell data show that there is increased expression of Acta2 and Tagln shown in the plots which might be consistent with the increased aSMA staining which others have observed in these settings. Interestingly, the transcripts of both genes are reduced in alveolar fibroblasts while increased in ductal myofibroblasts, Col13a1 fibroblasts, Col14a1 fibroblasts, and vascular smooth muscle. We did not include aSMA antibody staining in our flow cytometry experiments, but this would certainly add value to future attempts to characterize the phenotypic changes occurring during these injury models. 

      (2) The hallmark genes used to define the subpopulations are not given in single-cell data. As the definition of fibroblast subtypes remains an area of unsettled discussion in the field, it is possible that the decreased number by classification and not a true difference. Tissue validation and more transparency in the methods used for single-cell sequencing would be critical here.

      See response above and new Supplemental Figure 2.

      (3) There is an oversimplification of neonatal hyperoxia as a "BPD model" used here without a reference to detailed prior work demonstrating that the degree and duration of hyperoxia dramatically change the phenotype. For example, Morty et al have shown that hyperoxia of 85% or more x 14 days is required to demonstrate the septal thickening observed in severe human BPD. Other than one metric of lung morphometry (MLI), which is missing units on the y-axis and flexivent data, the authors have not fully characterized this model. Prior work comparing 75% O2 exposure for 5, 8, or 14 days shows that in the 8-day exposed group (similar to the model used here), much of the injury was reversible. What evidence do the authors have that hyperoxia alone is an accurate model of the permanent structural injury seen in human BPD?

      At the onset of our studies, we noted that several groups were using widely variable protocols ranging from 60-100% O2 exposure. Morty et al. have indeed conducted thorough experiments to characterize various different hyperoxia exposure protocols. In their 2017 study (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5312005/) they showed that 85% O2 from P1-P7 was sufficient to produce increased septal thickness compared to control mice, and this change was comparable to P1-P14 exposure with 85% O2. Interestingly, they also noted that some therapeutic interventions could rescue disease caused by 60% O2 but not 85% O2 exposure. Our criteria in choosing a treatment protocol were: (1) nursing dams and pups survived hyperoxia exposure, (2) injury was reproducible across cohorts, and (3) injury was not reversible simply by recovering in room air. We found that recent work utilizing 75% O2 exposure was sufficient to cause the alveolar simplification phenotype which we sought to investigate. In our hands, we did not observe mortality of nursing dams or pups except for litters lost to cannibalism/failure of cross-fostering.

      We are confident that the injury caused by our hyperoxia protocol is not reversible simply by recovering mice in room air. Several groups have phenotyped mice at P4, P10, or P14 immediately following the conclusion of hyperoxia treatment. To ensure that we were studying a lasting, irreversible phenotype, we conducted our endpoint studies (morphometry and lung physiology) at P40. Because mice continue to undergo alveolarization until ~P36-P39, we reasoned that this additional recovery time following cessation of hyperoxia would allow for spontaneous recovery if this injury was transient. Additionally, shown below are unpublished flexiVent data in which mice were treated for 10 days with 75% O2 and recovered until analysis at 10 weeks of age. These results are entirely consistent with the flexiVent data we have included in the manuscript, and the persistence of lung physiologic changes in adult mice suggest the presence of permanent underlying structural changes. We did not conduct morphometry/MLI studies at later timepoints, but we have no reason to suspect a different outcome given the clear results from lung physiology.

      Author response image 6.

      (4) Thibeault et al published a single-cell analysis of neonatal hyperoxia in 2021, with seemingly contrasting findings. How does this dataset compare in context?

      Our data is complimentary to the single-cell analysis published by Thebaud et al. We included a re-analysis of their mesenchymal data in Supplementary Figure 2 which shows they also observed a relative decrease in myofibroblast clusters at the P7 and P14 timepoints following hyperoxia treatment. Figure 4 of their paper highlights the top differentially expressed genes between RA and O2 in Col13a1 FB and myofibroblasts, and we observe nearly identical findings in our data set within each of these clusters. Below we have created dotplots of P7 wildtype samples for the same selected genes shown in Figure 4G of the Thebaud et al. paper. It is important to note that their clustering pooled all myofibroblasts into one cluster, while our data is divided into alveolar myofibroblasts and ductal myofibroblasts. The other difference is their data set includes all timepoints P3, P7 and P14 pooled for display, while the plot we selected for simplicity here is only P7 cells. From these data we can see that the general trends are identical to those observed by Thebaud et al., and the differences in genes such as Acta2 can be accounted for by different changes observed in the different myofibroblast clusters – which is identical to what is shown in the violin plots above – namely that Acta2 is reduced in hyperoxia in alveolar myofibroblasts while increased in the ductal myofibroblasts.

      Author response image 7.

      Alveolar myoFB

      Author response image 8.

      Ductal myoFB

      One difference between our two datasets is the relative contribution of myofibroblast and Col13a1 fibroblasts to the entire mesenchymal population of cells. Over 50% of all mesenchymal cells in our preps consist of myofibroblasts, while most of their mesenchymal cells are Col13a1 fibroblasts. These differences are likely accounted for by differences in tissue digestion and cell preparation protocols. However, despite these differences, their data show the same trends of decreased myofibroblasts and a relative expansion in Col13a1 fibroblasts.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 1, for the hyperoxia model, it is informative to have the analysis done at P40, while most of the previous studies using this model focus on outcomes shortly after the end of the hyperoxia regimen. The authors state "we did not see evidence of fibrosis, scarring, or inflammation." It will be helpful to include data supporting this conclusion, especially ACTA2, CTHRC1, and CD45 staining.

      We did not conduct trichrome staining or hydroxyproline assays to quantify the absence of fibrotic changes because there were no gross histologic changes consistent with scarring or fibrosis by H&E staining. We have amended the text to say “we did not see evidence of fibrosis or scarring” since we did not publish any changes to characterize the immune cell compartment.

      (2) Figure 3, single cell analysis, naming of the clusters is confusing. Is "alveolar myofibroblasts" the same as "secondary crest myofibroblasts"? Is "Col13a1 FB" the same as "alveolar fibroblasts" and "Col14a1 FB" the same as "adventitial fibroblasts"? The loss of myofibroblasts is intriguing because, by staining, there is an increase of ACTA2+ cells. Are ACTA2+ cells not myofibroblasts in scRNAseq data?

      As mentioned in responses above, we used Jichou Chen’s nomenclature of “alveolar myofibroblasts” and “ductal myofibroblasts”, but we agree that the former cluster is most consistent with “secondary crest myofibroblasts”. To distinguish the two remaining clusters of fibroblasts we used the same nomenclature as found in Thebaud et al’s single cell data set- “Col13a1 FB and “Col14a1 FB”. The Col13a1 FB cluster is most consistent with “alveolar fibroblasts” and contains high expression of several genes used to define “lipofibroblasts”, though it is unclear whether the latter may represent a subcluster within the Col13a1 FB cluster.

      As shown above, Acta2 is expressed broadly within the lung mesenchyme with highest levels found in myofibroblasts and smooth muscle cells.

      (3) Phosphorylated SMAD2/3 staining (e.g. Cell Signaling antibody) in the two models will be informative to show where TGF signaling activity is altered.

      We have not been successful in using SMAD2/3 staining to infer changes in TGFb signaling at the resolution needed to address this question. Other groups have shown qPCR and western blot data for SMAD2/3 signaling from whole lung extracts, but these approaches lack cell type and specificity and do not address spatial changes. We attempted to incorporate pSMAD2/3 staining into our flow cytometry experiments, but the staining protocol did not work in our hands.

      (4) Is cell death increased in the multiple models that showed simplification?

      While our EdU experiments address proliferation, we were unable to perform PDGFRa and TUNEL/caspase co-staining by histology to address apoptosis/cell death in our different models. Shown here is data from P7 wildtype mice in which Cdkn1a (promoting arrest of cell cycle), and pro-apoptotic genes Bax, Bak1, and Fas are all upregulated in hyperoxia in several mesenchymal cell populations including myofibroblasts.

      Author response image 9.

      (5) Wording: "These data suggest that avb6 does not play a role in TGFb activation during normal development or neonatal hyperoxia, while av-integrins in the lung mesenchyme are required for normal development and play a protective role in response to hyperoxia." The first half of the sentence is missing a reference to the epithelium.

      Text now reads "These data suggest that epithelial avb6 does not play a role…”

      Reviewer #2 (Recommendations For The Authors):

      The reviewer greatly appreciates the work presented here, especially the hard task of addressing combined signaling pathway input into key mesenchymal cell types during an essential expansion of alveolar surface area in postnatal lung and its effect upon disturbance.

      The issues of concern are mentioned in the public review and are expanded upon below:

      (1) Expanded characterization of PDGFRa+ expressing cells in the scRNA dataset is needed (see public review). Also included should be some of the key myofibroblast genes (elastin, Acta2, etc.) and their changes in the relevant cell populations. It would be important to show (at least at the transcriptional level) that myofibroblast differentiation is impaired if the author claims that the alveolarization defect is due to functional myofibroblast impairment. Furthermore, Ect2 expression and changes with treatments should be shown for the different cell populations (relevant to Figure 9).

      See responses above

      (2) The authors stated that they did not find evidence of fibrosis, scarring, and inflammation, but did not provide data to support this statement. Given the importance of at least the inflammation component in BPD, the absence of inflammation needs to be shown, especially in the model using the TGFBR2-cKO mouse, where at least their data show a trend to increased CD45 cell numbers (Figure 2), and upregulated inflammatory upstream regulators (IL10, IFNa, IKBKB, CEBPB upregulated) in the IPA (Figure 3). BAL and/or tissue by flow or IHC have been used to assess different immune cell populations. In terms of evaluation of vascular impairment, the single-cell data set contains endothelial cells, vascular smooth muscle, and pericytes, which allows interrogation following the two different types of injury (hyperoxia cKO TGFbR2) used for the scRNA-seq experiments).

      A full characterization of the immune cell or vascular/endothelial cell compartment within our models is beyond the scope of this current study as we were focusing on the shared changes observed within the lung mesenchyme. None of these compartments exist in isolation, so of course there are likely to be correlative and/or causative changes observed in each of the different models which we studied. We did consider further phenotypic analysis of the immune cells by flow cytometry within our different models, but deferred these experiments for future studies. As mentioned earlier we have omitted the reference to “no inflammation”.

      (3) The authors should report several litters per experiment and experimental group, mortality in the groups, and if present, visualize using e.g. Caplan-Meyer curves. The switch of the mothers during treatment, the early postnatal injections and treatments, and variability in outcome measures between different litters have to be anticipated. Therefore at least 2 litters, but preferably 3 litters per experiment should be examined, to show reproducibility.

      All experiments were conducted with at least 2-3 contemporaneous litters in each treatment group as this was necessary to have enough animals per treatment condition/group to achieve statistical significance. This was essential as all experiments were conducted on the C57BL/6 background where litter sizes are typically 6-8 pups in our colony. We did not encounter any maternal mortality related to hyperoxia exposure while rotating between hyperoxia and normoxia every 48 hrs. Loss of pups in our experiments was mostly due to cannibalism either immediately after birth or from neglect due to failure of cross-fostering.

      (4) The reviewer is concerned about using PBS as a control for experiments involving antibody treatment, in this case, 1D 11. The use of an isotype IgG would be the most appropriate and convincing control. In this case, an isotype-matched murine IgG1 control (13C4) has already been generated and is commercially available. While the reviewer does not suggest repeating all experiments, at least one small experiment showing that control IgG does not alter the lung phenotype with hyperoxia when compared with 1D11 would be important.

      We appreciate the reviewer’s suggestion and will consider an isotype antibody comparison in future studies. While not directly comparing 1D11 to isotype, we can share data in which we compared PBS to a different antibody. In this experiment, we attempted to use antibody blockade during the first 10 days of life while mice were undergoing hyperoxia treatment to target a specific component of the TGFb pathway. We observed no difference in outcomes either in RA or O2 when comparing PBS to xxx antibody. We cannot share the antibody identity due to intellectual property reasons, however additional studies confirmed that this antibody likely had no impact due to poor in vivo blocking activity.

      Author response image 10.

      (5) While inhibited proliferation is one possible explanation for the decrease of PDGFRa expression in the injured mice, there should be consideration of increased and/or premature apoptosis (before the physiologically observed wave P14-P20) as another reason. Also, do the authors propose that only proliferation results in alveolarization impairment, but differentiation plays no significant role here? If that is the case that would mean that there are some fully-differentiated myofibroblasts in the alveolar septa, but not enough to create the multitude of alveolar septal walls. Have the authors evaluated the decrease in secondary alveolar septa formed per alveolar airspace? This measure would give some sense of whether septum initiation was prevented or whether septa were formed, but are structurally abnormal, e.g. due to altered ECM (suspected decrease in Elastin and SMA expression, if myofibroblast differentiation was impaired or cell content (suspected decrease in myofibroblasts and increase of other cell types, such as lipofibroblasts).

      Apoptosis/cell death are likely to play a role in addition to inhibited proliferation. See violin plots shown above with cell cycle arrest and pro-apoptotic genes upregulated within the mesenchyme. Because we were unable to optimize tissue sections/staining with the samples collected during the early time points of our experiments (ie P4, P7, P10, P14), we are unable to co-stain for markers of apoptosis and answer this question in a direct manner. Future experiments will focus on additional characterization of these early changes with particular attention to altered fibroblast phenotypes within the alveolar septae.

      (6) An illustration depicting key cells and the pathways involved in cartoon format would be a useful addition and visualize the important conclusions of this paper for the reader.

      We appreciate this suggestion but think the results are sufficiently straightforward that a summary cartoon would not add much.

      Figure 4A: the legend appears to be switched. The gray square seems to align with the epithelial ligands, while the blue square aligns with receptors.

      Thank you for identifying this mistake – fixed.

      Names of transgenic lines used through manuscript:

      Please use the correct name, as per JAX would be either Gli1tm3(cre/ERT2)Alj/J or Gli1-CreERT2.

      Please use the correct name, as per JAX would be either Pdgfratm1.1(cre/ERT2)Blh/J or Pdgfrα-CreERT2.

      PDGFRa-CRE would be JAX# 013148.

      The transgenic lines have been noted in the methods, and we have edited the text of the manuscript to reflect the correct names of these lines. For the supplementary figure 4 which compares Gli1-CreERT2 to Pdgfrα-CreERT2, we left our prior nomenclature intact because it better reflects that each of these lines are haploinsufficient at their targeted loci, and that the controls are cre-negative littermates.

      We did not use the PDGFRa-CRE line (JAX# 013148).

      Reviewer #3 (Recommendations For The Authors):

      - More transparency about the single-cell analysis is required: 1) how are cell types and clusters defined? 2) what strategy was used for ambient RNA? 3) how do the controls compare with recently published mouse developmental datasets? 4) how does this model compare with the single-cell dataset published by Thibeault et al in 2021 (neonatal hyperoxia x 14 days with multiple time points used)?

      See responses above.

      - Tissue level validation of these findings is essential by RNA ISH or IF. While validation that the same process is at play in human tissue would be ideal, if this is not available, the conclusions must be tempered in the discussion.

      See responses above.

      - Is this more mild neonatal injury reversible in mice? As noted above, more characterization of this model (and placing it in the context of other more widely published models would be helpful).

      See responses above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Important details about the nature of DEG comparisons between the wild type and the Lrrk2 G2019S model are missing.

      Please see the recommendations section below for specific responses to individual comments from Reviewer #1.

      (2) Some aspects of the integration between snRNA-seq and MERFISH data are not clear, and many MERFISH-identified cells do not appear to have a high-confidence cluster transfer into the snRNA-seq data space. Imputation is used to overcome some issues with the MERFISH dataset, but it is not clear that this is appropriate.

      Please see the recommendations section below for specific responses to individual comments from Reviewer #1.

      Reviewer #2 (Public review):

      (1) In the GO pathway analyses (both GSEA and DEG GO), I did not see a correction applied to the gene background considered. The study focusses on dopaminergic neurons and thus the gene background should be restricted to genes expressed in dopaminergic neurons, rather than all genes in the mouse genome. The problem arises that if we randomly sample genes from dopaminergic neurons instead of the whole genome, we are predisposed to sampling genes enriched in relevant cell-type-specific roles (and their relevant GO terms) and correspondingly depleted in genes enriched in functions not associated with this cell type. Thus, I am unsure whether the results presented in Figures 8 and 9 may be more likely to be obtained just by randomly sampling genes from a dopaminergic neuron. The background should be limited and these functional analyses rerun.

      Thank you for pointing out this important concern. We agree that overrepresentation analyses (ORAs) are vulnerable to selecting cell-type specific markers as significantly differentially expressed and thus inflating detection of cell-type associated gene sets rather than those truly altered as a function of experimental condition. We have thus re-run the GO analyses in our study with the genetic background being adjusted for each individual comparison. For dataset-level GO in Fig 8, genetic background was defined as genes with expression detected in at least 5% of all cells (to approximate the inclusion of cluster-specific genes). For comparisons of subsets within the dataset (i.e. a family or cluster) across conditions, a minimum detection level of 10% of cells was used to define the genetic background. These same thresholds were applied to filter the DEG lists used as input for GO. Interestingly, this correction appears to have filtered out or lowered the significance of some of the more generic brain-associated pathways that we initially presented, such as axonogenesis or learning and memory, and we feel even more confident in our original interpretation.

      Functional class scoring methods like GSEA, however, are unlike ORAs in that they do utilize a hypergeometric test to calculate overrepresentation as no distinction is made between significant and non-significant differential gene expression (nor is a genetic background provided as input to this tool). GSEA takes as input the full DE results, ranking genes according to their association with either group. Thus, genes simply enriched in DA neurons should be present towards both extremes of the rank list, rather than uniformly skewed toward one extreme. Per the GSEA authors’ user manual and original source paper, the entirety of DE testing should be provided as input for GSEA (barring genes with detection levels so low that their differential expression and/or ranking is likely to be artifactual):

      “The GSEA algorithm does not filter the expression dataset and generally does not benefit from your filtering of the expression dataset. During the analysis, genes that are poorly expressed or that have low variance across the dataset populate the middle of the ranked gene list and the use of a weighted statistic ensures that they do not contribute to a positive enrichment score. By removing such genes from your dataset, you may actually reduce the power of the statistic and processing time is rarely a factor as GSEA can easily analyze 22,000 genes with even modest processing power. However, an exception exists for RNA-seq datasets where GSEA may benefit from the removal of extremely low count genes (i.e., genes with artifactual levels of expression such that they are likely not actually expressed in any of the samples in the dataset).” [https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html]

      In our study, this filtering of very low expression genes (to account for artifactually inflated fold changes or a large number of ties in the rank list that are subsequently ordered at random) occurred at the level of DE testing using the Seurat FindMarkers command, in which differential expression calculations were only performed for genes that were detected in a minimum of 10% of cells in the dataset.

      (2) In the scRDS results, I am unsure what is significant and what isn't. The authors refer to relative measures in the text ("highest") but I do not know whether these differences are significant nor whether any associations are significantly unexpected. Can the x-axis of scRDS results presented in Figure 9 H and I be replaced with a corrected p-value instead of the scRDS score?

      An important distinction should be made here between scDRS and similar approaches that utilize overrepresentation analyses to assess for associations of DEGs with putative risk genes, similar to the GO analyses performed in our paper. The scDRS score represents the relative association for each individual cell’s expression profile (among all other cells in the dataset) with PD risk loci by utilizing the underlying SNPs and associations described in GWAS summary statistics (see Methods or Zhang et al., Nat Genetics 2022 for more details). While scDRS can be used to generate a p value for each individual cell in the dataset, scDRS does not have a native method for defining group-level p values, nor have we attempted to calculate group-level p values here. In order to compare cluster-level mean scDRS scores and determine their significance, we created bootstrapped 95% confidence intervals for the mean scDRS score of each cluster or family (shown by the error bars in forest plots 9G, 9H). A score of 0 represents the null hypothesis of no association between gene expression and PD risk loci, and thus if the 95% confidence interval does not overlap 0, the mean scDRS score for a given group can be regarded as significant as there is a less than 5% chance of the true group mean containing the null. Similarly, groups can be compared to each other in the same way to determine if the group-level mean scDRS score is significantly different across a given pair. However, this overlap of confidence intervals should be interpreted cautiously, as there are a large number of potential comparisons that can be made, creating the potential for Type I error. We have added language to clarify what the scDRS score represents, and to ensure it is not conflated with approaches such as GO or GSEA.

      (3) The results discussed at the bottom of page 13 [page 14 of new version] state that 48.82% of the proteins encoded by the Calb1 DEGs have pre-synaptic localisations as opposed to 45.83% of the SOX6 DEGs, which does not support the statement that "greater proportions of DEGs are associated with presynaptic locations in cells from vulnerable DA neurons (Sox6 family, [and in particular,Sox6^tafa1]), compared to less vulnerable ones (Calb1 family)".

      Thank you for pointing this out; the error here lies in the wording of the results. The percentages mentioned above describe the percentages within the synaptic localized genes rather than the total DEG lists. We have rephrased this section for clarity to include both the percentages within this category as well as the total (the results of which are in line with our original statement).

      (4) While an interest in the Sox6^tafa1 subtype is explained through their expression of Anxa1 denoting a previously identified subtype associated with locomotory behaviours, it was unclear to me how to interpret the functional associations made to DEGs in this subtype taken out of context of other subtypes. Given all the other subtypes, it is not possible to ascertain how specific and thus how interesting these results are unless other subtypes are analysed in the same way and this Sox6^tafa1 subtype is demonstrated as unusual given results from other subtypes.

      In our study, we chose to specifically focus on this population given its unique acceleration-locked functional activity pattern observed in Azcorra & Gaertner et al, Nat Neuro 2023, as there are technical limitations that warrant cautious application of the above approach. We agree that the associations of this population to the described DEGs cannot be interpreted as unique to this population given the data presented and have added language to this effect within the text. There are two major challenges to analyzing all other subtypes to provide a comparison. Firstly, given the number of subtypes involved and number of downstream analyses, it is computationally intensive to carry out this analysis. More importantly however, the results cannot be easily compared across different populations due to the variability in both cluster size and internal heterogeneity of each cluster, as the statistical power in calculating DEGs will be inherently different across these populations (i.e. smaller or more heterogenous clusters would be expected to show a lower number of DEGs reaching significance). While pseudo bulk testing is effective for mitigating these factors, our limited sample number (n=2 independently generated datasets per group) dramatically underpowers differential expression testing using pseudo bulk analysis. One solution is to uniformly limit each cluster size to the minimally observed cluster size through random down-sampling. While this allows the ‘n’ in DE calculations to be uniform, this potentially worsens the problem of internal heterogeneity, which would remain roughly constant but in the setting of a lower ‘n’, increasing the variability in results for larger clusters. To provide a comparator for the population of interest we focused on, we have performed this down sampling approach in order to compare Sox6^Tafa1 to another cluster within the VTA, Calb1^Stac, that also expresses high levels of Anxa1 and Aldh1a1 given the broad interest in these markers as proxies for vulnerability. The results of this comparison are now shown in Figure S10.

      (5) On p12, the authors highlight Mir124a-1hg that encodes miR-124. This is upregulated in Figure 8D but the authors note this has been to be downregulated in PD patients and some PD mouse models. Can the authors comment on the directional difference?

      We have adjusted the text to reflect this discrepancy and speculate on why this may be observed. In short, one hypothesis is that miR-124, given its proposed neuroprotective effects, is increased in DA neurons facing toxic metabolic insults as a compensatory response. In our prodromal model without observable degeneration, this could represent an early sign of cell stress. While speculative, in PD patients or overtly degenerative models, lack of compensatory miR-124 or fulminant cell death among vulnerable cells could result in an observed decrease in miR-124 expression.

      (6) Lastly, can the authors comment on the selection of a LogFC cut-off of 0.15 for their DEG selection? I couldn't see this explained (apologies if I missed it).

      The 0.15 cutoff was selected arbitrarily based on the observed range of fold changes seen among our differentially expressed genes. However, importantly, this cutoff was not used for defining DEGs for downstream analyses such as GSEA or GO, nor for defining significance of differential expression, which was done purely based on FDR-adjusted p values <0.05. The selection of 0.15 affects only the coloring seen in the volcano plot, which we have decided to move to supplemental figures given the uniformly small effect size seen in individual genes and a separate reviewer comment regarding concern in the field over differential expression testing methods in single-cell datasets. Instead, this figure now focuses on highlighting pathway- and gene-set level comparisons that can provide easier interpretation of small, but concordant changes across swaths of genes.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the MERFISH dataset, only around half of the DAergic cells (2,297 of 4,532) were successfully projected into the snRNA-seq UMAP space, based on a similarity score > 0.5. Additionally, key transcripts that were used to define the snRNA-seq clusters (such as Sox6) were not identified at all in the MERFISH dataset. This raises some questions about the ability to integrate and compare these datasets directly, which are not fully considered in the manuscript. These discrepancies are smoothed over using imputation, which allows specific class-defining genes such as Sox6 to be plotted on spatial coordinates in Figure 4D. However, imputation is not without caveats, and the appropriateness of the imputation is not well considered in the text.

      We fully agree with the reviewer that the use of an imputation approach needs to be clarified and justified thoroughly. We added a sentence to better clarify the process of imputation on Page 9 “The imputed gene expression is extrapolated from anchors established from pairwise correspondences of cell expression levels between MERFISH and snRNA-Seq datasets.” This pair-wise cell correspondence as defined by anchors can be assessed using Seurat confidence score. We acknowledge the fact that only about 50% of cells could confidently be transferred onto the snRNA-Seq data. This is the result of using a stringent confidence level of 0.5 (similar to previous publications, PMID: 38092916 & 38092912). We preferred mapping fewer high-confidence cells than potentially misrepresenting the spatial location of some of these clusters.

      It is also important to demonstrate the reliability of gene imputation. Indeed as pointed out by the reviewer, some probes such as Sox6 were not detected in the MERFISH dataset. To strengthen our data integration and as already mentioned in the manuscript, we excluded 219 genes based on the deviation of average counts per cell between the datasets. The fact that the imputed expression of Sox6 perfectly reflects its well-characterized distribution (PMIDs: 25127144, 30104732, 25437550, 34758317) strengthened our confidence in our imputation pipeline. We also looked at the correlation of imputed gene expression with the detected transcripts in our MERFISH experiments. We added a new supplemental figure (S7) highlighting the correlations between MERFISH and imputed gene expression of 8 genes (4 for each Sox6 and Calb1 family). Together Fig S6 and S7 show the range of correlations between imputed and actual MERFISH transcript. Altogether, we can observe relatively high correlation between the number of detected transcripts per gene in snRNA-Seq and MERFISH datasets

      In addition, we added a paragraph discussing limitations of gene expression imputation on page 17: “A strength of our study is that it utilizes advantages of each transcriptomic approach, the deep molecular profiling of individual cells using snRNA-Seq and the spatial resolution of MERFISH. For instance, we relied on gene expression imputation to ascribe expression level to genes not covered/detected in our MERFISH probe panel. Gene imputation as described by Stuart et al.(92) has been used in several recent studies integrating spatial and transcriptomic data(46, 47). It relies on identifying anchors that enable projection of MERFISH data onto the UMAP space of a snRNA-Seq dataset and then uses neighboring cells to extrapolate the expression of genes not included in our probe panel. This approach was used to impute Sox6 expression, which accurately reflects what has been reported in prior immunofluorescence and in situ hybridization studies(11, 27, 38, 43, 55). Moreover, imputed gene expression levels correlated strongly with MERFISH detected transcript for most genes further supporting our approach (Fig S6 and S7). Nevertheless, dataset integration has limitations that should be considered. First, imputed gene expression relies on the ability to identify reliable anchors linking the snRNA-Seq and MERFISH datasets. These anchors are determined in part by the choice of genes included on probe panels and thus could indirectly influence the reliability of imputed gene expression. Secondly, gene counts per cell in MERFISH are determined via segmentation of images, which is susceptible to artifacts and bias from centrally versus peripherally localized gene transcripts. In summary, although limitations are present in multi-modal transcriptomic analyses, merging these two approaches provided a molecular and spatial map of the DA system that could not have been resolved by either method alone.”

      (2) In the discussion, the authors argue that the cellular classifications identified here for DA neurons are more likely to reflect discrete cell types than cell states. The rationale for this conclusion is largely based on the absence of subtype differences between wild-type and LRRK2 G2019S transgenic mice. I do not find this argument to be convincing, because it is still possible that certain subdivisions simply reflect dynamic cell states that are also not grossly altered in the mutant mouse. A stronger argument for this claim would be to include trajectory-based analyses that do not show predicted transition points between nearby or related clusters.

      We thank the reviewer for pointing out this particular limitation as differentiating “cell type” and “cell states” been debated in the field for years with no consensus emerging how to address the issue. As suggested, we performed a trajectory analysis using Monocle3 on both control and Lrrk2 samples. We’ve built the trajectory map, taking cluster 20 as the starting node. To avoid potential biased trajectories induced by different cell coverage, we’ve down sampled the Lrrk2 condition to match the number of cells of wildtype. As expected, since most of the DA clusters are not segregated in the UMAP space, the trajectory analysis showed predicted transitions between clusters (see Author response image 1A and 1B). Even though some clusters’ pseudotime score were statistically different between the wildtype and Lrrk2 samples, they overall remained similar (Author response image 1C). This analysis suggests that the LRRK2G2019S mutation induces a mild transcriptional perturbation but does not result in a major cell state drift. Indeed, we believe changes in the observed trajectory path would disappear as the number of cells analyzed increases. Because of this bias introduced by cell coverage, we prefer not to include this trajectory analysis in the manuscript to avoid misleading readers. Thus, as suggested by the reviewer, we softened our claim to “This suggests that our taxonomic scheme is agnostic to a mild perturbation such as LRRK2G2019S, suggesting that our clusters are reflective of cell types, rather than cell states. It is possible that with more severe perturbations, such as a toxin lesion, more substantial alterations of taxonomic schemes are observed(86, 93). However, we expect that for mild insults, day to day behavioral changes, or pharmacological paradigms, our clusters will be resistant to changes, although individual gene levels may vary. Nonetheless, we cannot definitively confirm that a given DA neuron cannot convert from one subtype to another. Ultimately, alternative approaches such as detailed fate mapping of clusters or RNAseq-based trajectory analyses with greater numbers of sampled cells could be used to resolve this question.”.

      Author response image 1.

      A)Trajectory analysis of wildtype and B) LRRK2<sup>G2019S</sup> samples. C) Pseudotime scores for each cluster across wildtype and Lrrk2 conditions. Error bars represent the confidence of error for false positives discovery rate of 5%.

      (3) The relationship between individual samples, GEMwell, and sequenced library should be clarified. If independent samples were combined into one GEMwell, this should be explicitly stated for clarity.

      We have revised the text to better clarify the methodology. In brief, each of our 4 independent samples (2 control, 2 mutants; equal sexes per sample) were isolated from n=2 pooled mice (for a total n=8 mice across the 4 samples). Each sample was processed in its own GEM well to produce 4 distinct libraries that were subsequently sequenced and analyzed as described.

      (4) Please include more details on DEG testing in the manuscript, this is key for interpreting the robustness of certain findings. Ideally, pseudobulked comparisons would be used here (given concerns in the field that DEG testing where N = number of cells artificially inflates the statistical power, violates assumptions of independence, and results in false positive DEGs).

      While we agree that pseudobulk analysis would be ideal for reducing false positives, our study, while exceptionally large in total numbers of DA cells profiled, was generated from 4 total 10X libraries as described above, without any mechanism to definitively demultiplex to the original n=8 source mice. Thus, pseudobulk comparisons would be performed using only n=2 per group, which is below the recommended sample size for these methods. Given this concern, we have moved the volcano plot from Figure 8D to the supplementals and added language to the methods and relevant figure legend acknowledging the limitation in Seurat’s default differential expression analysis methodology.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The innate immune system serves as the first line of defense against invading pathogens. Four major immune-specific modules - the Toll pathway, the Imd pathway, melanization, and phagocytosis- play critical roles in orchestrating the immune response. Traditionally, most studies have focused on the function of individual modules in isolation. However, in recent years, it has become increasingly evident that effective immune defense requires intricate interactions among these pathways. 

      Despite this growing recognition, the precise roles, timing, and interconnections of these immune modules remain poorly understood. Moreover, addressing these questions represents a major scientific undertaking. 

      Strengths: 

      In this manuscript, Ryckebusch et al. systematically evaluate both the individual and combined contributions of these four immune modules to host defense against a range of pathogens. Their findings significantly enhance our understanding of the layered architecture of innate immunity. 

      We thank the reviewer for their kind assessment.

      Weaknesses: 

      While I have no critical concerns regarding the study, I do have several suggestions to offer that may help further strengthen the manuscript. These include: 

      (1) Have the authors validated the efficiency of the mutants used in this study? It would be helpful to include supporting data or references confirming that the mutations effectively disrupted the intended immune pathways. 

      We have done so in Figure 1.

      (2) Given the extensive use of double, triple, and quadruple mutants, a more detailed description of the mutant construction process is warranted. 

      We now provide a supplement (File S1) that details the successive genetic crosses and recombinations that were required to generate these compound fly stocks carrying multiple mutations. We also provide some information regarding rapid screening of stocks for phenotypes. Of note some of these fly stocks have been deposited at VDRC as they will be useful to fly community to assess immune modules in a controlled background, and complete stock information will be tied to these stocks there.

      Reviewer #2 (Public review): 

      Summary: 

      In this work, the authors take a holistic view of Drosophila immunity by selecting four major components of fly immunity often studied separately (Toll signaling, Imd signaling, phagocytosis, and melanization), and studying their combinatory effects on the efficiency of the immune response. They achieve this by using fly lines mutant for one of these components, or modules, as well as for a combination of them, and testing the survival of these flies upon infection with a plethora of pathogens (bacterial, viral, and fungal). 

      Strengths: 

      It is clear that this manuscript has required a large amount of hands-on work, considering the number of pathogens, mutations, and timepoints tested. In my opinion, this work is a very welcome addition to the literature on fly immune responses, which obviously do not occur in one type of response at a time, but in parallel, subsequently, and/or are interconnected. I find that the major strength of this work is the overall concept, which is made possible by the mutations designed to target the specific immune function of each module (at least seemingly) without major effects on other functions. I believe that the combinatory mutants will be of use for the fly community and enable further studies of the interplay of these components of immune response in various settings. 

      To control for the effects arising from the genetic variation other than the intended mutations, the mutants have been backcrossed into a widely used, isogenized Drosophila strain called w1118. Therefore, the differences accounted for by the genotype are controlled. 

      I also appreciate that the authors have investigated the two possible ways of dealing with an infection: tolerance and resistance, and how the modules play into those. 

      We thank the reviewer for their kind assessment. 

      Weaknesses: 

      While controlling for the background effects is vital, the w1118 background is problematic (an issue not limited to this manuscript) because of the wide effects of the white mutation on several phenotypes (also other than eye color/eyesight). It is a possibility that the mutation influences the functionality of the immune response components, for example, via effects of the faulty tryptophan handling on the metabolism of the animal. 

      I acknowledge that it is not reasonable to ask for data in different backgrounds better representing a "wild type" fly (however, that is defined is another question), but I think this matter should be brought up and discussed. 

      We agree with the reviewer and have included caveats on the different genetic effects brought about the combinatory mutant approach including differences in white gene status, insertion of GFP or DsRed markers, and nature of genetic mutations (Line 142-on).

      “Of note, the strains used in this study differ in their presence/absence of the white<sup>+</sup> gene, present in the PPO1<sup>∆</sup>, NimC1<sup>1</sup> and eater<sup>1</sup> mutations.  In addition to its well established function in eye pigmentation, the white gene can also impact host neurology and intestinal stem cell proliferation (Ferreiro et al., 2017; Sasaki et al., 2021). We did not observe any obvious correlations between white<sup>+</sup> gene status and susceptibilities in this study. Moreover,  in a previous study looking at the cumulative effects of AMP mutations on lifespan, white gene status and fluorescent markers did not readily explain differences in longevity (Hanson and Lemaitre, 2023). We therefore believe that the extreme immune susceptibility we have created through deficiencies for pathways regulating hundreds of genes, or major immune modules, overwhelms the potential effects of white<sup>+</sup> and other transgenic markers. For additional information on which stocks bear which markers, see discussion in Supplementary file 1.”

      Of interest, we were highly conscious of this concern in working with combinatory AMP mutants which differed in white, GFP, and DsRed copies. However, even over the many weeks of snowballing effects on microbiota community composition and structure, we found no trends tied strictly to white+ or to other genetic insertions on lifespan (Hanson and Lemaitre, 2023; DMM).

      The whole study has been conducted on male flies. Immune responses show quite extensive sex-specific variation across a variety of species studied, also in the fly. But the reasons for this variation are not fully understood. Therefore, I suggest that the authors conduct a subset of experiments on female flies to see if the findings apply to both sexes, especially the infection-specificity of the module combinations.  

      We thank the reviewer for this suggestion. We have performed the requested experiments, and include female survival trends in Figure 4supp1. We have added the following text to the main manuscript (Line 554):

      “All survival experiments to this point were done with males. We therefore assessed key survival trends for these infections in females to learn whether the dynamics we observed were consistent across sexes (Figure 4supp1). For all three pathogens (Pr rettgeri, Sa aureus, C. albicans) the rank order of susceptibility was broadly similar between males and females, with higher rates of mortality in females overall. Thus, we found no marked sex-bygenotype interaction. Interestingly, the greater susceptibility of females in our hands is true even for ∆ITPM flies, although there are only a few surviving flies on which we can base these conclusions. However, these data may suggest the sexual dimorphism in defense against infection that we see against these pathogens is due to factors independent of the immune modules we disrupted.”

      It is worth noting that male-female sex dichotomies in infection are inconsistent across the literature, with strong lab-specific effects (Belmonte et al., 2020 and personal observation). In our lab setting, we consistently see female mortality higher than males when compared, independent of pathogen and mutant background. We have not seen notable interaction terms of sex and genotype for most immune deficient mutants. It is quite interesting to have done these experiments with ITPM, however, which reveals that there is at least a trend suggesting this dichotomy is independent of the four immune modules we deleted. Still, our infection conditions kill most males, and so it would be good to replicate this sex-specific ∆ITPM result in a dedicated study with doses chosen to improve the resolution of male-female differences. For now, we prefer to use conservative language and avoid overinterpreting this trend, but do feel it merits mentioning.  

      Recommendations for the authors:

      Comment on statistical requests

      Both reviewers requested further clarity on the statistical analyses supplemental to Figure 3. We haved address these comments as follows.

      First, we now provide an additional supplementary .zip file containing summary statistics for all survival data in Figure 3 (Supplementary File 3). We have additionally added this text to line 226 to make this data treatment more clear:

      …” we chose to focus on major differences apparent in summary statistics,Highlighting”…

      And we highlight that all survival data are also provided as Kaplan-Meier survival curves in the main or supplementary figures in Line 233:

      “Kaplan-Meier survival curves for all experiments are provided in the main text or supplementary information”.

      Second, as outlined in the main text, we were unable to sample across all pathogenby-genotype interactions systematically, and this unfortunately obfuscates robust statistical modelling. We addressed the challenge of finding meaningful statistical differences by focusing on trends only if they were i) consistent across experimental replicates, ii) of a consistent logic across comparable genotypes, ensuring random inter-experimental noise was not unduly shaping interpretations, and iii) of a mean lifespan difference ≥1.0 days compared to wild-type, and compared to relevant unchallenged or clean-injury controls. This last choice was especially important because not all experimental replicates included all genotypes due to challenges of animal husbandry and coordination among multiple researchers over five years of data collection. As a result, our initial analyses using a cox mixed-effects model found it to be rather useless, being insensitive to important experiment batch effects visible to the eye because statistically-affected genotypes were not present in all experiments.

      We therefore ensured that behaviour relative to controls within* experiments was consistent, rather than the comparison of genotypes to controls across the sum of experiments with a post-hoc treatment attempting to apportion variance to experiment batch (but unable to do so for some genotypes and some batches). Due to differeces in baseline health and the dynamics explained by studies like Duneau et al. (2017; eLife, there is an expected unequal variance of genotype*pathogen interactions across experiment batches. Unfortunately, this unequal variance, coupled with incomplete sampling across experiment batches, means “highly significant” differences can emerge that don’t hold up to scrutiny of comparisons to controls taken only from within an experiment batch. Thus, we chose to forego a cox mixed effect model approach entirely. Instead, our highly conservative approach, focusing on only very large effects with a mean lifespan difference ≥1.0 days, mitigates these issues. We have taken great care to ensure that any results we highlight stand up to inter-experiment batch effects. We would further draw the reviewers’ attention to our response to Reviewer 2 relating to Figure 3, which emphasizes the level of conservativism that we are applying.

      At the end of the Discussion, we have added the following sentence to emphasize these limitations:

      “…a combinatorial mutation approach to deciphering immune function can be extended even to the broad level of whole immune modules. Of note, we were unable to systematically sample all genotype-bypathogen interactions equally. We have therefore been highly conservative in our reporting of major effects. There are likely many important interactions” not discussed in our study. Future investigations may highlight important biology that is apparent in our data, but which we may not have mentioned here. To this end, we have deposited our isogenic immunity fly stocks in the Vienna Drosophila Resource Centre to facilitate their use. Beyond immunity, our tools can also be of use to study various questions at the cutting edge of aging, memory, neurodegeneration, cancer, and more, where immune genes are repeatedly implicated. We hope that this set of lines will be useful to the community to better characterize the Drosophila host defense.”

      We recognise this response may not fully satisfy the reviewers’ requests. While use of summary statistics is simple, our rules for highlighting interactions of importance are defined, readily understood and interpreted, and draw attention to key trends in that are backed by a solid understanding of the data and its limitations. We have taken this approach out of a responsibility to avoid making spurious assertions that stem from underpowered statistical models rather than from the biology itself.

      Reviewer #1 (Recommendations for the authors): 

      (1) Lines 1092-1093 - Please double-check the labeling of the panels in Figure 2. It appears that panels A and C correspond to single-module mutants, whereas panels B and D refer to compound-module mutants. 

      We have modified Figure 2 and Figure 2supp1 labelling. We also realise there was an error in the column titling that contributed to the confusion. We hope the new layout is clear, and thank the reviewers for noting this issue.

      (2) Lines 347-377 - Figure 2D is not cited in the text. 

      We now cite Fig2D in Line 356.

      (3) P values should be indicated in Figure 2 and Figure 3 for all relevant comparisons. Additionally, "ns" (not significant) should be added in Figure 5A-B. 

      We make the effort to show key uninfected survival trends in Figure 2, and list the total flies (n_flies) in Fig3 to provide the reader with the underlying confidence in the trends observed. We focus on differences of mean lifespan of at least 1 day, and which are consistent in direction across combinatory mutations.  We have avoided the multiple comparisons of cox proportional hazard survival analyses throughout this study because they are overly sensitive for our purposes, as we have previously when systematically comparing many genotypes to each other (see Hanson and Lemaitre, 2023; DMM).

      (4) Minor points: Hml-Gal4, UAS-GFP should be italic; Line 192-- "uL" and "uM"; Line 596: P>.05.

      We have made these changes. We’re unsure what the comment regarding P>.05 referred to, but have removed spaces and made it non-italics. 

      Reviewer #2 (Recommendations for the authors): 

      Statistical analyses and their outcomes are clearly indicated only for the data in Figure 1 and Figure 5 and in the supplement for Figure 1, while they are not reported/not easily accessible for other data. For the main figures, statistics should be indicated in the figure for an easier assessment of the data. In case of multiple comparisons potentially crowding the plots too much, statistics may be in a supplementary file/table. 

      See response above.

      In case of the hemocytes, besides phagocytosis, I would think that ROS generation via the DUOX/NOX system is also an integral part of the immune response against pathogens, and that has not been included here. That might be an interesting addition for future experiments. As the NimC1, eater double mutant flies are said to have fewer hemocytes, it is possible that this function of the hemocytes is affected as well. This could be commented on in the text. 

      The reviewer raises a good point. The role of DUOX and NOX in ROS responses is not assessed in our study. To our knowledge, DUOX and NOX participate primarily in the wound repair response, or in epithelial renewal at damage sites or in the gut. In our study on systemic immunity, we did not assess the role of clotting, the precise function of ROS, and we have missed other host defense or stress response mechanisms as well (e.g. constitutively-expressed AMP-like genes, TEPs, JAK-STAT) that likely play a role in the systemic immune defense. Considering the lethality caused by Nox and Duox mutation, there would be inherent genetic difficulties to recombine these as multiple mutations. Unfortunately, this makes it  difficult to include these processes in our analysis in a systematic manner.  We are already happy to have generated fly lines lacking four immune modules simultaneously, even if they are not fully immune deficient. We have mentioned this point in the discussion (Line 613-on).

      Of note, the NimC1, eater double mutants actually have decreased hemocyte counts at the adult stage (Melcarne et al,. 2019). Thus NimC1, eater double mutants are not impaired only in phagocytosis, but the overall cellular response. We make a point to outline this in Line 225-257, and 607.

      I think it could be mentioned that the melanization response at larval stage (against parasitoids) functions differently from the melanization described here (requiring hemocyte differentiation and PPO3).

      A good point. We have added this mention in Line 97:

      “In addition, a third PPO gene (PPO3) is specifically expressed by lamellocytes, specialized hemocytes that differentiate in larvae responding to and enveloping invading parasites (Dudzic et al., 2015)”.

      Overall, the clarity of the figures and figure legends could be worked on to make them a bit easier to follow. Below are some of my suggestions: 

      (1) In Figure 2, adding headings to parts C & D (similarly to A & B) would make it easier to follow what is happening in the figure at a glance. Also, it is rather difficult to visually follow which strain is which in the plots. I'd suggest adding the key/legend for single mutants below 2A & B, and the key for the double mutants below C & D. If a mutant is present in A & B and in C & D, it could be included in both keys. I also think that it would be intuitive to present the single mutants by dashed lines and double mutants by continuous lines (or vice versa), so that one would easily distinguish between them. Of note, the figure legend says that A & B are single mutants, but for example in B there are also some double mutants (?). 

      We have modified Figure 2 and Figure 2supp1 labelling. We also realise there was an error in the column titling that contributed to the confusion. We hope the new layout is clear, and thank the reviewers for noting this issue.

      (2) In Figure 3, it looks like ΔMel is almost identical to controls in the clean injury survival, but in Figure 2C, it is clearly doing worse. I might be missing something here, but would like the authors to clarify the matter. Also, the meaning of the numbers in the heat map could be explained in the figure legend and/or added to the figure (color key). 

      The reviewer is correct. We thank the reviewer for this astute observation. Inadvertently, we used an old version of the Figure 2 preparation where only a subset of experiments was entered in the Prism data file rather than the total data used to inform Figure 3. This issue affected all genotypes.

      We have reviewed the data in Figure 2, Figure 2supp1, and Figure 3, and updated these figures accordingly to ensure they represent the full survival data. We have also incorporated new experiments into the sum data related to male-female differences and to fill gaps in the data from the 1<sup>st</sup> submission. We will also note due to the nature of 1<sup>st</sup> decimal rounding that the difference between WT and ΔMel appears slightly underrepresented: the true difference (over the 7-day lifespan) is 0.37. We’ve provided a version of this figure rounded to 2 decimal places below, but prefer the simpler 1 decimal place in the main text for readability. The updated Figure 2 shows the full data in Figure 3 accurately.

      We will also take this opportunity to highlight how conservative our ≥1.0 days difference approach is. Breaking down survival curve patterns in Figure 2 relative to mean differences in Figure 3, for clean injury, approximately ~75% of ΔMel flies survive to day 7 with mortality mostly taking place between days 3-7. The result is a mean lifespan of 6.37 days. On a survival curve, this difference appears quite strong, but in our mean lifespan table the difference is rather muted (WT vs. ΔMel difference = 0.37 days). Thus, differences of ≥1.0 days reflect very strong trends in survival data that are near-guaranteed to be independent of experimental noise. While we note issues that prevented us from a fully systematic sampling for all experiments, we are confident that the ≥1.0 day differences we highlight, using the rules explained in the main text, are robust. While this approach could be seen as overly conservative, it is our preference in this initial study, containing combinations of 25 treatments and 14 genotypes, to be highly conservative. Future studies may investigate other strong differences we have not highlighted, and the data we provide here can help generate expectations and guide those studies.

      Author response image 1.

      Figure 3 with 2 decimals places of rounding for mean lifespans. The 7-day clean injury mean lifespan of WT is 6.74 days, and of ΔMel is 6.37 days. Due to rounding, in the 1 decimal Figure 3 this difference appears as if it is only 0.3 days, but it closer to 0.4 days. Regardless, this level of difference, which appears rather clearly in a survival curve, is well below the level of difference we have chosen to highlight in our study.

      (1) Figure 4: I find it very tedious to compare CFUs among different mutants from the plots. As the idea is to compare bacterial loads among the mutants at different timepoints, it would be easier to compare them if the data were shown within a timepoint (CFUs of each mutant at 2h, at 6h, and so on). This is also how the results are written in the text (within a time point). Would it also be clearer if the CFU plots were named, for example: " A', B', and C'"? 

      We appreciate this note. We feel both representations have merits and pitfalls, but prefer our original design showing the progression of bacterial growth within genotype first. However, we have added dotted lines representing the wild-type bacterial loads at 2hpi, 12hpi, and 24hpi to assist the reader in making acrossgenotype comparisons at key time points. Like this, the reader can see if the error bars (StDev) overlap the mean of the wild-type, and so make more intuitive judgements about whether these differences are meaningful.

      (2) Figure 2D is not referred to in the text. 

      We now cite Fig2D in Line 356.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The modeling approaches are very sophisticated, and clearly demonstrate the selective nature of acute ketamine to reduce the impact of trial losses on subsequent performance, relative to neutral or gain outcomes. The authors then, not unreasonably, suggest that this effect is important in the context of the negative bias in interpreting events that is prominent in depression, in that if ketamine reduces the ability of negative outcomes to alter behavior, this may be a mechanism for its rapid acting antidepressant effects.

      However, there is a very strong assumption in this regard, as shown by the first sentence of the discussion which implies this is a systematic study of ketamine's acute antidepressant effects. In actuality, this is a study of the acute effects of ketamine on reinforcement learning (RL) modeled parameters. A primary concern here is that an effect presented as a "robust antidepressant-like behavioral effect" should be more enduring than just an alteration during the acute administration. As it is, the link to an "anti-depressant effect" is based solely on the selective effects on losses. This is not to say this is not an interesting observation, worthy of exploration. It is noted that a similar lack of enduring effects on outcome evaluation is observed in humans, as shown in supplemental fig. S4, but there is not accompanying citation for the human work.

      We agree with the reviewer that the way we linked the study results to ketamine’s antidepressant action can be misleading and based on a rather strong assumption which was not systematically tested in the study. We made the following changes to the manuscript:

      (1) These results constitute a rare report of a robust antidepressant-like behavioral effect produced by therapeutic doses of ketamine during acute phase (<1 hour) after injection (Introduction, 3rd paragraph, line 8-9 in the original manuscript).

      Changed to: These results constitute a rare report of an acute effect of therapeutic dose of ketamine on the processing of affectively negative events during dynamic decision-making.

      (2) We clarified in the Discussion that our study is to gain insights into, but not a systematic investigation of ketamine’s antidepressant action as follows:

      (2.1) A sentence was added (1st paragraph of Discussion): Using a token-based decision task and extensive computational modeling, we examined the behavioral modulation induced by therapeutic doses of ketamine to gain insights into possible early signs of ketamine’s antidepressant activity.

      (2.2) Consistent with the findings from humans, ketamine’s effect on outcome evaluation was acute and did not last over subsequent days (Supplemental Figure S4) (Discussion, 2nd paragraph, line 6-7 in the original manuscript).

      Changed to: While ketamine’s antidepressant effect is reported to be sustained over a week of period (5), ketamine’s effect on outcome evaluation was acute and did not last over subsequent days (Supplemental Figure S4). This discrepancy might be attributable to the possible differences in the state of brain network between healthy subjects and those with depression as well as the type of measures taken to assess ketamine’s effect.

      (2.3) A sentence was added (Discussion, last sentence of the 2nd paragraph) : Nevertheless, systematic studies are required to understand whether the reduced aversiveness to loss in our task might share the same mechanisms that underlie ketamine’s antidepressant action.

      One question that comes to mind in terms of the selectivity observed is whether similar work has been done to examine the acute effects of any other drugs. If ketamine is unique in this regard, that would be quite interesting.

      We think this is an interesting idea. However, comparing ketamine’s effect to that of other drugs is not the scope of the current study. We hope that we will be able to answer this question with future studies.

      Reviewer #2 (Public Review):

      Oemisch and Seo set out to examine the effects of low-dose ketamine on reinforcement learning, with the idea that alterations in reinforcement learning and/or motivation might inform our understanding of what alterations co-occur with potential antidepressant effects. Macaques performed a reinforced/punished matching pennies task while under effects of saline or ketamine administration and the data were fit to a series of reinforcement learning models to determine which model described behavior under saline most closely and then what parameters of this best-fitting model were altered by ketamine. They found a mixed effect, with two out of three macaques primarily exhibiting an effect of ketamine on processing of losses and one out of three macaques exhibiting an effect of ketamine on processing of losses and perseveration. They found that these effects of ketamine appeared to be dissociable from the nystagmus effects of the ketamine.

      The findings are novel and the data suggesting that ketamine is primarily having its effects on processing of losses (under the procedures used) are solid. However, it is unclear whether the connection between processing of losses and the antidepressant effects of ketamine is justified and the current findings may be more useful for those studying reinforcement learning than those studying depression and antidepressant effects. In addition, the co-occurrence of different behavioral procedures with different patterns of ketamine effects, with one macaque tested with different parameters than the other two exhibiting effects of ketamine that were best fit with a different model than the other two macaques, suggests that there may be difficulty in generalizing these findings to reinforcement learning more generally.

      (1) First, the authors should be more explicit and careful in the connection they are trying to make about the link between loss processing and depression. The authors call their effect a "robust antidepressant-like behavioral effect" but there are no references to support this or discussion of how the altered loss processing would relate directly to the antidepressant effects.

      We agree with the reviewer’s point on the way we made the connection between the study results and ketamine’s antidepressant action. This concern overlaps with the reviewer #1’s concern. Please refer to our response 2, 2-1, 2-2 and 2-3.

      (2) It appears that the monkey P was given smaller rewards and punishers than the other two monkeys and this monkey had an effect of ketamine on perseveration that was not observed in the other two monkeys. Is this believed to be due to the different task, or was this animal given a different task because of some behavioral differences that preceded the experiment? The authors should also discuss what these differences may mean for the generality of their findings. For example, might there be some set of parameters where ketamine would only alter perseveration and not processing of losses?

      Although the best-fitting ketamine model for monkey P includes an additional element – perseveration, we believe that monkey P’s baseline behavior and ketamine’s effect are not significantly different from the other two monkeys for the following reasons.

      First, monkey P was the first animal that we tested ketamine’s effect, and therefore we aimed to match the other two monkeys’ baseline behavior similar to monkey P’s behavior in order to reduce variability in ketamine’s effect potentially attributable to the difference in baseline behavior before pharmacological manipulation. We had to adjust the payoff matrix for the subsequent animals (Y and B) because these monkeys were more sensitive to loss, and seldom chose “risky” target (yielding loss). In order to make the other two monkeys’ behavior similar to that of monkey P, we adjusted the asymmetry between the risky and the safe target in the way that loss (neutral) outcome occurred from the safe (risky) target as well. Eventually, this adjustment made the baseline behavior similar across all three monkeys. The goal of the study was to reliably measure the ketamine’s effect, and not to study individual differences that can naturally occur with the same task parameters. Therefore, we believe that the adjustment of payoff matrix helped to reliably detect ketamine’s effect starting from the common baseline behavior.

      Second, the best-fitting model for monkey P (K-model 7) and that for the other two monkeys (K-model 4) make very similar predictions both qualitatively and quantitatively as are seen in the revised Figure 4. The parameters for outcome values estimated from these two models in monkey P are very similar as is seen in the revised Table 3. In addition, the difference in BIC between the model which includes only perseveration modulation (K-model 6) and the model incorporating outcome value modulation as well (K-model 7) is 441, whereas the difference in BIC between K-model 7 and the model that includes only outcome value modulation (K-model 4) is as small as 4. These BIC results indicate that the variability explained by ketamine’s modulation of outcome evaluation is remarkably larger that that explained by its modulation of perseveration in monkey P.

      Therefore, we conclude that ketamine’s effect was not significantly different between monkey P and the other two monkeys. We clarified this in the revised manuscript by adding the following paragraph in the Result section:

      “Unlike monkey Y and B, the best-fitting model for monkey P indicated that ketamine increased overall tendency to switch choice in addition to outcome-dependent modulation of outcome evaluation. However, BIC differed only slightly (dBIC = 3.99) between the best-fitting (K-model 7) and the second-best model (K-model 4) and the model predictions for choice behavior were very similar both qualitatively and quantitatively (Table 3, Figure 4). We conclude that the behavioral effects of ketamine were consistent across all three monkeys.”

      (3) The authors should discuss whether the plasma ketamine levels they observed are similar to those seen with rapid antidepressant ketamine or are higher or lower.

      We added a sentence in the first paragraph of the Result section as follows with a reference.

      “Plasma concentration and its time course over 60 minutes were also comparable to those measured after 0.5mg/kg in human subjects (35).”

      (35) Zarate CA, Brutsche N, Laje G, Luckenbaugh DA, Venkata SLV, Ramamoorthy A, et al (2012): Relationship of ketamine’s plasma metabolites with response, diagnosis, and side effects in major depression. Biol Psychiatry, 72: 331-338.

      (4) For Figure 4 or S3, the authors should show the data fitted to model 7, which was the best for one of the animals.

      We added the parameters and model predictions from both K-model 7 and K-model 4 for monkey P to help comparison between two models in Table 3, and Figure 4. Revised Table 3 and Figure 4 are as follows:

      Author response table 1.

      Maximum likelihood parameter estimates of the best models for saline and ketamine sessions.

      In all three animals, the model incorporating valence-dependent change in outcome evaluation best fit the choice data from ketamine sessions with (K-model 7 in the parenthesis, P) or without (K-model 4, P and Y/B) additional change in the tendency of choice perseveration (Figure 3, Table 3).

      Author response image 1.

      ketamine-induced behavioral modulation simulated with differential forgetting model (for saline session) and best-fitting K-model (for ketamine session).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Public Comments

      (1) BioRxiv version history.

      Reviewer 1 correctly noted that we have posted different versions of the paper on bioRxiv and that there were significant changes between the initial version and the one posted as part of the eLife preprint process. Here we provide a summary of that history.

      We initially posted a bioRxiv preprint in November, 2021 (Version I) that included the results of two experiments. In Experiment 1, we compared conditions in which the stimulation frequency was at 2 kHz, 3.5 kHz, or 5.0 kHz. In Experiment 2, we replicated the 3.5 kHz condition of Experiment 1 and included two amplitude-modulated (AM) conditions, with a 3.5 kHz carrier signal modulated at 20 Hz or 140 Hz. Relative to the sham stimulation, non-modulated kTMP at 2 kHz and 3.5 kHz resulted in an increase in cortical excitability in Experiment 1. This effect was replicated in Experiment 2.

      In the original posting, we reported that there was an additional boost in excitability in the 20 Hz AM condition above that of the non-modulated condition. However, in re-examining the results, we recognized that the 20 Hz AM condition included an outlier that was pulling the group mean higher. We should have caught this outlier in the initial submission given that the resultant percent change for this individual is 3 standard deviations above the mean. Given the skew in the distribution, we also performed a log transform on the MEPs (which improves the normality and homoscedasticity of MEP distributions) and repeated the analysis. However, even here the participant’s results remained well outside the distribution. As such, we removed this participant and repeated all analyses. In this new analysis, there was no longer a significant difference between the 20 Hz AM and non-modulated conditions in Experiment 2. Indeed, all three true stimulation conditions (non-modulated, AM 20 Hz, AM 140 Hz) produced a similar boost in cortical excitability compared to sham. Thus, the results of Experiment 2 are consistent with those of Experiment 1, showing, in three new conditions, the efficacy of kHz stimulation on cortical excitability. But the results fail to provide evidence of an additional boost from amplitude modulation. 

      We posted a second bioRxiv preprint in May, 2023 (Version 2) with the corrected results for Experiment 2, along with changes throughout the manuscript given the new analyses.

      Given the null results for the AM conditions, we decided to run a third experiment prior to submitting the work for publication. Here we used an alternative form of amplitude modulation (see Kasten et. al., NeuroImage 2018). In brief, we again observed a boost in cortical excitability in from non-modulated kTMP at 3.5 kHz, but no additional effect of amplitude modulation.  This work is included in the third bioRrxiv preprint (Version 3), the paper that was submitted and reviewed at eLife.

      (2) Statistical analysis.

      Reviewer 1 raised a concern with the statistical analyses performed on aggregate data across experiments.  We recognize that this is atypical and was certainly not part of an a priori plan. Here we describe our goal with the analyses and the thought process that led us to combine the data across the experiments.

      Our overarching aim is to examine the effect of corticospinal excitability of different kTMP waveforms (carrier frequency and amplitude modulated frequency) matched at the same estimated cortical E-field (2 V/m). Our core comparison was of the active conditions relative to a sham condition (E-field = 0.01 V/m). We included the non-modulated 3.5 kHz condition in Experiments 2 and 3 to provide a baseline from which we could assess whether amplitude modulation produced a measurable difference from that observed with non-modulated stimulation. Thus, this non-modulated condition as well as the sham condition was repeated in all three experiments. This provided an opportunity to examine the effect of kTMP with a relatively large sample, as well as assess how well the effects replicate, and resulted in the strategy we have taken in reporting the results. 

      As a first step, we present the data from the 3.5 kHz non-modulated and sham conditions (including the individual participant data) for all three experiments in   4. We used a linear mixed effect model to examine if there was an effect of Experiment (Exps 1, 2, 3) and observed no significant difference within each condition. Given this, we opted to pool the data for the sham and 3.5 kHz non-modulated conditions across the three experiments. Once data were pooled, we examined the effect of the carrier frequency and amplitude modulated frequency of the kTMP waveform. 

      (3) Carry-over effects

      As suggested by Reviewer 1, we will examine in the revision if there is a carry-over effect across sessions (for the most part, 2-day intervals between sessions). For this, we will compare MEP amplitude in baseline blocks (pre-kTMP) across the four experimental sessions.

      Reviewer 1 also commented that mixing the single- and paired-pulse protocols might have impacted the results. While our a priori focus was on the single-pulse results, we wanted to include multiple probes given the novelty of our stimulation method. Mixing single- and different paired-pulse protocols has been relatively common in the non-invasive brain stimulation literature (e.g., Nitsche 2005, Huang et al, 2005, López-Alonso 2014, Batsikadze et al 2013) and we are unaware of any reports suggested that mixed designs (single and paired) distort the picture compared to pure designs (single only).

      (4) Sensation and Blinding

      Reviewer 2 bought up concerns about the sham condition and blinding of kTMP stimulation. We do think that kTMP is nearly ideal for blinding. The amplifier does emit an audible tone (at least for individuals with normal hearing) when set to an intensity to produce a 2 V/m E-field. For this reason, the participants and the experimenter wore ear plugs. Moreover, we played a 3.5 kHz tone in all conditions, including the sham condition, which effectively masked the amplifier sound. We measured the participant’s subjective rating of annoyance, pain, and muscle twitches after each kTMP session (active and sham). Using a linear mixed effect model, we found no difference between active and sham for each of these ratings suggesting that sensation was similar for active and sham (Fig 8). This matches our experience that kHz stimulation in the range used here has no perceptible sensation induced by the coil. To blind the experimenters (and participants) we used a coding system in which the experimenter typed in a number that had been randomly paired to a stimulation condition that varied across participants in a manner unknown to the experimenter.

      Reviewer 1 asked why we did not explicitly ask participants if they thought they were in an active or sham condition. This would certainly be a useful question. However, we did not want to alert them of the presence of a sham condition, preferring to simply describe the study as one testing a new method of non-invasive brain stimulation. Thus, we opted to focus on their subjective ratings of annoyance, pain, and finger twitches after kTMP stimulation for each experimental session.

      Response to Recommendations for the Authors

      Reviewer #1: 

      Reviewer # 1 in the public review noted the possibility of carry-over effects and suggested that we compare the amplitude of the MEPS in the pre blocks across the four sessions.

      Although we did not anticipate carry-over effects lasting 2 or more days, we have now conducted an analysis in which we use a linear mixed effect model with a fixed factor of Session and a random factor of Participant. The results show that there is not an effect of session [χ2(3) = 4.51, p \= 0.211].

      Author response table 1.

      Detailed comments and some suggestions to maybe improve the writing and figures: 

      Abstract: 

      BioRxiv Version 1: "We replicated this effect in Experiment 2 and found that amplitude-modulation at 20 Hz produced an additional boost in cortical excitability. " 

      BioRxiv Version 2, 3 and current manuscript: "Although amplitude-modulated kTMP increased MEP amplitude compared to sham, no enhancement was found compared to non-modulated kTMP." 

      I am a little concerned about this history because the conclusions seem to have changed. It looks like the new data has a larger number of subjects, which could explain the divergence. Although it is generally not good practice to analyze the data at interim time points, without accounting for alpha spending. It appears that data analysis methods may have also changed, as some of the extreme points in version 1 seem to be no longer in the new manuscript (Figure 4 Sham Experiment 1). 

      In the public review above we explain in detail the different versions of the bioRxiv preprint and how the results changed from the first version to the current manuscript.

      Introduction: <br /> "Second, the E-fields for the two methods exist in orthogonal subspaces" Can you explain what this means? 

      Thank you for this suggestion, we have updated the paper (pg. 4, line 78-81) by adding two sentences to explain what we mean by orthogonal subspaces and describe the consequences of this with respect to the E-fields resulting from tES and TMS. Specifically, we now comment that even if the E-fields of tES and TMS are similar in focality, they may target different populations of neurons.  

      "In addition, the kTMP waveform can be amplitude modulated to potentially mimic E-fields at frequencies matching endogenous neural rhythms [15]." That may be so, but reference [15] makes the exact opposite point, namely, that kHz stimulation has little effect on neuronal firing until you get to very strong fields. The paper that makes that claim is by Nir Grossman, but in my view, it is flawed as responses are most likely due to peripheral nerve (axon) stimulation there given the excessive currents used in that study. The reference to Wang and Peterchev [17] is in agreement with that by showing that you need 2 orders of magnitude stronger fields to activate neurons. 

      The reviewers are correct that that Ref 15 (Esmaeilpour et al, 2021), as well as Wang et al, 2023 use much higher E-fields than we target in our present study. However, our point here is that, while we cannot use our approach to apply E-fields at endogenous frequencies, we can do amplitude modulation of the kHz carrier frequency at these lower frequencies. We cited Esmaeilpour et al., (2021) because they show that high frequency stimulation with amplitude-modulated waveforms resulted in dynamic modulation at the “beating” frequency. Given we are well in subthreshold space in this paper, and well below the E-field levels in Esmaeilpour et al (2021), the open question is whether amplitude modulation at this level will be able to perturb neural activity (e.g., increase power of endogenous oscillations at the targeted frequency). 

      To address this concern, we modified the sentence (pg.6, lines 120-121) to now read "In addition, the kTMP waveform can be amplitude modulated at frequencies matching endogenous neural rhythms." In this way, we are describing a general property of kTMP (as well as other methods that can use high frequency signals).

      I am not aware of any in-vitro study showing the effects of kHz stimulation at 2V/m. The review paper by Neudorfer et al is very good. But if I got it correctly in a quick read it is not clear that there is experimental evidence for subthreshold effects. They do talk about facilitation, but the two experimental papers cited there on the auditory nerve don't quantify field magnitudes. I would really love it if you could point me to a relevant empirical study showing the effects of kHz stimulation at 2 V/m. 

      Perhaps all this is a moot point as you are interested in lasting (plastic) effects on MEP. For this, you cite one study with 11 subjects showing the effects of kHz tACS on MEPs [20]. I guess that is a start. The reference [21] is only a safety study, so it is probably not a good reference for that. Reference [22] also seems out of place as it is a modeling study. The effects on depression of low-intensity magnetic stimulation in references [23-26] are intriguing. 

      We agree with the reviewer that Ref 20 (now Ref 18: Chaieb, Antal & Paulus; 2011) is the most relevant one to cite here since it provides empirical evidence for changes in neural excitability from kHz stimulation, and in fact, serves as the model for the current study. We have retained Refs 23-26 (now Ref 19-22: Rohan et al., 2014; Carlezon et al., 2005; Rohan et al., 2004 & Dublin et al., 2019) since they also do show kHz effects on mood and removed Refs 21 (Chaieb et al., 2014) and 22 (Wang et al., 2018) for the reasons cited by the Reviewer.

      Figure 1: "The gray dashed function depicts the dependence of scalp stimulation threshold upon frequency [14]." It's hard to tell from that reference what the exact shape is, but the frequency dependence is likely steeper than what is shown here, i.e. 2 mA at 10 Hz can be really quite unpleasant. 

      We have removed the gray dashed line given that this might be taken to suggest a discrete transition. We now just have a graded transition to reflect that the tolerance of tES is subjective. We start the shading at 2 mA for the lowest frequencies given that there is general agreement that 2 mA is well-tolerated and decrease the shading intensity as frequency increases. The general aim of the figure is not to make strong claims about the threshold of scalp discomfort for tES, but to show that kTMP can target much higher cortical E-fields within the tolerable range.

      Methods: <br /> Procedures: <br /> It does not seem like double-blinding has been directly assessed. 

      We did not assess double blinding by directly assessing whether the participant was in a sham or active condition. We did not want to alert the participants of the presence of a sham condition after the first session of the 4-session study, preferring to simply describe the study as a test of a new method of non-invasive brain stimulation. For this reason, we opted to focus on their subjective ratings of annoyance, pain, and finger twitches after kTMP stimulation for each experimental session. These ratings did not differ between active and sham kTMP, which suggests kTMP has good potential for double blinding.

      MEP data analysis: Taking the mean of log power is unusual, but I suppose the reference provided gives a good justification. Does this explain the deviation from the biorxiv v1 results? 

      We opted to perform a logarithmic transformation of MEP amplitudes to improve the normality and homoscedasticity of the MEP distribution. We cite three papers (Refs 50-52: Peterchev et al., 2013, Nielsen 1996a, & Nielsen 1996b) that have applied a similar approach in handling MEP data. We had not done the transformation in the first bioRxiv but opted to do so in the eLife submission based on further review of the literature. We note that the two analyses produce similar statistical outcomes once we removed the outlier discussed in the Public Review.

      "Interactions were tested by comparing a model in which the fixed effects were restricted to be additive against a second model that could have multiplicative and additive effects." Not sure what this means. Why not run a full model with interactions included and read off the stats from that single model for the various factors? Should one not avoid running multiple models as one would have to correct p-values for multiple comparisons for every new test? 

      We used the lme4 package in R to fit our linear mixed effect models (Ref 54: Bates, Mächler, Bolker & Walker, 2015). In this package they intentionally leave out p-values for individual models or factors because they note there is a lack of convergence in the field about how to calculate parameter estimates in complex situations for linear mixed effect models (e.g., unbalanced designs). They suggest model comparison using the likelihood-ratio test to obtain and report p-values, which is what we report in the current manuscript.

      We revised the text in the section Linear Mixed Effects Models to state that likelihood ratio tests were used to obtain p-values to remove any confusion.

      Procedures: <br /> kTPM: Nice that fields were measured. Would be nice to see the data that established the empirical constant k. 

      We have expanded our discussion of how we established k in the Methods section. We first derived k using the equation E0 \= kfcI based on previously published reports of the current (I) and frequency (fc) of the MagVenture Cool-B65 coil (now Refs 29-30: Deng, Lisanby & Peterchev, 2013; Drakaki, Mathiesen, Siebner, Madsen & Thielscher, 2022). We then verified this value using the triangular E-field probe to within 5% error.

      Figure 3, spectrum. The placement of the fm label on the left panel is confusing. It suggests that fm was at the edge of the spectrum shown, which would not be the best way to show that there is nothing there - obviously, there isn't, but the figure could be more didactic. 

      Thanks for pointing this out. We modified the figure, moving the ‘fm’ label to the center of the first panel. This change makes it clear that there is no peak at the amplitude modulated frequency.

      "a trio of TMS assays of cortical excitability" Can you clarify what this means? 

      Sorry for the confusion. The trio of TMS assays refers to the single pulse and two paired-pulse protocols (SICI - ICF). We edited the Procedure section to clarify this (pg 9, line 195-197).

      Figure 2A: it would be nice to indicate which TMS blocks were single pulse and which were the two paired-pulse protocols. It is hard to keep track of it all for the three different experiments. 

      We have now clarified in the text (see above) that all three probes were used in each block for Experiments 1 and 2, and only the single-pulse probe in Experiment 3. We have modified the legend for Figure 2 to also provide this information.

      Results: <br /> "Based on these results, we combined the data across the three experiments for these two conditions in subsequent analyses." This strikes me as inappropriate. Should not a single model have been used with a fixed effect of experiment and fixed effect of stimulation condition? 

      We recognize that pooling data across experiments may be atypical. Indeed, our initial plan was to simply analyze each experiment on its own (completely within-subject analysis). However, after completing the three experiments, we realized that since the sham and non-modulated 3.5 kHz conditions were included in each experiment, we had an opportunity to examine the effect of kTMP in a relatively large N study (for NIBS research). Before pooling the data, we wanted to make sure that the factor of experiment did not impact the results and our analysis showed there was no effect of experiment. Note that we did not include the factor of stimulation condition in this model because we did not want to do multiple comparisons of the same contrast (3.5 kHz compared to sham). By pooling the data before analysis of the stimulation conditions we could then focus on our two key independent variables: 1) kTMP carrier frequency and 2) kTMP amplitude modulated frequency, doing fewer significance tests to minimize multiple comparisons. The linear mixed effect (LME) model allows us to include a random effect of participant. In this way, we account for the fact that some comparisons are within subjects and some comparisons are between subjects.

      The reviewer is correct that after pooling the data, we could have continued to include the factor of experiment in the LME models. This factor could still account for variance even though it was not significant in the initial test. Given this, we have now reanalyzed the data including the fixed factor of experiment in all the comparisons that contain data from multiple experiments. This has led us to modify the text in the Methods section under Linear Mixed Effects Models and in the Results section under Repeated kTMP Conditions (3.5 kHz and Sham) across Experiments. In addition, the results of the LME models have been updated throughout the Results section. We note that the pattern of results was unchanged with this modification of our analyses.

      "Pairwise comparisons of each active condition to sham showed that an increase was observed following both 2 kHz ..." I suppose this is all for Experiment 1? It is a little confusing to go back and forth between combining experiments and then separate analyses per experiment without some guiding text, aside from being a bit messy from the statistical point of view. 

      We did not go back to performing separate analyses of the experiments after pooling the data. Once we ran the test to justify pooling the data, subsequent tests were done with the pooled data to evaluate the effects of carrier frequency and amplitude modulation.

      Figure 5 is confusing because the horizontal lines with ** on top seem to refer to the same set of sham subjects, but the subjects of Experiments 2 and 3 are different from Experiment 1, so in these pairwise comparisons there is a mix of between-subject and within subject-comparison going on here. Did I get that right? 

      Yes – that is correct. As noted above we pooled the data after showing that there was no effect of experiment. Thus, the data for the sham and 3.5 kHz non-modulated conditions are from three different experiments. There was some overlap of subjects in Experiments 1 and Experiment 2 (Experiment 3 was all new participants).  We used a linear mixed effect model so that we could account for this mixed design. Participant was always included as a random factor, which allows us to account for the fact that some comparisons are within, and some are between. Based on a previous comment, we now include Experiment as a fixed factor (see above) which provides a way to evaluate variance across the different experiments.

      "We next compared sham vs. active non-modulated kTMP and found that active kTMP produced a significant increase in corticospinal excitability [χ2(1) = 23.46 p < 0.001" Is this for the 3.5Hz condition? 

      No, that is for an omnibus comparison of non-modulated kTMP (including 2 kHz, 3.5 kHz and 5 kHz conditions) vs. sham. We have edited the paper to include the three conditions that are included as the active non-modulated kTMP conditions for clarity (pg. 22, line 463). Having observed a significant omnibus result, we continued with paired comparisons: “Pairwise comparisons of each active condition to sham showed that an increase was observed following both 2 kHz [χ2(1) = 6.90, p = 0.009; d = 0.49] and 3.5 kHz kTMP [χ2(1) = 37.75, p < 0.001; d = 0.70; Fig 5: Non-Modulated conditions]. The 5 kHz condition failed to reach significance [χ2(1) = 1.43, p = 0.232; d = 0.21].”

      Paired-Pulse Assays: There are a number of results here without pointing to a figure, and at one point there is a reference to Figure 6, which may be in error. It would help to point the reader to some visual corresponding the the stats. 

      Thank you. This was an error on line 542. It should have read Figure 7. We have added two other pointers to Figure 7 where we discuss the absence of an effect of kTMP on SICI.

      Reviewer #2 (Recommendations For The Authors):

      I would recommend a couple of changes to the background.

      "Orthogonal subspaces" line 78. This is a fairly formal term that has little relevance here, although the difference between scalar and vector potential-based fields is interesting to think about. If it stays, it should be mathematically supported, but it's easily rewritten to deliver the gist of it. 

      We have updated the paper by adding text that we hope will clarify what we mean by orthogonal subspaces (pg. 4, line 78-81). We note that we developed the math behind this statement in a previous paper (Ref # 10: Sheltraw et al., 2021). We have changed the location of the citation so that it directly follows these sentences and will provide a pointer to readers interested in the physics and math concerning orthogonal subspaces. 

      The statement that the scalp e-field for TES is greater than the e-field for TMS for similar cortical fields needs a little more clarification, since historically they have operated orders of magnitude apart, and it is easy to misread and trip over this statement (although it is factually true). Presenting a couple of numbers at cortical and scalp positions would help illustrate the point. That you are not considering applying TES at traditional TMS levels but rather TMS at TES values is what is initially easy to miss. 

      We appreciate the feedback and have updated this section to provide the reader with a better intuition of this point. We now specify that the scalp to cortical E-field ratio is approximately 18 times larger for tES compared to TMS and cite our previous paper which has much more detail about how this was calculated.

      A note that the figures show scalp sensation around 1.0 V/m while the text states 0.5; cortical depths are an important thing for the reader to keep in mind. 

      This comment, when considered in tandem with one of the comments of Reviewer 1 led us to revise Figure 1. We removed the dashed gray line which might be taken to suggest a strict cutoff in terms of tolerability (which we did not intend). We now use shading that fades away to make the point of continuity. We have extended this down to a cortical E-field of 0.5 V/m to correspond with the text.  

      This is a nicely done and carefully reported experiment and I look forward to seeing more. 

      Thank you for your kind note!

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Summary:

      In the present study, authors found the ternary complex formed by NCAN, TNC, and HA as an important factor facilitating the multipolar to bipolar transition in the intermediate zone (IZ) of the developing cortex. NCAM binds HA via the N-terminal Link modules, meanwhile, TNC cross-links NCAN through the CDL domain at the C-terminal. The expression and right localization of these three factors facilitate the multipolar-bipolar transition necessary for immature neurons to migrate radially. TNC and NCAM are also involved in neuronal morphology. The authors used a wide range of techniques to study the interaction between these three molecules in the developing cortex. In addition, single and double KO mice for NCAN and TNC were analyzed to decipher the role of these molecules in neuronal migration and morphology.

      Strengths:

      The study of the formation of the cerebral cortex is crucial to understanding the pathophysiology of many neurodevelopmental disorders associated with malformation of the cerebral cortex. In this study, the authors showed, for the first time, that the ternary complex formed by NCAN, TNC, and HA promotes neuronal migration. The results regarding the interaction between the three factors forming the ternary complex are convincing.

      We appreciate the reviewers' positive assessment of our research.

      Weaknesses:

      However, regarding the in vivo experiments, the authors should consider some points for the interpretation of the results:

      • The authors did not use the proper controls in their experiments. For embryonic analysis, such as cortical migration, neuronal morphology, and protein distribution (Fig. 6, 7, and 9), mutant mice should be compared with control littermates, since differences in the results could be due to differences in embryonic stages. For example, in Fig. 6 the dKO is more developed than the WT embryo.

      It was challenging to compare double knockout mice with control littermates. When crossing Ncan and Tcn double heterozygous mice, the probability of obtaining double knockout mice is 1/16. Given an average litter size of around 8, acquiring a substantial number of double knockout mice would necessitate an impractical number of breeding pairs. Consequently, we were constrained to use non-littermate control mice. To address potential differences in developmental stages, we analyzed 19-20 embryos obtained from five individuals in each group, demonstrating that the observed differences between the two groups are more substantial than the inherent variability within each group.

      • The authors claim that NCAM and TNC are involved in neuronal migration from experiments using single KO embryos. This is a strong statement considering the mild results, with no significant difference in the case of TNC KO embryos, and once again, using embryos from different litters.

      We agree with the reviewer's comment that a single deletion of TNC has a minimal impact on neuronal migration. We have revised the Results section to reflect the mild nature of the TNC KO phenotype more accurately.

      Page 8, line 225: "In NCAN KO mice, a significantly lower percentage of labeled cells resided in the upper layer (Bin2), and more cells remained in the lower layer (Bin5) than in WT mice (Figure 7a). In contrast, the impact of a single deletion of TNC on neuronal cell migration was minimal. Although TNC KO mice exhibited a tendency to have a higher proportion of labeled cells in the lower layer (Bin4) than in WT mice, this did not reach statistical significance (Figure 7a). The delay in neuronal migration observed in the single KO mice was milder when compared to that observed in DKO mice (Figure 6a-c), suggesting that simultaneous deletion of both NCAN and TNC is necessary for a more pronounced impairment in neuronal cell migration."

      • The measurement of immunofluorescence intensity is not the right method to compare the relative amount of protein between control and mutant embryos unless there is a right normalization.

      We agree that measuring immunofluorescence intensity alone is insufficient for comparing the relative amount of protein. In Figure 8, we have employed Western blotting to compare the protein levels, revealing an approximately 50% reduction in NCAN and TNC following hyaluronidase digestion. In Figures 7b and 7c, we demonstrated alterations in the localization patterns of TNC and NCAN in Ncan KO and Tnc KO mice; however, we did not mention their quantity.

      • Page 7, line 206. "No significant abnormalities were observed in the laminar structure in 4-week-old DKO mice". The authors should be more careful with this statement since they did not check the lamination of the adult cortex. I would recommend staining, control and mutant mice, with markers of different cortical populations, such as Cux1, Ctip2, Tbr1, to asses this point.

      In response to the suggestion, we have conducted additional experiments to provide a more detailed examination of the laminar structure in the cerebral cortex. The results have been incorporated into the revised manuscript as follows:

      Page 7, line 209: "To investigate the laminar organization of the postnatal cerebral cortex, we analyzed the distribution of NeuN-positive postmitotic neurons in DKO mice at 2 weeks of age. No notable abnormalities were observed in the laminar structure of DKO mice (Figure 6-figure supplement 3a, b). Additionally, the laminar distribution of Ctip2-positive deep layer neurons showed no significant differences between WT and DKO mice (Figure 6-figure supplement 3a, c)."

      • The authors do not explain how they measured the intensity of TNC around the transfected Turbo-RFP-positive neurons.

      We added the following description to the Materials and Methods:

      Page 18, line 608: "Images were captured in the IZ region containing Turbo-RFP-positive neurons using a 100X magnification objective lens with 3.0X optical zoom on an AX R confocal microscope (Nikon). A total of 10 optical sections were acquired with a step size of 190 nm. Z-projection views were generated, and the staining intensity of TNC around Turbo-RFP-positive neurons was measured in a 59 × 59 µm area using ImageJ FIJI."

      • The loading control of the western blots should be always included.

      In Figure 6-figure supplement 1, we have incorporated western blot data using a GAPDH antibody as a loading control. We have added an explanation in the figure legend of Figure 3c, stating that we analyzed the same samples as those used in Figure 1e.

      • For Fig. 3e, I think values are represented relative to E18 instead to P2.

      Thank you for pointing that out. As suggested, we have corrected the representation in Fig. 3e to be relative to E18 instead of P2.

      • I would recommend authors use the standard nomenclature for the embryonic stages. The detection of the vaginal plug is considered as E0.5 and therefore, half a day should be added to embryonic stages (E14.5...).

      We have revised our manuscript to designate the detection of the vaginal plug as E0.5, and subsequently, we have adjusted all embryonic stages by adding half a day, such as E14.5.

      • Fig 10K: I do not see the differences in the number of neurites in the graph.

      We have modified the presentation from a box-and-whisker plot to a bar graph to enhance the visibility of differences in the average number of neurites.

      • Line 37: Not all of the cerebral cortex is structured in 6 layers but the neocortex.

      We have changed 'cerebral cortex' to 'cerebral neocortex.'

      Reviewer 2

      Summary:

      ECM components are prominent constituents of the pericellular environment of CNS cells and form complex and dynamic interactomes in the pericellular spaces. Based on bioinformatic analysis, more than 300 genes have been attributed to the so-called matrisome, many of which are detectable in the CNS. Yet, not much is known about their functions while increasing evidence suggests important contributions to developmental processes, neural plasticity, and inhibition of regeneration in the CNS. In this respect, the present work offers new insights and adds interesting aspects to the facets of ECM contributions to neural development. This is even more relevant in view of the fact that neurocan has recently been identified as a potential risk gene for neuropsychiatric diseases. Because ECM components occur in the interstitial space and are linked in interactomes their study is very difficult. A strength of the manuscript is that the authors used several approaches to shed light on ECM function, including proteome studies, the generation of knockout mouse lines, and the analysis of in vivo labeled neural progenitors. This multi-perspective approach permitted to reveal hitherto unknown properties of the ECM and highlighted its importance for the overall organization of the CNS.

      Strengths:

      Systematic analysis of the ternary complex between neurons, TNC, and hyaluronic acid; establishment of KO mouse lines to study the function of the complex, use of in utero electroporation to investigate the impact on neuronal migration;

      We appreciate the reviewers' insightful comments.

      Weaknesses:

      The analysis is focused on neuronal progenitors, however, the potential impact of the molecules of interest, in particular, their removal on differentiation and /or survival of neural stem/progenitor cells is not addressed. The potential receptors involved are not considered. It also seems that rather the passage to the outer areas of the forming cortex is compromised, which is not the same as the migration process. The movement of the cells is not included in the analysis.

      In this study, we demonstrated that the ternary complex of NCAN, TNC, and HA is predominantly localized in the subplate/intermediate zone. This region lacks neural stem/progenitor cells but serves as the initiation site for the radial migration of postmitotic neurons. Consequently, our study focused on the role of the ternary complex in neuronal migration and polarity formation. We acknowledge that we did not investigate in-depth the potential effects of ECM perturbation on the differentiation and survival of neural stem/progenitor cells. However, as highlighted by the reviewer, it is important to explore the effects on neural stem/progenitor cells. To address this concern, we analyzed Pax6-positive radial glial cells and Tbr2-positive intermediate progenitor cells in the ventricular zone of wild-type and Ncan/Tnc double knockout (DKO) mice. Immunohistochemical analysis revealed no significant differences between WT and DKO mice (Figure 6-figure supplement 4a). Furthermore, the morphology of nestin-positive radial fibers exhibited no distinguishable variations between WT and DKO mice (Figure 6-figure supplement 4b, c).

      (1) In the description of the culture of cortical neurons the authors mentioned the use of 5% horse serum as a medium constituent. HS is a potent stimulus for astrocyte differentiation and astrocytes in vitro release neurocan. Therefore, the detection of neurocan in the supernatant of the cultures as shown in Figure 1h might as well reflect release by cultivated astrocytes.

      As pointed out by the reviewer, Figure 1h did not conclusively demonstrate that neurons are the sole source of NCAN production. Indeed, in situ hybridization analysis revealed the widespread distribution of Ncan mRNA throughout the cerebral cortex (Figure 2a). This result suggests that the production of NCAN involves not only neurons but also other cell populations, including radial glial cells and astrocytes. While we acknowledge the potential contribution of other cell types to NCAN production, Ncan expression by neurons during radial migration is a crucial aspect of our findings (Figure 1i, j). We have revised the manuscript as follows:

      Page 5, line 111: "This result suggested the secretion of NCAN by developing neurons; however, we cannot rule out the involvement of coexisting glial cells in the culture system. To investigate the expression of Ncan mRNA during radial migration in vivo, we labeled radial glial cells in the VZ with GFP through in utero electroporation at E14.5 (Figure 1i, Figure 1-figure supplement 1)."

      (2) It is known that neurocan in vivo is expressed by neurons, but may be upregulated in astrocytes after lesion, or in vitro, where the cells become reactive.

      We have incorporated the following description into the discussion:

      Page 11, line 359: "Previous studies have reported an upregulation of NCAN and TNC in reactive astrocytes, indicating the potential formation of the ternary complex of NCAN, TNC, and HA in the adult brain in response to injury (Deller et al., 1997; Haas et al., 1999)."

      (3) Do NCAN KO neurons show an increase in neurite growth on the TNC substrates? The response on POL was changed (Fig. 10h-k), but the ECM substrates were not tested with the KO neurons.

      The impact of ECM substrates on NCAN KO neurons has not been investigated, and this remains an avenue for further exploration in our ongoing research. Future studies aim to elucidate the NCAN-TNC connection by identifying TNC cell surface receptors and unraveling the subsequent intracellular signaling pathways.

      (4) Do the authors have an explanation for why the ternary complex is concentrated in the SP/IZ zone?

      In the mature brain, hyaluronan acts as a scaffold that facilitates the accumulation of ECM components, including proteoglycans and tenascins around neurons. Therefore, it is conceivable that the ECM components bind to hyaluronan in the embryonic brain, resulting in its accumulation in the subplate/intermediate zone. In support of this hypothesis, enzymatic digestion of hyaluronan in the subplate/intermediate zone led to the disappearance of TNC and NCAN accumulation (Figure 8a-c). This result may account for the disparity observed, where Tnc mRNA is expressed in the ventricular zone while the TNC protein localizes to the subplate/intermediate zone.

      (5) Are hyaluronic acid synthesizing complexes (HAS) concentrated in the SP/IZ?

      According to the reviewer's comment, we have investigated the localization of Has2 and Has3 mRNA using in situ hybridization. However, due to the relatively low expression levels of these enzymes, we encountered challenges in obtaining clear signals (Author response image 1). Further research is needed to understand the mechanisms behind the localization of hyaluronan in the intermediate zone.

      Author response image 1.

      In situ hybridization analysis of Has2 and 3 mRNA on the E16.5 cerebral cortex. Upper images show results of in situ hybridization using antisense against Has2 and 3. Lower images are in situ hybridization using sense probes as negative controls.

      (6) CSPGs as well as TNC are part of the neural stem/progenitors cell niche environment. Does the removal of either of the ECM compounds affect the proliferation, differentiation, and/or survival of NSPCs, or their progeny?

      )7) This question relates to the fact that the migration process itself is not visualized in the present study, rather its outcome - the quantitative distribution of labeled neurons in the different bins of the analysis. This could also derive from modified cell numbers.

      As pointed out by the reviewer, previous studies have shown the role of CSPGs and TNC as components of the neural stem/progenitor cell niche (see reviews by (Faissner et al., 2017; Faissner and Reinhard, 2015). However, as mentioned in Response #2, based on our analyses, we did not observe a reduction in neural stem/progenitor cells in NCAN/TNC double-knockout mice. While we cannot precisely explain this discrepancy, it is worth noting that many past studies evaluated the activities of the ECM molecules in in vitro systems such as neurospheres. The observed differences may stem from variations in experimental systems.

      (8) What is the role of the ECM in the SP/IZ area? Do the cells need the ECM to advance, the reduction would then leave the neuronal progenitors in the VZ area? This somehow contrasts with interpretations that the ECM acts as an obstacle for neurite growth or cell migration, or as a kind of barrier.

      The role of the ECM is multifaceted, with certain ECM molecules known to inhibit neurite outgrowth while others facilitate it. Additionally, the effects of ECM can vary depending on the cell type. It is established that after migrating neurons adhere to radial fibers, they utilize these fibers as a scaffold to migrate toward the cortical surface. However, in the subplate/intermediate zone, migrating neurons have not yet adhered to radial fibers. This study provides evidence that multipolar neurons undergo morphological changes into bipolar cells with the assistance of the NCAN, TNC, and HA complex. Subsequently, this facilitates their movement along radial fibers.

      (9) A direct visualization of the movement of neural progenitors in the tissue as has been for example performed by the Kriegstein laboratory might help resolve some of these issues.

      As suggested by the reviewer, utilizing live imaging techniques to directly observe the movement of neural progenitors within the tissue is indeed a powerful tool. We recognize the significance of addressing these points in future research.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zhang et al., investigated the relationship between monocular and binocular responses of V1 superficial-layer neurons using two-photon calcium imaging. They found a strong relationship in their data: neurons that exhibited a greater preference for one eye or the other (high ocular dominance) were more likely to be suppressed under binocular stimulation, whereas neurons that are more equivalently driven by each other (low ocular dominance) were more likely to be enhanced by binocular stimulation. This result chiefly demonstrates the relationship between ocular dominance and binocular responses in V1, corroborating what has been shown previously using electrophysiological techniques but now with greater spatial resolution (albeit less temporal resolution). The binocular responses were well-fitted by a model that institutes divisive normalization between the eyes that accounts for both the suppression and enhancement phenomena observed in the subpopulation of binocular neurons. In so doing, the authors reify the importance of incorporating ocular dominance in computational models of binocular combination.

      The conclusions of this paper are mostly well supported by the data, but there are some limitations of the methodology that need to be clarified, and an expansion of how the results relate to previous work would better contextualize these important findings in the literature.

      Strengths:

      The two-photon imaging technique used to resolve the activity of individual neurons within intact brain tissue grants a host of advantages. Foremost, two-photon imaging confers considerably high spatial resolution. As a result, the authors were able to sample and analyze the activity from thousands of verified superficial-layer V1 neurons. The animal model used, awake macaques, is also highly relevant for the study of binocular combination. Macaques, like humans, are binocular animals, meaning they have forward-facing eyes that confer overlapping visual fields. Importantly, macaque V1 is organized into cortical columns that process specific visual features from the separate eyes just like in humans. In combination with a powerful imaging technique, this allowed the authors to evaluate the monocular and binocular response profiles of V1 neurons that are situated within neighboring ocular dominance columns, a novel feat. To this aim, the approach was well-executed and should instill further confidence in the notion that V1 neurons combine monocular information in a manner that is dependent on the strength of their ocular dominance.

      Weaknesses:

      While two-photon imaging provides excellent spatial resolution, its temporal resolution is often lower compared to some other techniques, such as electrophysiology. This limits the ability to study the fast dynamics of neuronal activity, a well-understood trade-off of the method. The issue is more so that the authors draw comparisons to electrophysiological studies without explicit appreciation of the temporal difference between these techniques. In a similar vein, two-photon imaging is limited spatially in terms of cortical depth, preferentially sampling from neurons in layers 2/3. This limitation does not invalidate any of the interpretations but should be considered by readers, especially when making comparisons to previous electrophysiological reports using microelectrode linear arrays that sample from all cortical layers. Indeed, it is likely that a complete picture of early cortical binocular processing will require high spatial resolution (i.e., sampling from neurons in neighboring ocular dominance columns, from pia mater to white matter) at the biophysically relevant timescales (1ms resolution, capturing response dynamics over the full duration of the stimulus presentation, including the transient onset and steady-state periods).

      To address the same concern from all three reviewers, we discussed the technical limitations of two photon calcium imaging at the end of Discussion, including limited imaging depth, low temporal resolution, and nonlinearity. The relevant texts are copied here:

      (Ln 304) “Limitations of the current study

      Although capable of sampling a large number of neurons at cellular resolution and with low sampling bias, two-photon calcium imaging has its known limitations that may better make it a complementary research tool to electrophysiological recordings.

      For example, two-photon imaging can only sample neurons from superficial-layers, while binocular neurons also exist in deeper layers, and even neurons in the input layer are affected by feedback from downstream binocular neurons to exhibit binocular response properties (Dougherty, Cox, Westerberg, & Maier, 2019). Furthermore, calcium signals are relatively slow and cannot reveal the fast dynamics of neuronal responses. Due to these spatial and temporal limitations, a more complete picture of the neuronal mechanisms underlying binocular combination of monocular responses may come from studies using both technologies.

      In addition, calcium signals may exaggerate the nonlinear properties of neurons. Although calcium signals indicated by GCaMP5, our favored choice of calcium indicator, displays a linear relationship to neuronal spike rates within a range of 10-150 Hz (Li, Liu, Jiang, Lee, & Tang, 2017), weak and strong signals out of this range are more nonlinear, and may appear poorer and stronger, respectively, than electrode-recorded effects. Consequently, the differences in population responses between monocular and binocular stimulations revealed by this study might be less pronounced.”

      (Recommendations For The Authors):

      Overall, my main suggestion for the authors to improve the paper is to revise some of the interpretations of their results in relation to previous research. The purpose of the present study was to illustrate a more complete picture of the binocular combination of monocular responses by taking into consideration the ocular dominance of V1 cells (lines 34-36). A study published earlier this year had an identical purpose (Mitchell et al., Current Biology, 2023) and arrived at a highly similar conclusion (and also applied divisive normalization to fit their data). I would ask that this paper be mentioned in the introduction and discussed.

      The Mitchell et al 2023 paper is added to the Introduction and Discussion:

      (Ln 50) “In addition (to the Dougherty et al 2019 paper from the same group), Mitchell, Carlson, Westerberg, Cox, and Maier (2023) reported that binocular combination of monocular stimuli with different contrasts is also affected by neurons’ eye preference.”

      (Ln 286) “The critical roles of ocular dominance have been largely overlooked by extant binocular vision models to our knowledge, except that Anderson and Movshon (1989) demonstrated that a model consisting of multiple ocular dominance channels can better explain their psychophysical adaptation data, and that Mitchell et al. (2023) revealed that binocular combination of different contrasts presented to different eyes are affected by neurons’ ocularity preference.”

      Nevertheless, the results of the present study are very valuable. They add substantial spatial resolution and sophisticated relational analysis of monocular and binocular responses that Mitchell et al., 2023 did not include. Therefore, my suggestion is to emphasize the advantages of two-photon imaging in the introduction, focusing on the ability to image neurons in neighboring ocular dominance columns. The rigorous modeling of the relationship between nearby neurons with a range of eye preferences, in tandem with the incredible yield of two-photon imaging, is what sets this paper apart from previous electrophysiological work.

      The finding that binocular responses were dependent on ocular dominance is largely consistent with previous electrophysiological results. However, there should be a paragraph in the discussion section that speaks to the limitations of comparing two-photon imaging data to electrophysiological data. Namely, there are two limitations:

      (1) These two techniques confer different temporal resolutions. It is conceivable that some of the electrophysiology relationships (for example, described by Dougherty et al., 2019) may be dependent on the temporal window over which the data was averaged, typically over 50-100ms around stimulus onset, or 100-250ms comprising the neurons' sustained response to the stimulus. This possible explanation of the difference in obtained results would be especially useful for the discussion paragraph starting at line 232. It would also be helpful to readers for there to be some mention of the advantage of having high temporal resolution (i.e., the benefits of electrophysiology) since (a) recent work has distinguished between sequential stages of binocular combination (Cox et al., 2019) and (b) modern models of V1 neurons emphasize recurrent feedback to explain V1 temporal dynamics (see Heeger et al., 2019; Rubin et al., 2015), which could prove to be relevant for combination of stimuli in the two eyes (Fleet et al., 1997).

      Our discussion regarding the technical limitations of 2-p calcium imaging has been listed earlier. Specific to the Dougherty et 2019 paper, we added the following discussion to address the issue of temporal resolution difference between two technologies.

      (Ln 266) “In addition, it is unclear whether the discrepancies are caused by different temporal resolutions of electrode recording and calcium imaging. The results of Dougherty et al. (2019) represent changes of neuronal spike activities over a period of approximately 50-200 ms after the stimulus onset, which may reflect the sustained neuronal responses to the stimulus and possible feedback signals. Calcium signals are much slower and indicative of the aggregated neuronal responses over a longer period (up to 1000 ms in the current study). They should have smeared, rather than exaggerated, the differences between monocular and binocular responses, although we cannot exclude the possibility that some neuronal response changes beyond 200 ms are responsible for the discrepancies.”

      (2) The sample of V1 neurons in this study is limited to cells in the most superficial layers of the cortex (layers 2/3). This limitation is, of course, well understood, but it should be mentioned at least in the context of studying the formative mechanisms of binocular combination in V1 (since we know that binocular neurons also exist in layers 5/6, and there is now substantial evidence that even layer 4 neurons are not as "monocular" as we previously thought (Dougherty et al., 2019)).

      See our discussion regarding the technical limitations of 2-p calcium imaging listed earlier.

      In short, I believe the paper would be improved by (1) adding the above citations in the appropriate places, (2) acknowledging in the introduction that this question has been investigated electrophysiologically but emphasizing the advantages of two-photon imaging, and (3) adding a paragraph to the discussion section that discusses the temporal and spatial limitations when using two-photon imaging to study binocular combination, particularly when comparing the results to electrophysiology.

      Reviewer #2 (Public Review):

      Summary:

      This study examines the pattern of responses produced by the combination of left-eye and right-eye signals in V1. For this, they used calcium imaging of neurons in V1 of awake, fixating monkeys. They take advantage of calcium imaging, which yields large populations of neurons in each field of view. With their data set, they observe how response magnitude relates to ocular dominance across the entire population. They analyze carefully how the relationship changed as the visual stimulus switched from contra-eye only, ipsi-eye only, and binocular. As expected, the contra-eye-dominated neurons responded strongly with a contra-eye-only stimulus. The ipsi-eye-dominated neurons responded strongly with an ipsi-eye-only stimulus. The surprise was responses to a binocular stimulus. The responses were similarly weak across the entire population, regardless of each neuron's ocular dominance. They conclude that this pattern of responses could be explained by interocular divisive normalization, followed by binocular summation.

      Strengths:

      A major strength of this work is that the model-fitting was done on a large population of simultaneously recorded neurons. This approach is an advancement over previous work, which did model-fitting on individual neurons. The fitted model in the manuscript represents the pattern observed across the large population in V1, and washes out any particular property of individual neurons. Given the large neuronal population from which the conclusion was drawn, the authors provide solid evidence supporting their conclusion. They also observed consistency across 5 fields of view.

      The experiments were designed and executed appropriately to test their hypothesis. Their data support their conclusion.

      Weaknesses:

      One weakness of their study is that calcium signals can exaggerate the nonlinear properties of neurons. Calcium imaging renders poor responses poorer and strong responses stronger, compared to single-unit recording. In particular, the dramatic change in the population response between monocular stimulation and binocular stimulation could actually be less pronounced when measured with single-unit recording methods. This means their choice of recording method could have accidentally exaggerated the evidence of their finding.

      We discussed the nonlinearity of calcium signals as part of the technical limitations of 2-p imaging calcium. The calcium indicator we use, GCaMP5, has a reasonable range of linear relationship with spike rates. But out of this range, the nonlinearity is indeed a concern.

      (Ln 314) “In addition, calcium signals may exaggerate the nonlinear properties of neurons. Although signals indicated by GCaMP5, our favored choice of calcium indicator, displays a linear relationship to neuronal spike rate within a range of 10-150 Hz (Li et al., 2017), weak and strong signals out of this range are more nonlinear, and may appear poorer and stronger, respectively, than electrode-recorded effects. Consequently, the changes in population responses between monocular and binocular stimulations revealed by this study might be less pronounced.”

      The implication of their finding is that strong ocular dominance is the result of release from interocular suppression by a monocular stimulus, rather than the lack of binocular combination as many traditional studies have assumed. This could significantly advance our understanding of the binocular combination circuitry of V1. The entire population of neurons could be part of a binocular combination circuitry present in V1.

      This is a very good insight. We added the following sentences to the end of the first paragraph of Discussion:

      (Ln 242) “These findings implicate that at least for neurons in superficial layers of V1, significant ocular dominance may result from a release of interocular suppression during monocular stimulation, an unusual viewing condition as our vision is typically binocular, rather than a lack of binocular combination of inputs from upstream monocular neurons.”

      (Recommendations For The Authors):

      Line 150: "To model interocular response suppression, responses from each eye in Eq. 2 were further normalized by an interocular suppression factor wib or wcb," I recommend the authors improve their explanation of how they arrived at Eq. 3 from Eq. 2. As it stands, my impression is that they have one model for the responses to monocular stimulation, and another model for the responses to binocular stimulation. What I think is missing is that both equations are derived from the same model. Monocular stimulation is a situation in which the stimulus in one eye's contrast is zero. Could the authors clarify whether this situation produces an interocular suppression of zero, and how that leads to Eq. 2?

      We rewrote the modeling part to show that Equations 1-3 are sequential steps of development for the same model. We also added a brief paragraph to discuss how Eq. 3 could lead to Eq. 2 under monocular viewing:

      (Ln 166) “Although not shown in Eq. 3, we also assumed that the nonlinear exponent b also depends on the contrast of the stimulus presented to the other eye (i.e., Sc or Si). Consequently, when Sc or Si = 0 under monocular stimulation, Rc or Ri = 0 (Eq. 1), and interocular suppression wib or wcb = 1, so Eq. 3 changes back to Eq. 2. It is only when Sc and Si are equal and close to 1, as in the current study, that interocular suppression and binocular combination would be in the current Eq. 3 format.”

      Line 225: "However, individually, compared to monocular responses, responses of monocular neurons more preferring the stimulated eye are actually suppressed, and only responses of binocular neurons are increased by binocular stimulation." This sentence is difficult to follow. I recommend the authors improve clarity by breaking up the sentence into several sentences. If I understand correctly, they summarize the pattern in the data that is indicative of interocular divisive normalization, i.e., their final conclusion.

      This sentence no longer exists in the Discussion.

      Line 426: "Third, for those showing significant orientation difference, the trial-based orientation responses of each neuron were fitted with a Gaussian model with a MATLAB nonlinear least squares function:" The choice of using a Gaussian function to fit orientation tuning was probably suboptimal. A Gaussian function provides an adequate fit only for neurons whose tuning is very sharp. The responses outside of the peak fall down to the baseline and the two ends meet. Otherwise, the two ends do not meet. An adequate fit would be achieved with a function of a circular variable, which wraps around 180 deg. I recommend using a Von Mises function for fitting orientation tuning.

      We agree with the reviewer that the Von Mises function is more accurate than Gaussian for fitting orientation tuning functions. Indeed we are using it to fit orientation tuning of V4 neurons, many of which have two peaks. For the current V1 data, the differences between Von Mises and Gaussian fittings are very small, as shown in the orientation functional maps from three macaques below. Because we also use the same Gaussian fitting of orientation tuning in several published and current under-review papers, we prefer to keep the Gaussian fitting results in the manuscript.

      Author response image 1.

      Reviewer #3 (Public Review):

      The authors have made simultaneous recordings of the responses of large numbers of neurons from the primary visual cortex using optical two-photon imaging of calcium signals from the superficial layers of the cortex. Recordings were made to compare the responses of the cortical neurons under normal binocular viewing of a flat screen with both eyes open and monocular viewing of the same screen with one eye's view blocked by a translucent filter. The screen displayed visual stimuli comprising small contrast patches of Gabor function distributions of luminance, a stimulus that is known to excite cortical neurons.

      This is an important data set, given the large numbers of neurons recorded. The authors present a simple model to explain the binocular combination of neuronal signals from the right and left eyes.

      The limitations of the paper as written are as follows. These points can be addressed with some additional analysis and rewriting of sections of the paper. No new experimental data need to be collected.

      (1) The authors should acknowledge the fact that these recordings arise from neurons in the superficial layers of the cortex. This limitation arises from the usual constraints on optical imaging in the macaque cortex. This means that the sample of neurons forming this data set is not fully representative of the population of binocular neurons within the visual cortex. This limitation is important in comparing the outcome of these experiments with the results from other studies of binocular combination, which have used single-electrode recording. Electrode recording will result in a sample of neurons that is drawn from many layers of the cortex, rather than just the superficial layers.

      See our discussion regarding the technical limitations of 2-p calcium imaging listed earlier.

      (2) Single-neuron recording of binocular neurons in the primary visual cortex has shown that these neurons often have some spontaneous activity. Assessment of this spontaneous level of firing is important for accurate model fitting [1]. The paper here should discuss the level of spontaneous neuronal firing and its potential significance.

      We have noticed previously that at non-optimal spatial frequencies, calcium responses to a moving Gabor grating are close to zero (Guan et al., Prog Neurobiology, 2021, Fig. 1B), but we cannot tell whether this is due to calcium response nonlinearity, or a close-to-zero level of spontaneous neuronal activity. Prince et al (2002) reported low spontaneous responses of V1 neurons with moving grating stimuli (e.g., about 3 spikes/sec in one exemplar neuron, their Fig. 1B), so this appears not a big effect. In our data fitting, we do have an orientation-unspecific component in the Gaussian model, which represents the neuronal response at a non-preferred orientation, but not necessarily the spontaneous activity.

      (3) The arrangements for visual stimulation and comparison of binocular and monocular responses mean that the stereoscopic disparity of the binocular stimuli is always at zero or close to zero. The animal's fixation point is in the centre of a single display that is viewed binocularly. The fixation point is, by definition, at zero disparity. The other points on the flat display are also at zero disparity or very close to zero because they lie in the same depth plane. There will be some small deviations from exactly zero because the geometry of the viewing arrangements results in the extremities of the display being at a slightly different distance than the centre. Therefore, the visual stimulation used to test the binocular condition is always at zero disparity, with a slight deviation from zero at the edges of the display, and never changes. [There is a detail that can be ignored. The experimenters tested neurons with visual stimulation at different real distances from the eyes, but this is not relevant here. Provided the animals accurately converged their eyes on the provided binocular fixation point, then the disparity of the visual stimuli will always be at or close to zero, regardless of viewing distance in these circumstances.] However, we already know from earlier work that neurons in the visual cortex exhibit a range of selectivity for binocular disparity. Some neurons have their peak response at non-zero disparities, representing binocular depths nearer than the fixation depth or beyond it. The response of other neurons is maximally suppressed by disparities at the depth of the fixation point (so-called Tuned Inhibitory [TI] neurons). The simple model and analysis presented in the paper for the summation of monocular responses to predict binocular responses will perform adequately for neurons that are tuned to zero disparity, so-called tuned excitatory neurons [TE], but is necessarily compromised when applied to neurons that have other, different tuning profiles. Specifically, when neurons are stimulated binocularly with a non-preferred disparity, the binocular response may be lower than the monocular response[2, 3]. This more realistic view of binocular responses needs to be considered by the authors and integrated into their modelling.

      We agree and include the following texts when discussing the future work:

      (Ln 298) “In addition, in our experiments, binocular stimuli were presented with zero disparity, which best triggered the responses of neurons with zero-disparity tuning. A more realistic model of binocular combination also requires the consideration of neurons with other disparity-tuning profiles.”

      (4) The data in the paper show some features that have been reported before but are not captured by the model. Notably for neurons with extreme values of ocular dominance, the binocular response is typically less than the larger of the two monocular responses. This is apparent in the row of plots in Figure 2D from individual animals and in the pooled data in Figure 2E. Responses of this type are characteristic of tuned inhibitory [TI] neurons[2]. It is not immediately clear why this feature of the data does not appear in the summary and analysis in Figure 3.

      This difference is indeed captured by the model, which can be more easily appreciated in Fig. 4A where monocular and binocular model simulations are plotted in the same panel. In the text, we also wrote: (Ln 195) “It is apparent that binocular responses cannot be explained by the sum of monocular responses, as binocular responses are substantially lower than the summed monocular responses for both monocular and binocular neurons. Nor can binocular responses be explained by the responses to the preferred eye, as binocular responses are also lower than those to the preferred eye (the larger of the two monocular responses) for monocular neurons.”

      The paper text states that the responses were "first normalized by the median of the binocular responses". This will certainly get rid of this characteristic of the data, but this step needs better justification, or an amendment to the main analysis is needed.

      The relevant sentence has been rewritten as “Monocular and binocular data of each FOV/depth, as well as the pooled data, were first normalized by the respective median of the binocular responses of all neurons in the same FOV/depth.” This normalization would render the overall binocular responses to be around unity, for the purpose of facilitating comparisons among all FOV/depth, but it would not affect the overall characteristic of the data.

      In the present form, the model and analysis do not appear to fit the data in Figure 2 as accurately as needed.

      Thanks for pointing out the problem, as data fitting for FOV C_270 and the pooled data were especially inaccurate. The issue has been mostly fixed when each datum was weighted by its standard deviation (please see the updated Fig. 3).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      The manuscript by Rios et al. investigates the potential of GSK3 inhibition to reprogram human macrophages, exploring its therapeutic implications in conditions like severe COVID-19. The authors present convincing evidence that GSK3 inhibition shifts macrophage phenotypes from pro-inflammatory to anti-inflammatory states, thus highlighting the GSK3-MAFB axis as a potential therapeutic target. Using both GM-CSF- and M-CSF-dependent monocyte-derived macrophages as model systems, the study provides extensive transcriptional, phenotypic, and functional characterizations of these reprogrammed cells. The authors further extend their findings to human alveolar macrophages derived from patient samples, demonstrating the clinical relevance of GSK3 inhibition in macrophage biology.

      The experimental design is sound, leveraging techniques such as RNA-seq, flow cytometry, and bioenergetic profiling to generate a comprehensive dataset. The study's integration of multiple model systems and human samples strengthens its impact and relevance. The findings not only offer insights into macrophage plasticity but also propose novel therapeutic strategies for macrophage reprogramming in inflammatory diseases.

      Strengths:

      (1) Robust Experimental Design: The use of both in vitro and ex vivo models adds depth to the findings, making the conclusions applicable to both experimental and clinical settings.

      (2) Thorough Data Analysis: The extensive use of RNA-seq and gene set enrichment analysis (GSEA) provides a clear transcriptional signature of the reprogrammed macrophages.

      (3) Relevance to Severe COVID-19: The study's focus on macrophage reprogramming in the context of severe COVID-19 adds clinical significance, especially given the relevance of macrophage-driven inflammation in this disease.

      Weaknesses:

      There are no significant weaknesses in the study, though some minor points could be addressed for clarity and completeness, as outlined in the recommendations below.

      Many thanks for these comments. Please find below the response to the  specific recommendations.

      Recommendations for the authors:

      (1) In lines 263-266, the term "MoMac-VERSE" and its associated clusters are introduced without sufficient explanation. The authors should provide additional clarification on what these clusters represent and how they were derived.

      We have revised the text according to the reviewer´s suggestion and followed the original nomenclature of the MoMac-VERSE monocyte/macrophage clusters, also recognizing the procedure for their identification. The newly modified text now states: "Thus, analysis of the MoMac-VERSE (a resource that identified conserved monocyte and macrophage states derived from healthy and pathologic human tissues) (GSE178209) (2), indicated that GSK3 inhibition augments the expression of the gene sets that define MoMac-VERSE subsets identified as long-term resident macrophages [Cluster HES1_Mac (#2)] and tumor-associated macrophages with an M2-like signature [Clusters HES1_Mac (#2), TREM2_Mac (#3), C1Q<sup>hi</sup>_Mac (#16) and FTL_Mac (#17)] (2) (Figure 1H)."

      (2) In line 283, the reference labeled "2227" appears incorrect. It seems to be a formatting issue, and it might refer to references 22-27. Please verify and correct.

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (3) In line 353, the reference is incorrect. Please reviewe ensure that all references are properly cited throughout the manuscript.

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (4) In line 368, one of the patient samples shows a decreased IL-10 response after CHIR treatment. The authors should acknowledge the heterogeneity in the primary cell responses and adjust the conclusion accordingly to reflect this variability.

      We have modified the text following the reviewer´s comment, and acknowledge the heterogeneity in the production of IL-10 after GSK3 inhibition in the three analyzed samples. The modified text now states: "Consistent with these findings, CHIR-AMØ exhibited higher expression of MAFB (Figure 6F) whose increase correlated with an augmented secretion of Legumain, CCL2 and IL-10 (Figure 6G), although the latter was only seen in two samples, probably reflecting heterogeneity in primary cell responses."

      (5) Figure 7B: the UMAP shows 4 populations, but according to the visualization in the sup fig 3, there should be many more clusters. How do the authors explain this? Are these patient-specific clusters? Also, IMs can be separated into at least subpopulations. Can the authors plot also bona fide macrophage markers expressed by all subpopulations?

      To clarify this whole issue, and avoid misleading visualization of donor-specific clusters (see below), we have now replaced all UMAP plots shown in the previous version (in old Figure 7 and old Supplementary Figure 3) with new UMAP plots after running scVI reduction. In addition, we are including a new Supplementary Figure (new Supplementary Figure 3) that contains the information of the 21310 single-cell transcriptomes from human lungs reported in GSE128033 (ref. 47) after filtering and integration [nFeature > 200 and < 6000; Unique Molecular Identifiers (nCount) > 1000) and % of mitochondrial genes (< 15 %)]. Besides, old Supplementary Figure 3 has been replaced by the new Supplementary Figure 4, which includes the information of the single-cell transcriptomes from human lung macrophages selected from GSE128033 (ref. 47) based on their expression of the monocyte/macrophage-associated markers CD163, FABP4, LYVE1 or FCN1.

      Addresing the first question, UMAPs in old Figure 7B and old Supplementary Figure 3B had a different  number of clusters because old Figure 7B was derived from old Supplementary Figure 3B after grouping macrophage clusters according to the expression of previously defined markers and to limit the weight of donor-specific clusters. Specifically, the macrophage clusters from old Figure 7B were re-grouped according to the differential expression of:

      - FCN1 (including cluster 4, 7 and 12 from Figure 7B): Infiltrating monocytes.

      - FABP4 and TYMS-negative (including clusters 0, 2, 5 and 13 from Figure 7B), or MARCO and INHBA (cluster 9 from Figure 7B) or PPARG (cluster 11 from Figure 7B): Alveolar macrophages (AMØ).

      - TYMS, MKI67, TOP2A and NUSAP1 (cluster 15 from Figure 7B): Proliferating AMØ.

      - LYVE1 or RNASE1 or LGMN (including clusters 1, 3, 6, 8, 10 and 14 from Figure 7B): Interstitial Macrophages (IMØ).

      As the reviewer suggested, this type of UMAP plot yielded a large number of donor-specific clusters. To avoid such a misleading representation, we have now plotted UMAPs after running scVI reduction in every case. The new plots are now shown in new Figure 7A, new Figure 7B, new Supplementary Figure 3 (containing the information of the 21310 single-cell transcriptomes from GSE128033) and the novel Supplementary Figure 4 (with the information of the single-cell transcriptomes from human lung macrophages from GSE128033).

      Finally, to address the last issue, we have now plotted the expression of genes used for macrophage definition (CD163, FABP4, LYVE1, FCN1), as well as proliferation-associated genes (TYMS, MKI67, TOP2A, NUSAP1) and other bona fide macrophage marker genes (SPI1, FOLR2) in Supplementary Figure 4C.

      (6) statistics should be indicated in every figure legend and for every subfigure where applicable.

      We have now included the specific statistical procedure applied for each Figure and panel.

      Reviewer 2 (Public review):

      The study by Rios and colleagues provides the scientific community with a compelling exploration of macrophage plasticity and its potential as a therapeutic target. By focusing on the GSK3-MAFB axis, the authors present a strong case for macrophage reprogramming as a strategy to combat inflammatory and fibrotic diseases, including severe COVID-19. Using a robust and comprehensive methodology, in this study it is conducted a broad transcriptomic and functional analyses and offers valuable mechanistic insights while highlighting its clinical relevance

      Strengths:

      Well performed and analyzed

      Weaknesses:

      Additional analyses, including mechanistic studies, would increase the value of the study

      In an effort to address the comment of the reviewer, we have performed more detailed analysis of the kinetics and dose-response effects of GSK3 inhibition, which are now provided as new Supplementary Figure 3A.

      Regarding additional mechanistic studies, we decided to explore the relationship between inactive GSK3β and MAFB levels at the early stages of M-CSF- or GM-CSF-driven monocyte-to-macrophage differentiation. These experiments, performed in three independent monocyte preparations, indicated that, 48 hours along differentiation, M-CSF promoted a huge increase in both MAFB expression and a slight (albeit significant) rise in inactive GSK3β (P-Ser9-GSK3β) (compared to either untreated or GM-CSF-treated monocytes), further supporting the macrophage re-programming effect of GSK3. However, since the M-CSF-promoted increase in MAFB levels was much robust than the enhancement in inactive GSK3β, we hypothesize that proteasomal degradation of MAFB might be also distinct between M-CSF- (M-MØ) and GM-CSF-dependent (GM-MØ) monocyte-derived macrophages.

      Author response image 1.

      Total GSK3β, p-Ser9-GSK3β and MAFB levels in three preparations of freshly purified monocytes either unstimulated (-) or stimulated with M-CSF (10 ng/ml) or GM-CSF (1,000 U/ml) at different time points, as determined by Western blot (upper panel). Vinculin protein levels were determined as protein loading control. Mean ± SEM of the GSK3β/Vinculin, p-Ser9-GSK3β/Vinculin, and MAFB/Vinculin protein ratios from the three independent experiments are shown (lower panel) (paired Student’s t test: *, p<0.05; ****, p<0.001).

      Based on this finding, we then determined proteasome activity in fully differentiated M-CSF- and GM-CSF-dependent monocyte-derived macrophages. Use of the Immunoproteasome Activity Fluorometric Assay Kit II (UBPBio) in M-MØ and GM-MØ, either untreated or exposed to the proteasome inhibitor MG132, revealed that immune-proteasomal and proteasomal activity is significantly stronger in GM-MØ than in M-MØ,  as demonstrated in assays for chymotrypsin-like (ANW) and branched amino acid preferring (PAL) activity (immunoproteasome), and trypsin-like (KQL) activity (both proteasome and immunoproteasome). This result suggested that, indeed, immunoproteasomal activity might contribute to the differential expression of MAFB in M-MØ and GM-MØ.

      Author response image 2.

      Immunoproteasome activity in M-MØ and GM-MØ, either untreated or exposed to MG132, as determined using the Immunoproteasome Activity Fluorometric Assay Kit II (UBPBio) on the three indicated peptides (upper panel).  Mean ± SEM of three independent experiments are shown (paired Student’s t test: *, p<0.05) (lower panel).

      Consequently, we next set up experiments to assess whether the proteasome inhibitor MG132 was capable of enhancing the expression of MAFB-dependent genes in GM-MØ. Preliminary results of GM-MØ exposure to MG132 for 6 hours indicated an increase in the expression of MAFB protein and the MAFB-dependent genes LGMN and IL10. , as well as a reduction in the expression of the GM-MØ-specific gene CD1C.

      Author response image 3.

      A. Schematic representation of the exposure of MG132 to GM-MØ for 6 hours. B. MAFB protein levels in four independent preparations of GM-MØ exposed to either DMSO (DMSO-GM-MØ) or the proteasome inhibitor MG132 (MG132-GM-MØ) for 6 hours, as determined by Western blot (left panel). GAPDH protein levels were determined as protein loading control. Mean ± SEM of the MAFB/GAPDH protein ratios from the four independent experiments are shown (right panel) (paired Student’s t test: ***, p<0.005). C. Relative mRNA levels of the indicated genes in DMSO-GM-MØ and MG132-GM-MØ, as determined by RT-PCR on seven independent samples (paired Student’s t test: ***, p<0.005; ****, p<0.001).

      Unfortunately, this proteasome inhibitor (MG-132) caused a great reduction in cell viability after 6-8 hours. Since a similar decrease in cell viability was observed upon analysis with the ONX-0914 immunoproteasome inhibitor, we could not procede any further with this approach.

      Given the reviewer´s suggestion to include mechanistic insights to the manuscript, we are now providing these results (and the corresponding figures) only for the reviewer´s information and to make clear our attempts to comply with his/her request.

      Recommendations for the authors:

      The results are of interest, and only some minor issues need to be addressed to strengthen the conclusions of the study.

      We gratefully thank the reviewer for his/her comments. 

      (1) This study employs a single dose of 10 μM of the GSK3 inhibitor CHIR-99021 for 48 hours, which is reasonable for in vitro studies. However, further investigation into the effect of different doses and exposure times could provide additional insight into optimal dosing and durability of reprogramming effects. In addition, would an alternative GSK3 inhibitors have comparable effects?

      Following the reviewer suggestion, we have performed a kinetics and dose-response analysis of the effects of CHIR-99021, using MAFB protein levels as a readout. This experiments is now shown in new Supplementary Figure 1A, that replaces the old Supplementary Figure 1A panel where a shorter kinetics was presented. Results of this new experiment indicates a maximal effect of 10µM CHIR-99021, and that the effect of the inhibitor becomes maximal 24-48 hours after treatment. The text has been modified accordingly, and it now states: "Kinetics and dose-response analysis of the effects of CHIR-99021 on MAFB expression showed that maximal protein levels were achieved after a 24-48 hour exposure to 10µM CHIR-99021 (Supplementary Figure 1A), conditions that were used hereafter."

      Regarding the use of alternative GSK3 inhibitors, we had already provided that information in Supplementary Figure 1B, where the effects of SB-216763 (10 µM) or LiCl (10 mM) were evaluated. The huge reversal of the Tyr<sup>216</sup>/Ser<sup>9</sup> GSK3β phosphorylation ratio observed with CHIR-99021 was not seen with other GSK3 inhibitors, as indicated in the text. In any event, we believe that the relevance of this result with SB-216763 or LiCl is minimized by the results generated after siRNA-mediated GSK3 knockdown (shown in Figure 4), that completely reproduced the effects seen with CHIR-99021.

      (2) Why in the "reanalysis of single cell RNAseq data" section, the authors use Seurat v5 (R) but then change to python, and the other way around?

      As indicated in the documentation for Integrative Analysis in Seurat v5 (https://satijalab.org/seurat/articles/seurat5_integration), scVIIntegration requires reticulate package which allow us to run Python environment in R.

      (3) When the authors refer to the clusters enriched in MoMacVERSE, they use the labels of the clusters (for example #2 or #3). I would suggest using the annotations described in the original paper, to link it to the bibliography published through the labels established in the paper.

      We have revised the text according to the reviewer´s suggestion and followed the original nomenclature of the MoMac-VERSE monocyte/macrophage clusters, also recognizing the procedure for their identification. The newly modified text now states: "Thus, analysis of the MoMac-VERSE (a resource that identified conserved monocyte and macrophage states derived from healthy and pathologic human tissues) (GSE178209) (2), indicated that GSK3 inhibition augments the expression of the gene sets that define MoMac-VERSE subsets identified as long-term resident macrophages [Cluster HES1_Mac (#2)] and tumor-associated macrophages with an M2-like signature [Clusters HES1_Mac (#2), TREM2_Mac (#3), C1Q<sup>hi</sup>_Mac (#16) and FTL_Mac (#17)] (2) (Figure 1H)."

      (4) In line 309. Is there any significance on the "having a stronger effect"?

      We apologize for the misleading sentence. The phrase has been modified for better clarity, and the text now states: "Like CHIR-99021, silencing of both GSK3A and GSK3B augmented the expression of MAFB, with the simultaneous silencing of both GSK3A and GSK3B genes having a stronger effect (Figure 4B), and modulated the expression of 329 genes (Figure 4C,D)."

      (5) In line 337, "(22)(27)", are these references?

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (6) In the single-cell reanalysis, could you please provide integration Qc plots? It would be interesting to have it on the paper.

      To clarify this whole issue, and avoid misleading visualization of donor-specific clusters (see below), we have now replaced all UMAP plots shown in the previous version (in old Figure 7 and old Supplementary Figure 3) with new UMAP plots after running scVI reduction. In addition, we are including a new Supplementary Figure (new Supplementary Figure 3) that contains the information of the 21310 single-cell transcriptomes from human lungs reported in GSE128033 (ref. 47) after filtering and integration [nFeature > 200 and < 6000; Unique Molecular Identifiers (nCount) > 1000) and % of mitochondrial genes (< 15 %)]. Besides, old Supplementary Figure 3 has been replaced by the new Supplementary Figure 4, which includes the information of the single-cell transcriptomes from human lung macrophages selected from GSE128033 (ref. 47) based on their expression of the monocyte/macrophage-associated markers CD163, FABP4, LYVE1 or FCN1.

      As requested by the reviewer, we are now providing the Qc plots for the re-analysis in the new Supplementary Figures 3 and 4.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      In its current form, I would exclude the cryo-EM data from the manuscript. It does not add much and it is distracting from the excellent work that you did on the functional characterization of the variant. Alternatively, you could try to improve the resolution and see if you can get some more meaningful analysis out of the structures? I noticed that you only collected very small datasets. If you decide to pursue a higher resolution reconstruction, collecting more movies will give you a better chance to obtain a higher resolution.

      We express our gratitude to the reviewer for their invaluable feedback. While acknowledging that our structure currently maintains a low resolution, it still provides valuable insights into the splice's proximity to the N412 glycan density. This proximity and low-resolution map hindered the complete modeling of all the splice residues. Notably, this structure represents the first depiction of this particular splice variant. Consequently, it lays a foundation for subsequent studies in the field, and hence, we would want to keep it in the manuscript. As per reviewers’ suggestions, we have now included comparisons of our structure with the GluK1-2a receptor structure reported recently (Mayerson et al. 2022). We do plan to carry out higher-resolution structures in the future.

      I would probably also exclude the RNAseq analysis. I think that Figure 1 is fine, but the supplement 1 is not very successful in convincing me that the exon 9 is expressed mainly in early stages of brain development. In addition, the plot in Figure 1 indicates strong expression in the cerebellar cortex in 20s and 30s. If you decide to keep the data, I strongly encourage you to include more details on the analysis in the methods section.

      Thanks for this insightful comment. We have now modified this section extensively for better clarity. Indeed, the expression of this variant seems to be dynamic in different brain regions. This has now been specified in the revised manuscript. Figure 1 shows the expression of GRIK1 exon 9 gene in different regions of the human brain and donor age. The supplementary figure 1 is a zoom-in on one such region, the Cerebral cortex, where we observe the maximum expression of GRIK1. In this region, we also observed higher expression of exon 9 in the early stages of development. The scales of Figure 1 (0-4 RPKM) and supplemental Figure 1(06RPKM) are different due to more expression of other exons in supplemental Figure 1 (example, we observe 4RPKM expression in the shade of red, for figure 1, whereas similar values of 4RPKM are orange-yellow in the supplemental figure1). Using Supplemental Figure 1, we wanted to show the expression of exon 9 with respect to other exons during developmental stages that prove that GluK1-1 is highly expressed in the initial stages of life. more details on the analysis in the methods section has been added now.

      Additionally, there are a few minor issues in the data presentation:

      (1) in Fig. 2C there seems to be a mismatch between the green dose response plot and the GluK12a trace shown. The plot reports an EC50 of 187.7 uM, whereas in the sample trace 0.25 mM agonist activates only to ~20%.

      We have verified the data and statistics, confirming their consistency with the values reported in the manuscript. For Figure 2C, we present representative traces from a single cell. However, the EC50 value was calculated using Hill's equation based on averaged data from 5 cells.

      (2) The axis label is misprinted in Figure 3C

      Thanks. Corrected.

      (3) In Fig 5 supplement 1, panel B - the 3 last labels above the western blot lanes are off so it is difficult to see which sample corresponds to which lane.

      Thanks. We have corrected the figure.

      Reviewer #2 (Recommendations For The Authors):

      Overall I congratulate the authors of this study nicely done. It represents a large body of work.

      We thank the reviewer for his/her time and positive comments.

      I have several minor corrections that authors could consider for the revision of the manuscript P7. The desensitization rate of GluK1-2a was "delayed"... replace by "increased".

      Corrected.

      P9. Last line 0.37; P.. Add the P value.

      P value has been added as suggested.

      P11 authors indicate that K368/375//379/382H376-E mutant exhibit significant difference in desensitization properties in presence of NEto1, but on the 1st line of p11, they provide a P value above 0.05

      We thank the reviewer for pointing out this discrepancy and have fixed the same. We have discussed two mutants that show slower desensitization when compared to GluK1-1a co-expressed with Neto1. The K to E mutant has significance, while the des value for the K368/375//379/382H376-E mutant shows the same pattern, though not significantly. We have now modified the text to explain this more clearly.

      P19 the calculation of mean weighted tau TDes is not clear and should be better explained.

      Thanks. We have added more details in the Methods sections. We analyzed the current decays in response to 1–2 ms or 1 s applications by employing an exponential function or the sum of two exponential functions. This analysis allowed us to derive a weighted mean τdes using the formula [(τ1 × amplitude1) + (τ2 × amplitude2)]/[amplitude1 + amplitude2]. The tau values represent the time constants obtained from the exponential fits, while the amplitudes correspond to the estimated contributions of each component to the total peak current amplitude.

      [(A1 * t1) + (A2 * t2)] / (A1 + A2)

      It represents the calculation of a weighted mean, where A1 and A2 are the amplitudes, and t1 and t2 are the corresponding time constants. The formula calculates the overall mean time constant by taking into account the contribution of each component to the total amplitude.

      P19 the rate of recovery was obtained by fitting the one-phase association "with" exponential function. With is missing.

      We have corrected this error.  Thanks.

      P21 which method has been used for site directed mutagenesis

      Overlapping PCR was carried out for mutagenesis using the primers listed in Figure 4-table supplement 1. A ligation-free cloning approach (Zhang et al., 2017) was used. It has now been elaborated in the methodology section under Site directed mutagenesis.

      P21 and 22. Provide complete reference of reagent including species of antibodies.

      Thanks. We have added all the details in the methods section now. 

      Anti-His: Rabbit mAb #12698 (Cell Signaling Technology)

      Anti-Neto1: Rabbit #SAB3500679 (Sigma Aldrich)

      Anti-GFP: Mouse mAb G1546 (Sigma Aldrich)

      Anti-actin: Mouse mAb A3853 (Sigma Aldrich)

      P22 How much anti His antibody was used with 40microliter of protein A?

      We have used 2µg/ 40uL of Protein A slurry. This has now been added to the methodology.

      P23 Authors seem to have used a virus to express protein but the protocol is not given. For example what is P2 virus?

      We have now modified the manuscript to include details of baculovirus generation as per the protocol described in Goehring et al. 2014. We followed the same protocol wherein the 2nd generation of virus (P2) generated in insect (SF9) cells was used for infecting suspensionadapted HEK293-T cells for large-scale GluK1-1aEM protein expression.

      Reviewer #3 (Recommendations For The Authors):

      Major concerns:

      (1) The effect of the splice insert on Gluk1 regulation by Neto proteins is not fully clear. For example, experiments in Fig. 3G indicate that the desensitization time for Gluk1-1a + Neto2 is ~32ms. This value is half compared with data obtained from whole-cell experiments shown in Fig. 3A (~70ms). What is the reason for this discrepancy? If variability is observed between experiments, I wonder how valid are the comparisons made in panel A between GluK11a+Neto2 vs GluK1-2a+Neto2 groups. In the case of recovery analysis, authors found significant differences comparing both groups in the presence of Neto (Fig. 3B) but recovery times are not identic for Gluk1-1a vs Gluk1-2a (without Neto). Thus, I wonder if the fold change related to the control group (without Neto) is different. 

      We appreciate your detailed feedback, which has allowed us to clarify and reinforce the validity of our experimental findings. Different recording configurations (e.g., outside-out patch (Fig. 3G) versus whole-cell recordings (Fig. 3A) have been used. Whole-cell recordings average responses over a larger membrane area and also have slower solution exchange times compared to outside-out patch recordings. This may have contributed to the variability in desensitization times. However, similar trends in our whole cell vs. outside-out patch recordings were observed. Further, all the data except those presented in Figs 3G and 3H are from whole-cell recordings. We have performed multiple independent experiments and utilized rigorous statistical analyses to validate our comparisons. We report mean values with standard deviations or confidence intervals to provide a more accurate representation of the data.

      Neto1 significantly speeds up the recovery from desensitization for both variants, with a more pronounced effect on GluK1-1a (GluK1-1a +Neto1: 0.68 s) compared to GluK1-2a (GluK1-2a +Neto1: 1.15 s). The recovery times are not identical for the two variants, likely due to the presence of splice insert in GluK1-1a. Neto2, on the other hand, slows recovery for both variants without significant differential effects. However, the recovery rate from the desensitized state is faster for GluK1-1 compared to GluK1-2a alone, although insignificant (without Neto). 

      In the case of the glutamate concentration-response curve (Fig. 3C), EC50 values for Neto1 and Neto2 are relatively the same, but this approach on its own does not provide insights about the role of the splice insert. Previous experiments with the Gluk1 reveal differences between EC50 in the presence of Neto1 or 2 (Fisher, 2015), suggesting that the insert could regulate glutamate binding affinity, but still, this point is not directly demonstrated in this work.

      Thanks for this insightful comment. Indeed, we cannot conclude that splice residues directly affect glutamate sensitivity and have modified the text accordingly. The Fisher paper demonstrated that both Neto1 and Neto2 can influence glutamate sensitivity in GluK1-2a, with EC50 values of 124.6 ± 16.2 µM. Specifically, in the presence of Neto1 and Neto2, the EC50 values are 4.4 ± 0.4 µM and 13.7 ± 4.2 µM, respectively, indicating a noticeable effect though not substantially different for GluK1-2a coexpressed with either Neto1 and Neto2. Our observation for the GluK1-1a has been similar, with both Neto1 and Neto2 showing a leftward shift.

      (2) Similar to the previous point, a proper interpretation of mutant data is missing in the manuscript. From current data, it is difficult to visualize the role of the insert on Netodependent regulation, mainly, because of the fact that some mutations alone affect Gluk1-1 channel properties. The authors conclude their data by stating that "while the modulation of the receptor by Neto 1 is affected by mutations in splice insert, the modulation by Neto 2 remains largely unaffected" (Page 13). However, this statement is confusing since the co-expression of Gluk1-1a with Neto2 (Fig. 5) prevents the effect caused by mutation K368 alone (Fig. 4), indicating that modulations by Neto 2 are indeed potentially affected by the mutations. Please, clarify. Also, the effect of the K368/375/379/382H376-E mutant on Neto modulation (pink bar in Fig. 5) is impossible to interpret properly since the effect of the mutation alone is not shown in the manuscript.

      Thanks for seeking this important clarification. It is indeed true that splice residue mutations themselves affect the receptor functional properties in comparison to the wild-type receptors. For the sake of clarity, we have presented the effect of splice mutants on receptor properties separately from the effect of mutations on modulation by Neto proteins. Figure 4 demonstrates a comparison between wild-type and mutant receptors without the Neto proteins, showcasing different kinetic properties, while Figure 5 provides detailed information on the role of the insert in Neto-dependent regulation. 

      It’s true we could not record the effect of the K368/375/379/382H376-E mutant alone or when coexpressed with Neto 2 due to low peak amplitudes (mentioned in Table 1) that prevented reliable comparisons. However, robust currents were observed when the same mutant was coexpressed with Neto1, and hence comparisons were shown for this mutant with GluK1-1a wild-type + Neto1. 

      We have now modified the statement "while the modulation of the receptor by Neto 1 is affected by mutations in splice insert, the modulation by Neto 2 remains largely unaffected" and the last paragraph as follows:

      “Neto1 appears to have more pronounced effects on the mutant receptors compared to Neto2. Specifically, Neto1 significantly slowed desensitization for the K368-E mutant, accelerated recovery from desensitization for K368-E and K368/375/379/382H376-E mutants, increased agonist efficacy for K368-E and K375/379/382H376-E mutants, and altered rectification properties for K368E and K368/375/379/382H376-E mutants. In contrast, Neto2 had fewer significant effects on the mutant receptors, with the main impact being an increase in agonist efficacy for the K368-E mutant. Notably, Neto2 did not significantly affect desensitization, recovery from desensitization, or rectification properties of the mutant receptors when compared with wildtype GluK1-1a coexpressed with Neto2. These findings suggest that the splice residues in GluK1-1a differentially influence receptor modulation by Neto1 and Neto2, with Neto1 showing more extensive modulation of the mutant receptors' functional properties.”

      (3) An open question after reading this interesting work is if the proposed change in Neto regulation because of the splice insert is due to changes in Gluk1-Neto interactions or because the rearrangement after interaction with Neto proteins is different. Pull-down experiments (Fig 5 Sup.1) suggest that the splice insert and all the mutants tested do not prevent interaction with Neto proteins. I wonder if the authors could complement their data with a quantitative approach/analysis to demonstrate if the splice insert and the mutants affect Neto1/2 interactions (as expected for the rationale when creating the mutants).

      Thank you for this insightful suggestion. You raise an important point about distinguishing between changes in GluK1-Neto interactions and potential differences in receptor rearrangement after Neto binding. While our pull-down experiments suggest that the splice insert and mutants don't prevent Neto interactions (probably due to a larger interaction interface all along the receptor), a quantitative approach would indeed provide more nuanced information. In future studies, we do plan to perform a quantitative approach like Surface plasmon resonance to assess the changes in interactions upon mutations in the splice and/or Neto proteins in different states of the receptor. In addition, obtaining cryo-EM structures of GluK1 splice variants in complex with Neto1 and Neto2 would provide crucial insights into their interaction interfaces and any conformational changes induced by binding. 

      (4) Related to the Gluk1-1a structure, the authors state that the overall structure is similar to the one without the insert (page 14); however, this is not properly shown in the manuscript. Even if the overall architecture of the channel is the same, authors should make a proper/adequate comparison between both structures/domains to support their claims. Also, one should expect that the insertion of 15 amino acids would affect in some way the closing neighboring domains. The differential effect of the splice insert on glutamate and kainate EC50 values (Fig. 2 and Fig. 2 sup.1), suggests that the insert could introduce a sort of rearrangement in the binding domain. Thus, I wonder if a more elaborated analysis of the current structural data could reveal some structural insights that would explain the specific functional differences due to the splice insert. If the low resolution and the missing residues avoid making some comparisons and establish differences between sidechain orientations, still, a proper comparison between the domain backbones would be helpful to validate the author's statement at least. Also, I wonder if the changes could be resolved better in a closed state or APO structure, instead of the desensitized structure. Finally, are the structures obtained in DDM and nanodiscs similar?

      As per the reviewer’s suggestion, we have now added a new figure in the supplementary information, “Figure 6-figure supplement 9,” where we show a superimposition of GluK11aEM (detergent-solubilized or reconstituted in nanodiscs) and GluK1-2a (PDB:7LVT; silver) showing overall conservation of the structures in the desensitized state.

      As evident from the figure and rmsd values mentioned above, we do not observe significant movements at both ATD and LBD layers of GluK1-1a with respect to GluK1-2a. Also as can be observed the DDM solubilized and nanodisc reconstituted GluK1-1a (Panel A) are very similar with a rmsd of ~2.19Å across all the 2664 Calpha atom pairs. Due to low resolution of our structures, we have refrained from carrying out detailed structural comparisions.

      Our efforts to capture the closed state or apo state structures have failed due to either severe orientation bias (only top views) or increased heterogeneity. 

      (5) Methods section lacks relevant information for proper data interpretation as well as for replicating some experiments in the future. For example:

      A) The experimental design to determine the rectification index with a Ramp protocol is not clear: 1) Why the authors applied a ramp protocol if receptors desensitize along the time? Please clarify the protocol.

      Ramp protocols were used only for the wild-type receptors to compare their voltage-dependent behavior, as this was the first study to compare the two splice variants. All kainate receptors (GluK1-GluK5) desensitize over time. However, their rectification properties have been studied previously (both the absence and presence of Neto proteins) using Ramp protocols as they are faster than step protocols.  

      B) Are polyamines included in the solutions to perform the rectification assays?

      No, polyamines were not added to the intracellular solution, and the effect of the endogenous polyamine block was measured. This has now been specified in the results as well as the methods section.

      C) It is not clear if the experiments to calculate IK/IG ratios were performed in the same preparation (This is, the same cell was stimulated with glutamate and then kainate or vice versa).

      Indeed, the current responses for glutamate vs kainate are performed in the same cell (the same cell was stimulated by glutamate then kainate) so that the responses can be compared. It’s now been specified in the methods section.

      D) The experimental design for calculating recovery is not clear.

      We employed a double pulse protocol to measure receptor recovery. The protocol involved applying two consecutive pulses of agonist stimulation to the receptor. Initially, we applied a brief agonist pulse to activate the receptor, followed by a specific recovery period. After the recovery period, we administered a second agonist pulse to assess the receptor's recovery response. The receptor's recovery was determined by comparing the response amplitude of the second pulse to that of the first pulse, providing valuable insights into the receptor's recovery kinetics. Recovery rates were calculated with single exponential association fits in Prism. We have now modified the text for better clarity.

      E) Please indicate the species used for both functional and Cryo-EM (rat Gluk1 isoform?).

      Thanks for pointing this out. We have now specified in relevant methodology sections that Rattus norvegicus GluK1 and Neto proteins were used in this study.

      F) Please describe the nanodisc reconstitution protocol and how the nanodisc protein was purified, if appropriate.

      The MSP1E3D1 was purified by following the protocol given by the Sligar group in 2014 (doi: 10.1016/S0076-6879(09)64011-8). The nanodisc reconstitution protocol has now been elaborated in the revised manuscript.

      G) Site-directed mutagenesis methodology is incomplete. Please check.

      We have now elaborated this section to include more details.

      Minor concerns:

      (1) Authors state that splice residues are ~30A away from the TM domain. Currently, there is no friendly representation showing the localization of the splice in the structure, besides Fig.6E. The manuscript could benefit itself if authors include a better 3D representation or a scheme to highlight the position of the splice relative to critical domains.

      Thanks for pointing this out. The distance between TRP 381 CA (ATD) and LEU 636 CA (TM3) is 92.10 Å. We have changed the value in the text to ~92 Å.

      Author response image 1.

      (2) Authors mention that mutations in the insert to alanine show normal traffic to the plasma membrane but low current amplitude. Then, I wonder if single-channel conductance, mean open time or open probability is affected by the splice insert. Showing the effects of the insert on single-channel properties would strengthen the manuscript's quality.

      It is a good suggestion. However, as can be observed from our whole cell or outside out patch data, we obtained low peak amplitudes (<50 pA) for many of our receptor-only constructs and also suffered from high SEM for some recordings due to heterogeneity between cells of the same population. The suggestion to study the single channel properties of these receptors is considered for future experiments

      (3) It is unclear how the insert or the mutations specifically affect glutamate- or kainate-induced responses because authors analyze IK/IG ratios only. Maybe authors could consider including an analysis of the role of the insert on specific glutamate- or kainate-induced response to gain insights about ligand selectivity.

      All the values have been included in the excel for raw data. We have included the desensitization kinetics of mutant receptors in the presence of glutamate and compared it to the wild type GluK1-1a. Kainate induced responses were very heterogenous (high SEM for % desensitization) and hence have not been included in the main data.

      (4) Please be consistent with nomenclature along the manuscript to avoid confusion. For example, Are Gluk-1-1 and Gluk-1-1a referring to the same variant?

      GluK1-1 has been used in the abstract and the introduction where we introduce the N-terminal splice variant which either has the 15 residues (termed as GluK1-1) or lacks it (GluK1-2). The C- terminal splice variants for GluK1 are named as “a-d”, with “a” being the smallest Cterminal domain variant. Later in the manuscript, we have used only GluK1-1a terminology to represent the ATD splice variant with shortest C-terminal domain.

      The introduction and spatiotemporal results talk about the GluK1-1 receptors wherein the 

      (5) Legend figure 2: Repeated phrase should be removed. Please check.

      (6) Page 8: "This is similar to the effect observed in GluK1-2 receptors whereby the glutamate EC50 was shown to increase by Neto proteins [Neto1: 34-fold and Neto2: 7.5-fold (Palacios-Filardo et al., 2016) and Neto1/2: 10-30X (Fisher, 2015)]". It seems that values from Fisher's paper are backward. Please correct. 

      (7) Page 9. Second paragraph. Spelling mistake when referring to Fig. 3G.

      Thanks for pointing out the inadvertent errors; we have now corrected all of them.

      (8) Figure 3: The title in Y axis overlaps with the figure. Please check.

      We have corrected the error.

      (9) Page 10: "In addition, K375/379/382H376-E mutant also exhibited a slowdown in the recovery (K375/379/382H376-E: 4.83 {plus minus} 0.31 s P=0.2774) (Figure 4C; Table 1)." Statistical analysis indicates this is not correct. Please tone down this statement. For example: "...mutant also exhibited a trend to a slowdown in the recovery although differences do not reach statistical significance".

      Thanks. We have modified the statement as suggested.

      (10) Page 11: "and a reduction was observed for K375/379/382H376-E receptors (1.17 {plus minus} 0.28 P=0.3733) compared to wild-type (Figure 4D; Table 1)." Same issue as the previous minor comment.

      Thanks. We have modified the statement as suggested.

      (11) Page 11: "We observed that mutants K368-E and K368/375/379/382H376-E, desensitize significantly slower in the presence of Neto1" This statement is not true for K368/375/379/382H376-E mutant. Please correct.

      Thanks. We have modified the statement as suggested and specified the difference.

      (12) Legend Figure 4. Colored asterisks are not clear in the figure. Please check.

      Thanks. The reference to colored asterisks has been removed from the legend as they are not used.

      (13) Representative data shown in Fig 5 sup.2A do not match very well with the final quantification shown in Fig 5A. Please check. Also, the authors state in the result section (page 10) that data shown in Fig. 5A indicate that "GluK1-1a modulation by Neto 1 is influenced by the splice residues". This could be true only for residue K368; however, this is not so obvious since the two mutants containing K368E are inconsistent. Please check and clarify.

      Only representative traces are shown in Fig 5 sup 2 A. However, the quantification shown in Fig 5 A is from multiple cells. We have rechecked all the data and found it to be consistent. We have rewritten this section and modified it for better clarity.

      (14) Figure 6-supplement 2: Please incorporate missing values of MW standards in panel B.

      Thanks. We have modified the figure to include values for MW standards.

      (15) It is not clear the rationale for showing construct C552Y C557V C575S in Fig. 6 sup.3, panel A. This mutant is not mentioned in the manuscript.

      It has been mentioned in the methodology section under “Construct design for expression and purification of rat GluK1-1aEM”. It (C552Y C557V C576S) is one of the constructs used in optimizations that were checked for good protein yields. Based on FSEC protein profiles, we used C552Y, C557V (2X Cys mutant) as GluK1-1aEM, which is mentioned in the same section.

      (16) Fig. 6 sup.4 Not clear what does mean w.r.c. Please specify in the legend.

      With respect to (w. r. t.) has been specified in the manuscript.

      (17) Suggestion to improve data presentation in Fig. 4D and Fig. 3 sup.1B: For easier comparison of IK/IG ratios, representative traces for kainate and glutamate in the same group could be shown using the same Y-scale.

      It has been purposely shown with two different Y-scales due to the differences in peak amplitudes in the presence of glutamate or kainate. 

      (18) Fig. 3 sup.1A: Based on the figure legend, horizontal bars representing the application of glutamate are not consistent with time scale bars. Please, check. In the same figure, panel B, the representative traces shown for GluK-1a-Neto1 are not consistent with IK/IG ratio shown in Fig. 3D.

      Thanks, we have corrected the horizontal bars representing glutamate application. The representative traces shown for GluK-1a-Neto1 were rechecked and are consistent with the IK/IG ratio shown in Fig. 3D.

      (19) I wonder if the authors could discuss the lack of Neto1 effect on the wild type Gluk1-2a channel, as proposed previously.

      Sheng et al., 2015 showed that Neto1 enhances the desensitization onset of GluK1. However, it is unclear which GluK1 splice variants were used in that study. GluK1 has several splice variants, but in the present study, we specifically compared GluK1-1a and 2a. In our case, we did not observe the effect of Neto1 on wild-type GluK1-2a in either of the two techniques (whole cell and outside-out patch) we utilized for our study. However, as can be observed from our data, the GluK1-2a receptor alone shows a faster desensitization kinetics than the previous study (Copits et al., 2011). The differences could stem from different experimental conditions such as constructs, recording conditions used etc.

      Copits BA, Robbins JS, Frausto S, Swanson GT. Synaptic targeting and functional modulation of GluK1 kainate receptors by the auxiliary neuropilin and tolloid-like (NETO) proteins. Journal of Neuroscience. 2011 May 18;31(20):7334-40.

      Sheng N, Shi YS, Lomash RM, Roche KW, Nicoll RA. Neto auxiliary proteins control both the trafficking and biophysical properties of the kainate receptor GluK1. Elife. 2015 Dec 31;4:e11682. doi: 10.7554/eLife.11682. PMID: 26720915; PMCID: PMC4749551.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The bacterial neurotransmitter:sodium symporter homoglogue LeuT is an well-established model system for understanding the fundamental basis for how human monoamine transporters, such as the dopamine and serotonin, couple ions with neurotransmitter uptake. Here the authors provide convincing data to show that the K+ catalyses the return step of the transport cycle in LeuT by binding to one of the two sodium sites. The paper is an important contribution, but it's still unclear exactly where K+ binds in LeuT, and how to incorporate K+ binding into a transport cycle mechanism.

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript tackles an important question, namely how K+ affects substrate transport in the SLC6 family. K+ effects have previously been reported for DAT and SERT, but the prototypical SLC6fold transporter LeuT was not known to be sensitive to the K+ concentration. In this manuscript, the authors demonstrate convincingly that K+ inhibits Na+ binding, and Na+-dependent amino acid binding at high concentrations, and that K+ inside of vesicles containing LeuT increases the transport rate. However, outside K+ apparently had very little effect. Uptake data are supplemented with binding data, using the scintillation proximity assay, and transition metal FRET, allowing the observation of the distribution of distinct conformational states of the transporter.<br /> Overall, the data are of high quality. I was initially concerned about the use of solutions of very high ionic strength (the Km for K+ is in the 200 mM range), however, the authors performed good controls with lower ionic strength solutions, suggesting that the K+ effect is specific and not caused by artifacts from the high salt concentrations.

      The major issue I have with this manuscript is with the interpretation of the experimental data. Granted that the K+ effect seems to be complex. However, it seems counterintuitive that K+ competes with Na+ for the same binding site, while at the same time accelerating the transport rate. Even if K+ prevents rebinding of Na+ on the inside of vesicles, it would be expected that K+ then stabilizes this Na+-free conformation, resulting in a slowing of the transport rate. However, the opposite is found. I feel that it would be useful to perform some kinetic modeling of the transport cycle to identify a mechanism that would allow K+ to act as a competitive inhibitor of Na+ binding and rate-accelerator at the same time.

      This ties into the second point: It is not mentioned in the manuscript what the configuration of the vesicles is after LeuT reconstitution. Are they right-side out? Is LeuT distributed evenly in inside-out and right-side out orientation? Is the distribution known? If yes, how does it affect the interpretation of the uptake data with and without K+ gradient?

      Finally, mutations were only made to the Na1 cation binding site. These mutations have an effect mostly to be expected, if K+ would bind to this site. However, indirect effects of mutations can never be excluded, and the authors acknowledge this in the discussion section. It would be interesting to see the effect of K+ on a couple of mutants that are far away from Na+/substrate binding sites. This could be another piece of evidence to exclude indirect effects, if the K+ affinity is less affected.

      Reviewer #2(Public Review):

      To characterize the relationship between Na+ and K+ binding to LeuT, the effect of K+ on Na+- dependent [3 H] leucine binding was studied using a scintillation proximity assay. In the presence of K+ the apparent affinity for sodium was reduced but the maximal binding capacity for this ion was unchanged, consistent with a competitive mechanism of inhibition between Na+ and K+.

      To obtain a more direct readout of K+ binding to LeuT, tmFRET was used. This method relies on the distance-dependent quenching of a cysteine-conjugated fluorophore (FRET donor) by a transition metal (FRET acceptor). This method is a conformational readout for both ion- and ligand-binding. Along with the effect of K+ on Na+-dependent [3 H] leucine binding, the findings support the existence of a specific K+ binding site in LeuT and that K+ binding to this site induces an outward closed conformation.

      It was previously shown that in liposomes inlaid with LeuT by reconstitution, intra-vesicular K+ increases the concentrative capacity of [ 3 H] alanine. To obtain insights into the mechanistic basis of this phenomenon, purified LeuT was reconstituted into liposomes containing a variety of cations, including Na+ and K+ followed by measurements of [ 3 H] alanine uptake driven by a Na+ gradient.

      The ionic composition of the external medium was manipulated to determine if the stimulation of [3 H] alanine uptake by K+ was due to an outward directed potassium gradient serving as a driving force for sodium-dependent substrate transport by moving in the direction opposite to that of sodium and the substrate. Remarkably it was found that it is the intra-liposomal K+ per se that increases the transport rate of alanine and not a K+ gradient, suggesting that binding of K+ to the intra-cellular face of the transporter could prevent the rebinding of sodium and the substrate thereby reducing their efflux from the cell. These conclusions assume that the measured radioactive transport is via right-side-out liposomes rather than from their inverted counterparts (in case of a random orientation of the transporters in the proteoliposomes). Even though this assumption is likely to be correct, it should be tested.

      Since K+- and Na+-binding are competitive and K+ excludes substrate binding, the Authors chose to focus on the Na1 site where the carboxyl group of the substrate serves as one of the groups which coordinate the sodium ion. This was done by the introduction of conservative mutations of the amino acid residues forming the Na1 site. The potassium interaction in these mutants was monitored by sodium dependent radioactive leucine binding. Moreover, the effect the effect of Na+ with and without substrate as well as that of potassium on the conformational equilibria was measured by tmFRET measurements on the mutants introduced in the construct enabling the measurements. The results suggest that K+-binding to LeuT modulates substrate transport and that the K+ affinity and selectivity for LeuT is sensitive to mutations in the Na1 site, pointing toward the Na1 site as a candidate site for facilitating the interaction between K+ in some NSS members.

      The data presented in this manuscript are of very high quality. They are a detailed extension of results by the same group (Billesbolle et. al, Ref. 16 from the list) providing more detailed information on the importance of the Na1 site for potassium interaction. Clearly this begs for the identification of the binding site in a potassium bound LeuT structure in the future. Presumably LeuT was studied here because it appears that it is relatively easy to determine structures of many conformational states. Furthermore, convincing evidence showed that the stimulatory effect of K+ on transport is not because of energization of substrate accumulation but is rather due to the binding of this cation to a specific site.

      Reviewer #1 (Recommendations For The Authors):

      • Include a transport mechanism that can account for the K+ effects.

      We appreciate the opportunity to elaborate further regarding how we envision this complex mechanism. It is generally known that, within the LeuT-fold transporters, the return step is ratelimiting for the transport process. Our data suggests that K+ binds to the inward-facing apo form.

      Accordingly, we propose that the role of K+ binding is to facilitate LeuT to overcome the rate-limiting step. We propose the following mechanistic model: When Na+ and substrate is released to the intracellular environment the transporter must return to the outward-facing conformation. This can happen in (at least) two ways: 1) The transporter in its apo-form closes the inner gate and opens to the extracellular side, now ready to perform a new transport cycle. 2) The transporter rebinds Na+, which allows for the rebinding of substrate. It can now go in reverse (efflux) or it once again release its content. The transporter can naturally also only rebind Na+ and release it again to the cytosol.

      The purpose of K+ binding is to prevent Na+ rebinding and to promote a conformational state of the transporter, which does not allow Na+ binding. Even though Na+ has a higher affinity for the site, K+ is much more abundant.

      This model is supported by our previous experiment, showing that intravesicular K+ prevents [3H]alanine efflux while LeuT performs Na+-dependent alanine transport. Thus, the increase in Vmax could be due to a decreased efflux (exchange mode), or a facilitation of the rate-limiting step, or a combination of the two.

      Note that the model does not require that K+ is counter-transported. It just has to prevent Na+ rebinding. However, even though we failed to show K+ counter-transport, it does not mean that it does not happen. Further experiments must clarify this issue.

      To be more explicit about our proposed mechanistic model, we have expanded the last paragraph in the Discussion section. It now reads:

      “We propose that K+ binding either facilitates LeuT transition from inward- to outward-facing (the rate limiting step of the transport cycle), or solely prevents the rebinding and possible efflux of Na+ and substrate. It could also be a combination of both. Either way, intracellular K+ will lead to an increase in Vmax and concentrative capacity. Note that our previous experiment showed an increased [3H]alanine efflux when LeuT transports alanine in the absence of intra-vesicular K+16. Specifically, the mechanistic impact of K+ could be to catalyze LeuT away from the state that allows the rebinding of Na+ and substrate. This way, K+ binding would decrease the possible rebinding of intracellularly released Na+ and substrate, thereby rectifying the transport process and increase the concentrative capacity and Vmax (Figure 6). Our results suggest that K+ is not counter-transported but rather promotes LeuT to overcome an internal rate limiting energy barrier. However, further investigations must be performed before any conclusive statement can be made here.”

      • Describe the orientation of the transporter in the vesicles.

      When working with reconstituted NSS, the transport activity is determined by the Na+ gradient. This is also evident in the experiments where we dissipate the Na+ gradient. Here we find transport activity compatible to background. We can also see in the literature, that directionality is rarely determined for transport proteins in reconstituted systems. When that is said, it is difficult to know how the inside-out LeuT contribute to the transport process. Will they work in reverse and contribute to the accumulation of intravesicular [3H]alanine? If so, to what extent? They will likely not be affected by the intravesicular K+. Therefore, their possible contribution will ‘work against’ our results and decrease the apparent K+ effects reported herein. Taken together, unless the vast majority of LeuT molecules are inside-out, knowing the actual proportion will not, in our perspective, affect our interpretations and conclusions of the data.

      When that is said, we have also been curious about this issue and with the question raised by the reviewer, we performed the suggested experiment. We have inserted the results in Figure 3 – Figure supplement 1D. The figure shows that a fraction of the reconstituted LeuT are susceptible to thrombin cleavage of the accessible C-terminal. We have quantified the cleaved fraction to around 40% of the total (see Author response image 1 below). It is, however, a crude estimate since it is difficult to perform reliable dosimetry with fractions that close together. Thus, we are reluctant to add a quantitative measure in the article text.

      Author response image 1.

      We have inserted the following in the main text:

      “It is difficult to control the directionality of proteins when they are reconstituted into lipid vesicles. They will be inserted in both orientations. Outside-out and inside-out. In the case of LeuT it is the imposed Na+-gradient which is determines the directionality of transport. Uptake through the insideout transporters will probably also happen. Note that the inside-out LeuT will not have the K+ binding site exposed to the intra-vesicular environment. Accordingly, a propensity of transporters will likely not be influenced by the added K+ and will tend to mask the contribution of K+ to the transport mode from the right-side out LeuT. To investigate LeuT directionality in our reconstituted samples, we performed thrombin cleavage of accessible C-terminals on intact and perforated vesicles, respectively. The result suggests that the proportion of LeuT inserted as outside-out is larger than the proportion with an inside-out directionality (Figure 3 – Figure supplement 1D).”

      For the inserted Figure 3 – Figure supplement 1D, we have added the following legend:<br /> “(D) SDS-PAGE analysis of LeuT proteoliposomes following time-dependent thrombin digestion of accessible C-terminals (reducing the mass of LeuT by ~1.3 kDa). The reaction was terminated by the addition of PMSF at the specified time points. The lanes corresponding to the time-dependent proteolysis are flanked by lanes containing proteoliposomes without thrombin (left, 0 min) or digested in the presence of DDM (right, 180 min+DDM). Arrows indicate bands of full-length (top) and cleaved (bottom) LeuT.”

      • Check the effects of mutations away from the Na1 cation binding site.

      We have included the LeuT K398C in the study as a negative control for unspecific effects on Na+ and K+ binding. The mutant exhibit Na+ dependent [3H]leucine binding and K+-dependency similar to LeuT WT – see Table 2 and Table 2 - Figure Supplement 1G.

      As a minor point, the authors use the term "affinity" liberally. However, unless these are direct binding experiments, the term "apparent affinity" may be more appropriate, since Km values are affected by the transport cycle (in uptake), as well as binding of cations/substrate.

      We thank the reviewer for emphasizing this important point. We have revised the manuscript accordingly. We use ‘affinity’ when it has been determined under equilibrium conditions, either as a SPA binding experiment or based on tmFRET. We use the term ‘Km’ when the apparent affinity has been determined during non-equilibrium conditions such as during substrate transport.

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in part 2, it is important to show the effect of internal potassium on transport in-sided liposomes. This could be done using the methodology developed by Tsai et. al. Biochemistry 51 (2012) 1557-1585.

      We appreciate this important point and have performed the suggested experiment. See reviewer 1 comment #2

      In the Abstract and throughout it is mentioned that K+ is not counter transported, yet on the bottom of p. 16 it is mentioned that this is possible.

      We have tried to be very cautious with any interpretation about whether K+ is only binding or whether it is also counter-transported. Either way, it must facilitate a transition towards a non-Na+ binding state. We tried to differentiate between the two possibilities by investigating if an outwarddirected K+ gradient alone could drive transport (Figure 3E). We do not observe any significant difference from background (no gradient). However, the gained information is rather weak: It is still possible that K+ is counter-transported, but the K+ gradient does not impose any driving force. Instead, it ensures a rectification of the Na+-dependent substrate transport. If so, this experiment would come up negative even if K+ is counter-transported.

      To be more explicit, we have changed the wording on page 16.

      Our results suggests that K+ is not counter-transported, but rather promote LeuT to overcome an internal rate limiting energy barrier. However, further investigations must be performed before any conclusive statement can be made here.

      Fig.2-Fig. Supplement 1: it is important to show that the effect of leucine is sodium-dependent by adding the control K+ and leucine.

      We thank the reviewer for suggesting this important control. We have added the experiment to Figure 2 – Figure supplement 1 as suggested. The effect is not different from K+ alone supporting the SPA-binding data that K+-binding does not promote substrate binding.

      Point for discussion: Whereas potassium is counter transported in SERT, there are conflicting interpretations on this in DAT (Ref. 15 from the list and Bhat et. al eLife (2021) 10:e67996). The situation in LeuT seems like the scenario described by Bhat et. al.

      We appreciate the suggestion for a proposed link between LeuT and hDAT. Although, as mentioned above, we find it early days to be too certain on this option. We have now mentioned the mechanistic similarity in the Discussion following our description of the proposed mechanistic model (see first request from reviewer #1):

      “If K+ is not counter-transported, LeuT might comply with the mechanism previously suggested for the human DAT31.”

      Fig. 5-Fig. Supplement 1: Why are no data on N27Q and N286Q given? If these mutants have no transport activity this should be stated. Moreover, alanine uptake by A22V is almost sodium independent and is also very fast, suggesting binding, not transport. Are the counts sensitive to ionophores like nigericin?

      We appreciate this important point. Indeed, the LeuT N27Q and N286Q are transport inactive. This information is now inserted in the main text when describing the conformational dynamics of N27QtmFRET and N286QtmFRET.

      We agree with the reviewer that the [3H]alanine uptake for A22V is not very conclusive. The vesicles with Na+ on both sides (open diamonds) do allow [3H]alanine binding. Vesicles with added gramicidin are similar in activity. The fast rate could indeed suggest a binding event. This we also do not rule out in the main text. However, the contribution in activity from LeuT A22V in vesicles with a Na+ gradient cannot be explained by a binding event alone. Then it should bind more [3H]alanine in the presence of a Na+ gradient, which is possible, but hard to imagine. Also, the alanine affinity for LeuT A22V is ~1 µM (Table 1). At this affinity it should be literally impossible to detect any binding because the off-rate is so fast that it would all dissociate during the washing procedure.

      We have described the data and left out any interpretation (e.g. changed ‘[3H]alanine transport’ to ‘[3H]alanine activity’). In addition, we have replaced: “This correlates with the lack of changes in conformational equilibrium observed in the tmFRET data between the NMDG+, Na+ and K+ states.” with: “Further investigations must clarify whether the changes in observed [3H]alanine activity constitutes a transport- or a binding event.”

      Lower part of p. 16. The Authors speculate "that the mechanistic impact of K+ binding could be to accelerate a transition away from the conformation where Na+ and substrate are released, to a state where they can no longer rebind and thus revert the transport process (efflux)". This could be easily tested by measuring exchange, which should not be influenced by potassium.

      We performed this experiment in Billesbolle et al. 2016. Nat Commun (Fig. 1f). We show that the exchange is decreased in the presence of K+. We hypothesize that this is because K+ binding forces LeuT away from the exchange mode.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Positive comments:

      We appreciate the positive comments of the editor and reviewers. The editor noted that the paper presents a “technological advance” that has enabled “important insights about the brain circuits through which the cerebellum could participate in social interactions.” Reviewer 1 thought this was a “timely and important study with solid evidence for correlative conclusions” and that the experiments were “technically challenging” and “well-performed”. Reviewer 2 stated that the finding of correlated activity between the regions is “interesting as non-motor functions of the cerebellum are relatively little explored.” They also thought “that the data are presented clearly, and the manuscript is well-written”. Reviewer 3 mentioned that “this approach can be useful for many neuroscientists”. We thank all the positive comments from the editors and all the reviewers.

      Reviewer #1 (Public Review)

      While the novelty of the device is strongly emphasized, I find that its value is somewhat diminished by the wire-free device developed by the same group as it should thus be possible to perform calcium imaging wire-free and electrophysiological recording via a single conventional cable (or also via wireless headstages).

      While it would be potentially possible to use a wire-free Miniscope in parallel with a wired electrophysiology recording system, this would result in a larger footprint on the animal’s head, more than a gram in increased weight due to an added LiPo battery, a larger electrophysiology head-stage, and limited recording length due to a battery capacity of around 20 minutes. Our main goal for the development of the E-scope platform was to develop an expandable electrophysiology recording board that would work with all previously built UCLA Miniscopes while also streamlining the integration of power and data into the coaxial cable connection already familiar to hundreds of labs using Miniscopes. The vast majority of Miniscope experiments are done using wired systems and we aimed to support the expansion of those systems instead of requiring a more substantial switch to using wire-free Miniscopes.

      The role of the identified network activations in social interactions is not touched upon.

      We agree with the reviewer that we have not discovered a causal role for the co-modulated activity patterns we have observed. As these causal experiments will require the development of real-time techniques for blocking socially evoked changes in firing rate in cerebellum and ACC, we are currently planning experiments to address causality. These results will be described in a future publication.

      Reviewer #1 (Recommendations for the Authors):

      Please provide the number of recorded mice.

      The number is now provided in the revised manuscript.

      If the recorded areas (cerebellar cortex, DN, and ACC) are part of the same circuit regulating social interactions, it would be nice to get insights into the directionality of the circuit. The authors favor the possibility that during social behavior, cerebellar efferences indirectly influence ACC activities (as in Figure 4A), however, no evidence is presented to support this interpretation. ACC activities might also indirectly influence PC firing. It may be possible to get insights into this by comparing the timing of neuronal activity in the different areas with respect to social onset.

      For this study, we mainly focused on the output of the cerebellar circuit to the cortex as previous work shows that dentate nucleus projects to the thalamus, which in turn projects to ACC and other cortical regions. (Badura et al.,eLife, 2018; Kelly et al., Nat. Neurosci., 2020) The temporal resolution of calcium imaging is limited (with the rise time of calcium events with genetically-encoded indicators taking hundreds of milliseconds) such that the resolution is insufficient to precisely assess the relative onset timing of the two regions. Our work certainly does not rule out cortical influences on PC firing.

      Reviewer #2 (Public Review)

      However, the causal relationship is far from established with the methods used, leaving it unclear if these two brain regions are similarly engaged by the behavior or if they form a pathway/loop.

      As indicated in our response to Reviewer #1’s similar critique, the goal of the presented study is to demonstrate the feasibility and capabilities of this novel device. This new tool will allow us to conduct a comprehensive and rigorous study to assess the causal role of the interactions between the cerebellum and ACC in social behavior (as well as other behaviors). These experiments are being designed now.

      Reviewer #2 (Recommendations for the Authors):

      It is unclear what is entirely unique about the E-scope. It seems that its advance is simply a common cable that allows interfacing with both devices (lighter weight than two cables is stated in the Discussion). Is this really an advance? What are its limitations? E.g., how close can the recording sites be to one another? How can it be configured for any other extracellular recording approach (tetrodes, 64-channel arrays, or Neuropixels)?

      In our experience, multiple lines of wires tethered to different head-mounted devices on an animal significantly impacts their behavior. Therefore, one of the major advantages of the UCLA Miniscope Platform is the use of a single, flexible coaxial cable to minimize the impact on tethering on behavior. The E-Scope platform builds on top of this work by incorporating electrophysiology recording capabilities into this single, flexible coaxial cable. Additionally, the electrophysiology recording hardware is backwards compatible with all previously built UCLA Miniscopes and can run through open-source and commercial commutators already used in Miniscope experiments.

      The available bandwidth within the shared single coaxial cable can handle megapixel Miniscope imaging along with the maximum data output of a 32 channel Intan Ephys IC. The E-Scope platform presented here does run the Intan Ephys IC at 20KSps for all 32 channels instead of the maximum 30KSps due to microcontroller speed limitations, but this could be overcome by using a fast microcontroller or clock, or slightly reducing the total number of electrodes samples. Finally, the E-Scope was designed to support any electrode types supported by the Intan Ephys IC. This includes up to 32 channels of passive probes such as single electrodes, tetrodes, silicon probes, and flexible multi-channel arrays but does not include Neuropixels as Neuropixels use custom active electronics on the probe to multiplex, sample, and serialize electrophysiology data.

      The authors only analyzed simple spikes in PCs for social-related activity. What about complex spikes? Is this correlated with ACC activity?

      Complex spikes were detectable to the extent that we were able to define that the recorded cell was a PC, but because these cells were recorded in freely behaving mice, accurate complex spike detection was not reliable enough to be used for further correlational analyses.

      The data is sampled in the two regions (cerebellum and ACC) at very different rates (imaging is much slower than electrophysiology; ephys data was binned). How does this affect the correlation plots?

      We generated firing rate maps for the cerebellar neural activity using a binning size that matched the sampling frequency of calcium imaging (see Methods). As mentioned in our methods, to study the relationship between the electrophysiology and calcium imaging data we binned the spike trains using 33 ms bins to match the calcium imaging sampling rate (~30 Hz). This limits the temporal resolution to calculate fine-scale correlations, but the correlations that we report are on a behaviorally relevant temporal scale. The fine temporal resolution of the electrophysiology data however can still be used to further examine at a higher temporal resolution the relationship between cerebellar output and specific social behavior epochs.

      For the correlation analysis, over what time frame was the activity relationship examined? How was this duration determined?

      Author response image 1.

      The main criteria for the time frame used to study the correlation analysis was the behavioral timescale of social interaction [see figure above for the number of social (red) and object (blue) interaction bouts (a), their duration (b) and coefficient of variation (CV) (c)]. Overall, the activity relationship time frame was based on the average duration of the social interactions (~3 sec). Periods of 3.8 before and 5.8 sec after interaction onset were used to study. Accordingly, the cross-correlograms were constructed using a maximum lag length of 5 sec. In the article we reported correlation at lag 0.

      The relationship between the cerebellum and ACC seems unconvincing. If two brain regions are similarly engaged by the behavior, wouldn't they have a high correlation? Is the activity in one region driving the other?

      We reference studies showing an anatomical and functional indirect connection between the cerebellum and the ACC or prefrontal cortex (Badura et al., eLife, 2018). Also, as stated in the introduction, the ACC is a recognized brain area for social behavioral studies. In the results, we stated that correlations increase in groups of neurons that are similarly engaged during a specific epoch in the social interaction was an expected finding. What was not expected was that there would be no difference in the distribution's correlation when the social epochs were removed, suggesting that intrinsic connectivity does not drive a difference in correlations.

      Although, since there is a cerebello-cortical loop, further study will be needed to understand which area initiates this type of activity during social behavior,

      • In the figures, the color-coded scale bars should be labeled as z-scores (confusing without them).

      • In Figure 4, the color differences for Soc-ACC, Soc+ACC and SocNS ACC should be more striking as it is hard to tell them apart because they are all similar shades of blue-gray.

      We thank the reviewer for their suggestions for improving the figures. We have incorporated these changes in Figures 2, 3 and along with their figure supplements. Graphs in Figure 4D-G have been edited to make the lines more visible to the reader.

      Reviewer #3 (Public Review)

      However, a mouse weighs between 20 and 40 g, so that an implant of 4.5 g is still quite considerable. It can be expected that this has an impact on the behavior and, possibly, the well-being of the animals. Whether this is the case or not, is not really addressed in this study.

      The weight of the E-Scope (4.5 g) is near the maximum that is tolerated by animals in our experience. We therefore acclimated the mouse to the weight with dummy scopes of increasing weights over a 7-10 day period. During this period, we observed the animal to have normal exploratory behavior. Specifically, there is no change in the sociability of the animals (Figure 2A) and animals cover the large arena (48x 48 cm, Figure 2H).

      Overall, the description of animal behavior is rather sparse. The methods state only that stranger age-matched mice were used, but do not state their gender. The nature of the social interactions was not described? Was their aggressive behavior, sexual approach and/or intercourse? Did the stranger mice attack/damage the E-Scope? Were the interactions comparable (using which parameters?) with and without E-Scope attached? It is not even described what the authors define as an "interaction bout" (Figure 2A). The number of interaction bouts is counted per 7 minutes, I presume? This is not specified explicitly.

      As mentioned in the methods section of the original version of our manuscript, all the target mice were age-matched “male” mice. As per the reviewer’s suggestion, we now have added in the manuscript that before any of our social interaction behavioral experiments, aggressive or agitated mice were removed after assessing their behavior in the arena during habituation. For all trials, all mice were introduced for the first time.

      We also mention in the methods section of our manuscript, that social behaviors were evaluated by proximity between the subject mouse and novel target mouse (2 cm from the body, head, or base of tail). From our recordings, we did not observe any aggressive, mounting, nor any other dominance behavior over the E-Scope subject mouse during the 7 minutes of social interaction assessment. Social interaction bouts in Figure 2A show the average number of social interaction bouts during the recording time. This has now been expanded upon in our revised manuscript.

      It would be very insightful if the authors would describe which events they considered to be action potentials, and which not. Similarly, the raw traces of Figure 1E are declared to be single-unit recordings of Purkinje cells. Partially due to the small size of the traces (invisible in print and pixelated in the digital version), I have a hard time recognizing complex spikes and simple spikes in these traces. This is a bit worrisome, as the authors declare the typical duration of the pause in simple spike firing after a complex spike to be 20-100 ms. In my experience, such long pauses are rare in this region, and definitely not typical. In the right panel of Figure 1A, an example of a complex spike-induced pause is shown. This pause is around 15 ms, so not typical according to the text, and starts only around 4 ms after the complex spike, which should not be the case and suggests either a misalignment of the figure or the detection of complex spike spikelets as simple spikes, while the abnormally long pause suggests that the authors fail to detect a lot of simple spikes. The authors could provide more confidence in their data by including more raw data, making explicit how they analyzed the signals, and by reporting basic statistics of firing properties (like rate, cv or cv2, pause duration). In this respect, Figure 2 - figure supplement 3 shows quite a large percentage of cells to have either a very low or a very high firing rate.

      We now provide a better example of simple spikes and complex spikes in Fig 1E and corrected our comment in the body of the manuscript. Previous version of the SS x CS cross-correlation histogram in Figure 1G as the reviewer mentions, was not the best example, because of the detected CS spikelets. However, the detection of CS spikelets has little impact on the interpretation of the results. We have replaced this figure with a better example of the SS x CS cross-correlation histogram.

      The number of Purkinje cells recorded during social interactions is quite low: only 11 cells showed a modulation in their spiking activity (unclear whether in complex spikes, simple spikes or both. During object interaction, only 4 cells showed a significant modulation. Unclear is whether the latter 4 are a subset of the former 11, or whether "social cells" and "object cells" are different categories. Having so few cells, and with these having different types of modulation, the group of cells for each type of modulation is really small, going down to 2 cells/group. It is doubtful whether meaningful interpretation is possible here.

      While the number of neurons is not as high as those reported for other regions, the number presented depicts the full range of responses to social behavior. It is extremely difficult to obtain stable neurons in freely behaving socially interacting animals and only a handful of neurons could be recorded in each animal. Among these recorded neurons only a subset responds to social interactions further reducing the numbers. The results however are consistent among cell types and the direction of modulation fits with the inhibitory connectivity between PCs and DN neurons. To our knowledge, we are the first group to publish neuronal activity of PC and DN neurons from freely behaving mice during social behavior.

      Neural activity patterns observed during social interaction do not necessarily relate specifically to social interaction, but can also occur in a non-social context. The authors control this by comparing social interactions with object interactions, but I miss a direct comparison between the two conditions, both in terms of behavior (now only the number of interactions is counted, not their duration or intensity), and in terms of neural activity. There is some analysis done on the interaction between movement and cerebellar activity (Figure 2 - figure supplement 4), but it is unclear to what extent social interactions and movements are separated here. It would already help to indicate in the plots with trajectories (e.g., Fig. 2H) indicate the social interactions (e.g., social interaction-related movements in red, the rest of the trajectories in black).

      We have updated the social interaction plots in Figure 2H in the revised version of the manuscript.

      Reviewer #3 (Recommendations for the Authors):

      Increase the number of cerebellar neurons that are recorded.

      Due to the difficulty of the experiment and the low yield which we get for cerebellar recordings, substantially increasing the number of neurons will require many more experiments which are not feasible at this time.

      Include more raw data and make the analysis procedure more insightful with illustrations of intermediate steps.

      We have included a more thorough description of the analysis in the methods section of the revised manuscript.

      Provide a better description of the behavior.

      We have increased the level of detail regarding the mouse behavior in the Results and Methods sections. This includes a more detailed description of the parameters we used to analyze the social interaction.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      The authors aim to test the sensory recruitment theory of visual memory, which assumes that visual sensory areas are recruited for working memory, and that these sensory areas represent visual memories in a similar fashion to how perceptual inputs are represented. To test the overlap between working memory (WM) and perception, the authors use coarse stimulus (aperture) biases that are known to account for (some) orientation decoding in the visual cortex (i.e., stimulus energy is higher for parts of an image where a grating orientation is perpendicular to an aperture edge, and stimulus energy drives decoding). Specifically, the authors show gratings (with a given "carrier" orientation) behind two different apertures: one is a radial modulator (with maximal energy aligned with the carrier orientation) and the other an angular modulator (with maximal energy orthogonal to the carrier orientation). When the subject detects contrast changes in these stimuli (the perceptual task), orientation decoding only works when training and testing within each modulator, but not across modulators, showing the impact of stimulus energy on decoding performance. Instead, when subjects remember the orientation over a 12s delay, orientation decoding works irrespective of the modulator used. The authors conclude that representations during WM are therefore not "sensory-like", given that they are immune to aperture biases. This invalidates the sensory recruitment hypothesis, or at least the part assuming that when sensory areas are recruited during WM, they are recruited in a manner that resembles how these areas are used during perception.

      Strengths:

      Duan and Curtis very convincingly show that aperture effects that are present during perception, do not appear to be present during the working memory delay. Especially when the debate about "why can we decode orientations from human visual cortex" was in full swing, many may have quietly assumed this to be true (e.g., "the memory delay has no stimuli, and ergo no stimulus aperture effects"), but it is definitely not self-evident and nobody ever thought to test it directly until now. In addition to the clear absence of aperture effects during the delay, Duan and Curtis also show that when stimulus energy aligns with the carrier orientation, cross-generalization between perception and memory does work (which could explain why perception-to-memory cross-decoding also works). All in all, this is a clever manipulation, and I'm glad someone did it, and did it well.

      Weaknesses:

      There seems to be a major possible confound that prohibits strong conclusions about "abstractions" into "line-like" representation, which is spatial attention. What if subjects simply attend the endpoints of the carrier grating, or attend to the edge of the screen where the carrier orientation "intersects" in order to do the task? This may also result in reconstructions that have higher bold at areas close to the stimulus/screen edges along the carrier orientation. The question then would be if this is truly an "abstracted representation", or if subjects are merely using spatial attention to do the task.

      Alternatively (and this reaches back to the "fine vs coarse" debate), another argument could be that during memory, what we are decoding is indeed fine-scale inhomogenous sampling of orientation preferences across many voxels. This is clearly not the most convincing argument, as the spatial reconstructions (e.g., Figure 3A and C) show higher BOLD for voxels with receptive fields that are aligned to the remembered orientation (which is in itself a form of coarse-scale bias), but could still play a role.

      To conclude that the spatial reconstruction from the data indeed comes from a line-like representation, you'd need to generate modeled reconstructions of all possible stimuli and representations. Yes, Figure 4 shows that line results in a modeled spatial map that resembles the WM data, but many other stimuli might too, and some may better match the data. For example, the alternative hypothesis (attention to grating endpoints) may very well lead to a very comparable model output to the one from a line. However testing this would not suffice, as there may be an inherent inverse problem (with multiple stimuli that can lead to the same visual field model).

      The main conclusion, and title of the paper, that visual working memories are abstractions of percepts, is therefore not supported. Subjects could be using spatial attention, for example. Furthermore, even if it is true that gratings are abstracted into lines, this form of abstraction would not generalize to any non-spatial feature (e.g., color cannot become a line, contrast cannot become a line, etc.), which means it has limited explanatory power.

      We thank the reviewer for bringing up these excellent questions.

      First, to test the alternative hypothesis of spatial attention, we fed a dot image into the image-computable model. We placed the dot where we suspect one might place their spatial attention, namely, at the edge of the stimulus that is tangent to the orientation of the grating. We generated the model response for three orientations and their combination by rotating and averaging. From Author response image 1 below, one can see that this model does not match the line-like representation we reported. Nonetheless, we would like to avoid making the argument that attention does not play a role. We strongly suspect that if one was attending to multiple places along a path that makes up a line, it would produce the results we observed. But there begins a circularity in the logic, where one cannot distinguish between attention to a line-like representation and a line of attention being the line-like representation.

      Author response image 1.

      Reconstruction maps for the dot image at the edge of 15°, 75°, 135°, and the combined across three orientation conditions.

      Second, we remain agnostic to the question of whether fine-scale inhomogenous sampling of orientation selective neurons may drive some of the decoding results we report here. It is possible that our line-like representations are driven by neurons tuned to the sample orientation that have receptive fields that lie along the line. Here, we instead focus on testing the idea that WM decoding does not depend on aperture biases.

      Finally, we agree with the reviewer that there is much more work to be done in this area. Our working hypothesis, that WM representations are abstractions of percepts, is admittedly based on Occam's razor and an appeal to efficient coding principles. We also agree that these results may not generalize to all forms of WM (eg, color). As always, there is a tradeoff between interpretability (visual spatial formats in retinotopically organized maps) and generalizability. Frankly, we have no idea how one might be able to test these ideas when subjects might be using the most common type of memory reformatting - linguistic representations, which are incredibly efficient.

      Additional context:

      The working memory and perception tasks are rather different. In this case, the perception task does not require the subject to process the carrier orientation (which is largely occluded, and possibly not that obvious without paying attention to it), but attention is paid to contrast. In this scenario, stimulus energy may dominate the signal. In the WM task, subjects have to work out what orientation is shown to do the task. Given that the sensory stimulus in both tasks is brief (1.5s during memory encoding, and 2.5s total in the perceptual task), it would be interesting to look at decoding (and reconstructions) for the WM stimulus epoch. If abstraction (into a line) happens in working memory, then this perceptual part of the task should still be susceptible to aperture biases. It allows the authors to show that it is indeed during memory (and not merely the task or attentional state of the subject) that abstraction occurs.

      Again, this is an excellent question. We used a separate perceptual task instead of the stimulus epoch as control mainly for two reasons. First, we used a control task in which participants had to process the contrast, not orientation, of the grating because we were concerned that participants would reformat the grating into a line-like representation to make the judgments. To avoid this, we used a task similar to the one used when previous researchers first found the stimulus vignetting effect (Roth et al., 2018). Again, our main goal was to try to focus on the bottom-up visual features. Second, because of the sluggishness of the BOLD response, combined with our task design (ie, memory delay always followed the target stimulus), we cannot disentangle the visual and memory responses that co-exist at this epoch. Any result could be misleading.

      What's also interesting is what happens in the passive perceptual condition, and the fact that spatial reconstructions for areas beyond V1 and V2 (i.e., V3, V3AB, and IPS0-1) align with (implied) grating endpoints, even when an angular modulator is used (Figure 3C). Are these areas also "abstracting" the stimulus (in a line-like format)?

      We agree these findings are interesting and replicate what we found in our previous paper (Kwak & Curtis, Neuron, 2022). We believe that these results do imply that these areas indeed store a reformatted line-like WM representation that is not biased by vignetting. We would like to extend a note of caution, however, because the decoding results in the higher order areas (V3AB, IPS0-1, etc) are somewhat poor (especially in comparison to V1, V2, V3) (see Figure 2).

      Reviewer #2:

      Summary:

      According to the sensory recruitment model, the contents of working memory (WM) are maintained by activity in the same sensory cortical regions responsible for processing perceptual inputs. A strong version of the sensory recruitment model predicts that stimulus-specific activity patterns measured in sensory brain areas during WM storage should be identical to those measured during perceptual processing. Previous research casts doubt on this hypothesis, but little is known about how stimulus-specific activity patterns during perception and memory differ. Through clever experimental design and rigorous analyses, Duan & Curtis convincingly demonstrate that stimulus-specific representations of remembered items are highly abstracted versions of representations measured during perceptual processing and that these abstracted representations are immune to aperture biases that contribute to fMRI feature decoding. The paper provides converging evidence that neural states responsible for representing information during perception and WM are fundamentally different, and provides a potential explanation for this difference.

      Strengths:

      (1) The generation of stimuli with matching vs. orthogonal orientations and aperture biases is clever and sets up a straightforward test regarding whether and how aperture biases contribute to orientation decoding during perception and WM. The demonstration that orientation decoding during perception is driven primarily by aperture bias while during WM it is driven primarily by orientation is compelling.

      (2) The paper suggests a reason why orientation decoding during WM might be immune to aperture biases: by weighting multivoxel patterns measured during WM storage by spatial population receptive field estimates from a different task the authors show that remembered but not actively viewed - orientations form "line-like" patterns in retinotopic cortical space.

      We thank the reviewer for noting the strengths in our work.

      Weaknesses:

      (1) The paper tests a strong version of the sensory recruitment model, where neural states representing information during WM are presumed to be identical to neural states representing the same information during perceptual processing. As the paper acknowledges, there is already ample reason to doubt this prediction (see, e.g., earlier work by Kok & de Lange, Curr Biol 2014; Bloem et al., Psych Sci, 2018; Rademaker et al., Nat Neurosci, 2019; among others). Still, the demonstration that orientation decoding during WM is immune to aperture biases known to drive orientation decoding during perception makes for a compelling demonstration.

      We agree with the reviewer, and would add that the main problem with the sensory recruitment model of WM is that it remains underspecified. The work cited above and in our paper, and the results in this report is only the beginning of efforts to fully detail what it means to recruit sensory mechanisms for memory.

      (2) Earlier work by the same group has reported line-like representations of orientations during memory storage but not during perception (e.g., Kwak & Curtis, Neuron, 2022). It's nice to see that result replicated during explicit perceptual and WM tasks in the current study, but I question whether the findings provide fundamental new insights into the neural bases of WM. That would require a model or explanation describing how stimulus-specific activation patterns measured during perception are transformed into the "line-like" patterns seen during WM, which the authors acknowledge is an important goal for future research.

      We agree with the reviewer that perhaps some might see the current results as an incremental step given our previous paper. However, we would point out that researchers have been decoding memorized orientation from the early visual cortex for 15 years, and not one of those highly impactful studies had ever done what we did here, which was to test if decoded WM representations are the product of aperture biases. Not only do our results indicate that decoding memorized orientation is immune to these biases, but they critically suggest a reason why one can decode orientation during WM.

      Reviewer #3:

      Summary:

      In this work, Duan and Curtis addressed an important issue related to the nature of working memory representations. This work is motivated by findings illustrating that orientation decoding performance for perceptual representations can be biased by the stimulus aperture (modulator). Here, the authors examined whether the decoding performance for working memory representations is similarly influenced by these aperture biases. The results provide convincing evidence that working memory representations have a different representational structure, as the decoding performance was not influenced by the type of stimulus aperture.

      Strengths:

      The strength of this work lies in the direct comparison of decoding performance for perceptual representations with working memory representations. The authors take a well-motivated approach and illustrate that perceptual and working memory representations do not share a similar representational structure. The authors test a clear question, with a rigorous approach and provide convincing evidence. First, the presented oriented stimuli are carefully manipulated to create orthogonal biases introduced by the stimulus aperture (radial or angular modulator), regardless of the stimulus carrier orientation. Second, the authors implement advanced methods to decode the orientation information present, in visual and parietal cortical regions, when directly perceiving or holding an oriented stimulus in memory. The data illustrates that working memory decoding is not influenced by the type of aperture, while this is the case in perception. In sum, the main claims are important and shed light on the nature of working memory representations.

      We thank the reviewer for noting the strengths in our work.

      Weaknesses:

      I have a few minor concerns that, although they don't affect the main conclusion of the paper, should still be addressed.

      (1) Theoretical framing in the introduction: Recent work has shown that decoding of orientation during perception does reflect orientation selectivity, and it is not only driven by the stimulus aperture (Roth, Kay & Merriam, 2022).

      Excellent point, and similar to the point made by Reviewer 1. We now adjust our text and cite the paper in the Introduction.

      Below, we paste our response to Reviewer 1:

      “Second, we remain agnostic to the question of whether fine-scale inhomogenous sampling of orientation selective neurons may drive some of the decoding we report here. It is possible that our line-like representations are driven by neurons tuned to the sample orientation that have receptive fields that lie along the line. Here, we instead focus on testing the idea that WM decoding does not depend on aperture biases.”

      (2) Figure 1C illustrates the principle of how the radial and angular modulators bias the contrast energy extracted by the V1 model, which in turn would influence orientation decoding. It would be informative if the carrier orientations used in the experiment were shown in this figure, or at a minimum it would be mentioned in the legend that the experiment used 3 carrier orientations (15{degree sign}, 75{degree sign}, 135{degree sign}) clockwise from vertical. Related, when trying to find more information regarding the carrier orientation, the 'Stimuli' section of the Methods incorrectly mentions that 180 orientations are used as the carrier orientation.

      We apologize for not clearly indicating the stimulus features in the figure. Now, we added the information about the target orientations in Figure 1C legend. Also, we now corrected in the Methods section the mistakes about the carrier orientation and the details of the task. Briefly, participants were asked to use a continuous report over 180 orientations. We now clarify that “We generated 180 orientations for the carrier grating to cover the whole orientation space during the continuous report task.”

      (3) The description of the image computable V1 model in the Methods is incomplete, and at times inaccurate. i) The model implements 6 orientation channels, which is inaccurately referred to as a bandwidth of 60{degree sign} (should be 180/6=30). ii) The steerable pyramid combines information across phase pairs to obtain a measure of contrast energy for a given stimulus. Here, it is only mentioned that the model contains different orientation and spatial scale channels. I assume there were also 2 phase pairs, and they were combined in some manner (squared and summed to create contrast energy). Currently, it is unclear what the model output represents. iii) The spatial scale channel with the maximal response differences between the 2 modulators was chosen as the final model output. What spatial frequency does this channel refer to, and how does this spatial frequency relate to the stimulus?

      (i) First, we thank the reviewer for pointing out this mistake since the range of orientations should be 180deg instead of 360deg. We corrected this in the revised version.

      (ii) Second, we apologize for not being clear. In the second paragraph of the “Simulate model outputs” section, we wrote,

      “For both types of stimuli, we used three target orientations (15°, 75°, and 135° clockwise from vertical), which had two kinds of phases for both the carriers and the modulators. We first generated the model’s responses to each target image separately, then averaged the model responses across all phases for each orientation condition.”

      We have corrected this text by now writing,

      from vertical), two phases for the carrier (0 or π), and two phases for the modulator (sine “For both types of stimuli, we used three target orientations (15°, 75°, and 135° clockwise from vertical), two phases for the carrier (0 or π), and two phases for the modulator (sine or cosine phase). We first generated the model responses to each phase condition separately, then averaged them across all phases for each orientation condition.”

      (iii) Third and again we apologize for the misunderstanding. Since both modulated gratings have the same spatial frequency, the channel with the largest response should be equal to the spatial frequency of the stimulus. We corrected this by now writing,

      “For the final predicted responses, we chose the subband with maximal responses (the 9th level), which corresponds to the spatial frequency of the stimulus (Roth, Heeger, and Merriam 2018).”

      (4) It is not clear from the Methods how the difficulty in the perceptual control task was controlled. How were the levels of task difficulty created?

      Apologies for not being clear. The task difficulty was created by setting the contrast differences between the two stimuli. The easiest level is choosing the first and the last contrast as pairs, while the hardest level is choosing the continuous two contrasts. We added these sentences

      “The contrast for each stimulus was generated from a predefined set of 20 contrasts uniformly distributed between 0.5 and 1.0 (0.025 step size). We created 19 levels of task difficulty based on the contrast distance between the two stimuli. Thus, the difficulty ranged from choosing contrast pairs with the largest difference (0.5, easiest) to contrast pairs with the smallest difference (0.025, hardest). Task difficulty level changed based on an adaptive, 1-up-2-down staircase procedure (Levitt 1971) to maintain performance at approximately 70% correct.”

      Recommendations For The Authors

      (Reviewer #1):

      (1) If the black circle (Fig 3A & C) is the stimulus size, and the stimulus (12º) is roughly half the size of the entire screen (24.8º), then how are spatial reconstructions generated for parts of the visual field that fall outside of the screen? I am asking because in Figure 3 the area over which spatial reconstructions are plotted has a diameter at least 3 times the diameter of that black circle (the stimulus). I'm guessing this is maybe possible when using a very liberal fitting approach to prf's, where the center of a prf can be outside of the screen (so you'd fit a circle to an elongated blob, assuming that blob is the edge of a circle, or something). Can you really reliably estimate that far out into visual space/ extrapolate prf's that exist in a part of the space you did not fully map (because it's outside of the screen)?

      We thank the reviewer for pointing out this confusing issue.

      First, the spatial construction map has a diameter 3 times the diameter of the stimulus because we included voxels whose pRF eccentricities were within 20º in the reconstruction, the same as Kwak & Curtis, 2022. There are reasons for doing so. First, while the height of the screen is 24.8º, the width of the screen is 44º. Thus, it is possible to have voxels whose pRF eccentricities are >20º. Second, for areas outside the height boundaries, there might not be pRF centers, but the whole pRF Gaussian distributions might still cover the area. Moreover, when creating the final map combined across three orientation conditions, we rotated them to be centered vertically, which then required a 20x20º square. Finally, inspecting the reconstruction maps, we noticed that the area that was twice the stimulus size (black circle) made very little contributions to the reconstructions. Therefore, the results depicted in Figure 3A&C are justified, but see the next comment and our response.

      (2) Is the quantification in 3B/C justified? The filter line uses a huge part of visual space outside of the stimulus (and even the screen). For the angular modulator in the "perception" condition, this means that there is no peak at -90/90 degree. But if you were to only use a line that is about the size of the stimulus (a reasonable assumption), it would have a peak at -90/90 degree.

      This is an excellent question. We completely agree that it is more reasonable to use filter lines that have the same size (12º) as the stimulus instead of the whole map size (40º). Based on the feedback from the Reviewer, we redid the spatial reconstruction analyses and now include the following changes to Figure 3.

      (1) We fitted the lines using pixels only within the stimulus. In Figure 3A and Figure 3C, we now replaced the reconstruction maps.

      (2) We added the color bar in Figure 3A.

      (3) We regenerated the filtered responses and calculated the fidelity results by using line filters with the stimulus size. We replaced the filtered responses and fidelity results in Figure 3B and Figure 3D. With the new analysis, as anticipated by the Reviewer, we now found peaks at -90/90 degrees for the angular modulated gratings in the perceptual control task in V1 and V2. Thank you Reviewer 1!!!!

      (4) We also made corresponding changes in the Supplementary Figure S4 and S5, as well as the statistical results in Table S4 and S5.

      (5) In the “Methods” section, we added “within the stimulus size” for both “fMRI data analysis: Spatial reconstruction” and “Quantification and statistical analysis” subsections.

      (3) Figure 4 is nice, but not exactly quantitative. It does not address that the reconstructions from the perceptual task are hugging the stimulus edges much more closely compared to the modeled map. Conversely, the yellow parts of the reconstructions from the delay fan out much further than those of the model. The model also does not seem to dissociate radial/angular stimuli, while in the perceptual data the magnitude of perceptual reconstruction is clearly much weaker for angular compared to radial modulator.

      We thank the reviewer for this question. First, we admit that Figure 4 is more qualitative than quantitative. However, we see no alternative that better depicts the similarity in the model prediction and the fMRI results for the perceptual control and WM tasks. The figure clearly shows the orthogonal aperture bias. Second, we agree that aspects of the observed fMRI results are not perfectly captured by the model. This could be caused by many reasons, including fMRI noise, individual differences, etc. Importantly, different modulators induce orthogonal aperture bias in the perceptual but not the WM task, and therefore does not have a major impact on the conclusions.

      (4) The working memory and perception tasks are rather different. In this case, the perception task does not require the subject to process the carrier orientation (which is largely occluded, and possibly not that obvious without paying attention to it), but attention is paid to contrast. In this scenario, stimulus energy may dominate the signal. In the WM task, subjects have to work out what orientation is shown to do the task. Given that the sensory stimulus in both tasks is brief (1.5s during memory encoding, and 2.5s total in the perceptual task), it would be interesting to look at decoding (and reconstructions) for the WM stimulus epoch. If abstraction (into a line) happens in working memory, then this perceptual part of the task should still be susceptible to aperture biases. It allows the authors to show that it is indeed during memory (and not merely the task or attentional state of the subject) that abstraction occurs.

      We addressed the same point in the response for Reviewer 1, “additional context” section.

      Recommendations for improving the writing:

      (1) The main text had too little information about the Methods. Of course, some things need not be there, but others are crucial to understanding the basics of what is being shown. For example, the main text does not describe how many orientations are used (well... actually the caption to Figure 1 says there are 2: horizontal and vertical, which is confusing), and I had to deduce from the chance level (1/3) that there must have been 3 orientations. Also, given how important the orthogonality of the carrier and modulator are, it would be good to have this explicit (I would even want an analysis showing that indeed the two are independent). A final example is the use of beta weights, and for delay period decoding only the last 6s (of the 12s delay) are modeled and used for decoding.

      We thank the reviewer for identifying aspects of the manuscript that were confusing. We made several changes to the paper to clarify these details.

      First, we added the information about the orientations we used in the caption for Figure 1 and made it clear that Figure 1C is just an illustration using vertical/horizontal orientations. Second, the carrier and the modulator are different in many ways. For example, the carrier is a grating with orientation and contrast information, while the modulator is the aperture that bounds the grating without these features. Their phases are orthogonal, and we added this in the second paragraph of the “Stimuli” section. Last, in the main text and the captions, we now denote “late delay” when writing about our procedures.

      (2) Right under Figure 3, the text reads "angular modulated gratings produced line-like representations that were orthogonal carrier orientation reflecting the influence of stimulus vignetting", but the quantification (Figure 3D) does not support this (there is no orthogonal "bump" in the filtered responses from V1-V3, and one aligned with the carrier orientation in higher areas).

      This point was addressed in the “recommendations for the authors (Reviewer 1), point 2” above.

      Minor corrections to text and figures:

      (1) Abstract: "are WM codes" should probably be "WM codes are".

      We prefer to keep “are WM codes” as it is grammatically correct.

      (2) Introduction: Second sentence 2nd paragraph: representations can be used to decode representations? Or rather voxel patterns can be used...

      Changed to “On the one hand, WM representations can be decoded from the activity patterns as early as primary visual cortex (V1)...”

      (3) Same paragraph: might be good to add more references to support the correlation between V1 decoding and behavior. There's an Ester paper, and Iamchinina et al. 2021. These are not trial-wise, but trial-wise can also be driven by fluctuating arousal effects, so across-subject correlations help fortify this point.

      We added these two papers as references.

      (4) Last paragraph: "are WM codes" should probably be "WM codes are".

      See (1) above.

      (5) Figure 1B & 2A caption: "stimulus presenting epoch" should probably be "stimulus presentation epoch".

      Changed to “stimulus epoch”.

      (6) Figure 1C: So this is very unclear, to say stimuli are created using vertical and horizontal gratings (when none of the stimuli used in the experiment are either).

      We solved and answered this point in response to Reviewer 3, point 2.

      (7) Figure 2B caption "cross" should probably be "across".

      We believe “cross” is fine since cross here means cross-decoding.

      (8) Figure 3A and C are missing a color bar, so it's unclear how these images are generated (are they scaled, or not) and what the BOLD values are in each pixel.

      All values in the map were scaled to be within -1 to 1. We added the color bar in both Figure 3 and Figure 4.

      (9) Figure 3B and D (bottom row) are missing individual subject data.

      We use SEM to indicate the variance across subjects.

      (10) Figure D caption: "early (V1 and V2)" should probably be "early areas (V1 and V2)".

      Corrected.

      (11) Methods, stimuli says "We generated 180 orientations for the carrier grating to cover the whole orientation space." But it looks like only 3 orientations were generated, so this is confusing.

      We solved and answered this point in response to Reviewer 3, point 2.

      (12) Further down (fMRI task) "random jitters" is probably "random jitter"

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      […] Weaknesses:

      While there are no glaring weaknesses in this study, it should be noted that a great deal of literature has pinpointed the MPOA (and specifically inhibitory cells in this area) as being critical to sexual behavior, including female mating. However, no study to my knowledge has explored self-paced female mating with such fine control over manipulating and monitoring cellular activity in this region. In addition, this study may act to inspire others to further explore the additional brain regions found to show upregulation of neural activity (Fos) during mating completion in the female using the data sets generated here.

      Reviewer #2 (Public Review):

      […] Weaknesses:

      The authors include an elegant manipulation of ejaculation-activated neurons in the MPOA using DREADD. However, this study was limited to show that activation of previously activated cells was sufficient to reduce approach behavior in a paced mating paradigm and receiving intromissions in a home cage mating paradigm. An inhibition approach using DREADD would have been a great complement to this study as it would have examined if activation of the cells was required. Moreover, additional tests for sexual motivation would have greatly strengthened the overall conclusions.

      Reviewer #3 (Public Review):

      […] Weaknesses:

      (1) Their activity-dependent labeling strategy is not exclusive to mating completion but instead includes all neurons active before, during, and after the social encounter. In the manuscript, the authors did not discuss the time course of Fos activation or the timeframe of the FosTRAP labeling strategy. Fos continues to be expressed and is detectable for hours following neural activation. Therefore, the FosTRAP strategy also labels neurons that were activated 3 hours before the injection of 4-OHT. The original FosTRAP2 paper which is cited in this manuscript (DeNardo et al, 2019) performed a detailed analysis of the labeling window in Supplementary Figure 2 of that paper. Here is quoted text from that paper: "Resultant patterns of tdTomato expression revealed that the majority of TRAPing occurred within a 6-hour window centered around the 4-OHT injection." Thus, the FosTRAP "mating completion" groups throughout this manuscript also include neurons activated 3 hours before mating completion, which includes neurons activated during appetitive and consummatory mating behaviors.

      This makes all of the FosTRAP data very difficult to interpret. Compounding this is the issue that the two groups the authors compare in their experiments are females administered 4-OHT following appetitive investigation behaviors (with the male removed before mating behaviors occurred) and females administered 4-OHT following mating completion. The "appetitive" group labeled neurons activated only during appetitive investigation, but the "completion" group labeled neurons activated during appetitive investigations, consummatory mating bouts, and mating completion. Therefore, in the brain-wide analysis of Figure 2, it is impossible to identify brain regions that were activated exclusively by mating completion and not by consummatory mating behaviors. This could have been achieved if the "completion" group was compared to a group of females that had commenced consummatory mating behaviors but were separated from the male before mating was completed. Then, any neurons labeled by the "completion" FosTRAP but not the "consummatory" FosTRAP would be neurons specifically activated by mating completion. In the current brain-wide analysis experiments, neurons activated by consummatory behaviors and mating completion can not be disassociated.

      This same issue is present in the interpretation of the chemogenetic activation data in Figure 6. In the experiments of Figure 6, the authors are activating neurons naturally activated during consummatory mating behaviors as well as those activated during mating completion.

      We appreciate the reviewers comments and concerns about the TRAP method.

      First, we agree that the FosTRAP method does not have the sensitivity to separate ensembles that happen within a short time window. From our preliminary results, we have observed that the cells that inject 4-OHT after mating completion induce more tdTomato cells in the MPN than injection after appetitive behavior or consummatory behavior (Author response image 1).

      To further compare the difference between the “consummatory” and “completion” ensemble, we included an additional cohort where we TRAP cells responding to consummatory behavior. This cohort is added to Figure 2, 6, S3, S4, S9, S10 and S11. From the whole brain mapping of TRAP cells, we found that many hypothalamic and extended amygdala areas including the medial preoptic area, and the bed nucleus of stria terminalis were shown to have significantly larger tdTomato+ cell density in the completion group than in the appetitive group while there was a tendency that the consummatory group also had larger cell density than the appetitive group. In the Gq-DREADD experiment, we found that the Completion-hM3Dq group but not the Consummatory-hM3Dq group showed the reduction of sexual motivation of the female mouse in the self-paced mating assay (Figure 6). The Completion-hM3Dq group but not the Consummatory-hM3Dq group also showed significantly low intromission events and tended to show lower receptivity in the home cage mating assay (Figure S10). Furthermore, post-hoc histological analysis also showed that the c-Fos+ and TRAP labeled cells in the MPN tended to be the larger in the Completion-hM3Dq group than in the Consummatory-hM3Dq group (Figure S9). These results, together with the in vivo Calcium imaging experiments in Figure 3, 4 and 5, suggests that the MPN contains male-ejaculation responsive cells that are distinct with the male-mounting responsive cells and that they are sufficient to suppress female sexual motivation.

      However, it is true that with the current state of mouse genetic tools, we do not have any methods with higher time accuracy. We have discussed the limitations of FosTRAP method regarding its low time sensitivity in the Discussion section.

      Author response image 1.

      Representative image showing TRAP labeling in the MPN after mating completion and intromission

      (2) This study does not definitively show that the female mice used in this study display decreased sexual motivation after the completion of mating. The females exhibit reduced interaction with males that had also just completed mating, but it is unclear if the females would continue to show reduced interaction time if given the choice to interact with a male that was not in the post-ejaculatory refractory period. Perhaps, these females have a natural preference to interact more with sexually motivated males compared to recently mated (not sexually motivated) males. To definitively show that these females exhibit decreased sexual motivation the authors should perform two control experiments: 1) provide the females with access to a fully sexually motivated male after the females have completed mating with a different male to see if interaction time changes, and 2) compare interaction time toward mated and non-mated males using the self-paced mating assay. These controls would show that the reduction in the interaction time is because the females have reduced sexual motivation and not because these females just naturally interact with sexually motivated males more than males in the post-ejaculatory refractory period.

      We highly appreciate the reviewers comments regarding the interpretation of the self-paced mating assay. To address the concerns, we added an experiment where the female subjects were introduced to a novel sexually motivated male mice in the self-paced mating assay immediately after receiving ejaculation (Figure S2). As result, we found that similar to the self-paced mating assay using the same male animal, the female subject spends significantly more time in the isolation zone on the post-ejaculation day when compared to the pre-ejaculation day.

      (3) It is unclear how the transient 90-second response of these MPOA neurons following the completion of mating causes the prolonged reduction in female sexual motivation that is at the minutes to hours timeframe. No molecular or cellular mechanism is discussed.

      (4) The authors discuss potential cell types and neural population markers within the MPOA and go into some detail in Figure S3. However, their experiments are performed with only the larger excitatory and inhibitory MPOA neural populations.

      While the molecular or cellular mechanism of prolonged activity of MPOA neurons is  critical to understand the neural mechanism of how sustained neural activity in the MPOA suppress female sexual motivation, it is out of the reach of the current manuscript and a subject of future studies. We have added a section in the discussion part to further discuss the potential molecular mechanisms.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      If the authors haven't already, it would be useful if the authors could make the brain-wide analysis of Fos activity publicly available.

      We have distributed the data to https://dandiarchive.org/

      I would also make sure the n's are included in each Figure Legend for each panel (some are missing in the Supplementals).

      We appreciate the comment, we have added the number of subjects to Figure 3, 4, 5.

      It would also be best to provide clearer labels to some of the Figures, for example, Figure 5D, the Types should also be labeled with what behaviors they correspond to.

      We appreciate the comment. Figure 5 is focused on post-ejaculation neural activity. The cell types are categorized based neural activity after experiencing male ejaculation, it does not correspond to any behaviors.

      Reviewer #2 (Recommendations For The Authors):

      (1) A first recommendation is to replace the use of the term "mating completion" with "ejaculation". Male and female rodents display a period of reduced approach behavior following display or experiencing ejaculation, which is referred to as the post-ejaculatory interval. The current studies investigate the neural ensemble that contributes to this post-ejaculatory interval in female mice. In addition, male and female rodents will display a prolonged period of sexual inactivity referred to as satiety, which is typically observed after repeated display or experience of ejaculations. The current studies do not investigate satiety. Moreover, in the current studies, female mice appeared to display approach behavior (time in the interaction zone) even within the 10 minutes following experiencing ejaculation (Fig 1F). Hence, the term "completion" is not accurate and should be replaced by "ejaculation" in all figures and throughout the manuscript. Replacing completion with ejaculation will also clarify what defines "onset of completion", which this reviewer assumes refers to the onset of ejaculatory behavior observed in the male.

      Thank you for the comment. We agree that the mating completion was inappropriate. We have changed the wording to ejaculation or post-ejaculatory period.

      (2) Likewise, a variety of other terms and descriptions need to be adjusted for consistency and accuracy. For example, "room" when referring to the interaction or isolation zones; "onset of mating completion" when referring to ejaculation; "male intruder" to refer to the introduction of the male mating partner, but using a term typically used for an intruder-resident aggression test. Replacing these terms will aid in reducing confusion for the reader and more accurately describe the behavioral parameters.

      We appreciate the comment. We have updated the terms “male intruder” to “partner”, “room” to “area” or “zone”.

      (3) The use of the paced mating paradigm is a strength of these studies. This paradigm has been widely used and validated to study female sexual behavior in rodents. Please refer to recent reviews and landmark papers using this paradigm in addition to the current cited papers to better reflect the vast wealth of studies that previously reported the behavioral data that were replicated in this study.

      We have added a section discussing the self-paced mating assay, its merits and caveats P8.

      (4) In the paced mating test, females can pace the receipt of sexual stimulation, and latencies to withdraw and return to the male-containing chamber are considered indicators of sexual motivation. Female withdrawal will increase with the intensity of the sexual stimulation and latency to return is longer following ejaculation. Paced mating is thus a balance of approach and withdrawal behaviors that increases reward and likelihood of pregnancy for females. Moreover, ejaculation-induced withdrawal and longer latencies to return and approach are altered by hormonal status and by the introduction of a novel male partner. Thus, female sexual behavior is complex and withdrawal behavior (in this paper measured as time spent in an isolation zone) needs to be interpreted with caution and not simply referred to as sexual motivation. I recommend expanding the description of the paradigm to highlight the strengths and limitations of this paradigm and use caution to interpret time spent in the isolation zone as a lack of sexual motivation. I also recommend referring to the period after ejaculation as the post-ejaculatory interval (instead of completion).

      Thank you for the comment. We have changed the wording in the manuscript to adjust the way it refers to sexual motivation.

      (5) In the current paper, time in the isolation zone and the number of transitions are used as the behavioral measures. Latencies, which are typically included in paced mating studies, were missing from the data. If data are available for latencies to withdraw and return to the interaction zone after mount, intromission, and ejaculation, please add these data. If such data were not collected or are not available, please recognize this caveat.

      Thank you for the comment. In figure 1, which all animals did experience male ejaculation, we added latency analysis (Figure 1I and 1P). The result indicates as suggested in the literature, female mice took significantly longer to return the interaction zone after male-ejaculation.

      (6) The brain-wide mapping study of cFos expression after ejaculation confirms and extends prior findings, mostly in rats. Please reference prior papers in female rodents showing cFos after ejaculation and discuss how the current data replicate or differ from prior data.

      In the manuscript P8 L351, we have referred to Pfaus et al., 1993 to discuss the similarity in the c-Fos expression pattern studied in rats. We have further added descriptions to emphasize the similarity between the two datasets.

      (7) A paragraph describing the specific cell types that are activated in the MPOA is an essential part of the study and is described in detail, but only shown in supplementary figures. Given the emphasis on this particular part of the study, a recommendation is to incorporate these data as a regular figure instead of supplementary material.

      While we greatly appreciate the comment, we consider that the molecular characterization of MPOA neurons are not the main focus  of the paper and decided to keep it in the supplementary figure.

      (8) Calcium imaging studies were performed in the home cage for obvious practical reasons. However, in the home cage testing, the females withdraw from the males using a different approach and do not exit an interaction zone through a division. There may also be differences in the male sexual behavior patterns and thus the stimulation that females receive from the male. Yet, it appears that ejaculation induces similar patterns of neural activation in this paradigm. Thus, it is likely that neuron activation is a result of receiving ejaculation, rather than withdraw behavior. Please briefly discuss the comparisons between the cFos and calcium imaging conclusions in these two different paradigms.

      We have added a section discussing the self-paced mating assay, its merits and caveats P8. Withdrawal and latency and its interpretation is discussed in this section.

      (9) The final study includes the manipulation of ejaculation-activated neurons in the MPOA using DREADD. This study was limited to show that activation of previously activated cells was sufficient to reduce approach behavior in a paced mating paradigm and receiving intromissions in a home cage mating paradigm. An inhibition approach using DREADD would have been a great complement to this study as it would have shown if activation of the cells was required. Moreover, additional tests for sexual motivation, such as partner preference tests would have greatly strengthened the results since a lack of entering an interaction zone can also be explained by impaired sensory processing or locomotor behavior. Finally, CNO also appeared to impact time in the isolation zone for a subset of animals in the ejaculation (completion) control group and the appetitive group. These effects didn't reach statistical significance, but groups also had low sample sizes (n=6-7) and may thus have been underpowered. The recommendation is to include these caveats and shortcomings in the discussion of these results.

      We appreciate the comments. We first added an inhibitory approach to show the necessity of MPOA neurons. As result, we found that the inhibition of these neurons did not affect the behavior in the self-paced mating assay but increased the subjects sexual receptivity (Figure S11). For the low sample size, we have added a power analysis in the statistical section.

      (10) The studies utilized ovariectomized females with hormone priming. Since sexual receptivity in females is highly dependent on the hormonal milieu, the authors are encouraged to add an explanation of why ovariectomized females were used and if the results may have differed in cycling females.

      We appreciate the comments. The female subjects used in the TRAP experiment will be needing to experience ejaculation from the male mice twice, once to label the cells, and second during the reactivation. In order to avoid pregnancy during the first experience, we ovariectomized the female and controlled their hormonal conditions. This method has been used successfully in other sexual behavior studies (Yang et al., 2013, Ring., 1944.).  This was described in P11. We have further demonstrated in Figure 1N-T that female mice were not ovariectomized and were under the natural estrus cycle showed similar suppression of sexual interaction after the completion of mating. The manuscript was updated to discuss that the behavior change after mating completion is not dependent on the ovary.

      (11) Overall, the paper lacks references to relevant prior studies. For example, many studies have been reported over the past 2-3 decades about the effects of female rodent sexual behavior on activation in the brain and the effects of different vaginocervical stimulation on pregnancy and fertility. It is absolutely the case that much remains unknown about the complex neural circuitries that control behavior during the post-ejaculatory interval and sexual satiety in both male and female rodents, but studies have indicated roles for hypothalamic areas, bed nucleus of the stria terminals, ventral tegmental area, posterior thalamus, and prefrontal cortex. Hence, the current introduction and discussion do not adequately summarize or acknowledge these prior investigations and therefore place these new findings in the context of what was previously known.

      We appreciate the comment and added references to P2 L65, P8 L355-357 to discuss existing literature about c-Fos mapping analysis after ejaculation or genital stimulation in female rats.

      (12) Finally, sample sizes appear to be modest, ranging n=4-8 (except n=14 in the completion group in Figure S7) and vary between groups within and between studies. Please explain in the methods section how sample sizes were pre-determined and acknowledge if studies may have potentially been underpowered.

      The sample size for behavior experiments in this study were n = 6-9. This was predetermined based on previous studies examining female sexual behavior (Ishii et al. 2017, Liu et al. 2022, Yin et al. 2022). To further examine the number of animals required for our behavioral experiments, we pooled data used in this study and conducted a power analysis (n = 111 pooled data, control n = 94, stim n = 17). We conducted a power analysis using the variance calculated from pooled average time in isolation zone. These data were pooled from control animals in each experiment (eg. animals with GFP control virus injected, saline injected, etc.). The average time in isolation zone in the after ejaculation or after reactivating the completion cells was 420 ± 210 seconds, and 49 ± 91 seconds in the control group (mean ± s.d.). Within this population, we found that 5 animals were sufficient to detect the difference (p < 0.05, power = 0.8) in Students t-test. We have added this explanation in the supplemental experimental procedure, page P18, line 817-827.

      Reviewer #3 (Recommendations For The Authors):

      The authors should discuss the fact that the FosTRAP2 strategy labels neurons activated 3 hours before the 4-OHT injection. As the manuscript is written, it seems to suggest that the 4-OHT injection given following mating completion only labeled neurons activated during mating completion. This is very misleading. I respect the amount of work and rigor that went into these experiments. The single-cell imaging, implementation of the FosTRAP strategy, and behavioral analysis are all well executed. Novel insights into the neural regulation of female sexual drive can be gleaned from the neural imaging experiments. Unfortunately, the limitations of the FosTRAP strategy make those studies very difficult to interpret, and therefore, a more candid discussion and re-interpretation of the data from the FosTRAP experiments is needed.

      We appreciate the reviewers comments and concerns about the TRAP method.

      First, we agree that the FosTRAP method does not have the sensitivity to separate ensembles that happen within a short time window. From our preliminary results, we have observed that the cells that inject 4-OHT after mating completion induce more tdTomato cells in the MPN than injection after appetitive behavior or consummatory behavior (Author response image 1).

      To further compare the difference between the “consummatory” and “completion” ensemble, we included an additional cohort where we TRAP cells responding to consummatory behavior. This cohort is added to Figure 2, 6, S3, S4, S9, S10 and S11. From the whole brain mapping of TRAP cells, we found that many hypothalamic and extended amygdala areas including the medial preoptic area, and the bed nucleus of stria terminalis were shown to have significantly larger tdTomato+ cell density in the completion group than in the appetitive group while there was a tendency that the consummatory group also had larger cell density than the appetitive group. In the Gq-DREADD experiment, we found that the Completion-hM3Dq group but not the Consummatory-hM3Dq group showed the reduction of sexual motivation of the female mouse in the self-paced mating assay (Figure 6). The Completion-hM3Dq group but not the Consummatory-hM3Dq group also showed significantly low intromission events and tended to show lower receptivity in the home cage mating assay (Figure S10). Furthermore, post-hoc histological analysis also showed that the c-Fos+ and TRAP labeled cells in the MPN tended to be the larger in the Completion-hM3Dq group than in the Consummatory-hM3Dq group (Figure S9). These results, together with the in vivo Calcium imaging experiments in Figure 3, 4 and 5, suggests that the MPN contains male-ejaculation responsive cells that are distinct with the male-mounting responsive cells and that they are sufficient to suppress female sexual motivation.

      However, it is true that with the current state of mouse genetic tools, we do not have any methods with higher time accuracy. We have discussed the limitations of FosTRAP method regarding its low time sensitivity in the Discussion section.

      Editor notes:

      Should you choose to revise your manuscript, please include full statistical reporting in the main text including test statistic, degrees of freedom, an exact P value.

      Thank you for the comment. The statistical values were added to the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Strengths:

      The authors embarked on an ambitious journey to seek the answer regarding 3D genome changes predisposing to metastatic organotropism. The authors succeeded in the assembly of a comprehensive panel of breast cancer cell lines and the aggregation of the 3D genome structure data to conduct a hypothesis-driven computation analysis. The authors also achieved in including proper controls representing normal non-cancerous epithelium and the end organ of interest. The authors did well in the citation of relevant references in 3D genome organization and EMT.

      Weaknesses:

      (1) The authors should clearly indicate how they determine the patterns of spread of the breast cancer cell lines being utilized in this manuscript. How did the authors arrive at the conclusion that certain cell lines would be determined as "localized spread" and "metastatic tropism to the lung"? This definition is crucial, and I will explain why.

      It is indeed a critical point to clearly define and explain what qualifies as metastatic potential to particular organs in our system. Here, we intentionally limited our scope to metastasis that had occurred within the human system. Our cell lines are chosen based on their sites of origin and etiological history in the patients from which they were derived. For example, the cancer cell line BT474 was classified as “localized” because these cells were derived from a solid tumor in the breast itself. Meanwhile, MCF7 and T47D cell lines are considered lung metastatic because these cells were collected from the pleural effusion from the lung. We therefore model human organotropism from the breast to the lung by using cells that originated from infiltrative ductal carcinoma (human breast) but were collected from pleural effusions (human lung). We then use as a comparison a human lung cancer-derived cell line that was itself purified from a pleural effusion. In this way, we can compare the genome structure of a lung cancer cell in the lung environment to a breast cancer cell that has metastasized to the lung environment.

      In our revised version, we further clarify this definition in the text as well as in additional annotations in our supplemental table of all cell line information.

      Todd Golub's team from the Broad Institute of MIT and Harvard published "A metastasis map of human cancer cell lines" to exhaustively create a first-generation metastasis map (MetMap) that reveals organspecific patterns of metastasis. (By the way, this work was not cited in the reference in this manuscript.) The MetMap Explorer (https://depmap.org/metmap/vis-app/index.html) is a public resource that could be openly accessed to visualize the metastatic potential of each cell line as determined by the in vivo barcoding approach as described in the MetMap paper in the format of petal plots. 5 organs were tested in the MetMap paper, including brain, lung, liver, kidney, and bone. The authors would discover that some of the organ-specific metastasis patterns defined in the MetMap Explorer would be different from the authors' classification. For example, the authors defined MCF7 as a line as lung metastatic, and rightly so the MetMap charted a signal towards lung with low penetrance and low metastatic potential. The authors defined ZR751 as a line with localized spread, however, the MetMap charted a signal towards the kidney with low penetrance and low metastatic potential, the signal strength similar to the lung metastasis in MCF7. A similar argument could be made for T47D. The TNBC line MDA-MB-231 is indeed highly metastatic, however, in MetMap data, its metastasis is not only specific to the lung but towards all 5 organs with high penetrance and metastatic potential. The 2 lung cancer cell lines mentioned in this study, A549 and H460, the authors defined them as localized spread to the lung. However, the MetMap data clearly indicated that A549 and H460 are highly metastatic to all 5 organs with high penetrance and high metastatic potential.

      We acknowledge the valuable contributions of animal models in metastatic cancer studies, but we also want to avoid the potentially confounding variable of the animal microenvironment. The MetMap Explorer contains valuable information (and as part of our clarification on this point, we now cite the MetMap in the text), but the “metastatic potential of each cell line” for this tool is measured in a mouse environment. Knowing that a particular cell line, which originated from a human lung metastasis, can further metastasize to other organs in a mouse does not necessarily mean that those cells could do so in humans. The microenvironment responses to metastatic colonization recapitulate the events in wound repair, and these can differ among species (https://pubmed.ncbi.nlm.nih.gov/28916657/ https://pubmed.ncbi.nlm.nih.gov/39729995/ ). Further, the changes a cell needs to make to adapt to a new organ system in a mouse could be confounded by the changes needed to adapt to mouse conditions in general. Finally, migration from a site of ectopic injection may not mimic migration from an initial tumor site. These factors lead to well known cases where MetMap does not reflect the metastatic potential of cancers in humans. As a classic example, prostate cancer frequently metastasizes to bone in humans, and the PC3 cell line was derived from a bone metastatic prostate cancer. However, MetMap shows no evidence of PC3 being able to metastasize to bone in a mouse.

      We agree that the very best data would come from matched primary and metastatic tumors in the same human patient, but those data do not currently exist and generating them would require future work beyond the scope of this study.

      Since results will vary among different experimental models testing metastatic organotropism, (intracardiac injection was the metastasis model being adopted in the MetMap), the authors should state more clearly which experimental model system served as the basis for their definition of organ-specific metastasis. In my opinion, this is the most crucial first step for this entire study to be sound and solid.

      Taking all the above into account, in our revision, we have now included further clarification in the main text to more clearly explain how and why we chose the cell lines we did and what the advantages and limitations of this choice are.

      (2) Figure 1b: The authors found that "MDA-MB-231 cells were grouped with the lung carcinoma cells. This implies that the genome organization of this cell line is closer to that of lung cells than to other breast epithelial cell lines.". In fact, another TNBC line BT549 was also clustered under the same clade. So this clade consisted of normal-like and highly metastatic lines. Therefore, the authors should be mindful of the fact that the compartment features might not directly link to metastasis (or even metastatic organotropism).

      In figure 1b, the grouping that includes MDA-MB-231 (lung metastatic breast cancer) connected to A549, and H460 (lung cancer) occurs at a distance of about 0.2. If the clustering tree were cut at a distance of 0.26, 6 separate clusters would result: two clusters of Luminal subtypes (all labeled red), one that includes all healthy epithelial cells (both lung and breast, all labeled green), one that links two localized breast cancers, one that links MDA-MB-231 to lung carcinoma cell lines, and then BT549 by itself. So, while BT549 appears next to MDA-MB-231 along the horizontal axis, this is just coincidence of the representation: the dendrogram shows it is quite distant from all the other cell lines in this cluster according to compartment profile.

      So, it is only MDA-MB-231 that is very closely linked with the lung cancer cell types.

      It is true that the healthy lung cells (HTBE) are clustered separately and are more similar to normal/non tumorigenic breast epithelial cells (HMEC and MCF10A) than to any cancer cell type. This could suggest that there are aspects of the compartment pattern that represent any healthy epithelium as compared to cancer. What we find in the compartment profile, in both the clustering and the PCA analysis, is that compartment signatures contain information about cell properties on several overlapping levels: there is an aspect of the compartment profile that distinguishes healthy from cancerous cells, an aspect that distinguishes luminal cancers from other subtypes, a part that associates with organotropism, and an aspect that captures EMT status. The final compartment status is a composite of these numerous factors.

      We have clarified the text to indicate that we mean MDA-MB-231 clusters near lung cancer, not necessarily healthy lung cell models.

      (3) Figure 3: In the text, the authors stated, "To further investigate this result, we examined the transcription status of genes that changed compartment across the EMT spectrum and, conversely, the compartment status of genes that changed transcription (Fig. 3b, c, and d)". However, it was not apparent in the figure that the cell lines were arranged according to an EMT spectrum.

      To display these comparisons more clearly, we have now revised figure 3b, c, and d in two ways: First, we have defined the gene and cell line clustering by one set of data (for example, compartment identity in 3b) and then displayed the other data (gene expression) with all genes and cell lines in the same order. Therefore, for each column, genes and cell lines can be compared visually between top and bottom rows. Second, we have colored cell line names from purple to yellow according to their EMT scores as shown in Supplementary Figure 1a. This allows a visual indication of how the clustering separates cell lines by EMT status.

      Also, the clustering heatmaps did not provide sufficient information regarding the genes with concordant/divergent compartments vs transcription changes. It would be more informative if the authors could spend more effort in annotating these genes/pathways.

      We want to clarify that the genes plotted in the heatmaps in Figure 3 are also the genes whose functional enrichment we present in figures 1 and 2. So, the genes that segregate strongly based on A/B compartment (but not gene expression) in figure 3b are the same genes whose GO terms are annotated in Figure 1d. Likewise, the genes that segregate strongly based on gene expression, but not A/B compartment, in figure 3c and d are the same genes whose GO terms are annotated in Figure 2b. We have now made this connection clearer in the text.

      But, we also agree with the reviewer that it is important to explore a bit further the relationship between these divergent sets of genes. Our explorations have led to several observations:

      (1) In some cases, the compartment-segregated genes and the transcription-segregated genes are different members of the same pathways. In Author response image 1 below, for example, we show interactions (according to STRING) for genes from figure 3c that are highly expressed in the epithelial-like cell lines and are annotated as involved in epithelial development (green). We then added to the network genes from figure 3b that are specifically in the A compartment in the epithelial-like cell lines but not mesenchymal cell lines that are also annotated as involved in epithelial development (red). Most of these epithelial development genes that change expression are in the A compartment in all cell lines and therefore do not rely on spatial compartment changes for their regulation. But some additional epithelial development genes, which are interconnected in this same network, are changing compartments across the EMT spectrum. One example, FOXA1, is a key hub in the network and is known to be a pioneer transcription factor involved in development and differentiation. Controlling this gene at the level of spatial genome organization rather than local transcriptional control could be important in the stable cell fate changes that can happen with EMT.

      Author response image 1.

      (2) Overall, the set of genes that change compartments does not have as strong functional enrichment as the transcription change set of genes. This could indicate that some of the compartment changes that occur with EMT are not directly gene regulatory but rather enable an overall conformational change of the chromatin that is needed for the alterations in physical cell state or to accomplish long distance gene regulation changes.

      (3) Related to long distance gene regulation changes, we also see cases in which the gene that changes transcription but not compartment across EMT is adjacent to regions that switch compartments.

      A good example is TFF3 (yellow, Supplementary figure 1C). TFF3 is one of the genes that strongly segregates across EMT by transcription, being more highly expressed in epithelial-like (bottom 4 tracks) but not mesenchymal-like (top 4 tracks) cancers. Despite this differential expression, it is almost always in the A compartment across all cell lines. However, it is adjacent to regions that show strong compartment change EMT signatures. So, even though this specific gene region is not changing compartment, its regulation may be influenced by the entire region being Aassociated in epithelial-like but neighboring regions becoming B-associated in mesenchymal like cancers.

      TFF3 is expressed in normal breast epithelium and has been implicated as a biomarker for endocrine therapy response in breast cancer.

      Meanwhile, many genes that are in these compartment switching regions (BACE2, DSCAM, PDE9A) are not among the strongest expression signature genes.

      (4) Interestingly, some of the regions (such as the region shown in Supplementary figure 1C) that change compartment across the breast cancer spectrum overlap with regions that we found change compartment in the progression of prostate cancer, as shown in the string.db enrichment analysis below.

      Author response image 2.

      In our revised manuscript, we now include more of these explanations in the text and include the example offset compartment and transcription change region shown about as panel c of Supplementary Figure 1.

      (4) Figure 4: The title of the subheading of this section was 'Lung metastatic breast cancer cell lines acquire lung-like genome architecture". Echoing my comments in point 1, I am a bit hesitant to term it as "lung metastatic" but rather "metastatic' in general since cell lines such as MDA-MD-231 do metastasize to other organs as well. However, I do get the point that the definition of "lung metastasis" is derived from the common metastasis features among the cell lines here (MCF7, T47D, SKBR3, MDAMB-231). There might be another argument about whether the "lung" carcinoma cell lines can be considered "localized" since they are also capable of metastasizing to other organs.

      Rather than classifying cells on metastatic “potential” (as measured in a mouse), our cell lines are chosen based on their sites of origin and etiological history in the patients from which they were derived. Cancer cell lines called “lung metastasis” were collected from the pleural effusion from the human lung. Likewise, we call a cancer “localized” because it was taken from the tissue where the cancer originated, even if it might, if placed into a different context, be able to metastasize. We would argue that the genome structure features of the “localized” cancers reflect cancers that have not yet metastasized (even if they could in the future) while the “metastatic” cancers have already gone to a certain location (even if they could in theory have gone to a different location).

      In a way, what the authors probably were trying to leverage here is the "tissue" identity of that organ.

      Having said this, in addition to showing the "lung permissive changes", the authors should show the "breast identity conservation" as well. Because this section started to deal with the concept of "tissue/lineage identify", the authors should also clarify whether these breast cancer cell lines capable of making lung metastasis are also preserving their original tissue identity from the compartment features (which would most likely be the case).

      This is a great question. We have now more explicitly checked the proportions of genomic regions that change compartments to match lung vs. maintaining breast-specific compartment identity. The graphs in Supplementary Figure 2 begin with all genomic bins that have distinctive compartment identity between non-cancerous breast and lung epithelial cells. Then, the plots show what fraction of these tissue-specific bins change compartment to match lung vs. maintaining breast identity in each breast cancer cell line category. As we have shown in other graphs, particularly for switches to the A compartment, more bins change to match lung in the metastatic vs. primary site cell lines. In most cases, more than 50% of the tissue-specific bins shift to look more like lung.

      (5) Rest of the sections: The authors started to claim that the organ-specific metastasis permissive compartmental features mimic the destinated end organ. The authors utilized additional non-breast cancer cell lines (prostate cancer cell lines LNCaP as localized and DU145 as brain metastatic) in brain metastasis to strengthen this claim. (DU145 in MetMap again is highly metastatic to lung, brain, and kidney). However, this makes one wonder that for cell lines that are capable of metastasizing to multiple organ sites (eg. MDA-MB-231, DU145, A459, H460), does it mean that they all acquire the permissive features for all these organs? This scenario is clinically relevant in Stage 4 patients who often present with not only one metastatic lesion in one single organ but multiple metastatic lesions in more than one organ (eg. concomitant liver and lung metastasis). Do the authors think that there might be different clones having different tropism-permissive 3D genome features or there might be evolutionary trajectory in this?

      In my opinion, to further prove this point, the authors might need to consider doing in vivo experiments to collect paired primary and organ-specific metastatic samples to look at the 3D genome changes.

      We agree that an ideal experimental follow up to this study would be to collect paired metastatic and primary tumors, either in mouse xenograft or, even better, from patients. This is beyond the scope of what we can do for our current paper, but we have added a statement to the discussion of further experiments that would be required to clarify this point.

      (6) Technically, the study utilized public Hi-C data without generating new Hi-C data. The resolution of the Hi-C data for compartments was set at 250KB as the binning size indicating that the Hi-C data was at lower resolution so it might not be ideal to address other 3D genome architecture changes such as TADs or long-range loops. It is therefore unknown whether there might be permissive TAD/loop changes associated with organotropism and this is the limitation of this study.

      Our decision to focus on A/B compartmentalization rather than TAD or loop structure in this analysis was intentional and biologically motivated, rather than solely being a reflection of data resolution. Both compartments and topologically associated domains (TADs) are key parts of genome organization and disruption of these structures has the potential to alter downstream gene regulation, as shown by numerous studies. However, compartments have been found, more so than TADs, to be strongly associated with cell type and cell fate. Therefore, in this manuscript, we decided to focus only on the compartment organization changes between different healthy and cancerous cells as they are more likely to represent the stable alterations of the genome organization malignant transformations.

      (7) In the final sentence of the discussion the authors stated "Overall, our results suggest that genome spatial compartment changes can help encode a cell state that favors metastasis (EMT)". The "metastasis (EMT)" was in fact not clearly linked inside the manuscript. The authors did not provide a strong link between metastasis and EMT in their result description. It is also unclear whether the EMTassociated compartment identity would also correlate with the organotropic compartment identity.

      We agree that this statement involves too strong of an assumption. The literature on this topic is vast and complex, and while there is abundant evidence that pathways of EMT can play important roles in facilitating metastasis, there are other pathways at play in the metastatic process as well (https://journals.plos.org/Plosbiology/article?id=10.1371/journal.pbio.3002487). We have made a clearer statement about this in the text now.

      To address the question of whether the organotropic changes related to the EMT changes, we calculated the overlap between the genomic bins that strongly segregated cell lines in the compartment principal component analysis (PC1) with those that showed “organotropic” changes. As you can see in supplementary table 3, this overlap is actually very small, where only 3% of bins are important both for the EMT segregation of cell lines and organotropism.

      We have now included this overlap information as supplementary table 3 and have addressed this in the text.

      Reviewer #2 (Public review):

      Summary:

      This work addresses an important question of chromosome architecture changes associated with organotopic metastatic traits, showing important trends in genome reorganization. The most important observation is that 3D genome changes consistent with adaptations for new microenvironments, including lung metastatic breast cells exhibiting signatures of the genome architecture typical to a lung cell-like conformation and brain metastatic prostate cancer cells showing compartment shifts toward a brain-like state.

      Strengths:

      This work presents interesting original results, which will be important for future studies and biomedical implications of epigenetic regulation in norm and pathology.

      Weaknesses:

      The authors used publicly available data for 15 cell types. They should show how many different sources the data were obtained from and demonstrate that obtained results are consistent if the data from different sources were used.

      In our revised version, we have provided a clarified table of information about all the publicly available data used from all the cell lines, indicating the sources of the data. The 17 datasets used come from 8 different studies. So, indeed, the reviewer is correct that many different sources of data were used. To address the question of whether our results would be consistent if data from different sources were used, we created a comparison map of the A/B compartment profiles for data from multiple sources when it was available. You can see below that the Hi-C data from different sources for the same cell lines cluster quite closely and show high correlation and are well separated from different cell lines. So, we do not think that source batch effects play a major role in our results.

      Author response image 3.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1a: This figure could be re-formatted without the arrows. Arrows usually indicate upstreamto-downstream relationships along certain processes. Using arrows here would mislead people to think that the cell lines were derived from one another. The same could apply to the supplementary figures.

      We have now edited figure 1a to include lines linking cell lines, indicating conceptual relationships, rather than arrows, which would imply direct derivation.

      (2) Figure 1c: The PCA (PC2 axis) indeed seemed to separate the HER2 status quite well. One concern is MCF7, it is labeled as ERpos/HER2neg in MetMap but seems to be clustered as HER2pos in this study. Are they the same? (This again highlights the importance of cell line definition and annotation).

      It is a good point that MCF7, while generally considered HER2 negative (we indicate this negative status in Supplementary Table 1), falls near HER2 positive cells in PCA space. This indicates that PCA captures tendencies but is not a perfect classifier. In a high dimensional, complex system, it is expected that an unsupervised analysis such as this will not capture just one biological feature in a given principal component, and therefore something like HER2 status may not segregate perfectly. However, this analysis does suggest that MCF7 3D genome structure has features that are more similar to other HER2+ cell lines. This raises the interesting possibility that it may actually behave like HER2+ cells in some ways even while being HER2- itself. We have more clearly stated the MCF7 discrepancy in the text.

      Reviewer #2 (Recommendations for the authors):

      (1) The description of results can be shortened, to make it easier to read and understand.

      In our revision, we have tried to clarify where possible, but it was difficult to shorten without losing important caveats and context (especially to make important points emphasized by reviewer 1).

      (2) "100 most positive and negative eigenvalues for PC1" - please provide the correct description.

      We have altered this to make it clearer and more correct: “using the genes from the regions with the top 100 most positive and 100 most negative eigenvector loadings for this PC1”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is an interesting theoretical study examining the viability of Virtual Circular Genome (VCG) model, a recently proposed scenario of prebiotic replication in which a relatively long sequence is stored as a collection of its shorter subsequences (and their compliments). It was previously pointed out that VCG model is prone to socalled sequence scrambling which limits the overall length of such a genome. In the present paper, additional limitations are identified. Specifically, it is shown that VCG is well replicated when the oligomers are elongated by sufficiently short chains from ”feedstock” pool. However, ligation of oligomers from VCG itself results in a high error rate. I believe the research is of high quality and well written. However, the presentation could be improved and the key messages could be clarified.

      Strengths:

      High-quality theoretical modeling of an important problem is implemented.

      Weaknesses:

      The conclusions are somewhat convoluted and could be presented better.

      (1) It is not clear from the paper whether the observed error has the same nature as sequence scrambling.

      We thank the Reviewer for pointing out that this important point was not clearly explained. The sequence errors observed in our model are indeed of the same nature as sequence scrambling previously identified by Chamanian and Higgs (Chamanian and Higgs, PLoS Comp Biol 2022). The core issue is the ligation of two oligomers representing non-adjacent segments of the genome sequence, leading to the formation of ”chimeric” products that are not part of the desired genome.

      Our analysis identifies the ligation of VCG oligomers (V+V reactions) as the primary mechanism driving sequence scrambling. This allowed us to propose two strategies to mitigate sequence scrambling: (i) tuning the length and concentration of the VCG oligomers, and (ii) considering scenarios where only feedstock monomers contribute to elongation (non-reactive VCG oligomers). We modified the Introduction and Results section of our manuscript to convey this connection more clearly.

      (2) The authors introduce two important lengths LS1 and LS2 only in the conclusions and do not explain enough which each of them is important. It would make sense to discuss this early in the manuscript.

      We agree with the Reviewer and have followed the suggestion to introduce the two important length scales earlier in the manuscript (in the Model section of the main text). In the updated version, we refer to these length scales as the exhaustive coverage length L<sub>E</sub> (formerly LS1) and the unique subsequence length L<sub>U</sub> (formerly LS2). The exhaustive coverage length L<sub>E</sub> is defined as the maximum motif length for which all possible sequences of that length appear somewhere in the genome. In contrast, the unique subsequence length L<sub>U</sub> is the minimum motif length such that each subsequence of that length occurs only once in the genome, thus giving each motif a unique ”address”.

      Generally, a genome of length L<sub>G</sub> contains at most 2L<sub>G</sub> distinct subsequences, implying that L<sub>E</sub> can be at most , and L<sub>U</sub> must be at least , where ⌊...⌋ and ⌈...⌉ denote the next lower and higher integer, respectively. While the previous version of the manuscript focused exclusively on the limiting case L<sub>E</sub> \= L<sup>max</sup><sub>E</sub> and L<sub>U</sub> \= L<sup>min</sup><sub>U</sub> , we have extended our analysis to genomes with a broader range of L<sub>E</sub> and L<sub>U</sub> values the revised manuscript.

      This extended analysis reveals that, for accurate and efficient replication, the VCG oligomer length must always exceed L<sub>U</sub>, regardless of the choice of L<sub>E</sub>. The required margin beyond L<sub>U</sub> depends on the distribution of intermediate-length motifs (i.e., with L<sub>E</sub> < L < L<sub>U</sub>), but is typically only a few nucleotides.

      (3) It is not entirely clear why specific length distribution for VCG oligomers has to be assumed rather than emerged from simulations.

      We have integrated these new findings into the Results section of the main text and expanded the discussion of their implications for the prebiotic relevance of the VCG scenario in the Discussion section. Full methodological details are provided in the Supplementary Material (Sections S1 and S8).

      We thank the Reviewer for this insightful question. Our choice to assume specific length distributions for VCG oligomers is motivated by both conceptual and practical considerations. We explain our reasoning more clearly in the revised manuscript, in the beginning of the Model section of the main text.

      Conceptually, our study focuses on the propagation of sequence information by an already-formed VCG, rather than its emergence from a random pool. As discussed by Chamanian and Higgs, the spontaneous formation of a VCG from randomly interacting oligomers is a rare event. Our aim is to understand whether, once formed, such a structure can robustly replicate under prebiotic conditions. This question is best addressed when the genome and the oligomer pool (including their lengths and concentrations) can be systematically controlled.

      From a practical standpoint, working with a controllable pool of oligomers facilitates direct comparison to recent experimental studies that use predefined and well-characterized oligomer pools (Ding et al. JACS 2023). With our current methods and realistic rate constants, simulating the emergence of such pools from simple building blocks (e.g., monomers and dimers) would be computationally prohibitive, due to the low ligation rate. For example, in a system containing monomers (concentration 0.1mM) and octamers (concentration 1µM) in a volume of V = 3.3µm<sup>3</sup>, simulating the time between two ligation events takes over 300 hours of compute time (see SI Fig. S2). This renders dynamic pool generation unfeasible for the scope of our study.

      (4) Furthermore, the problem has another important length, L0 that is never introduced or discussed: a minimal hybridization length with a lifetime longer than the ligation time. From the parameters given, it appears that L0 is sufficiently long (∼ 10 bases). In other words, it appears that the study is done is a somewhat suboptimal regime: most hybridization events do not lead to a ligation. Am I right in this assessment? If that is the case, the authors might want to explore another regime, L_0 < LS_1, by considering a higher ligation rate.

      Indeed, we assume that the ligation rate is smaller than both the hybridization and dehybridization rates for any oligomer typically included in the pool (up to length 10). In terms of effective length scales, this corresponds to L<sub>0</sub> ≈ 10nt, with L<sub>0</sub> defined as stated by the Reviewer, i.e., the hybridization length corresponding to a lifetime comparable to the ligation time. Most of our analysis actually exploits the small ligation rate, by employing an adiabatic approximation in which ligation is assumed to be slower than any hybridization or dehybridization process in the pool irrespective of oligomer length. As the Reviewer states, in this regime most hybridization events are transient, and will not result in ligation, since the complexes typically dissociate before ligation can occur.

      While we agree that this assumption limits the overall yield of replication, it has a beneficial effect on replication fidelity. Oligomers that hybridize with mismatches tend to unbind more quickly due to the destabilizing effect of mismatches. In the slow-ligation regime, such complexes are likely to dissociate before a ligation can occur, preventing the formation of incorrect products. In contrast, if the ligation rate was comparable to the unbinding rate of mismatched hybrids, these incorrect associations could undergo ligation, thereby lowering the fidelity of replication. We thus view the regime L<sub>0</sub> > L<sub>V</sub> as more favorable for studying the error-suppressing potential of the VCG mechanism, though we acknowledge that exploring the effects of faster ligation rates is an interesting question for future work.

      Reviewer #2 (Public review):

      Summary:

      This important theoretical and computational study by Burger and Gerland attempts to set environmental, compositional, kinetic, and thermodynamic constraints on the proposed virtual circular genome (VCG) model for the early non-enzymatic replication of RNA. The authors create a solid kinetic model using published kinetic and thermodynamic parameters for non-enzymatic RNA ligation and (de)hybridization, which allows them to test a variety of hypotheses about the VCG. Prominently, the authors find that the length (longer is better) and concentration (intermediate is better) of the VCG oligos have an outsized impact on the fidelity and yield of VCG production with important implications for future VCG design. They also identify that activation of only RNA monomers, which can be achieved using environmental separation of the activation and replication, can relax the constraints on the concentration of long VCG component oligos by avoiding the error-prone oligo-oligo ligation. Finally, in a complex scenario with multiple VCG oligo lengths, the authors demonstrate a clear bias for the extension of shorter oligos compared to the longer ones. This effect has been observed experimentally (Ding et al., JACS 2023) but was unexplained rigorously until now. Overall, this manuscript will be of interest to scientists studying the origin of life and the behavior of complex nucleic acid systems.

      Strengths:

      • The kinetic model is carefully and realistically created, enabling the authors to probe the VCG thoroughly.

      • Fig. 6 outlines important constraints for scientists studying the origin of life. It supports the claim that the separation of activation and replication chemistry is required for efficient non-enzymatic replication. One could easily imagine a scenario where activation of molecules occurs, followed by their diffusion into another environment containing protocells that encapsulate a VCG. The selective diffusion of activated monomers across protocell membranes would then result in only activated monomers being available to the VCG, which is the constraint outlined in this work. The proposed exclusive replication by monomers also mirrors the modern biological systems, which nearly exclusively replicate by monomer extension.

      • Another strength of the work is that it explains why shorter oligos extend better compared to the long ones in complex VCG mixtures. This point is independent of the activation chemistry used (it simply depends on the kinetics and thermodynamics of RNA base-pairing) so it should be very generalizable.

      We thank the Reviewer for the careful assessment of our work and this concise summary of our main points.

      Weaknesses:

      • Most of the experimental work on the VCG has been performed with the bridged 2aminoimidazolium dinucleotides, which are not featured in the kinetic model of this work. Oher studies by Szostak and colleagues have demonstrated that non-enzymatic RNA extension with bridged dinucleotides have superior kinetics (Walton et al. JACS 2016, Li et al. JACS 2017), fidelity (Duzdevich et al. NAR 2021), and regioselectivity (Giurgiu et al. JACS 2017) compared to activated monomers, establishing the bridged dinucleotides as important for non-enzymatic RNA replication. Therefore, the omission of these species in the kinetic model presented here can be perceived as problematic. The major claim that avoidance of oligo ligations is beneficial for VCGs may be irrelevant if bridged dinucleotides are used as the extending species, because oligo ligations (V + V in this work) are kinetically orders of magnitude slower than monomer extensions (F + V in this work) (Ding et al. NAR 2022). Formally adding the bridged dinucleotides to the kinetic model is likely outside of the scope of this work, but perhaps the authors could test if this should be done in the future by simply increasing the rate of monomer extension (F + V) to match the bridged dinucleotide rate without changing rate of V + V ligation?

      We thank the Reviewer for this insightful comment. Indeed, we did not design our model to specifically describe the use of bridged 2-aminoimidazolium dinucleotides as feedstock for the VCG scenario. Adding the bridged dinucleotides to our model would require allowing for feedstock that effectively changes its length during the ligation reaction. As anticipated already by the Reviewer, this is outside the scope of our current modeling framework, which was chosen to explore the generic issue of sequence scrambling in the VCG scenario without distinguishing between different types of activation chemistries.

      Along the lines of the Reviewer’s suggestion, we clarified in the revised manuscript that we consider two limiting cases out of a family of models with two different ligation rate constants, k<sub>lig,1</sub> for ligations involving a monomer and k<sub>lig,>1</sub> for ligations involving no monomer, allowing for kinetic discrimination between these processes. We consider the two limiting cases where either k<sub>lig,1</sub> = k<sub>lig,>1</sub> or k<sub>lig,1</sub>/k<sub>lig,1</sub> → 0. The latter case, captures the behavior expected from an activation chemistry that enables fast primer extension but slow ligation, thereby suppressing sequence scrambling via V+V ligation events. The corresponding results, presented in Figure 6 and 7, indeed show that the VCG replication efficiency approaches 100% for pools that are rich in VCG oligomers.

      Our coarse-grained model, which does not explicitly describe the activation chemistry, was sufficient to capture important kinetic and thermodynamic constraints of the VCG scenario, and to qualitatively explain the experimental observation of a preferential extension of short over long VCG oligomers (Fig. 7B). For future work, we plan to extend our model to account for the activation chemistry in detail, to allow for a more quantitative comparison between theory and experiment.

      • The kinetic and thermodynamic parameters for oligo binding appear to be missing two potentially important components. First, base-paired RNA strands that contain gaps where an activated monomer or oligo can bind have been shown to display significantly different kinetics of ligation and binding/unbinding than complexes that do not contain such gaps (see Prywes et al. eLife 2016, Banerjee et al. Nature Nanotechnology 2023, and Todisco et al. JACS 2024). Would inclusion of such parameters alter the overall kinetic model?

      We thank the Reviewer for highlighting these recent studies. Todisco et al. (JACS 2024) report that complexes with gaps are well described by standard nearest-neighbor models, while stacking interactions at nick sites confer additional stability beyond these predictions. Our model is therefore expected to capture the thermodynamics of complexes with gaps accurately, but likely underestimates the stability of complexes containing nicks. In the VCG pool, all productive ligation complexes (F+F, F+V, V+V) inherently contain a nick and thus benefit from this stabilization, whereas unproductive complexes typically do not. The added stability is expected to increase the residence time of oligomers in productive complexes, thereby enhancing overall extension rates. However, since this stabilization applies uniformly across all productive complexes, it does not shift the relative contributions of different ligation pathways (in particular, correct vs. incorrect).

      This reasoning assumes that hybridization and dehybridization occur on timescales faster than ligation or primer extension. It is conceivable that this separation of timescales does not hold, particularly for oligomers binding to templates with gaps, where association is slower due to steric hindrance, while dissociation is further slowed by stabilizing nicks. As a result, the residence time of such complexes can become comparable to (or longer than) the ligation timescale. We now discuss this aspect more thoroughly in the revised Results and Discussion sections. Capturing the resulting effects in our analytical framework would require relaxing the adiabatic assumption, which is beyond the scope of this work. We recognize the relevance of the non-adiabatic regime of the dynamics, and hope to explore this regime in follow-up work.

      • Second, it has been shown that long base-paired RNA can tolerate mismatches to an extent that can result in monomer ligation to such mismatched duplexes (see Todisco et al. NAR 2024). Would inclusion of the parameters published in Todisco et al. NAR 2024 alter the kinetic model significantly?

      In contrast to complexes with nicks and gaps, mismatched complexes (Todisco et al. NAR 2024) will decrease replication fidelity relative to the results presented in our manuscript. Our current model assumes perfect base pairing, such that replication errors arise only from binding events involving regions too short to reliably identify the correct genomic position (sequence scrambling). Allowing mismatches will indeed introduce an additional error mechanism via imperfect yet sufficiently stable duplexes, thereby increasing the rate of incorrect extensions. However, we expect this effect to be limited. Due to the thermodynamic cost of internal loops, mismatched duplexes most often have their mismatches near the ends of the hybridized region, where their destabilizing effect is weakest (Todisco et al. NAR 2024). Terminal mismatches at the 3’end of the primer have been shown to reduce the primer-extension rate significantly via a stalling effect (Rajamani et al. JACS 2010, Leu et al. JACS 2013). Hence, we would expect errors due to mismatched duplexes to primarily occur for mismatches at the 5′ end. Such errors could be mitigated by a VCG pool consisting only of oligomers that are sufficiently long relative to the unique motif length of the virtual genome.

      We have extended the Discussion section to address this interesting issue.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      • ’(apostrophes) should be prime symbols instead of apostrophes

      We thank the Reviewer for spotting this mistake, which we have now corrected.

      • In the Introduction, the section that discusses the fidelity of enzyme-free copying should include a reference to Duzdevich et al. NAR 2021, as that work measured the fidelity experimentally.

      We have included this reference together with other references on the kinetics of hybridization/dehybridization to nicks and gaps in the main text.

      • The term feedstock oligomers may be problematic, because these also include monomers. In the ”Templated ligation” section of the Model, the statement ”We consider pools in which all oligomers are activated, as well as pools in which only monomers are activated” is imprecise. ”All oligomers, including monomers,...” would be better so as to avoid confusion in readers accustomed to standard RNA language.

      We thank the Reviewer for this helpful suggestion. In the revised manuscript, we now use the term feedstock (rather than feedstock oligomers) to avoid confusion. We have also revised the sentence in the ”Templated ligation” section to read ”all oligomers, including monomers, ...” as recommended.

      • The ”Experimentally determined association rate constants” reference 24-26, which measured the rate constants for DNA. Considering that the authors are modeling RNA, I wonder if Ashwood et al. Biophysical Journal 2023 contains any relevant RNA data that could help refine the model?

      We thank the Reviewer for pointing us to the study by Ashwood et al. We have added this reference to the corresponding paragraph in the revised manuscript. Their RNA association rate constant (∼ 5 × 10<sup>7</sup> M<sup>−1</sup> s<sup>−1</sup>) is larger than the one we used (∼ 1×10<sup>6</sup> M<sup>−1</sup> s<sup>−1</sup>), however a larger association rate is in fact beneficial for the validity of our adiabatic approximation, and thus would not affect our results, as long as the thermodynamic stability remains the same. This is because faster association then also implies faster dissociation, and the ratio of the ligation timescale to the timescales of (de)hybridization then becomes even smaller, which is the regime where the adiabatic approximation made in our analysis is justified.

      • In ”Triplexe softype 1—8 and 1—9...”,the word triplexes will confuse readers with RNA expertise as triplexe simply a triple-strandedRNA.

      We thank the Reviewer for pointing out the potentially ambiguous nomenclature. To avoid confusion with triplestranded RNA structures, we now refer to binary (ternary, ...) complexes instead of duplexes (triplexes, ...) throughout the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors addressed the influence of DKK2 on colorectal cancer (CRC) metastasis to the liver using an orthotopic model transferring AKP-mutant organoids into the spleens of wild-type animals. They found that DKK2 expression in tumor cells led to enhanced liver metastasis and poor survival in mice. Mechanistically, they associate Dkk2-deficiency in donor AKP tumor organoids with reduced Paneth-like cell properties, particularly Lz1 and Lyz2, and defects in glycolysis. Quantitative gene expression analysis showed no significant changes in Hnf4a1 expression upon Dkk2 deletion. Ingenuity Pathway Analysis of RNA-Seq data and ATAC-seq data point to a Hnf4a1 motif as a potential target. They also show that HNF4a binds to the promoter region of Sox9, which leads to LYZ expression and upregulation of Paneth-like properties. By analyzing available scRNA data from human CRC data, the authors found higher expression of LYZ in metastatic and primary tumor samples compared to normal colonic tissue; reinforcing their proposed link, HNF4a was highly expressed in LYZ+ cancer cells compared to LYZ- cancer cells. 

      Strengths: 

      Overall, this study contributes a novel mechanistic pathway that may be related to metastatic progression in CRC. 

      Weaknesses: 

      The main concerns are related to incremental gains, missing in vivo support for several of their conclusions in murine models, and missing human data analyses. Additionally, methods and statistical analyses require further clarification. 

      Main comments: 

      (1) Novelty 

      The authors previously described the role of DKK2 in primary CRC, correlating increased DKK2 levels to higher Src phosphorylation and HNF4a1 degradation, which in turn enhances LGR5 expression and "stemness" of cancer cells, resulting in tumor progression (PMID: 33997693). A role for DKK2 in metastasis has also been previously described (sarcoma, PMID: 23204234). 

      (2) Mouse data 

      a) The authors analyzed liver mets, but the main differences between AKT and AKP/Dkk2 KO organoids could arise during the initial tumor cell egress from the intestinal tissue (which cannot be addressed in their splenic injection model), or during pre-liver stages, such as endothelial attachment. While the analysis of liver mets is interesting, given that Paneths cells play a role in the intestinal stem cell niche, it is questionable whether a study that does not involve the intestine can appropriately address this pathway in CRC metastasis. 

      We value the reviewer’s comment that the splenic injection model cannot represent metastasis from the primary tumors, intravasation and extravasation. Therefore, we performed the orthotopic transplantation of AKP and KO organoids into the colon directly then, tested metastasis of cancer.

      Author response image 1.

      Primary tumor formation and liver metastasis by orthotopic transplantation of AKP or KO colon cancer organoids. 6-8 week-old male C57BL/6J mice were treated with 2.5% DSS dissolved in drinking water for 5 days, followed by regular water for 2 days to remove gut epithelium. After recovery with the regular water, the colon was flushed with 1000 μl of 0.1% BSA in PBS. Then, 200,000 dissociated organoid cells in 200 μl of 5% Matrigel and 0.1% BSA in PBS were instilled into the colonic luminal space. After infusion, the anal verge was sealed with Vaseline. 8 weeks after transplantation, the mice were sacrificed to measure primary tumor formation and liver metastasis.

      As a result, 4 out 6 mice in the control group successfully formed colorectal primary tumors whereas only 2 out 6 mice showed primary tumor formation in the KO group (Author response image 1A). The size of tumors was reduced by about half (10-12 mm to 5-7 mm). Only one AKP mouse developed metastasized nodules in the liver (Author response image 1B). Next, to measure the circulating tumor cells, we harvested at least 500 ul of bloods from the portal vein and then analyzed tdTomato-positive tumor cells (Author response image 2). Flow cytometry analysis of PBMCs showed the presence of tdTomatohiCD45- cells as well as tdTomatomidCD45+ cells in 2 out of 6 AKP mice, while no tdTomato-positive cells were observed in the PBMCs of KO organoid-transplanted mice.

      Due to the limited numbers of mice showed primary and metastatic tumor formation, we cannot provide a statistic analysis of DKK2-mediated metastasis. However, our revised data indicate a trend that DKK2 KO reduced primary tumor formation, the number of circulating tumor cells and liver metastasis. This trend is consistent with our previous report in the iScience paper, which showed that DKK2 KO reduced AOM/DSS-induced polyp formation about 60 % and decreased metastasis in the splenic injection model system in this manuscript. Further studies are necessary to confirm this trend and to provide the underlying mechanisms of intravasation and extravasation of circulating tumor cells.

      Author response image 2.

      Flow cytometry analysis of tdTomato+ circulating colon tumor cells in PBMCs. PBMCs were harvested via the portal vein after euthanasia. CD45 and tdTomato were analyzed by flow cytometry.

      b) The overall number of Paneth cells found in the scRNA-seq analysis of liver mets was strikingly low (17 cells, Figure 3), and assuming that these cells are driving the differences seems somewhat far-fetched. Adding to this concern is inappropriate gating in the flow plot shown in Figure 6. This should be addressed experimentally and in the interpretation of data. 

      We appreciate for reviewer’s comments to clarify this point. Since the number of LYZ+ cells is low in our scRNA-seq analysis, we performed flow cytometry in Figure 6H showing the clear population expressing LYZ in the same splenic injection model of metastasis. Figure 6H is a representative image of triplicates for each group and we performed this experiment three times, independently. As suggested, we changed the graph format and updated the gating and statistical analysis in Fig 6H and 6I. This in vivo result confirmed our in vitro data showing that DKK2 KO reduced LYZ+ cells while increase the HNF4α1 proteins.

      c) Figures 3, 5, and 6 show the individual gene analyses with unclear statistical data. It seems that the p-values were not adjusted, and it is unclear how they reached significance in several graphs. Additionally, it was not stated how many animals per group and cells per animal/group were included in the analyses. 

      In Fig. 3, mouse scRNA-seq data were generated from pooled cancer samples from 5 animals per group. The Wilcoxon signed-rank test was performed for each gene and/or regulon activity. Since multiple testing adjustments were not performed, a p-value adjustment is neither needed nor applicable..

      In Fig. 5, human data were analyzed. Cells from the same sample are dependent, but differential gene expression (DEG) analysis typically calculates statistics under the assumption that they are independent. This assumption may explain the low p-values observed in our data. To address this issue, we applied pseudobulk DEG analysis to our human single-cell data. Even after correcting for statistical error, we confirmed that the genes of interest still exhibited significantly different expression patterns (Author response image 3).

      Author response image 3.

      Pseudobulk DEG analysis confirmed the differential expression genes of interest.

      In Fig.6H-6I, the number of animals per group is provided in the figure legend.

      d) Figure 6 suggests a signaling cascade in which the absence of DKK2 leads to enhanced HNF4A expression, which in turn results in reduced Sox9 expression and hence reduced expression of Paneth cell properties. It is therefore crucial that the authors perform in vivo (splenic organoid injection) loss-of-function experiments, knockdown of Sox9 expression in AKP organoids, and Sox9 overexpression experiments in AKP/Dkk2 KO organoids to demonstrate Sox9 as the central downstream transcription factor regulating liver CRC metastasis. 

      Sox9 is a well-established marker gene for Paneth cell formation in the gut. Therefore, overexpression or knockout of the Sox9 gene would result in either an increase or decrease in Paneth cells in the organoids. We believe that the suggested experiments fall outside the scope of this manuscript. Instead, we demonstrated the change in the Paneth cell differentiation marker, Sox9, in the presence or absence of DKK2.

      e) Given the previous description of the role of DKK2 in primary CRC, it is important to define the step of liver metastasis affected by Dkk2 deficiency in the metastasis model. Does it affect extravasation, liver survival, etc.? 

      We appreciate the reviewer’s insights and perspectives. Regarding liver survival, it is well known that stem cell niche formation is a critical step for the outgrowth of metastasized cancer cells (Fumagalli et al. 2019, Cell Stem Cell). LYZ+ Paneth cells are recognized as stem cell niche cells in the intestine, and human scRNA-seq data have shown that LYZ+ cancer cells express stem cell niche factors such as Wnt and Notch ligands. To determine whether LYZ+ cancer cells act as stem cell niche cells, we performed confocal microscopy to assess whether LYZ+ cancer cells express WNT3A and DLL4 in AKP organoids (Author response image 4). The results show that LYZ labeling co-localizes with DLL4 and WNT3A expression, while the organoid reporter tdTomato is evenly distributed. Additionally, our in vitro and in vivo data indicate that DKK2 deficiency leads to a reduction of LYZ+ cancer cells, which may contribute to stem cell niche formation. Based on these findings, we propose that DKK2 is an essential factor for stem cell niche formation, which is required for cancer cell survival in the liver during the early stages of metastasis. Although our revised data confirmed the trend that DKK2 deficiency decreases liver metastasis, we have not yet determined whether DKK2 is involved in extravasation. This research topic should be addressed in future studies.

      Author response image 4.

      Confocal microscopy analysis for lysozyme (LYZ) and Paneth cell-derived stem cell niche factors, WNT3A and DLL4 in AKP colon cancer organoids.

      The method is described in the supplemental information. The list of antibodies used: DLL4 (delta-like 4) Polyclonal Antibody (Invitrogen, PA5-85931), WNT3A Polyclonal Antibody (Invitrogen, PA5-102317), Goat anti-Rabbit IgG (H+L) Cross-Adsorbed Secondary Antibody, Alexa Fluor™ 488 (Invitrogen, A-11008), Anti-Lysozyme C antibody (H-10, Santacurz, sc-518083), Goat anti-Mouse IgM (Heavy chain) Secondary Antibody, Alexa Fluor™ 647 (Invitrogen, A-21238).

      (3) Human data 

      Can the authors address whether the expression of Dkk2 changes in human CRC and whether mutations in Dkk2 as correlated with metastatic disease or CRC stage? 

      The human data were useful in identifying the presence of LYZ+ cancer cells with Paneth cell properties. However, due to the limited number of late-stage patient samples with high DKK2 expression, the results were not statistically significant. Nevertheless, the trend suggests a positive correlation between DKK2 expression and the malignant stage of CRC.

      (4) Bioinformatic analysis 

      The authors did not provide sufficient information on bioinformatic analyses. The authors did not include information about the software, cutoffs, or scripts used to make their analyses or output those figures in the manuscript, which challenges the interpretation and assessment of the results. Terms like "Quantitative gene expression analyses" (line 136) "visualized in a Uniform Approximation and Projection" (line 178) do not explain what was inputted and the analyses that were executed. There are multiple forms to align, preprocess, and visualize bulk, single cell, ATAC, and ChIP-seq data, and depending on which was used, the results vary greatly. For example, in the single-cell data, the authors did not inform how many cells were sequenced, nor how many cells had after alignment and quality filtering (RNA count, mt count, etc.), so the result on Paneth+ to Goblet+ percent in lines 184 and 185 cannot be reached because it depends on this information. The absence of a clustering cutoff for the single-cell data is concerning since this greatly affects the resulting cluster number (https://www.nature.com/articles/s41592-023-01933-9). The authors should provide a comprehensive explanation of all the data analyses and the steps used to obtain those results. 

      We apologize for the insufficient information. Below, we provide detailed information on the data analyses, which are also available in the GEO database (Bulk RNA-seq: GSE157531, ATAC-seq: GSE157529, ChIP-seq: GSE277510). Methods are updated in the current version of supplemental information.

      (5) Clarity of methods and experimental approaches 

      The methods were incomplete and they require clarification. 

      We’ve updated our methods as requested by the reviewer.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors propose that DKK2 is necessary for the metastasis of colon cancer organoids. They then claim that DKK2 mediates this effect by permitting the generation of lysozyme-positive Paneth-like cells within the tumor microenvironmental niche. They argue that these lysozyme-positive cells have Paneth-like properties in both mouse and human contexts. They then implicate HNF4A as the causal factor responsive to DKK2 to generate lysozyme-positive cells through Sox9. 

      Strengths: 

      The use of a genetically defined organoid line is state-of-the-art. The data in Figure 1 and the dependence of DKK2 for splenic injection and liver engraftment, as well as the long-term effect on animal survival, are interesting and convincing. The rescue using DKK2 administration for some of their phenotype in vitro is good. The inclusion and analysis of human data sets help explore the role of DKK2 in human cancer and help ground the overall work in a clinical context. 

      Weaknesses: 

      In this work by Shin et al., the authors expand upon prior work regarding the role of Dickkopf-2 in colorectal cancer (CRC) progression and the necessity of a Paneth-like population in driving CRC metastasis. The general topic of metastatic requirements for colon cancer is of general interest. However, much of the work focuses on characterizing cell populations in a mouse model of hepatic outgrowth via splenic transplantation. In particular, the concept of Paneth-like cells is primarily based on transcriptional programs seen in single-cell RNA sequencing data and needs more validation. Although including human samples is important for potential generality, the strength could be improved by doing immunohistochemistry in primary and metastatic lesions for Lyz+ cancer cells. Experiments that further bolster the causal role of Paneth-like CRC cells in metastasis are needed. 

      Recommendations for the Authors:

      Reviewing Editor (Recommendations for the Authors): 

      Here we note several key concerns with regard to the main conclusions of the paper. Additional experiments to directly address these concerns would be required to substantially update the reviewer evaluation. 

      (1) Demonstration of a causal role of Paneth-like cells in CRC metastasis, for example by sorting the Paneth-like cells - either by the markers they identified in the subsequent single cell or by scatter - to establish whether the frequency of the Paneth-like cells in a culture of organoids is directly correlated with tumorigenicity and engraftment. 

      We sincerely appreciate the reviewing editor’s comment. First, as previously reported (Shin et al., iScience 2021), there is no difference in proliferation between WT and KO during in vitro organoid culture or in vivo colitis-induced tumors. However, DKK2 deficiency led to morphological changes, which we analyzed using bulk RNA-seq. As described in the manuscript, Paneth cell marker genes, such as Lysozymes and defensins, were significantly reduced in DKK2 KO AKP organoids.

      Due to the nature of these markers, it is technically challenging to isolate live LYZ+ cancer cells. To address this issue in the future, we plan to develop organoids that express a reporter gene specific for Paneth cells. In this manuscript, we demonstrated a correlation between DKK2 and the formation of LYZ+ cancer cells. In both the splenic injection model (Fig. 1) and the orthotopic transplantation model (Fig. R1-R2), we observed that transplantation of cancer organoids with reduced numbers of LYZ+ cells (KO organoids) led to decreased metastatic tumor formation. The number of LYZ+ cells in KO-transplanted mice remained low in liver metastasized tumor nodules (Fig. 6H-I6). Immunohistochemistry further confirmed that LYZ+ cancer cells were barely detectable in KO samples (Author response image 5). These data suggest that DKK2 is essential for the formation of LYZ+ cancer cells, which are necessary for outgrowth following metastasis.

      Author response image 5.

      Histology of Lysozyme positive cells in metastasized tumor nodules in liver of colon cancer organoid transplanted mice. Immunohistochemistry of Lysozyme positive Paneth-like cells cells in liver metastasized colon cancer (Upper panels, DAB staining). Identification of tumor nodules by H&E staining (lower panels, Scale bar = 100 μm). Magnified tumor nodules are shown in the 2nd and 3rd columns (Scale bar = 25 μm). Arrows indicate Lysozyme positive Paneth like cells in tumor epithelial cells. Infiltration of Lysozyme positive myeloid cells is detected in both AKP and KO tumor nodules. AKP: Control colon cancer organoids carrying mutations in Apc, Kras and Tp53 genes. KO: Dkk2 knockout colon cancer organoids

      (2) Further characterization of Lyz+/Paneth-like cells to further the authors' argument for the unique function that they have in their tumor model. Specifically, do the cells with Paneth-like cells secrete Wnt3, EGF, Notch ligand, and DII4 as normal Paneth cells do? 

      We appreciate the reviewing editor’s comment. In response, we performed confocal microscopy analysis to examine the protein levels of LYZ, Wnt3A, and DLL4 in AKP colon cancer organoids (Author response image 4). The data presented above show that LYZ+ cancer cells express both Wnt3A and DLL4, suggesting that LYZ+ colon cancer cells may function similarly to Paneth cells, which are stem cell niche cells. Furthermore, using the Panglao database, we demonstrated that LYZ+/Paneth-like cells exhibit typical Paneth cell properties in human scRNA-seq data (Fig. 4 and Fig. 5). These findings suggest that LYZ+ colon cancer cells possess Paneth cell properties.

      (3) Experiments to test metastasis, ideally from orthotopic colonic tumors, to ensure phenotypes aren't restricted to the splenic model of hepatic colonization and outgrowth used at present. 

      We are in agreement with the reviewing editor and reviewers, which is why we conducted the orthotopic transplantation experiment. However, we encountered challenges in establishing this model effectively. After multiple trials, we observed that many mice did not form primary tumors, and the variability, particularly in metastasis, was difficult to control. Only a few AKP-transplanted mice developed liver metastasis. The representative revision data have been provided above. Nevertheless, we believe that this model needs further improvement and optimization to reliably study metastasis originating from primary tumors.

      (4) To generalize claims to human cancer, the authors should test whether loss of DKK2 impacts LYZ+ cancer cells in human organoids and affects their engraftment in immunodeficient mice compared to control. Another more correlative way to validate the LYZ+ expression in human colon cancer would be to stain for LYZ in metastatic vs. primary colon cancer, expecting metastatic lesions to be enriched for LYZ+ cells. 

      We agree with your point, and this will be addressed in future studies.

      (5) Clarifying inconsistencies regarding effect of DKK2 loss on HNF4A (Figure 1E vs Figure 6I). 

      In Figure 1 E, we measured the mRNA levels of HNF4A in metastasized foci by qPCR while in Figure 6I, we measured the protein level of HNF4A by flow cytometry. Recent studies, including our previous report, have shown that HNF4A protein levels are regulated by proteasomal degradation mediated by pSrc (Mori-Akiyama et al. 2007, Gastroenterology, Bastide et al. 2007, Journal of Cell Biology, Shin et al. 2021 iScience). Consequently, while the mRNA levels remained unchanged in Fig. 1E, we observed a reduction of HNF4A protein levels in Figure 6I.

      (6) Addressing concerns about statistics and reporting as outlined by Reviewer 1. 

      Thank you very much for your assistance in improving our manuscript. The updates have been incorporated as detailed above.

      These are the central reviewer concerns that would require additional experimentation to update the editorial summary. Other concerns should be addressed in a revision response but do not require additional experimentation. 

      Reviewer #1 (Recommendations For The Authors): 

      Specific comments: 

      • Do Dkk2-KO organoids grow normally?

      Yes, in vitro.

      Since the authors reported on the effects of Dkk2 in the induction/maintenance of the Paneth cell niche, changes in AKP organoid numbers of growth rate between Dkk2-WT and KO would be an expected outcome. 

      Disruption of Paneth cell formation in normal organoids is expected to alter growth. However, DKK2 KO in colon cancer organoids with mutations in the Apc, Kras, and Tp53 genes exhibits growth rates and organoid sizes similar to those of WT AKP controls. In contrast to in vitro observations, we observed a significant reduction in metastasized tumor growth in vivo. Further analyses of factors derived from LYZ+ cancer cells will help address the discrepancy in DKK2's absence between in vitro and in vivo conditions.

      • Figure 1: 

      - Panel C: The legend indicates what c.p. stands for.

      c.p.m. stands for count per minutes for in vivo imaging analysis. This has been updated in the Figure legend.

      - Panel E: Please comment on the possible underlying reasons for the lack of change in HNF4a1 levels. 

      This has been updated in response to the reviewing editor’s comment (5) above.

      - Panel E: Number of mice from which isolated cancer nodules were harvested. 

      Total mice per group were 5. This has been updated in the legend.

      • Figure 2: 

      - Suggestion: Panel A should be presented in Figure 1 since Dkk2 KO organoids are already used in Figure 1. 

      We added this to present the recovery of DKK2 by adding recombinant DKK2 proteins in Fig.2.

      - Panel B: Please explain why these genes are marked in blue. 

      It has been described in the legend. “Paneth cell marker genes are highlighted as blue circles (AKP=3 and KO=5 biological replicates were analyzed).”

      • Figure 3: 

      - Indicate the number of cells recovered from AKP vs. KO mice (since liver metastasis was already reduced in KO mice). This should be shown in a UMAP. 

      - Panel A: 4th line in the pathways, correct "Singel" typo. 

      We appreciate your correction. It has been fixed.

      - Panel A: There are multiple versions of PanglaoDB with different markers; a list of all that was used to determine cell type should be provided. 

      - Panel C: Bar value for the WNT pathway is not displayed, and there is no legend to indicate the direction of the analysis (that is, AKPvsKO or KOvsAKP). 

      It is KOvsAKP, described in the figure legend.

      - Panel C: Ingenuity pathway analysis is not a good tool to look at this type of result because it does not include the gene fold changes in the analysis, so it only provides a Z-score of the presence of that pathway and not the degree it is increased or fold changes - recommend substituting any type of GSEA analysis, such as fgsea. -o Panel D: the term "Patient" to refer to mice is confusing. Use "Mice" or "Treatment" or "Condition" instead. 

      Corrected

      - Panel D: Information about the number of mice per group, cells per animal (or liver let) used, and additional clarification about the statistical analysis used is required, as differences shown in this panel appear subtle given the standard variation in each group. Box plots need to show individual/raw values. 

      • Figure 4: 

      - Panel E: It would be helpful to show the cutoff lines for the Paneth cell score and Lyz expression in the graphs. 

      It has been updated in response to the reviewer’s request.

      • Figure 5: 

      - Panel B: again, information about the number of "patients" or cells used and clarification about the statistical analysis used is required as the display of data generates concerns about the distribution within groups. Box plots need to show individual/raw values

      It has been updated in response to the reviewer’s request.

      • Figure 6: 

      - Panel A: Add a legend to inform the direction of the process (e.g., red, activation, blue, repression). We noticed the Yap1 bar data had no color. Is there a reason for that? Please explain this point in the revised manuscript. 

      Red color added for the Yap1.

      - Panel A: Ingenuity pathway analysis is not a good tool to look at this type of results because it does not include the gene Foldchanges in the analysis, so it only provides a Z-score of the presence of that pathway and not the degree it is increased or not. I recommend substituting any type of GSEA analysis, such as fgsea. 

      - Panels A&B: Again, only p-value scores were provided, while fold changes are necessary to define the ratio of presence increase of normal vs. AKP. 

      - Panel D: No raw or pre-processed ChIP-seq data was provided. Additionally, please indicate exactly the genome location (it seems the image was edited from a raw made on UCSC genome browser-it should be remade by adding coordinates and other important information (genes around, epigenetic, etc.). 

      - Panel H/I: Flow cytometry gating is inappropriate, as its catching cells are negative for LYZ in both AKP and KO cells, resulting in an overestimation of the number of Lyz cells. Gating should specifically select very few LYZ-positive cells in the top/left quadrant. 

      The updates have been made, and the statistical data have been re-analyzed.

      - Panel J: Information about the number of animals/organoids or cells used and clarification about the statistical analysis used is required, as the display of data generates concerns about the distribution within groups. Box plots need to show individual/raw values. 

      • Overall: 

      - A supplementary table with all the sequenced libraries and their depth, read length/cell count should be provided.

      All of the information is now available in the GEO database. We used previously published human epithelial datasets for human single cell analysis (Joanito*, Wirapati*, Zhao*, Nawaz* et al, Nat Genetics, 2022, PMID: 35773407).

      - The Hallmark Geneset used is very broad, and the authors should confirm the results on GO bp. 

      Using Gene Ontology biological processes (GO bp), we observed that glycolysis-related genes were enriched in our newly described cell population, although the adjusted p-value did not exceed 0.05.

      Author response image 6.

      GSEA with GOBP pathway highlighted glycoprotein and protein localization to extracellular region, both of which are related Paneth cell functions. Paneth cells secrete α-defensins, angiogenin-4, lysozyme and secretory phospholipase A2. The enriched glycoprotein process and protein localization not extracellular region reflect the characteristics of Paneth cells. 

       

      - qPCR is not a good way to confirm sequencing results; while PCR data is pre-normalized, sequencing is normalized only after quantification, so results on 6 E and F should be shown on the sequencing data. 

      The expression level of Sox9 is relatively low. In our bulk RNA-seq data, the averages for Sox9 in AKP versus DKK2 KO are 28.2 and 25.1, respectively. While there is a similar trend, the difference is not statistically significant in this dataset, and we did not include an experimental group for reconstitution. Therefore, we conducted qPCR experiments for the reconstitution study by adding recombinant DKK2 (rmDKK2) protein to the culture. Furthermore, it is well established that Sox9 is an essential transcription factor for the formation of LYZ+ Paneth cells. Based on this, we assessed the levels of LYZ and Sox9 using qPCR and confocal microscopy in the presence or absence of DKK2.

      • Edits in the text: 

      - There are several typographical errors. Specific suggestions are provided below. 

      - Line 43: "Chromatin immunoprecipitation followed by sequencing analysis," state analysis of what cells before continuing with "revealed..." revealed... 

      - Line 77: Recent findings have identified 

      - Line 138: were reduced in KO tumor samples à rephrase to clarify "KO-derived liver tumors" 

      - Line 167: Recombinant mouse DKK2 protein treatment in KO organoids partially rescued this effect. Add "partially" since adding rmDkk2 didn't fully restore Lyz1 and Lyz2 levels. 

      - Line 185-187: the authors should not reference Figure 6 because it has not been introduced yet. 

      - Line 198-199: The authors claimed a correlation between Dkk2 expression and Lgr5 expression; however, the graph presented in Figure 3B does not indicate this. The R-value was 0.11, which does not indicate a correlative expression between these genes. 

      - Line 232-233: the authors need to show any connection to Dkk2 gene expression in human samples in order to draw that conclusion. 

      - Line 294: expression, leading to the formation 

      - Line 347: Wnt ligand (correct Wng typo) 

      We have modified our manuscript in accordance with the reviewer’s suggestions.

      Reviewer #2 (Recommendations For The Authors): 

      Specific criticisms/suggestions: 

      Author claim 1: Dkk2 is necessary for liver metastasis of colon cancer organoids. <br /> This model is one of hepatic colonization and eventual outgrowth and not metastasis. Metastasis is optimally assessed using autochthonous models of cancer generation, with the concomitant intravasation, extravasation, and growth of cancer cells at the distant site. The authors should inject their various organoids in an orthotopic colonic transplantation assay, which permits the growth of tumors in the colon, and they can then identify metastasis in the liver that results from that primary cancer lesion (i.e., to better model physiologic metastasis from the colon to liver). 

      The data of orthotopic colonic transplantation data has been provided above (Author response images 1 and 2).

      Author claim 2: DKK2 is required for the formation of lysozyme-positive cells in colon cancer. 

      It would greatly strengthen the authors' claim if supraphysiologic or very high amounts of DKK2 enhance CRC organoid line engraftment ( i.e., the specific experiment being pre-treatment with high levels of DKK2 and immediate transplantation to see a number of outgrowing clones). If DKK2 is causal for the engraftment of the tumors, increased DKK2 should enhance their capacity for engraftment. 

      Paneth cells have physical properties permitting sorting and are readily identifiable on flow cytometry. The authors should demonstrate increased tumorigenicity and engraftment by sorting the Paneth-like cells-either by the markers they identified in the subsequent single cell or by scatter to establish whether the frequency of the Paneth-like cells in a culture of organoids is directly correlated with engraftment potential. 

      Further characterization of the Paneth-like cells would help further the authors' argument for the unique function that they have in their tumor model. Specifically, do the cells with Paneth-like cells secrete Wnt3, EGF, Notch ligand, and DII4 as normal Paneth cells do? Immunofluorescence, sorting, or western blots would all be reasonable methods to assess protein levels in the sorted population. 

      This has been performed and provided above (Author response images 1 and 3)

      Author claim 3: Lyzosome (LYZ)+ cancer cells exhibit Paneth cell properties in both mouse and human systems. 

      For the claim to be general to human cancer, the author should demonstrate that loss of DKK2 impacts LYZ+ cancer cells in human organoids and affects their engraftment in immunodeficient mice compared to control. Another more correlative way to validate the LYZ+ expression in human colon cancer would be to stain for LYZ in metastatic vs. primary colon cancer, expecting metastatic lesions to be enriched for LYZ+ cells. 

      The claims on the metabolic function of Paneth-like cells need more clarification. Do the cancer cells with Paneth features have a distinct metabolic profile compared to the other cell populations? The authors should address this through metabolic characterization of isolated LYZ+ cells with Seahorse or comparison of Dkk2 KO to WT organoids (i.e., +/-LYZ+ cancer cell population). 

      To address this question, we need to develop organoids with a Paneth cell reporter gene. We appreciate the reviewer’s comment, and this should be pursued in future studies.

      Author claim 4: HNF4A mediates the formation of Lysozyme (Lyz)-positive colon cancer cells by DKK2. 

      The authors implicate HNF4A and Sox9 as causal effectors of the Paneth-like cell phenotype and subsequent metastatic potential. There appears to be some discordance regarding the effect of DKK2 loss on HNF4A. In Figure 1E, the authors show that gene expression in metastatic colon cancer cells for HNF4A in DKK2 knockout vs AKP control is insignificant. However, in Figure 6I, there is a highly significant difference in the number of HNF4A positive cells, more than a 3-fold percentage difference, with a p-value of <0.0001. If there is the emergence of a rare but highly expressing HNF4A cell type that on aggregate bulk expression leads to no difference, but sorts differentially, why is it not identified in the single-cell data set? These data together are highly inconsistent with regards to the effect of DKK2 on HNF4A and require clarification. 

      Previous studies have demonstrated that HNF4A is regulated by proteasomal degradation mediated by pSrc. As a result, the mRNA level of HNF4A remains unchanged, while the protein level is significantly reduced in colon cancer cells. DKK2 KO leads to decreased Src phosphorylation, resulting in the recovery of HNF4A protein levels. This explains why HNF4A cannot be detected in scRNA-seq datasets, which measure mRNA. We have shown this in our previous report. In this manuscript, based on ChIP-seq data using an anti-HNF4A monoclonal antibody, as well as confocal microscopy and qPCR data for the Sox9 gene, we propose that HNF4A acts as a regulator of cancer cells exhibiting Paneth cell properties.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript provides novel and important findings regarding the impact of noradrenergic signaling from the locus coeruleus on hippocampal gene expression. The locus coeruleus is the sole source of noradrenaline to the hippocampus and many rapid molecular changes induced by stress are regulated by noradrenaline. This manuscript provides a rigorous investigation into hippocampal genes uniquely regulated by noradrenaline in the presence or absence of stress. Data were collected and analyses were performed using solid methodology, and the results mostly convincingly support the conclusion made with few weaknesses. The study would benefit from a more comprehensive analyses of sex differences.

      Response: We thank the reviewers and the editors for the positive evaluation of our work and for the constructive feedback. To address some of the key criticisms, we have performed several new experiments and analyses. Importantly, we now provide a much more rigorous comparison of males and females, which strongly suggests that there are no major sex differences in the transcriptomic response to stress and noradrenaline in the hippocampus. We think that these - and other additions discussed below - significantly strengthen the manuscript. We provide detailed responses to all the reviewers comments. We have added numbers to the reviewers’ comments for easier referencing.

      Reviewer #1 (Public Review):

      Comment 1: Privitera et al., provide a comprehensive and rigorous assessment of how noradrenaline (NA) inputs from the locus coeruleus (LC) to the hippocampus regulate stress-induced acute changes in gene expression. They utilize RNA-sequencing with selective activation/inhibition of LC-NA activity using pharmacological, chemogenetic and optogenetic manipulations to identify a great number of reproducible sets of genes impacted by LC activation. It is noteworthy that this study compares transcriptomic changes in the hippocampus induced by stress alone, as compared with selective circuit activation/inhibition. This reveals a small set of genes that were found to be highly reproducible. Further, the publicly available data will be highly useful to the scientific community.

      Response: We are very grateful for this positive evaluation.

      Comment 2: A major strength of the study is the inclusion of both males and females. However, with this aspect of the study also lies the biggest weakness. While the experiments tested males and females, they were not powered for identifying sex differences. There are vast amounts of literature documenting the inherent sex differences, both under resting and stress-evoked conditions, in the LC-NA system and this is a major missed opportunity to better understand if there is an impact of these sex-specific differences at the genetic level in a major LC projection region. There are many instances whereby sex effects are apparent, but do not pass multiple testing correction due to low n's. The authors highlight one of them (Ctla2b) in supplemental figure 6. This gene is only upregulated by stress in females. It is appreciated that the manuscript provides an incredible amount of novel data, making the investigation of sex differences ambitious. Data are publicly available for others to conduct follow up work, and therefore it may be useful if a list of those genes that were different based on targeted interrogation of the dataset be provided with a clear statement that multiple testing corrections failed. This will aid further investigations that are powered to evaluate sex effects.

      Response: The assessment of the reviewers and the editorial feedback encouraged us to look more thoroughly into potential sex differences, because we believe it would indeed be a major additional strength if our manuscript could make a firm statement on this important issue. To this end, we have expanded the manuscript in two major ways:

      (1) To expand the analysis of sex effects also to the dorsal hippocampus, and to increase robustness of the data, we have performed RNA-seq in 32 additional samples of male and female mice exposed to stress (or control) and propranolol (or saline) injection. Figure 1fh and Supplementary Figure 1d-f have been updated to reflect this new addition, and the results are presented in a new section on Pages 3-4 (pasted below for ease of reviewing). In summary, the strongly support our initial observation that the effects of stress on gene expression, as well as the effects of propranolol on blocking stress-induced effects, are highly similar in both sexes.

      (2) To further increase the power for detection of sex-effects, we have performed a small meta-analysis. For this, we combined several RNAseq datasets from the current manuscript and published datasets from our previous work (Floriou-Servou et al., 2018; von Ziegler et al., 2022), which also investigated transcriptomic sex-differences in the hippocampus 45 min after cold swim stress exposure in the same setup as used for the current manuscript. This approach increased our sample size to 51 males and 20 females. In summary, this well-powered approach shows no evidence for sex differences in the transcriptional response to stress, even when more lenient analyses were applied. These results are described in a new section on page 4, and summarized in Supplementary Figures 1f+g. This section is pasted below for ease of reviewing.

      "While blocking β-adrenergic receptors was able to block stress-induced gene expression, we did not test whether propranolol might decrease gene expression already at baseline, independent of stress. Additionally, all tests had thus far been conducted in male mice, raising the question about potential sex differences in NA-mediated transcriptomic responses. To address these two issues, we repeated the experiment in both sexes and included a group that received a propranolol injection but was not exposed to stress (Fig. 1f). Combining the data from both experiments, we repeated the analysis for each region, to identify genes whose response to stress was inhibited by propranolol (Figure 1g). As in the previous experiment, we found that many of the stress-induced gene expression changes were blocked by propranolol injection in both dHC (Figure 1g, left panel) and vHC (Figure 1g, right panel). Importantly, propranolol did not change the expression level of these genes in the absence of stress. We then directly compared the genes sensitive to stress and propranolol treatment in both dHC and vHC. To this end, we plotted the union of genes showing a significant stress:propranolol interaction in either region in one heatmap across both dHC and vHC (Supplementary Figure 1d). This showed again that the stress-induced changes were very similar in dHC and vHC, and that propranolol similarly blocked many of them. Finally, we asked whether the response differs between males and females. Despite clear sex differences in gene expression at baseline (data not shown), we found no significant sex differences in response to stress or propranolol between male and female mice (FDR<0.05; Fig. 1g). To more directly visualize this, we compared females and males by plotting the log2-fold changes of the stress:propranolol interaction across all stress-induced genes that were blocked by propranolol. We find very similar regulation patterns in both sexes (Figure 1h). Although none of these sex differences are significant, some genes seem to show quantitative differences, so we plotted the expression patterns of the 5 genes showing the largest difference in interaction term as box-plots, which suggest that these spurious differences are likely due to noisy coefficient estimates (Supplementary Fig. 1e). To address concerns that our analysis of sex differences might not have been sufficiently powered, we performed a meta-analysis of the experiments shown here along with previously published datasets from our lab (Floriou-Servou et al. 2018; von Ziegler et al. 2022). In all these experiments, the vHC of male and female mice was profiled 45 min after exposure to an acute swim stress challenge. This resulted in a sample size of 51 males and 20 females. Despite this high number of independent samples, we could not identify any statistically significant interaction between sex and the stress response. To identify candidates that might not reach significance while discounting differences due to noise in fold-change estimates, we reproduced the same analysis using DESeq2 with Approximate Posterior Estimation for generalized linear model (apeglm) logFC shrinkage (A. Zhu, Ibrahim, and Love 2018). This analysis also did not reveal any sex differences in the stress response (Supplementary Fig. 1f). We then tailored the meta-analysis specifically to the set of stress-responsive genes that were blocked by propranolol, and also for these genes the response to stress was strikingly similar in both sexes (Supplementary Fig. 1g). Altogether, we conclude that there are no major sex differences in the rapid transcriptomic stress response in the hippocampus, and that blocking beta-receptors prevents a large set of stress-induced genes in both females and males."

      To put these findings in context with existing literature, we agree with the reviewer that there are many studies that have reported sex differences in the LC-circuitry as summarized by Bangasser and colleagues (Bangasser et al., 2016, 2019). However, these studies primarily focus on the LC itself, suggesting that female rats have more LC neurons, denser LC-dendrites in the peri-LC region, and that LC neurons are more readily activated by stress in females because of heightened sensitivity to CRF-signaling. A recent study in mice reports, in contrast, that females have fewer TH-positive neurons in the LC, but they also find enhanced excitability of LC neurons in females (Mariscal et al., 2023). Similarly, one study has suggested molecular differences in the makeup of the LC (Mulvey et al., 2018). Our experiments, however, focus on the impact of NA release in a projection region (hippocampus). Further, we use a strong stress induction protocol (swim stress) and various potent modes of direct LC activation, so differences in "LC-excitability" are likely less relevant in this context. We added evidence showing that we trigger powerful NA release in both sexes (Supplementary Figure 2c-h; see response to Reviewer #2, Comment #3 for more details). In addition, we show that the intensity or pattern of LC stimulation does not appear to alter the molecular response (Figure 3a-b), and that various stressors (mild or intense) all trigger the same NA-dependent molecular changes (Figure 4a-b). Therefore, our results suggest that once NA is released (in the hippocampus), the molecular downstream effects on gene expression are very similar - independent of stimulation intensity, sex, or hippocampal subregion (dorsal/ventral). This does not mean that there are no sex differences for activation of LC, but rather that the transcriptional response to NA release in the hippocampus is robust across sexes, and that propranolol seems to block NA-dependent effects similarly in both sexes. This does not rule out quantitative differences between sexes that only emerge with targeted analyses of individual genes, or once fluctuations in ovarian hormones are taken into account. We have updated the section in the discussion to summarize these considerations in light of the new results (see pages 20-21, section: "A uniform molecular response to stress and noradrenaline release in both sexes").

      Comment 3: A major finding of the present study is the involvement of noradrenergic transcriptomic changes occurring in astrocytic genes in the hippocampus. Given the stated importance of this finding within the discussion, it seems that some additional dialogue integrating this with current literature about the role of astrocytes in the hippocampus during stress or fear memory would be important.

      Response: We thank the reviewer for giving us an opportunity to add a more detailed discussion about the role of astrocytes and thyroid hormones in the hippocampus during learning and memory formation. We have added these statements to the discussion:

      “Within the hippocampus, astrocytic pathways are emerging as important players for learning and memory processes (Gibbs, Hutchinson, and Hertz 2008; Bohmbach et al. 2022). In fact, it is well-known that NA enhances memory consolidation (Schwabe et al. 2022; McGaugh and Roozendaal 2002), and recent work suggests that these effects are mediated by astrocytic β-adrenergic receptors (Gao et al. 2016; Iqbal et al. 2023). Our transcriptomic screens revealed Dio2 as the most prominent target influenced by LC activity. Dio2 is selectively expressed in astrocytes and encodes for the intracellular type II iodothyronine deiodinase, which converts thyroxine (T4) to the bioactive thyroid hormone 3,3',5-triiodothyronine (T3) and therefore regulates the local availability of T3 in the brain (Bianco et al. 2019). Enzymatic activity of DIO2 has further been shown to be increased by prolonged noradrenergic transmission through desipramine treatment in LC projection areas (Campos-Barros et al. 1994). This suggests that the LC-NA system and its widespread projections could act as a major regulator of brain-derived T3. Notably, T3-signaling plays a role in hippocampal memory formation (Rivas and Naranjo 2007; Sui et al. 2006), raising the possibility that NA-induced Dio2 activity in astrocytes might mediate some of these effects.”

      Comment 4: The comparison of the candidate genes activated by the LC in the present study (swim) with datasets published by Floriou-Servou et al., 2018 (Novelty, swim, restraint, and footshock) is an interesting and important comparison. Were there other stressors identified in this paper or other publications that do not regulate these candidate genes? Further, can references be added to clarify to the reader, that prior studies have identified that novelty, restraint and footshock all activate LC-NA neurons.

      ponse: Thank you for the positive feedback. We have only tested the stressors reported in Figure 4a-b (novelty, swim, restraint, and footshock). It is known that all these stressors trigger noradrenaline release, in fact we are not aware of stressors that do not trigger NA release. This reproducible finding supports the notion that the identified set of genes is indeed highly NAresponsive. As suggested, we have now included references that show increased NA release in response to all these stressors:

      “Therefore, we assessed their expression in a dataset comparing the effect of various stressors on the hippocampal transcriptome (Floriou-Servou et al., 2018). The stressors included restraint, novelty and footshock stress, which have all previously been shown to increase hippocampal NA release (HajósKorcsok et al., 2003; Lima et al., 2019; Masatoshi Tanaka et al., 1982).”

      Comment 5: Comparisons are made between chemogenetic studies and yohimbine, stating that fewer genes were activated by chemogenetic activation of LC neurons. There is clear justification for why this may occur, but a caveat may need to be mentioned, that evidence of neuronal activation in the LC by each of these methods were conducted at 90 (yohimbine) versus 45 (hM3Dq) minutes, and therefore it cannot be ruled out that differences in LC-NA activity levels might also contribute.

      Response: The reviewer raises an important point about some inconsistencies between the time points chosen in our study, an aspect that was also pointed out by Reviewer #2. We have chosen the 45 and 90 min time points for two different reasons. On the one hand, cFos changes on the protein level are known to peak 90 min after neuronal activation, and we wanted to capture the strongest possible cFos signal in the LC. On the other hand, we wanted to measure gene expression changes triggered by NA release, which already occur 45 min after noradrenergic activation (Roszkowski et al., 2016). Thus, when the experimental design allowed separate experiments (e.g. systemic yohimbine injection), we chose to measure gene expression after 45 min, but to validate cFos activation in the LC separately after 90min. In response to DREADD activation, however, we wanted to confirm within the same animal that LC activation was successful, and thus we collected LC and hippocampus simultaneously (Figure 2c,d). While the cFos increase is already very pronounced at the 45min time point (Figure 2g), the quality of IHC is slightly lower because the tissue cannot be perfused in this experimental design. Therefore, we do not think that the time point for cFos sampling matters in this context. However, we agree with the reviewer that it remains unclear whether yohimbine and DREADDs activate the LC with similar potency. To directly compare NA release would require a set of photometry-based experiments to measure NA release using genetically-encoded NA-sensors. While we have added such experiments for LC activation with DREADDs and optogenetics to show rapid NA release indeed occurs in the hippocampus (see Reviewer #2, Comment 3; Supplementary Figure 2c-h), yohimbine interferes with the NA-sensors as explained in detail in response to Reviewer 2, Comment 3. Thus, it was too challenging for us to directly compare the release dynamics in response to DREADDs and yohimbine, which was also not the main focus of our work. To explicitly address this caveat, we have extended the corresponding section in the discussion:

      "Finally, our observation that systemic administration of the α2-adrenergic receptor antagonist yohimbine very closely recapitulates the transcriptional response to stress stands in contrast to the much more selective transcriptional changes observed after chemogenetic or optogenetic LC-NA activation. This difference could be due to various factors. First, it remains unclear how strong the LC gets activated by yohimbine versus hM3Dq-DREADDs. However, given the potent LC activation observed after DREADD activation, it seems unlikely that yohimbine would lead to a more pronounced LC activation, thus explaining the stronger transcriptional effects. Second, contrary to LC-specific DREADD-activation, systemic yohimbine injection will also antagonize postsynaptic α2-adrenergic receptors throughout the brain (and periphery). More research is needed to determine whether this could have a more widespread impact on the hippocampus (and other brain regions) than isolated LC-NA activation, further enhancing excitability by preventing α2-mediated inhibition of cAMP production. Finally, systemic yohimbine administration and noradrenergic activity have been shown to induce corticosterone release into the blood (Johnston, Baldwin, and File 1988; Leibowitz et al. 1988; Fink 2016). Thus, yohimbine injection could have broader transcriptional consequences, including corticosteroid-mediated effects on gene expression."

      Comment 6: Please add information about how virus or cannula placement was confirmed in these studies. Were missed placements also analyzed separately?

      Response: Pupillometry recordings were performed with all animals involving optogenetic or chemogenetic manipulations of the LC, before subjecting them to stress experiments. These assessments account for both correct optic fiber placement and virus expression (Privitera et al., 2020). If an animal did not show a clear pupil response, it was not included any further in the study. To demonstrate correct cannula placement for drug infusion of isoprotenerol in the dorsal hippocampus, we added a representative image of cannula placement in Supplementary Figure 1h.

      Comment 7: Time of day for tissue collection used in genetic analysis should be reported for all studies conducted or reanalyzed.

      Response: Thank you for pointing out this omission. Tissue collection for RNA-seq analysis was always performed between 11am and 5pm during the dark phase of the reversed light-dark cycle. We have added this information to the corresponding method section (“Tissue collection”).

      Reviewer #1 (Recommendations For The Authors):

      Comment 8: This is a well written, comprehensive and rigorous manuscript that will be of great interest to those in the scientific community.

      Response: Thank you for the positive evaluation of our work and for the constructive feedback.

      Reviewer #2 (Public Review):

      Comment 1: The present manuscript investigates the implication of locus coeruleus-noradrenaline system in the stress-induced transcriptional changes of dorsal and ventral hippocampus, combining pharmacological, chemogenetic, and optogenetic techniques. Authors have revealed that stress-induced release of noradrenaline from locus coeruleus plays a modulatory role in the expression of a large scale of genes in both ventral and dorsal hippocampus through activation of β-adrenoreceptors. Similar transcriptional responses were observed after optogenetic and chemogenetic stimulation of locus coeruleus. Among all the genes analysed, authors identified the most affected ones in response to locus coeruleus-noradrenaline stimulation as being Dio2, Ppp1r3c, Ppp1r3g, Sik1, and Nr4a1. By comparing their transcriptomic data with publicly available datasets, authors revealed that these genes were upregulated upon exposure to different stressors. Additionally, authors found that upregulation of Ppp1r3c, Ppp1r3g, and Dio2 genes following swim stress was sustained from 90 min up to 2-4 hours after stress and that it was predominantly restricted to hippocampal astrocytes, while Sik1 and Nr4a1 genes showed a broader cellular expression and a sharp rise and fall in expression, within 90 min of stress onset.

      Overall, the paper is well written and provides a useful inventory of dorsal and ventral hippocampal gene expression upregulated by activation of LC-NA system, which can be used as starting point for more functional studies related to the effects of stress-induced physiological and pathological changes.

      Response: We thank the reviewer for the careful assessment of our work.

      Comment 2: However, I believe that the study would have benefited of a more comprehensive analyses of sex differences. Experiments in females were conducted only in one experiment and analyses restricted to the ventral hippocampus.

      Response: In response to the comments by the reviewer, as well as Reviewer #1 and the editors, we have sequenced an additional 32 brain samples to expand the comparison of sex effects in females and males across dorsal and ventral hippocampus, and we included a new meta-analysis of 3 experimental datasets (51 male and 20 female) samples, to thoroughly assess sex differences in the transcriptomic response to stress. We refer the reviewer to our detailed response provided above to Reviewer #1, comment #2, and the updated results section on pages 3-4.

      Comment 3: Although, the experiments were overall sound and the results broadly support the conclusion made, I think some methodological choices should be better explained and rationalized. For instance, the study focuses on identifying transcriptional changes in the hippocampus induced by stress-mediated activation of the LC-NA system, however NA release following stress exposure and pharmacological or optogenetic manipulation was mostly measured in the cortex.

      Response: Because the hippocampus was used for RNA-sequencing, we could not assess NA release in the hippocampus (as this would require fiber implants that would interfere with molecular measures, or different tissue processing for HPLC). Nonetheless, we wanted to assess the transcriptional changes in the hippocampus, while simultaneously measuring successful stimulation of the LC-NA system in the same animals. To achieve this, we pursued 3 routes: 1) we used pupillometry to confirm functional LC activation; 2) we measured cFOS in the LC to directly demonstrate LC activation; 3) we assessed NA release using uHPLC (which requires larger tissue samples) and we chose the cortex because both cortex and hippocampus receive NA predominantly from the LC (Samuels & Szabadi, 2008). Importantly, we had previously shown that chemogenetic LC activation leads to a similar NA turnover in both the cortex and hippocampus, as measured by uHPLC (Zerbi et al., 2019). The relevant figure from that paper is inserted below to quickly show the striking similarity between hippocampus and cortex.

      Author response image 1.

      Levels of noradrenaline (NE) turnover (MHPG/NE ratio) in the cortex (CTX) and hippocampus (HC), measured in whole tissue with uHPLC 90min after hM3Dq-DREADD activation of the LC (copied and cropped from Zerbi et al, 2019, Neuron).

      In response to the reviewers comment, we performed additional experiments to directly demonstrate that LC-activation with DREADDs as well as optogenetics causes an increase in hippocampal NA-release. We recorded NA release in the hippocampus (using fiber photometry combined with genetically encoded NA sensors). For DREADD activation, we observed a strong increase in hippocampal noradrenaline that started a few minutes after clozapine administration, and this increase was sustained throughout the duration of the 21 minute recording (see Supplementary Figure2c-e). For optogenetic LC activation, we find a rapid and immediate sharp increase in NA levels in the hippocampus (Supplementary Figure 2f-h). These experiments were performed in females and males and triggered similar responses. An adapted and cropped version of Supplementary Figure 2 is pasted below for ease of reading.

      Please note that we could not perform a similar experiment using yohimbine, because the GRABNE sensors are based on the alpha-2 adrenergic receptor, thus yohimbine administration interferes with the photometry recording. However, we believe that it is clear from this response that strong activation of the LC leads to uniform release of NA in the hippocampus and cortex.

      Author response image 2.

      c, Schematic of fiber photometry recording of hippocampal NA during chemogenetic activation of the LC. After 5 min baseline recording in the homecage animals were injected with clozapine (0.03mg/kg, i.p.) and placed in the OFT for 21min. d, Average ΔF/F traces of GRABNE2m photometry recordings in response to chemogenetic activation of the LC (mean±SEM for hM3DGq+ and hM3DGq- split into females and males, n=3/group/sex). e, Peak ΔF/F response of fiber photometry trace. f, Schematic of fiber photometry recording of hippocampal NA during optogenetic activation of the LC. Animals were lightly anesthetized (1.5% isoflurane) and recorded in a stereotaxic frame. After 1 min baseline recording, animals were stimulated three times with 5Hz for 10s (10ms pulse width, ~8mW laser power) and recorded for 2 min post-stimulation. g, Average ΔF/F traces of the NA sensors GRABNE1m and nLightG in response to optogenetic activation of the LC (mean±SEM for females and males, n(females)= 10, n(males)=5. h, Peak ΔF/F response of fiber photometry trace.

      Comment 4: Furthermore, behavioral changes following systemic pharmacologic or chemogenetic manipulation were observed in the open field task immediately after peripheral injections of yohimbine or CNO, respectively. Is this timing sufficient for both drugs to cross the blood brain barrier and to exert behavioral effects?

      Response: We have previously shown that chemogenetic activation of the LC through clozapine elicits pupil responses within 1-2 minutes after injection (Privitera et al., 2020; Zerbi et al., 2019). This indicates that clozapine rapidly crosses the blood brain barrier and affects LC activity within a few minutes after injection. Our additional experiments using genetically encoded sensors in the hippocampus show this even more directly (Supplementary Figure 2d), see also the response to Comment 3 above.

      Similarly, yohimbine also rapidly crosses the blood brain barrier within the same time frame (Hubbard et al., 1988). These observations are consistent with the rapid behavioral effects that can be detected within a few minutes after injection of clozapine for LC-DREADD activation (Zerbi et al., 2019), and for yohimbine as well (von Ziegler et al., 2023). In response to another comment of this reviewer, we have also re-analyzed the behavior presented in the current manuscript in time-bins of 3 minutes, which also shows the rapid onset of effects in response to yohimbine (within the first 3 min) and DREADDs (within 6 min), see Supplementary Fig. 3.

      Comment 5: Finally, the study shows that activation of noradrenergic hippocampus-projecting LC neurons is sufficient to regulate the expression of several hippocampal genes, although the necessity of these projection to induce the observed transcriptional effects has been tested to some extent through systemic blockade of beta-adrenoceptor, I believe the study would have benefited of more selective (optogenetic or chemogenetic) necessity experiments.

      Response: We understand the reviewer's point that blocking the LC during stress exposure would be an interesting experiment. However, it is very hard to completely silence the LC during intense stressors. In fact, despite intense efforts, we have not been able to silence the LC during swim stress exposure using DREADDs or other chemogenetic approaches (PSAM/PSEM). We were in fact able to silence the LC with the optogenetic inhibitor JAWS (and others have reported successful LC silencing with GtACR2), but there is a major issue involving the "rebound effect", where more NA is released once the inhibition is stopped. We would thus have had to optogenetically silence the LC for 45-90 min, which would create heat artifacts, and require challenging control experiments to draw firm conclusions. Given all these issues, we reasoned that blocking adrenergic receptors is a simple and elegant solution, which provides clear evidence for the necessity of beta-adrenergic signaling.

      Reviewer #2 (Recommendations For The Authors):

      Major concerns:

      Comment 6: The study focuses on the identification of transcriptional changes in the hippocampus induced by stress-mediated activation of the LC-NA system, however, noradrenaline release following stress exposure or yohimbine injection was measured in the cortex. Authors should consider measuring NA concentrations in the hippocampus after exposure to swim stress or administration of yohimbine, or at least explain their choice to analyse to cortex in the manuscript.

      Response: We have addressed this issue in detail in Response to "Reviewer 2, Comment #3", where we provided an overview of the additional data that support our approach. As mentioned before, measuring NA release after yohimbine is not compatible with our GRABNE-photometry approach, as the GRAB-sensor is based on alpha2-adrenoceptor. Here, we would like to add that measuring NA release using photometry during swim stress is also challenging. The challenge is the vigorous movement (swimming, typically in one direction), which creates pressure on the cables/implants. We felt that overcoming these experimental challenges (setup, troubleshooting and controls) would be beyond the scope of the paper, given that it is already known that this stressor leads to strong NA release in the hippocampus. We have now included references that demonstrate that all the stressors used in our work trigger NA increase in the hippocampus (see response to Reviewer 1, Comment 3): “Therefore, we assessed their expression in a dataset comparing the effect of various stressors on the hippocampal transcriptome (Floriou-Servou et al., 2018). The stressors included restraint, novelty and footshock stress, which have all previously been shown to increase hippocampal NA release (Hajós-Korcsok et al., 2003; Lima et al., 2019; Masatoshi Tanaka et al., 1982).”

      Comment 7: Concerning the experiment aimed at investigating sex differences in gene expression, it is not clear the reason why authors decided to restrict their analyses in females to the ventral hippocampal only. The explanation that in males they did not detect major differences between the dorsal and ventral hippocampus is not sufficient, because there could have been different effects in females. Therefore, the conclusion made by the authors that their "results suggest that the transcriptomic response is independent of sex" is not entirely correct, since sex differences were only evaluated in the ventral hippocampus.

      Response: We appreciate the reviewer's critique. As described above, we have now also sequenced the dorsal hippocampal tissue from the propranolol experiment (males and females, 32 samples) and additionally added an extensive meta-analysis of three large datasets (n=71) to compare transcriptional sex differences in response to stress. A detailed description of these experiments and how they have extended/supported our conclusions have been provided in response to Reviewer #1, Comment #2.

      Comment 8: Besides the effects on females, the same experiment examined whether propranolol by itself (in the absence of stress) would have been able to alter gene expression: such effects were not examined in the dorsal hippocampus. In contrast, in a different experiment, the effects of isoproterenol on genes expression were restricted to the dorsal hippocampus only. Furthermore, related to this latter experiment, intra-dorsal hippocampal injection of isoproterenol should presumably mimic the rise in NA observed after stress exposure, why was gene expression measured 90 min after isoproterenol central injections while in the other experiments gene expression was determined 45 min after stress, that is when authors observe the peak NA concentration?

      Response: We have addressed the reviewer's critique of dorsal vs ventral hippocampus by reanalyzing 32 additional samples from dorsal hippocampus of male and female mice after propranolol (or saline) injection. Please see response to Reviewer #1, comment #2.

      Regarding the time points: We have chosen the 45 and 90 min time points mainly for two reasons. First, cFos protein changes are known to be strongest 90 min after neuronal activation. Second, because we wanted to capture gene expression changes triggered by NA release, we reasoned that these effects must be fast and should thus be measured at an early transcriptional time-point (45min). However, after performing the time-course experiment after swim stress exposure (Figure 4d,c), we observed that the LC-NA-sensitive genes (e.g. Dio2 and several PP1-subunits) show the strongest changes 90 min after stress exposure. Therefore, in some of our experiments we opted to analyze gene expression changes at 90min, converging with the time-point we typically use for cFos staining. Contrary to the reviewer's statement, peak NA concentrations are not observed 45 min after the various interventions, but rather the peak in the main metabolite (MHPG) is observed then, due to the temporal dynamics of NA release and breakdown. NA release occurs immediately upon stress exposure (or direct LC activation), which we also show in the new photometry data described above. Thus, rapid NA release triggers intracellular cascades that lead to downstream transcriptional changes, which peak presumably between 4590 min later.

      Comment 9: Behavioral changes following systemic pharmacologic or chemogenetic manipulation were observed in the open field task immediately after peripheral injections of yohimbine or CNO, respectively. Is this timing sufficient for both drugs to cross the blood brain barrier and to exert behavioral effects? It is also not immediately clear the reason why the open field tasks have different durations depending on the experiments, which can also impact the results. Authors might also consider to split the open field data analyses in 2 or 3 min time-bins, to allow for a better comparison across the different results.

      Response: We thank the reviewer for the suggestion to plot the behavior data as time-bins. We have implemented this change for the yohimbine and DREADD experiments, and updated the corresponding figure accordingly (Supplementary Figure 3, pasted below for ease of reading). The new visualization clearly shows that yohimbine injection triggers rapid behavioral effects already in the first three minutes, whereas the LC-DREADD activation triggers behavioral changes within 3-6 minutes after injection. Thus, clear drug effects are visible in the first 10 minutes, which is comparable to the standard OFT test (10min testing) shown in response to swim stress exposure (Suppl. Figure 3a). The choice to expose mice to the OFT for 21 minutes in total was due to the fact that we based our experimental approach on the optogenetic LC-stimulation protocol first published by McCall and colleagues (McCall et al, Neuron, 2015), in which the LC is stimulated for 3 min followed by 3 min pauses (see Suppl. Figure 3d). Because of this on-off design, we decided to keep the optogenetic analysis simple and show the overall effect (Supplementary Figure 3d), particularly as we know that NA dynamics do not recover rapidly enough after 3 min continuous stimulation to justify a bin-analysis (unpublished data).

      Author response image 3.

      Effects of acute stress and noradrenergic stimulation on anxiety-like behaviour in the open field test. a, Stress-induced changes in the open field test 45 min after stress onset. Stressed animals show overall reductions in distance traveled (unpaired t-test; t=3.55, df=22, p=0.0018), time in center (welch unpaired t-test; t=3.50, df=13.61, p=0.0036), supported rears (unpaired t-test; t=3.39, df=22, p=0.0026) and unsupported rears (unpaired t-test; t=5.53, df=22, p = 1.47e-05) compared to controls (Control n = 12; Stress n = 12). This data have been previously published (von Ziegler et al., 2022). b, Yohimbine (3 mg/kg, i.p.) injected animals show reduced distance traveled (unpaired t-test; t=2.39, df=10, p=0.03772), reduced supported rears (unpaired t-test; t=6.56, df=10, p=0.00006) and reduced unsupported rears (welch unpaired t-test; t=3.69, df=4.4, p = 0.01785) compared to vehicle injected animals (Vehicle n = 6; Yohimbine n = 7). c, Chemogenetic LC activation induced changes in the open field test immediately after clozapine (0.03 mg/kg, i.p.) injection. hM3Dq+ animals show reduced distance traveled (unpaired t-test; t=6.28, df=13, p=0.00003), reduced supported rears (unpaired t-test; t=4.28, df=13, p=0.0009), as well as reduced unsupported rears (welch unpaired t-test; t=4.28, df=13, p = 0.00437) compared to hM3D- animals (hM3Dq- n = 7; hM3Dq+ n = 8). d, Optogenetic 5 Hz LC activation induced changes during the open field test. ChR2+ animals show reduced supported rears (unpaired t-test; t=2.42, df=64, p=0.0185) and reduced unsupported rears (unpaired ttest; t=2.91, df=64, p = 0.00499) compared to ChR2- animals (ChR2- n = 32; ChR2+ n = 36). Data expressed as mean ± SEM. p < 0.05, p < 0.01, p < 0.001, **p < 0.0001.

      Comment 9: The study shows that activation of noradrenergic hippocampus-projecting LC neurons is sufficient to regulate the expression of several hippocampal genes. I believe the study would have benefited of more selective necessity experiments. Authors might consider adding optogenetic (or chemogenetic) experiments aimed at inhibiting LC-NA hippocampal projections during stress exposure (or, alternatively, perform intrahippocampal pharmacological blockade of β-adrenoreceptors during stress exposure), and determine the effects on gene expression.

      Response: We kindly refer the reviewer to our previous response to Comment #2 above.

      Minor concerns:

      There is a typo in the abstract. Please correct "LN-NA" with "LC-NA"

      Response: Thank you, we have corrected it.

      References

      Bangasser, D. A., Eck, S. R., & Ordoñes Sanchez, E. (1/2019). Sex differences in stress reactivity in arousal and attention systems. Neuropsychopharmacology: Official Publication of the American College of Neuropsychopharmacology, 44(1), 129–139.

      Bangasser, D. A., Wiersielis, K. R., & Khantsis, S. (06/2016). Sex differences in the locus coeruleusnorepinephrine system and its regulation by stress. Brain Research, 1641, 177–188.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Mice can learn to associate sensory cues (sound and light) with a reward or activation of dopamine neurons in the ventral tegmental area (VTA), and then anticipate the reward from the sensory cue only. Using this paradigm, Harada et al. showed that after learning, the cue is able to induce dopamine release in the projection targets of the VTA, namely the nucleus accumbens and lateral hypothalamus (LH). Within the LH, dopamine release from VTA neurons (either by presentation of the cue or direct optical stimulation of VTA neurons) activates orexin neurons, measured as an increase in intracellular calcium levels.

      Strengths:

      This study utilized genetically encoded optical tools to selectively stimulate dopamine neurons and to monitor dopamine release in target brain areas and the calcium response of orexin neurons. This allowed a direct assessment of the relationship between the behavioral response of the animals, the release of a key neurotransmitter in select brain areas, and its effect on target cells, with a precision previously not possible. The results shed light on the mechanism underlying reward-related learning and expectation.

      Weaknesses: - The Ca increase in orexin neurons in response to optical stimulation of VTA DA neurons is convincing. However, there is an accumulated body of literature indicating that dopamine inhibits orexin neurons through D2 receptors, particularly at high concentrations both directly and indirectly (PMID 15634779, 16611835, 26036709, 30462527; but note that synaptic effects at low conc are excitatory - PMID 30462527, 26036709). There should be a clear acknowledgment of these previous studies and a discussion directly addressing the discrepancy. Furthermore, there are in-vivo studies that investigated the role of dopamine in the LH involving orexin neurons in different behavioral contexts (e.g. PMID 24236888). The statement found in the introduction "whether and how dopamine release modulates orexin neuronal activity has not been investigated vigorously" (3rd para of Introduction) is an understatement of these previous reports.

      We thank the Reviewer for pointing out that we missed several important citations. We added the references mentioned and the discrepancy of concern is addressed in the discussion section

      • Along these lines, previous reports of concentration-dependent bidirectional dopaminergic modulation of orexin neurons suggest that high and low levels of DA would affect orexin neurons differently. Is there any way to estimate the local concentration of DA released by the laser stimulation protocol used in this study? Could there be a dose dependency in the Intensity of laser stimulation and orexin neuron response?

      We agree that this is an interesting point. However, one limitation of our study, and of intensity-based genetically-encoded sensors in general, is that the estimation of the concentration is technically difficult. The sensor effectively reports changes in extra-synaptic levels of neurotransmitters, but to get the absolute value other modalities would be needed such as fast scan voltammetry. This limitation is now included in the discussion section.

      • The transient dip in DA signal during omission sessions in Fig2C (approx 1% decrease from baseline) is similar in amplitude compared to the decrease seen in non-laser trails shown in Fig 1C right panel (although the time course of the latter is unknown as the data is truncated). The authors should clarify whether those dips are a direct effect of the cue itself or indeed reward prediction error.

      Thanks for raising this important point. Indeed, there is a dip of the signal during non-stimulation trials. At day 1, the delivery of the cue triggered a dip and at day 10, there was a slight increase of the signal and followed by the dip. The data is difficult to interpret but our hypothesis is that two components trigger this dip of the signal. One is the aversiveness of the cue. Because a relatively loud sound (90dB) was used for the cue, it would not be surprising if the auditory cue was slightly aversive to the experimental animals. It has been shown that aversive stimuli induce a dip of dopamine in the NAc, although it is specific to NAc subregions. The second component is reward prediction error. Although the non-laser paired cue never triggered the laser stimulation, it is similar to the laser paired one. In a way both are composed of loud tone and same color of the visual cue (spatially different). We think it is possible that reward-related neuronal circuit was slightly activated by the non-laser paired cue. In line with this interpretation, a small increase of the signal was observed at day 10 but not day 1. If our hypothesis is true, since this signal was induced by two components, further analysis is unfortunately difficult.

      • There seem to be orexin-negative-GCaMP6 positive cells (Fig. 4B), suggesting that not all cells were phenotypically orexin+ at the time of imaging.<br /> The proportion of GCaMP6 cells that were ORX+ or negative and whether they responded differently to the stimuli should be indicated.

      While we acknowledge the observation of orexin-negative-GCaMP6 positive cells in Figure 4B, it's important to note that this phenomenon is consistent with the characteristics of the hOX-GCaMP virus used in prior experiments. The virus has undergone thorough characterization, and it has been reported to exhibit over 90% specificity, as demonstrated in prior work conducted in the laboratory of one of our contributing authors (PMID: 27546579). To address the concern raised by the reviewer, we have included Supplemental Figure 4 confirming that all mice consistently exhibited qualitatively similar hOX-GCaMP transients upon dopaminergic terminal stimulation. This additional evidence supports the reliability and specificity of our experimental approach.

      • Laser stimulation of DA neurons at the level of cell bodies (in VTA) induces an increase in DA release within the LH (Fig. 3C, D), however, there is no corresponding Ca signal in orexin neurons (Fig.4C).

      We realized that the figures were not clear and we understood that the reviewer did not see any corresponding Ca signal, but this description is not true. We now added Supplemental Figure 3 to show that there is Ca signal at day 1 already.

      In contrast, stimulating DA terminals within the LH induces a robust, long-lasting Ca signal (> 30s) in orexin neurons (Fig. 5). The initial peak is blocked by raclopride but the majority of Ca signal is insensitive to DA antagonists (please add a positive control or cite references indicating that the dose of antagonists used was sufficient; also the timing of antagonist administration should be indicated).

      This is now included in the discussion section. Also, the timing and dose of the antagonist is now described in the method section.

      Taken together, these results seem to suggest that DA does not directly increase Ca signal in orexin neurons. What could be mediating the remaining component?

      This point has been included in the discussion section.

      • Similarly, there is an elevation of Ca signal in orexin neurons that remains significantly higher after the cue/laser stimulation (Fig. 4F). It appears that it is this sustained component that is missing in omission trials. This can be analyzed further.

      It is true that there is a sustained component in stimulation trials, that is missing in omission trials. Most likely that is evoked by the stimulation of dopamine neurons. We argue that this component is isolated in Fig 5 and analyzed as much as we can.

      • Mice of both sexes were used in this study; it would be interesting to know whether sex differences were observed or not.

      We agree that this is an important point. However, our sample number is not high enough to make a meaningful comparison between male and female.

      Reviewer #2 (Public Review):

      Summary:

      This is an interesting and well-written study assessing the role of dopaminergic inputs from the VTA on orexin cell responses in an opto-pavlovian conditioning task. These data are consistent with a possible role of this system in reward expectation and are surprisingly one of the first demonstrations of a role for dopamine in this phenomenon.

      Strengths:

      The study has used an interesting opto-Pavlovian approach combined with fibre photometry.

      Weaknesses:

      It is unclear what n size was used or analysed, particularly for AUC measures e.g. Figures 1 D/E and 3 G. The number of trials reflected and the animal numbers need clarification.

      The sample size is indicated in the legend section.

      The study focused on opto-stim omissions - this work would be significantly strengthened by a comparison to a real-world examination where animals are trained for a radiation reward (food pellet).

      We agree that this would be an important experiment. This experiment is partially done in one of the contributing authors laboratories (doi.org/10.1101/2022.04.13.488195) and would be one of our follow up study.

      Have the authors considered the role of orexin in the opposing situation i.e. a surprise addition of reward?

      That would be an interesting experiment. To do that, natural reward, not optical stimulation, should be used as a reinforcer. This could be part of our follow up study.

      Similarly, there remains some conjecture regarding the role of these systems in reward and aversion - have the authors considered aversive learning paradigms - fear, or fear extinction - to further explore the roles of this system? There are some (important) discussions about the possible role of orexin in negative reinforcement. Further studies to address this could be warranted.

      It is true that dopamine also plays a significant role in aversive learning. Therefore, this would be an interesting experiment. The discussion section now includes this point.

      I think some further discussion of the work by Lineman concerning the interesting bidirectional actions of d1/d2 r signalling on glutamatergic transmission onto orexin neurons is worthwhile. While this work is currently cited, the nuance and perhaps relevance to d1 and d2 signalling could be contextualised a little more (https://doi.org/10.1152/ajpregu.00150.2018).

      Thanks for the suggestion. The discussion has been expanded.

      Reviewer #3 (Public Review):

      Summary:

      Harada and colleagues describe an interesting set of experiments characterizing the relationship between dopamine cell activity in the ventral tegmental area (VTA) and orexin neuron activity in the lateral hypothalamus (LH). All experiments are conducted in the context of an opto-Pavlovian learning task, in which a cue predicts optogenetic stimulation of VTA dopamine neurons. With training, cues that predict DA stimulation come to elicit dopamine release in LH (a similar effect is seen in accumbens). After training, omission trials (cue followed by no laser) result in a dip (inhibition) of dopamine release in LH, characteristic of reward prediction error observed in the striatum. Across cue training, the activity pattern of orexin neurons in LH mirrors that of LH DA levels. However, unlike the DA signal, orexin neurons do not exhibit a decrease in activity in omission trials. Systemic blockade of D2 but not D1 receptors blocked DA release in LH following VTA DA cell stimulation.

      Strengths: Although much work has been dedicated to examining projections from orexin cells to VTA, less has been done to characterize reciprocal projections and their function. In this way, this paper is a very important addition to the literature. The experiments are technically sound (with some limitations, below) and utilize sophisticated approaches, the manuscript is nicely written, and the conclusions are mostly reasonable based on the data collected.

      Weaknesses:

      I believe the impact of the paper could be enhanced by considering and/or addressing the following:

      Major:

      • I encourage the authors to discuss in the Introduction previous work on DA regulation of orexin neurons. In particular, the authors cite, but do not describe in any detail, the very relevant Linehan paper (2019; Am J Physiol Regul) which shows that DA differentially alters excitatory/inhibitory input onto orexin neurons and that these actions are reversed by D1 vs D2 receptor antagonists. Another paper (Bubser, 2005, EJN) showed that dopamine agonists increase the activity of orexin neurons and that these effects are blocked by D1/D2 antagonists. The current findings should be discussed in the context of these (and any other relevant) papers in the Discussion, too.

      Thanks for the valuable suggestion. This point has been integrated and the introduction and discussion sections have been revised carefully.

      • In the Discussion, the authors provide two (plausible) explanations for why they did not observe a dip in the calcium signal of orexin neurons during omission trials. Is it not possible that these cells do not encode for this type of RPE?

      We completely agree that it is possible. Now our current hypothesis is that dopamine in the LH encodes RPE and that information is transmitted to orexin neurons. Orexin neurons integrate other information and encode something else, we call it ‘multiplexed cognitive information’. It is still open question what this means exactly. This point is now mentioned in the discussion section.

      • Related to the above - I am curious about the authors' thoughts on why there is such redundancy in the system. i.e. why is dopamine doing the same thing in NAC and LH in the context of cue-reward learning?

      Thank you for the question. This is an important point, indeed. Our current hypothesis is described in the discussion section.

      ’Our data indicate that dopamine in both the NAc and LH encodes reward prediction error (RPE). One open question is the existence of such a redundant mechanism. We hypothesize that dopamine in the LH boosts dopamine release via a positive feedback loop between the orexin and dopamine systems. It has already been established that some orexin neurons project to dopaminergic neurons in the VTA, positively modulating firing. On the other hand, our data indicate that dopamine in the LH stimulates orexinergic neurons. These collective findings suggest that when either the orexin or dopamine system is activated, the other system is also activated consequently. Although the current findings align with this idea, the hypothesis should be carefully challenged and scrutinized.’

      • The data, as they stand, are largely correlative and do not indicate that DA recruitment of orexin neurons is necessary for learning to occur. It would be compelling if blocking the orexin cell recruitment affected some behavioral outcomes of learning. Similarly - does raclopride treatment across training prevent learning?

      We appreciate the insightful comment. It is indeed a limitation of our study that we lack behavioral data. However, given the extensive previous research on the crucial role of orexin in motivated behavior, we argue that establishing dopaminergic regulation of the orexin system itself is a valuable contribution. This perspective is thoroughly discussed in the dedicated section of our paper. It's important to note that the injection of D2 antagonists, including raclopride, is known to induce significant sedation. Due to this sedative effect, combining behavioral experiments with these drugs poses considerable challenges.

      • Only single doses of SCH23390 and raclopride were used. How were these selected? It would be nice to use more of a dose range to show that 1) and effect of D1R blockade was not missed, and 2) that the reduction in orexin signal with raclopride was dose-dependent.

      The rationale of the dose has been added to the discussion session. It is reported that these doses block dopamine receptors. We agree that it would be nice to have a dose-response curve, we are reluctant to increase the doses to avoid adverse effect to the experimental animals. The doses we used effectively induced hypo-locomotion, although data is not shown.

      • Fig 1C, could the effect the authors observed be due to movement?

      We argue this is unlikely. We recorded two channels one for the control and the other one for the signal. The motion-related artifact is corrected based on the control channel. One example trace around the laser stimulation is shown below. Please note that a typical motion-related artifact is a fast dip of the signal, normally observed in both 405 and 465 nm channels.

      Relatedly, what was the behavior like when the cue was on? Did mice orient/approach the cue?

      Although it has been reported that rats approach the cue (PMID: 30038277) in a similar task, it was not obvious in our case. It could be because we used both visual and auditory cues. Mice showed a general increase of locomotion during the cue and the stimulation but the direction was not clear to the experimenter.

      Also, when does the learning about the cue occur? Does it take all 10 days of learning or does this learning/cue-induced increase in dopamine signaling occur in less than 10 days?

      It is hard to say when the learning occurs. When we look at the learning curve of Figures 1,3 and 4, it seems the response to the cue plateaus at day 5 but since we don’t have behavioral data, the assessment is relayed only on the neuronal signal.

      • Also related to the above, could the observed dopamine signal be a result of just the laser turning on? It would seem important to include mice with a control sensor.

      We recorded two channels, 405 nm and 465 nm wavelength. 405 nm signal did not show increase of the signal while 465 nm signal did. The example trace is shown. Besides, the sensor has been characterized by the corresponding author already so we argue that this is unlikely.

      Author response image 1.

      Fig 1E, the effect seems to be driven by one mouse which looks like it could be a statistical outlier. The inclusion of additional animals would make these data more compelling.

      We agree that adding more mice would make data more compelling. However, considering the fact that dopamine in the accumbens has been investigated vigorously and our data is in line with the prior studies, we argue that we have enough data to claim our conclusion.

      • For Fig 1C, 3D, 3F, and 4D, could the authors please show the traces for the entire length of laser onset? It would be helpful to see both the rise and the fall of dopamine signals.

      For Fig 1C, one panel has been added. For fig 3, 4, supplemental figure was created to show the signal around laser stimulation.

      • Fig 2C, could the authors comment on how they compared the AUC to baseline? Was this comparison against zero? Because of natural hills and troughs during signals prior to cue (which may not equate to a zero), comparing the omission-induced dip to a zero may not be appropriate. A better baseline might be using the signals prior to the cue.

      The signal immediately before the cue onset was considered as a baseline, and baseline was subtracted. This means zero and baseline would be the same in our way of analysis.

      • Could the authors comment on how they came up with the 4-5.3s window to observe the AUC in Fig 3H?

      Since the kinetic of dopamine in the NAc and LH is different, different time windows have been used to observed a dip of dopamine. The analysis of the kinetics has been added.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific feedback to the authors

      • Sample size for each experiment/group could not be found.

      The sample size is now included in the legends.

      • In most figures, the timing of onset for the cue and laser stimulation is unclear. This makes the data interpretation difficult. They should be labeled as in Fig. 3C, for example.

      Panels have been updated to address this point.

      • Please provide the rationale for selecting the time range for the measurement of AUC for different experiments (e.g. Fig. 2C, 3H, 4A, 5F).

      The kinetics of dopamine in NAc and LH are different. This is now shown in the new Supplemental Figure 2. Based on this difference, the different window was chosen.

      • Fig. 1E, 3G right, 4E right: statistical analysis should use two-way repeated measures ANOVA rather than one-way ANOVA. Fig 1D, 3G left and 4E left panels can also be analyzed by two-way repeated measures ANOVA.

      We realized that those panels were redundant. Some panels have been removed and the analysis has been conducted according to this point.

      Minor comments:

      Fig. 2C can also show non-omission trials as a comparison.

      The panel has been updated.

      • The term "laser cue" is confusing, as the cue itself does not involve a laser.

      ’Laser-paired cue’ is used instead.

      • Color contrast can be improved for some figures, including Fig. 2C right, Fig. 3H right, and green and blue fluorescent fonts.

      The panels have been updated.

      • Figure legends: Tukey's test, rather than Tekey's test.

      This has been fixed.

      • There are some long-winded sentences that are hard to follow.

      Edited.

      • p.2, line 11 from bottom: should read ...the VTA evokes the release of dopamine.

      Edited

      • p.3, line 9: remove e from release.

      This has been addressed.

      Reviewer #3 (Recommendations For The Authors):

      Minor:

      • When discussing the understudied role of dopamine in brain regions other than the striatum in the Introduction, it might be helpful to cite this article: https://elifesciences.org/articles/81980 where the authors characterize dopamine in the bed nucleus of stria terminalis in associative behaviors and reward prediction error.

      The discussion session has been updated accordingly.

      • In the Discussion, it might be better to refrain from describing the results as 'measuring dopamine release' in the LH. Since there was no direct detection of dopamine release, rather a dopamine binding to the dLight receptors, referring to the detection as dopamine signaling/binding/transients is a better alternative.

      This point has been addressed.

      • In the Discussion, without measuring tonic dopamine release, it is difficult to say that there was a tonic dopamine release in the LH prior to negative RPE. In addition, I wouldn't describe the negative RPE as silencing of dopamine neurons projecting to the LH since this was not directly measured and it is hard to say for sure if the dip in dopamine is caused by silencing of the neurons. There certainly seems to be a reduction in extra-synaptic dopamine signaling in LH, however, what occurs upstream is unknown.

      We respectfully disagree with this point. In our opinion, the dopamine transient is more important than the firing of dopamine neurons because what matters for downstream neurons is dopamine concentration. For example, administration of cocaine increases the dopamine concentration extra-synaptically via blockade of DAT, while the firing of dopamine neurons go down via activation of D2 receptors expressed in dopamine neurons. Administration of cocaine is not known to induce negative RPE.

      • Typo at multiple places: 'Tekey's multiple comparison test'.

      This has been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      The experimental rigor and design of the noctural IOP experiments was weak with low n values and differing methods of IOP measurement (conscious versus anesthetized). The same method of IOP measurement needs to be used for all measurements to make any conclusions on the circadian patterns of IOP in each condition.

      One of the goals of our study was to confirm the results from the Patel et al (2021; PMID33853948) study, which in which nocturnal IOP measurements were conducted in anesthetized mice and diurnal IOP measurements in awake animals but we agree with both Reviewers that IOP should be measured under identical experimental conditions. Parenthetically, the number of animals per each treatment paradigm in the original version (N = 4) was sufficient to produce statistical significance for diurnal control vs diurnal TGFB, and diurnal control vs nocturnal control conditions.

      To address the comment, we generated an additional cohort of TGFb2-expressing mice (N = 6) in which nocturnal and diurnal measurements were performed in awake animals. The results are shown in the revised Figure 6. Similar to the anesthetized cohort, the diurnal IOP in Lv-TGFB2 mice was statistically indistinguishable from the nocturnal value, indicating that TGFB2-induced OHT is not additive to physiological (circadian) OHT. The TRPV4-dependence of ocular hypertension induced by physiological and pathological methods suggests that the channel functions as a final common mechanism for ocular hypertension.

      Reviewer #2 (Public review):

      Figure 1A-C. Often there is a difference between the massage (message?, op. authors) and transcript data. I recommend the authors to confirm with qPCR data with another mode of protein measurements.

      We are not sure we understand the Reviewer’s comment regarding the “difference between the message and transcript data” but note that the mRNA data shown in panels A & B are confirmatory of previously published transcriptomic and proteomic screens (eg, Fleenor et al., IOVS 2006; Bollinger et al., IOVS 2011;  Callaghan et al., Scientific Reports 2022; Li et al., Current Eye Research 2022 etc) and were included to show that the transcriptional response of canonical SMAD and pro-fibrotic genes unfolds as predicted from previous work. With regard to TRPV4 signaling, we expand transcriptomic data with protein analysis (Western blots) and functional analyses (measurements of TRPV4-mediated current and calcium imaging). Transcriptomic, protein expression, electrophysiological and imaging experiments revealed a remarkable consistency in TGFB2-dependence of gene (Fig. 1C) and protein expression (Fig. 1D), transmembrane current (Fig. 3C) and intracellular calcium (Fig. 2).

      Parenthetically, we attempted to get a sense for the TGFB2-dependence of Piezo1 protein expression by conducting Western blots with multiple antibodies and experimental conditions. These efforts were unsuccessful, presumably due to the complexity (30-40 TM domains) and large molecular weight (280-300 kDa) of the protein. We note, however, that Piezo1 signaling cannot account for the observed OHT given that studies by us and others  (Yarishkin et al., 2021, PMID: 33226641 and Zhu et al., 2021; PMID: 33532718) associated Piezo1 signaling with facility increases. The revised m/s reads: “The suppression of outflow facility by Piezo1 inhibitors applied under in vitro and in vivo conditions (39, 81) instead suggests that Piezo1 opposes the hypertensive functions of TRPV4.” The preprint by Redmon et al. (bioRxiv 2024, PMID 39041037) expands the TRPV4-dependence of OHT to microbead-induced, steroid-induced and nocturnal models of OHT to indicate that TRPV4 functions as a universal driver of elevated IOP.  We reiterate this in the revised Discussion.

      Does direct TRPV4 activation also induce the expression of these markers? Does inhibition of TRPV4, after TGF-β treatment, prevent the expression of these markers? Is TRPV4 acting downstream of this response?

      A RNASeq study conducted by us (Rudzitis et al., under review) suggests that the agonist GSK101 has minimal effect on the fibrotic and canonical pathways shown in panels A and B. These data are beyond the scope of the present study. They will be published elsewhere, however, we include the data associated with genes depicted in panels A and B for the reviewer at the end of this Response.

      We conducted an additional series of experiments to test whether TGFB2-induced upregulation of the TRPV4 and Piezo1 genes is itself TRPV4-dependent. As shown in the new SFig. 1, upregulation of the two genes is unaffected by TRPV4 inhibition.

      Figure 1D. Beta tubulin is not a membrane marker. Having staining of b tubulin in membrane fraction shows contamination from the cytoplasm. Does the overall expression also increase?

      b-tubulin associates with the plasma membrane by binding to integral membrane proteins in the plasma and organellar membranes through palmitoylation and attachment to linker proteins and as an integral component of exocytotic vesicles (Wolff, BBA 2009; Hogerheide et al., PNAS 2017). The protein is often used as a loading control for the TRPV4 protein (please see https://www.cellsignal.com/products/primary-antibodies/trpv4-antibody/65893; Grove et al., Science Signaling 2019 and Moore et al., PNAS 2013).  Parenthetically, our RNASeq studies did not find modulation of b-tubulin expression by TGFβ2 [CNR and DK, unpublished observations].

      We examined the overall (cytosolic and membrane) TRPV4 expression and observed, similarly to the membrane fraction alone (Figure 2), upregulation following cytokine stimulation:

      Author response image 1.

      Western blot, total protection extract from control and TGFb2-treated TM cells [Alomone antibody].

      These results in our estimation do not add to the overall narrative and were not included into the paper.

      Figure 4A: it is not very clear. I recommend including a zoom image or better resolution image.

      We include a whole-page image as the new SFigure 4.

      Figure 5B and 6B. Why there is a difference between groups in pre-injection panel. As Figure 5A, in pre-injection, there is no difference between LV-TGFβ and LV-control while in 5B there is a significant difference between these groups.

      We revised Figure Legends to clarify that “pre-injection” in Figures 5B and 6B refers to IOP measurements before the intracameral injection of HC-06  not pre-injection of lentiviral constructs.

      Discussion section. Line 279: "TRPV4 channels in cells treated with TGFβ2 are likely to be constitutively active" ... needs to be discussed further.

      We rewrote the paragraph to clarify that TRPV4 is a thermosensitive channel that is expected to be constitutively active at the incubator temperature:

      “The effectiveness of TRPV4 inhibition in suppressing TGFB2-induced contractility (Fig. 4) is consistent with constitutive activation of TRPV4 channels in incubator-cultured cells.  TRPV4 is a thermosensitive channel (Q10 ~10). Mouse TRPV4 is activated by physiological temperatures (Chung et al., 2003; Shibasaki et al., 2007) with peak activation between ~34 - 37oC (Guler et al., 2003). The several-fold increase in functional expression of the channel in TGFB2-treated cells (Fig. 2) would be expected to promote tonic influx of Ca2+ and Ca2+-dependent cellular signaling. The abrogation of the contractile response in the presence of HC-06 indicates that TRPV4-mediated Ca2+ influx represents the principal source of calcium that drives the contractile response. Consistent with this, supplementation with the agonist GSK101 was sufficient to evoke TM contraction (Fig. 4B).”

      Line 280: "The residual contractility in HC-06-treated cells may reflect TGFβ2-mediated contributions from Piezo1." Piezo1 has a low threshold for mechanosensitivity. How do the authors discuss the observation that, in the presence of Piezo1, TRPV4 has a more prominent mechanosensory function? Is this tied to TGFβ signalling?

      This is an interesting question. Our macroscopic and single channel recordings of Piezo1 activity in TM cells recapitulate the time course published in the original Coste et al. (2010) study, showing the channel inactivates within 10-100 msec (Yarishkin et al., 2021). Thus, it is likely that the channel is largely inactivated during chronic ocular hypertension. Indeed, it has been suggested that resting membrane tension alone may be sufficient to inactivate Piezo1 (Lewis and Grandl, 2015), with cells grown on stiff substrates (e.g., under our experimental conditions) experiencing almost complete Piezo1 inactivation. We propose that the primary function of Piezo channels may be to sense and transduce transient mechanical loading. The remarkable IOP-lowering effectiveness of TRPV4 antagonists and knockdown indicates that - in contrast to Piezo1 - TRPV4 activation is sustained.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The complete strain name for the Trpv4-/- mice are missing.

      Corrected.

      The layout for Figure 6 is confusing as HC-06 was only used in panels B and C but the labels are above panel A.

      Corrected.

      Reviewer #2 (Recommendations for the authors):

      Only two mice were used for the noctural IOP experiments. Justification for retreating the same mice in opposite eyes and counting it as n=4 is not rigorous or justified.

      The number of mice investigated in the original submission was four. In Week 1, two mice underwent PBS injections and 2 two mice were treated with HC-06. After the baseline was re-established in Week 2, the treatments were reversed.

      We supplemented these numbers with an additional cohort of 6 mice, with identical results re: nocturnal vs diurnal IOP. These data are presented in the revised Figure 6.

      Why are daytime IOPs measured in awake mice but noctural IOP's measured in isoflurane anesthetized mice? Anesthesia is well known to effect IOP and using two different methods could alter the results, especially when comparing between the groups. This could be why you did not see a noctural rise in the TGFB injected eyes. The same method needs to be used for all measurements to make any conclusions on the circadian patterns of IOP in each condition.

      This is a good point, please see our response above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      Building on their own prior work, the authors present valuable findings that add to our understanding of cortical astrocytes, which respond to synaptic activity with calcium release in subcellular domains that can proceed to larger calcium waves. The proposed concept of a spatial "threshold" is based on solid evidence from in vivo and ex vivo imaging data and the use of mutant mice. However, details of the specific threshold should be taken with caution and appear incomplete unless supported by additional experiments with higher resolution in space and time.

      We thank the reviewers and editors for the positive assessment of our work as containing valuable findings that add to our understanding of cortical astrocytes. We also appreciate their positive appraisal of the proposed concept of a spatial threshold supported by solid evidence. 

      Regarding their specific comments, we truly appreciate them because they have helped to clarify issues and to improve the study. Point-by-point responses to these comments are provided below. Regarding the general comment on the spatial and temporal resolution of our study, we would like to clarify that the spatial and temporal resolution used in the current study (i.e., 2 - 5 Hz framerate using a 25x objective with 1.7x digital zoom with pixels on the order of 1 µm2) is within the norm in the field, does not compromise the results, nor diminish the main conceptual advancement of the study, namely the existence of a spatial threshold for astrocyte calcium surge. 

      We respect the thoughtfulness of the reviewers and editors towards improving the paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      Lines et al., provide evidence for a sequence of events in vivo in adult anesthetized mice that begin with a footshock driving activation of neural projections into layer 2/3 somatosensory cortex, which in turn triggers a rise in calcium in astrocytes within "domains" of their "arbor". The authors segment the astrocyte morphology based on SR101 signal and show that the timing of "arbor" Ca2+ activation precedes somatic activation and that somatic activation only occurs if at least {greater than or equal to}22.6% of the total segmented astrocyte "arbor" area is active. Thus, the authors frame this {greater than or equal to}22.6% activation as a spatial property (spatial threshold) with certain temporal characteristics - i.e., must occur before soma and global activation. The authors then elaborate on this spatial threshold by providing evidence for its intrinsic nature - is not set by the level of neuronal stimulus and is dependent on whether IP3R2, which drives Ca2+ release from the endoplasmic reticulum (ER) in astrocytes, is expressed. Lastly, the authors suggest a potential physiologic role for this spatial threshold by showing ex vivo how exogenous activation of layer 2/3 astrocytes by ATP application can gate glutamate gliotransmission to layer 2/3 cortical neurons - with a strong correlation between the number of active astrocyte Ca2+ domains and the slow inward current (SIC) frequency recorded from nearby neurons as a readout of glutamatergic gliotransmission. This is interesting and would potentially be of great interest to readers within and outside the glia research community, especially in how the authors have tried to systematically deconstruct some of the steps underlying signal integration and propagation in astrocytes. Many of the conclusions posited by the authors are potentially important but we think their approach needs experimental/analytical refinement and elaboration.

      We thank the reviewer for her/his positive appraisal and comments that has helped us to improve the study. In response to their insights, we aim to address the key points raised below:

      (1) Sequence of Events: We acknowledge the reviewer's interest in our findings regarding the sequence of events. We have provided a more detailed description of the methods and results to clarify the spatiotemporal relationships between domain activation and spatiotemporal clustering, to centripetal and centrifugal calcium propagation in relation to soma activation.

      (2) Spatial Threshold: The reviewer accurately identifies our characterization of a spatial threshold (≥22.6% activation) with temporal characteristics as a crucial aspect of our study. We have expanded upon this concept by offering a clearer illustration of how this threshold relates to somatic and global activation.

      (3) Intrinsic Nature of Spatial Threshold: The reviewer's insightful observation regarding the inherent quality of the spatial threshold, regardless of its dependence on neuronal stimuli is noteworthy. We have provided additional details to substantiate this claim, shedding more light on the fundamental nature of this phenomenon.

      (4) Physiological Implications: The reviewer rightly highlights the potential physiological significance of our findings, particularly in relation to gliotransmission in cortical neurons. We have enhanced our discussion by elaborating on the implications of these observations.

      The primary issue for us, and which we would encourage the authors to address, relates to the low spatialtemporal resolution of their approach. This issue does not necessarily compromise the concept of a spatial threshold, but more refined observations and analyses are likely to provide more reliable quantitative parameters and a more comprehensive view of the mode of Ca2+ signal integration in astrocytes. 

      We agree with the reviewer that our spatial-temporal resolution (2 – 5 Hz framerate using a 25x objective and 1.7x digital zoom with pixels on the order of 1 µm) does not compromise the proposed concept of the existence of a spatial threshold for the intracellular calcium expansion.

      For this reason, and because their observations might be perceived as both a conceptual and numerical standard in the field, we believe that the authors should proceed with both experimental and analytical refinement. Notably, we have difficulty with the reported mean delays of astrocyte Ca2+ elevations upon sensory stimulation. The 11s delay for response onset in "arbor" and 13s in the soma are extremely long, and we do not think they represent a true physiologic latency for astrocyte responses to the sensory activity. Indeed, such delays appear to be slower even than those reported in the initial studies of sensory stimulation in anesthetized mice with limited spatial-temporal resolution (Wang et al. Nat Neurosci., 2006) - not to say of more recent and refined ones in awake mice (Stobart et al. Neuron, 2018) that identified even sub-second astrocyte Ca2+ responses, largely preserved in IP3R2KO mice. Thus, we are inclined to believe that the slowness of responses reported here is an indicator of experimental/analytical issues. There can be several explanations of such slowness that the authors may want to consider for improving their approach: (a) The authors apparently use low zoom imaging for acquiring signals from several astrocytes present in the FOV: do all of these astrocytes respond homogeneously in terms of delay from sensory stimulus? Perhaps some are faster responders than others and only this population is directly activated by the stimulus. Others could be slower in activation because they respond secondarily to stimuli. In this case, the authors could focus their analysis specifically on the "fast-responding population". (b) By focusing on individual astrocytes and using higher zoom, the authors could unmask more subtle Ca2+ elevations that precede those reported in the current manuscript. These signals have been reported to occur mainly in regions of the astrocyte that are GCaMP6-positive but SR101-negative and constitute a large percentage of its volume (Bindocci et al., 2017). By restricting analysis to the SR101-positive part of the astrocyte, the authors might miss the fastest components of the astrocyte Ca2+ response likely representing the primary signals triggered by synaptic activity. It would be important if they could identify such signals in their records, and establish if none/few/many of them propagate to the SR-101-positive part of the astrocyte. In other words, if there is only a single spatial threshold, the one the authors reported, or two or more of them along the path of signal propagation towards the cell soma that leads eventually to the transformation of the signal into a global astrocyte Ca2+ surge. 

      We thank the reviewer for these excellent and important comments. The qualm with the mean delays of astrocyte activation is indeed a result of averaging together astrocyte responses to a 20 second stimulus. Indeed, astrocyte responses are heterogeneous and many astrocytes respond much quicker, as can be seen in example traces in Figs. 1D, 1G, and 3C. Indeed, with any biological system variability exists, however here we take the averaged responses in order to identify a general property of astrocyte calcium dynamics: the existence of the concept of a spatial threshold for astrocyte calcium surge. We have now included a paragraph in the Discussion section on this subject on P15, L16-22:

      “We were able to discover this general phenomenon of astrocyte physiology through the use of a novel computational tool that allowed us to combine almost 1000 astrocyte responses. Variation is rife in biological systems, and there are sure to be eccentricities within astrocyte calcium responses. Here, we focused on grouped data to better understand what appears to be an intrinsic property of astrocyte physiology. We used different statistical examinations and tested our hypothesis in vivo and in situ, and all these methods together provide a more complete picture of the existence of a spatial threshold for astrocyte calcium surge.“

      The specialized work of Stobart et al. 2018, was focused more on the fast activation of microdomain subpopulations than the induction of later somatic activation. Indeed, Stobart et al. 2018 and Wang et al. 2006 also found that somatic responses of astrocytes were delayed in the range of seconds. Importantly, Wang et al., 2006 describe that the activation of astrocytes is frequency dependent, that is, the higher the frequency, the faster and higher the activation. In the present, work we stimulated at just 2 Hz to better investigate the spatial threshold. Excitingly, the results showed by Stobart et al., 2018 agree with ours, Rupprecht et al. 2024 and Fedotova et al. 2023, that there is a sequence of activation from the domains to the somas, which could be due to the time that is required for the summation of the initial microdomain signal to reach a threshold capable to activate the soma. These above referenced studies have many similarities with our own but are different in the underlying scientific question that led to diverging methodology, however we want to stress that we agree with the reviewers that our methods provide sufficient evidence for the cell-scale scientific phenomenon that we are studying, which is the spatial threshold for astrocyte calcium surge. Finally, we have included an additional figure (new Figure 5) that only looks at the calcium dynamics of early responding cells and found no significant difference in the spatial threshold in this population compared to our original quantification.

      In this context, there is another concept that we encourage the authors to better clarify: whether the spatial threshold that they describe is constituted by the enlargement of a continuous wavefront of Ca2+ elevation, e.g. in a single process, that eventually reaches 22.6% of the segmented astrocyte, or can it also be

      constituted by several distinct Ca2+ elevations occurring in separate domains of the arbor, but overall totaling 22.6% of the segmented surface? Mechanistically, the latter would suggest the presence of a general excitability threshold of the astrocyte, whereas the former would identify a driving force threshold for the centripetal wavefront. In light of the above points, we think the authors should use caution in presenting and interpreting the experiments in which they use SIC as a readout. Their results might lead some readers to bluntly interpret the 22.6% spatial threshold as the threshold required for the astrocyte to evoke gliotransmitter release. Indeed, SIC are robust signals recorded somatically from a single neuron and likely integrate activation of many synapses all belonging to that neuron. On the other hand, an astrocyte impinges in a myriad of synapses belonging to several distinct neurons. In our opinion, it is quite possible that more local gliotransmission occurs at lower Ca2+ signal thresholds (see above) that may not be efficiently detected by using SIC as a readout; a more sensitive approach, such as the use of a gliotransmitter sensor expressed all along the astrocyte plasma-membrane could be tested to this aim.  

      The reviewer raised an excellent point. Whether the spatial threshold of 22.6% occur in the segmented astrocyte or may be reached occurring in separate domains of the arbor, is an important question and we address this by the inclusion of a novel analysis shown in the new figure (new Figure 5) in the revised version of the manuscript. In this new analysis, we demonstrate that the average distance between domain activation is not significantly different between subthreshold activity and the activity that precedes or follows the suprathreshold cellular activation. In contrast, we do find a significant difference in the average time between domain activation between subthreshold activity and activity that precedes and follows suprathreshold activation. We go further with a generalized linear model to show that percent area of active domains and temporal clustering is related to soma activation and not spatial clustering. This suggests that domain activation doesn’t need to be spatially clustered together to induce soma activation and subsequent calcium surge, but more importantly, domain activation must be over the spatial threshold and occur within a timeframe. This has been added to the Results on P10, L2-40:

      “Our results demonstrate the relationship between the percentage of active domains and soma activation and subsequent calcium surge. Next, we were interested in the spatiotemporal properties of domain activity leading up to and during calcium surge. Because we imaged groups of astrocytes, we were able to constrain our analyses to fast responders (onset < median population onset) in order to evaluate astrocytes that were more likely to respond to neuronal-evoked sensory stimulation and not nearby astrocyte activation (Figure 5A). In this population the spatial threshold was 23.8% within the 95% confidence intervals of [21.2%, 24.0%]. First, we created temporal maps, where each domain is labeled as its onset relative to soma activation, of individual astrocyte calcium responses to study the spatiotemporal profile of astrocyte calcium surge (Bindocci et al., 2017; Rupprecht et al., 2024) (Figure 5B). Using temporal maps, we quantified the spatial clustering of responding domains by measuring the average distance between active domains. We found that the average distance between active domains in subthreshold astrocyte responses were not significantly different from pre-soma suprathreshold activity (16.3 ± 0.4 µm in No-soma cells versus 16.2 ± 0.3 µm in Pre-soma cells, p = 0.75; n = 286 No-soma vs n = 326 Pre-soma, 30 populations and 3 animals; Figure 5C). Following soma activation, astrocyte calcium surge was marked with no significant change in the average distance between active domains (16.0 ± 0.3 µm in Post-soma cells versus 16.3 ± 0.4 µm in No-soma cells, p = 0.57 and 16.2 ± 0.3 µm in Presoma cells, p = 0.31; n = 326 soma active and n = 286 no soma active, 30 populations and 3 animals; Figure 5C). Taken together this suggests that on average domain activation happens in a nonlocal fashion that may illustrate the underlying nonlocal activation of nearby synaptic activity. Next, we interrogated the temporal patterning of domain activation by quantifying the average time between domain responses, and found that the average time between domain responses was significantly decreased in pre-soma suprathreshold activity compared to subthreshold activities without subsequent soma activation (9.4 ± 0.3 s in No-soma cells versus 4.4 ± 0.2 s in Pre-soma cells, p < 0.001; n = 326 soma active vs n = 286 not soma active, 30 populations and 3 animals; Figure 5D). The average time between domain activation was even less after the soma became active during calcium surge (2.1 ± 0.1 s in Post-soma versus 9.4 ± 0.3 s in No-Soma cells, p < 0.001 and 4.4 ± 0.1 s in Pre-soma cells, p < 0.001; n = 326 soma active and n = 286 not soma active, 30 populations and 3 animals; Figure 5D). This corroborates our findings in Figure S2 and highlights the difference in temporal profiles between subthreshold activity and astrocyte calcium surge. 

      We then tested the contribution of each of our three variables describing domain activation (percent area, average distance and time) to elicit soma activation by creating a general linear model. We found that overall, there was a significant relationship between these variables and the soma response (p = 5.5e-114), with the percent area having the largest effect (p = 3.5e-70) followed by the average time (p = 3.6e-7), and average distance having no significant effect (p = 0.12). Taken together this suggests that the overall spatial clustering of active domains has no effect on soma activation, and the percent area of active domains within a constrained time window having the largest effect.”

      Regarding comments on SIC, we fully agree with the reviewer. In the revised version of the manuscript, we have included text in the discussion to ensure the correct interpretation of the results, i.e., the observed 22.6% spatial threshold for the SIC does not necessarily indicate an intrinsic property of gliotransmitter release; rather, since SICs have been shown to be calcium-dependent, it is not surprising that their presence, monitored at the whole-cell soma, matches the threshold for the intracellular calcium extension. We have added to the Discussion P16, L15-30:

      “Astrocyte calcium activity induces multiple downstream signaling cascades, such as the release of gliotransmitters (Araque et al., 2014; de Ceglia et al., 2023). Using patch-clamp recordings of a single nearby neuron we showed that a nearby population of astrocyte calcium surge is also correlated to the increase in slow inward currents (SICs), previously demonstrated to be dependent on astrocytic vesicular release of glutamate (Araque et al., 2000; Durkee et al., 2019; Fellin et al., 2004). The increase of SICs we observed from patching a single neuron is likely the integration of gliotransmitter release onto synapses from a group of nearby astrocytes. Indeed, subthreshold astrocyte calcium increases alone can trigger activity in contacted dendrites (Di Castro et al., 2011). An exciting avenue of future research would be to observe the impact of a single astrocyte calcium surge on nearby neurons (Refaeli and Goshen, 2022). How many neurons would be affected, and would this singular event be observable through patch clamp from a single neuron? The output of astrocyte calcium surge is equally important to network communication as the labeling of astrocyte calcium surge, as it identifies a biologically relevant effect onto nearby neurons. Many downstream signaling mechanisms may be activated following astrocyte calcium surge, and the effect of locally concentrated domain activity vs astrocyte calcium surge should be studied further on different astrocyte outputs.”

      Additional considerations are that the authors propose an event sequence as follows: stimulus - synaptic drive to L2/3 - arbor activation - spatial threshold - soma activation - post soma activation - gliotransmission. This seems reminiscent of the sequence underlying neuronal spike propagation - from dendrite to soma to axon, and the resulting vesicular release. However, there is no consensus within the glial field about an analogous framework for astrocytes. Thus, "arbor activation", "soma activation", and "post soma activation" are not established `terms-of-art´. Similarly, the way the authors use the term "domain" contrasts with how others have (Agarwal et al., 2017; Shigetomi et al., 2013; Di Castro et al., 2011; Grosche et al., 1999) and may produce some confusion. The authors could adopt a more flexible nomenclature or clarify that their terms do not have a defined structural-functional basis, being just constructs that they justifiably adapted to deal with the spatial complexity of astrocytes in line with their past studies (Lines et al., 2020; Lines et al., 2021).

      We agree there is no consensus within the glial field about this event sequence. One major difference between this sequence of events and neuronal spike propagation is directionality from dendrite to soma to axon. It is unknown whether directionality of the calcium signal exists in astrocytes. However, our finding in Figure 5E suggests a directionality of centripetal propagation from the arborization to the soma to elicit calcium surge that leads to centrifugal propagation. In the Results on P10-11, L41-8:

      “Recent work studying astrocyte integration has suggested a centripetal model of astrocyte calcium, where more distal regions of the astrocyte arborization become active initially and activation flows towards the soma (Fedotova et al., 2023; Rupprecht et al., 2024). Here, we confirm this finding, where activated domains located distal from the soma respond sooner than domains more proximal to the soma (linear correlation: p < 0.05, R2 = 0.67; n = 30 populations, 3 animals; Figure 4E). Next, we build upon this result to also demonstrate that following soma activation, astrocyte calcium surge propagates outward in a centrifugal pattern, where domains proximal to the soma become active prior to distal domains (linear correlation: p < 0.01, R2 = 0.89; n = 30 populations, 3 animals; Figure 4E). Together these results detail that intracellular astrocyte calcium follows a centripetal model until the soma is activated leading to a calcium surge that flows centrifugally. This suggests that astrocytes have the capabilities to integrate the nearby local synaptic population, and if this activity exceeds the spatial threshold then it leads to a whole-cell response that spreads outward.” 

      And in the Discussion P15, L3-15:

      “Close examinations of the calcium surge uncovered distinct propagations whether before or after soma activation. Firstly, our analysis found that temporal clustering changed before and after calcium surge, with both being above subthreshold activity, and that this characteristic was absent when assessing spatial clustering. When comparing the percent area, spatial and temporal clustering of active domains using a GLM, we found that the percent area was the most significant parameter describing a threshold to soma activation. We then compared the delay of domain activation and its distance from the soma, and recreated previous results that suggest a centripetal model of astrocytic calcium responses from the distal arborizations to the soma (Fedotova et al., 2023; Rupprecht et al., 2023). Here, we went a step further and discovered that soma activation switches this directionality for astrocytic calcium surge to propagate outward in a centrifugal manner away from the soma. Taken together, these results demonstrate the integrative potential of astrocyte calcium responses and characterize further the astrocyte calcium surge to relay this other parts of the astrocyte.”

      The term “microdomain” is used in the references above to define distal subcellular domains in contact with synapses, and in order to dissociate from this term we adopt the nomenclature “domain” to define all subcellular domains in the astrocyte arborization. These items have been discussed and clarified in the revised version of the manuscript on P5, L17-19:

      “The concept of domain to define all subcellular domains in the astrocyte arborization should not be confused with the concept of microdomain, that usually refers to the distal subcellular domains in contact with synapses.”

      Our previous points suggest that the paper would be significantly strengthened by new experimental observations focusing on single astrocytes and using acquisitions at higher spatial and temporal resolution. If the authors will not pursue this option, we encourage them to at least improve their analysis, and at the same time recognize in the text some limitations of their experimental approach as discussed above. We indicate here several levels of possible analytical refinement.

      We believe our spatial (25x objective and 1.7x digital zoom with pixels on the order of 1µm) and temporal (2 – 5 Hz framerate) resolution is within the range used in the glial field. In any case the existence of a spatial threshold for astrocyte calcium surge is not compromised with the use of this imaging resolution.

      The first relates to the selection of astrocytes being analyzed, and the need to focus on a much narrower subpopulation than (for example) 987 astrocytes used for the core data. This selection would take into greater consideration the aspects of structure and latency. With the structural and latency-based criteria for selection, the number of astrocytes to analyze might be reduced by 10-fold or more, making our second analytical recommendation much more feasible.

      We agree that individual differences exist, however, establishing a general concept requires the sampling of many astrocytes. Nevertheless, we have included a new figure (new Figure 5) that analyzes early responders.

      For structure-based selection - Genetically-encoded Ca2+ indicators such as GCaMP6 are in principle expressed throughout an astrocyte, even in regions that are not labelled by SR101. Moreover, astrocytes form independent 3D territories, so one can safely assume that the GCaMP6 signal within an astrocyte volume belongs to that specific astrocyte (this is particularly evident if the neighboring astrocytes are GCaMP6negative). Therefore, authors could extend their analysis of Ca2+ signals in individual astrocytes to the regions that are SR101-negative and try to better integrate fast signals in their spatial threshold concept. Even if they decided to be conservative on their methods, and stick to the astrocyte segmentation based on the SR-101 signal, they should acknowledge that SR101 dye staining quality can vary considerably between individual astrocytes within a FOV - some astrocytes will have much greater structural visibility in the distal processes than others. This means that some astrocytes may have segmented domains extending more distally than others and we think that authors should privilege such astrocytes for analysis. However, cases like the representative astrocytes shown in Figure 4A or Figure S1B, have segmented domains localized only to proximal processes near the soma. Accordingly, given the reported timing differences between "arbor" and "soma" activation, one might expect there to be comparable timing differences between domains that are distal vs proximal to the soma as well. Fast signals in peripheral regions of astrocytes in contact with synapses are largely IP3R2-independent (Stobart et al., 2018). However, the quality of SR101 staining has implications for interpreting the IP3R2 KO data. There is evidence IP3R2 KO may preferentially impact activity near the soma (Srinivasan et al., 2015). Thus, astrocytes with insufficient staining - visible only in the soma and proximal domains - might show a biased effect for IP3R2 KO. While not necessarily disrupting the core conclusions made by the authors based on their analysis of SR101-segmented astrocytes, we think results would be strengthened if astrocytes with sufficient SR101 staining - i.e. more consistent with previous reports of L2/3 astrocyte area (Lanjakornsiripan et al., 2018) - were only included. This could be achieved by using max or cumulative projections of individual astrocytes in combination with SR101 staining to construct more holistic structural maps (Bindocci et al., 2017).

      We agree with the ideas concerning SR101, and indeed there could be variability in the origins of the astrocyte calcium signal. Astrocyte territory boundaries can be difficult to discern when both astrocytes express GCaMP6. Also, SR101-negative domains could encapsulate an area that is only partially that of astrocyte territory, including also extracellular space. Here we take a conservative approach to constrain ROIs to SR101positive astrocyte territory outlines without invading neighboring cells or extracellular space in order to reduce error in the estimate of a spatial threshold. The effect of IP3R2 KO preferentially impacting activity near the soma is interesting, and in line with our conclusions. We agree that the findings from SR101-negative pixels would not necessarily disrupt the core conclusions of the study, and the additional analysis suggested would further strengthen results. We have since included on the limitations of the study in the Discussion P15, L3137:

      “In this study, we chose to limit our examinations of calcium activity that was within the bounds determined by SR101 staining. Much work has shown that astrocyte territories are more akin to sponge-like morphology with small microdomains making up the end feet of their distal arborizations (Baldwin et al., 2024). Here, we took a conservative approach to not incorporate these fine morphological processes and only take SR101-postive pixels for analysis in order to reduce the possible error of including a neighboring astrocyte or extracellular space in our analyses. Much work can be done to extend these results.”

      For latency-based selection - The authors record calcium activity within a FOV containing at least 20+ astrocytes over a period of 60s, during which a 2Hz hindpaw stimulation at 2mA is applied for 20s. As discussed above, presumably some astrocytes in a FOV are the first to respond to the stimulus series, while others likely respond with longer latency to the stimulus. For the shorter-latency responders <3s, it is easier to attribute their calcium increases as "following the sensory information" projecting to L2/3. In other cases, when "arbor" responses occur at 10s or later, only after 20 stimulus events (at 2Hz), it is likely they are being activated by a more complex and recurrent circuit containing several rounds of neuron-glia crosstalk etc., which would be mechanistically distinct from astrocytes responding earlier. We suggest that authors focus more on the shorter latency response astrocytes, as they are more likely to have activity corresponding to the stimulus itself.

      We agree that different times of astrocyte calcium increases may be due to different mechanisms outside of the astrocyte. We believe the spatial threshold will be intrinsic to these external variables; yet we believe that longer latency responses are physiological and may carry important information to determining the astrocyte calcium responses. Indeed, we have performed the spatial threshold analysis on early responders (first half of responding cells), and found the spatial threshold in that population (23.8%) is within the 95% confidence interval [21.2%, 24.0%]. Additionally, the slow responders were also within the confidence interval (22.6%).

      The second level of analysis refinement we suggest relates specifically to the issue of propagation and timing for the activity within "arbor", "soma" and "post-soma". Currently, the authors use an ROI-based approach that segments the "arbor" into domains. We suggest that this approach could be supplemented by a more robust temporal analysis. This could for example involve starting with temporal maps that take pixels above a certain amplitude and plot their timing relative to the stimulus-onset, or (better) the first active pixel of the astrocyte. This type of approach has become increasingly used (Bindocci et al., 2017; Wang et al., 2019; Ruprecht et al., 2022) and we think its use can greatly help clarify both the proposed sequence and better characterize the spatial threshold. We think this analysis should specifically address several important points:

      We agree that the creation of temporal maps from our own data would be interesting, and we provide the results of the suggested analysis within the new figure (new Figure 5) in the revised version of the manuscript. In this analysis we show that subthreshold, pre-soma and post-soma dynamics are significantly different in time. These added results of including temporal maps strengthen our claim of a spatial threshold, by quantifying the distinct temporal and spatial dynamics of domain activation before and after the spatial threshold is met (i.e. soma activation), and highlights differences in subthreshold and suprathreshold activity.

      (1) Where/when does the astrocyte activation begin? Understanding the beginning is very important, particularly because another potential spatial threshold - preceding the one the authors describe in the paper - could gate the initial activation of more distal processes, as discussed above. This sequentially earlier spatial threshold could (for example) rely on microdomain interaction with synaptic elements and (in contrast) be IP3R2 independent (Srinivasan et al., 2015, Stobart et al., 2018). We would be interested to know whether, in a subset of astrocytes that meet the structure and latency criteria proposed above and can produce global activation, there is an initial local GCaMP6f response of a minimal size that must occur before propagation towards the soma begins. The data associated with varying stimulus parameters could potentially be useful here and reveal stimulus intensity/duration-dependent differences.

      This is a very important point. It is difficult to pinpoint the beginning of the signal, which is why we rely on the average of responses. The additional analysis we provide based on temporal maps (new Figure 5) shows a very interesting result in that there is no significant difference between the spatial clustering of, or average distance between, activated domains in subthreshold and pre-soma suprathreshold activity. This result, along with the General Linear Model, suggests that there is not another subcellular potential spatial threshold, as the activity is the same. Instead, the main difference between activity in the domains that leads to soma activation or not is the overall percentage of domains active and not necessarily how that spatial activity is organized. We have also added this point in the Discussion section to highlight the importance of this result. P15, L3-8:

      “Close examinations of the calcium surge uncovered distinct propagations whether before or after soma activation. Firstly, our analysis found that temporal clustering changed before and after calcium surge, with both being above subthreshold activity, and that this characteristic was absent when assessing spatial clustering. When comparing the percent area, spatial and temporal clustering of active domains using a GLM, we found that the percent area was the most significant parameter describing a threshold to soma activation.”

      (2) Whether the propagation in the authors' experimental model is centripetal? This is implied throughout the manuscript but never shown. We think establishing whether (or not) the calcium dynamics are centripetal is important because it would clarify whether spatially adjacent domains within the "arbor" need to be sequentially active before reaching the threshold and then reaching the soma. More broadly, visualizing propagation will help to better visualize summation, which is presumably how the threshold is first reached (and overcome).

      The alternative hypothesis of a general excitability threshold, as discussed above, would be challenged here and possibly rejected, thereby clarifying the nature of the Ca2+ process that needs to reach a threshold for further expansion to the soma and other parts of the astrocyte.

      We agree that our view is centripetal when considering activity leading up to soma activation. Indeed, we have found arborization activity precedes soma activity (Figure 3), soma activity appears to rely on the percent area of domain activity (Figure 4), and pre-soma domain activity comes online earlier in domains distal from the soma (new Figure 5). However, whether this is intrinsic or due to the fact that synapses are more likely to occur in the periphery requires further studies. Our new results in the new Figure 5 demonstrating that subthreshold activity has a spatial organization that is not significantly different than pre-soma activity in suprathreshold cases argues in favor of a general excitability threshold hypothesis. However, we do not see these hypotheses as mutually exclusive. Excitingly, we have also found that following soma activation, calcium surge appears to follow a centrifugal propagation. We have since added the topic of a centripetal-centrifugal experimental model to the Discussion P15, L8-15:

      “We then compared the delay of domain activation and its distance from the soma, and recreated previous results that suggest a centripetal model of astrocytic calcium responses from the distal arborizations to the soma (Fedotova et al., 2023; Rupprecht et al., 2024). Here, we went a step further and discovered that soma activation switches this directionality for astrocytic calcium surge to propagate outward in a centrifugal manner away from the soma. Taken together, these results demonstrate the integrative potential of astrocyte calcium responses and characterize further the astrocyte calcium surge to relay this other parts of the astrocyte.”

      (3) In complement to the previous point: we understand that the spatial threshold does not per se have a location, but is there some spatial logic underlying the organization of active domains before the soma response occurs? One can easily imagine multiple scenarios of sparse heterogeneous GCaMP6f signal distributions that correspond to {greater than or equal to}22.6% of the arborization, but that would not be expected to trigger soma activation. For example, the diagram in Figure 4C showing the astrocyte response to 2Hz stim (which lacks a soma response) underscores this point. It looks like it has {greater than or equal to}22.6% activation that is sparsely localized throughout the arborization. If an alternative spatial distribution for this activity occurred, such that it localized primarily to a specific process within the arbor, would it be more likely to trigger a soma response?

      This is an interesting point and our new spatiotemporal analysis found in the new figure (new Figure 5) aims to shed some light on this and is answered above. To our knowledge, there is no mechanism in astrocytes to impose directionality on calcium propagation, like rectifying voltage-gated sodium channels in neuronal voltage propagation. We found that the delay of domain activation compared to soma onset is significantly correlated to the distance from the soma (new Figure 5E). In addition, spatial clustering is not significantly different compared in pre-soma vs. non responders or post-soma. Together this suggests that centripetal propagation may be occurring throughout the entire cell and not in a local clustered way. Our findings also suggest that following soma activation astrocyte calcium surge follows a mostly centrifugal pattern (new Figure 5E).

      (4) Does "pre-soma" activation predict the location and onset time of "post-soma" activation? For example, are arbor domains that were part of the "pre-soma" response the first to exhibit GCaMP6f signal in the "post-soma" response?

      Please see above comments.

      Reviewer #2 (Public Review):

      Lines et al investigated the integration of calcium signals in astrocytes of the primary somatosensory cortex. Their goal was to better characterize the mechanisms that govern the spatial characteristics of calcium signals in astrocytes. In line with previous reports in the field, they found that most events originated and stayed localized within microdomains in distal astrocyte processes, occasionally coinciding with larger events in the soma, referred to as calcium surges. As a single astrocyte communicates with hundreds of thousands of synapses simultaneously, understanding the spatial integration of calcium signals in astrocytes and the mechanisms governing the latter is of tremendous importance to deepen our understanding of signal processing in the central nervous system. The authors thus aimed to unveil the properties governing the emergence of calcium surges. The main claim of this manuscript is that there would be a spatial threshold of ~23% of microdomain activation above which a calcium surge, i.e. a calcium signal that spreads to the soma, is observed. Although the study provides data that is highly valuable for the community, the conclusions of the current version of the manuscript seem a little too assertive and general compared with what can be deduced from the data and methods used.

      The major strength of this study is the experimental approach that allowed the authors to obtain numerous and informative calcium recordings in vivo in the somatosensory cortex in mice in response to sensory stimuli as well as in situ. Notably, they developed an interesting approach to modulating the number of active domains in peripheral astrocyte processes by varying the intensity of peripheral stimulation (its amplitude, frequency, or duration).

      We thank the reviewer for their kind and thoughtful review of our study.

      The major weakness of the manuscript is the method used to analyze and quantify calcium activity, which mostly relies on the analysis of averaged data and overlooks the variability of the signals measured. As a result, the main claims from the manuscript seem to be incompletely supported by the data. The choice of the use of a custom-made semi-automatic ROI-based calcium event detection algorithm rather than established state-of-the-art software, such as the event-based calcium event detection software AQuA (DOI: 10.1038/s41593-019-0492-2), is insufficiently discussed and may bias the analysis. Some references on this matter include: Semyanov et al, Nature Rev Neuro, 2020 (DOI: 10.1038/s41583-020-0361-8); Covelo et al 2022, J Mol Neurosci (DOI: 10.1007/s12031-022-02006-w) & Wang et al, 2019, Nat Neuroscience (DOI: 10.1038/s41593-019-0492-2). Moreover, the ROIs used to quantify calcium activity are based on structural imaging of astrocytes, which may not be functionally relevant.

      Unfortunately, there is no general consensus for calcium analysis in the astrocyte or neuronal field, and many groups use custom made software made in lab or custom software such as GECIquant, STARDUST, AQuA or AQuA2. While AQuA is an event-based calcium event detection software, it may be that not including inactive domains that are SR101 positive could underestimate the spatial threshold for calcium surge. Our data is not based on the functional events but is based on calcium with structural constraints within a single astrocyte. This is crucial to properly determine the ratio of active vs inactive pixels within a single astrocyte.

      For the reasons listed above, the manuscript would probably benefit from some rephrasing of the conclusions and a discussion highlighting the advantages and limitations of the methodological approach. The question investigated by this study is of great importance in the field of neuroscience as the mechanisms dictating the spatio-temporal properties of calcium signals in astrocytes are poorly characterized, yet are essential to understand their involvement in the modulation of signal integration within neural circuits.

      We thank the reviewer for their suggestions to benefit the conclusions and discussion. We have now included a paragraph outlining the limitations of the study in the Discussion P15, L23-37:

      “The investigation of the spatial threshold could be improved in the future in a number of ways. One being the use of state-of-the-art imaging in 3D(Bindocci et al., 2017). While the original publication using 3D imaging to study astrocyte physiology does not necessarily imply that there would be different calcium dynamics in one axis over another, the three-dimensional examination of the spatial threshold could refine the findings we present here. To better control the system, mice imaged here were under anesthesia, and this is a method that has been used to characterize many foundational physiological results in the field (Hubel and Wiesel, 1962; Mountcastle et al., 1957). However, assessing the spatial threshold in awake freely moving animals would be the next logical step. In this study, we chose to limit our examinations of calcium activity that was within the bounds determined by SR101 staining. Much work has shown that astrocyte territories are more akin to sponge-like morphology with small microdomains making up the end feet of their distal arborizations (Baldwin et al., 2024). Here, we took a conservative approach to not incorporate these fine morphological processes and only take SR101-postive pixels for analysis in order to reduce the possible error of including a neighboring astrocyte or extracellular space in our analyses. Much work can be done to extend these results.”

      Reviewer #3 (Public Review):

      Summary:

      The study aims to elucidate the spatial dynamics of subcellular astrocytic calcium signaling. Specifically, they elucidate how subdomain activity above a certain spatial threshold (~23% of domains being active) heralds a calcium surge that also affects the astrocytic soma. Moreover, they demonstrate that processes on average are included earlier than the soma and that IP3R2 is necessary for calcium surges to occur. Finally, they associate calcium surges with slow inward currents. Strengths:

      The study addresses an interesting topic that is only partially understood. The study uses multiple methods including in vivo two-photon microscopy, acute brain slices, electrophysiology, pharmacology, and knockout models. The conclusions are strengthened by the same findings in both in vivo anesthetized mice and in brain slices.

      We thank the reviewer for the positive assessment of the study and his/her comments.

      Weaknesses:

      The method that has been used to quantify astrocytic calcium signals only analyzes what seems to be a small proportion of the total astrocytic domain on the example micrographs, where a structure is visible in the SR101 channel (see for instance Reeves et al. J. Neurosci. 2011, demonstrating to what extent SR101 outlines an astrocyte). This would potentially heavily bias the results: from the example illustrations presented it is clear that the calcium increases in what is putatively the same astrocyte goes well beyond what is outlined with automatically placed small ROIs. The smallest astrocytic processes are an order of magnitude smaller than the resolution of optical imaging and would not be outlined by either SR101 or with the segmentation method judged by the ROIs presented in the figures. Completely ignoring these very large parts of the spatial domain of an astrocyte, in particular when making claims about a spatial threshold, seems inappropriate. Several recent methods published use pixel-by-pixel event-based approaches to define calcium signals. The data should have been analyzed using such a method within a complete astrocyte spatial domain in addition to the analyses presented. Also, the authors do not discuss how two-dimensional sampling of calcium signals from an astrocyte that has processes in three dimensions (see Bindocci et al, Science 2017) may affect the results: if subdomain activation is not homogeneously distributed in the three-dimensional space within the astrocyte territory, the assumptions and findings between a correlation between subdomain activation and somatic activation may be affected.

      In order to reduce noise from individual pixels, we chose to segment astrocyte arborizations into domains of several pixels. As pointed out previously, including pixels outside of the SR101-positive territory runs the risk of including a pixel that may be from a neighboring cell or mostly comprised of extracellular space, and we chose the conservative approach to avoid this source of error. We agree that the results have limitations from being acquired in 2D instead of 3D, but it is likely to assume the 3D astrocyte is homogeneously distributed and that the 2D plane is representative of the whole astrocyte. Indeed, no dimensional effects were reported in Bindocci et al, Science 2017. We have included a paragraph in the discussion to address this limitation in our study on P15, L23-27:

      “The investigation of the spatial threshold could be improved in the future in a number of ways. One being the use of state-of-the-art imaging in 3D(Bindocci et al., 2017). While the original publication using 3D imaging to study astrocyte physiology does not necessarily imply that there would be different calcium dynamics in one axis over another, the three-dimensional examination of the spatial threshold could refine the findings we present here.”

      The experiments are performed either in anesthetized mice, or in slices. The study would have come across as much more solid and interesting if at least a small set of experiments were performed also in awake mice (for instance during spontaneous behavior), given the profound effect of anesthesia on astrocytic calcium signaling and the highly invasive nature of preparing acute brain slices. The authors mention the caveat of studying anesthetized mice but claim that the intracellular machinery should remain the same. This explanation appears a bit dismissive as the response of an astrocyte not only depends on the internal machinery of the astrocyte, but also on how the astrocyte is stimulated: for instance synaptic stimulation or sensory input likely would be dependent on brain state and concurrent neuromodulatory signaling which is absent in both experimental paradigms. The discussion would have been more balanced if these aspects were dealt with more thoroughly.

      Yes, we agree that this is a limitation, and we acknowledge this is in the Discussion P15, L27-31:

      “To better control the system, mice imaged here were under anesthesia, and this is a method that has been used to characterize many foundational physiological results in the field (Hubel and Wiesel, 1962; Mountcastle et al., 1957). However, assessing the spatial threshold in awake freely moving animals would be the next logical step.”

      The study uses a heaviside step function to define a spatial 'threshold' for somata either being included or not in a calcium signal. However, Fig 4E and 5D showing how the method separates the signal provide little understanding for the reader. The most informative figure that could support the main finding of the study, namely a ~23% spatial threshold for astrocyte calcium surges reaching the soma, is Fig. 4G, showing the relationship between the percentage of arborizations active and the soma calcium signal. A similar plot should have been presented in Fig 5 as well. Looking at this distribution, though, it is not clear why ~23% would be a clear threshold to separate soma involvement, one can only speculate how the threshold for a soma event would influence this number. Even if the analyses in Fig. 4H and the fact that the same threshold appears in two experimental paradigms strengthen the case, the results would have been more convincing if several types of statistical modeling describing the continuous distribution of values presented in Fig. 4E (in addition to the heaviside step function) were presented.

      We agree with the reviewer and have added to the paper a discussion for our justification on the use of the Heaviside step function, and have included this in the methods section. We chose the Heaviside step function to represent the on/off situation that we observed in the data that suggested a threshold in the biology. We agree with the reviewer that Fig. 4G is informative and demonstrates that under 23% most of the soma fluorescence values are clustered at baseline. We agree that a different statistical model describing the data would be more convincing and confirmed the spatial threshold with the use of a confidence interval in the text and supported the use of percent domains active for this threshold over other properties such as spatial or temporal clustering using a general linear model. P18-19, L34-2:

      “Heaviside step function

      The Heaviside step function below in equation 4 is used to mathematically model the transition from one state to the next and has been used in simple integrate and fire models (Bueno-Orovio et al., 2008; Gerstner, 2000).

      The Heaviside step function 𝐻(𝑎) is zero everywhere before the threshold area (𝑎 ) and one everywhere afterwards. From the data shown in Figure 4E where each point (𝑆(𝑎)) is an individual astrocyte response with its percent area (𝑎) domains active and if the soma was active or not denoted by a 1 or 0 respectively. To determine 𝑎 in our data we iteratively subtracted 𝐻(𝑎) from  𝑆(𝑎) for all possible values of 𝑎 to create an error term over 𝑎. The area of the minimum of that error term was denoted the threshold area.”

      The description of methods should have been considerably more thorough throughout. For instance which temperature the acute slice experiments were performed at, and whether slices were prepared in ice-cold solution, are crucial to know as these parameters heavily influence both astrocyte morphology and signaling. Moreover, no monitoring of physiological parameters (oxygen level, CO2, arterial blood gas analyses, temperature etc) of the in vivo anesthetized mice is mentioned. These aspects are critical to control for when working with acute in vivo two-photon microscopy of mice; the physiological parameters rapidly decay within a few hours with anesthesia and following surgery.

      We have increased the thoroughness of our methods section. Especially including that body temperature and respiration were indeed monitored throughout anesthesia.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      (1) We think it would improve the paper if the authors provided a frame-by-frame example over (for example) 10-15 frames showing the spatiotemporal evolution of responses, where each frame represents 1s or 2s. This could be included with the temporal maps we proposed above.

      We agree that this is a useful example and have included it in our new figure (new Figure 5, specifically see Figure 5A) that uses temporal maps to analyze the spatiotemporal properties of calcium dynamics (Figure 5B).

      (2) Concerning the evidence in the present manuscript, we are not clear on what "populations" means. Can the authors clarify in methods? It is our understanding that 987 astrocytes from 30 populations from 3 mice were the source for the core data in the paper. What are the 30 populations, and how were the 987 astrocytes distributed across the populations? Are they roughly 10 FOVs per mouse? If so, please clarify roughly how far apart FOVs from the same mouse were, and how much delay between stim protocol application there was when a FOV was changed to a new FOV. Also, if for example, the 10th FOV from mouse 1 "saw" 9 rounds of stimulation before recording the response to the 10th stim round. To this point, was there any indication of response differences in populations that were recorded earlier vs later in the experimental sequence for each mouse?

      Descriptions of data will be included with the uploaded datasets following acceptance.

      (3) The description of the results on page 6 is a bit confusing for us. In lines 1-4, are the authors saying that 57.7% of astrocytes in a FOV exhibited responses within their soma and arborization, while 15.1% had responses only in arborization? If so, this is not clear to us from Figure 2C, where we count ~25 astrocytes in the FOV, maybe 8 or 9 astrocytes with activity in the arborization + soma (after stimulation), and 8 or 9 astrocytes with responses only in arborization. Is there something we do not understand, or is the second panel simply not representative of the group data?

      Figure 2D is representative of the group data and does indeed show 57.7% of the population responds within the soma and arborization, and a 15.1% of astrocytes with responses in only their arborizations. It is unable to observe in this image whether arborizations are active or just increases in one or a few domains, as may not be enough activity to be detected when sampling over the entire arborization.

      (4) In the second part of page 6 - when the authors apply linear regression - are they saying that there is a linear relationship between the amount (area) of activity measured in the arborization versus the soma, where populations of astrocytes with 50% activation of the arborization also tend to have 50% activation in their somas? If so, then this is not apparent by the map provided in Figure 2C, where it looks like soma activation (within the subpopulation) is 100% irrespective of the apparent activity in the arborization. This needs to be clarified. If not, and what they mean is that the probability of finding an active soma is related to the amount of activation within the arborization, this needs to be stated more clearly.

      When testing the linear relationship between somas active vs arborizations active, we find a significant linear correlation (p < 0.001, R2 = 0.90).

      (5) In the experiments where stimulation duration, frequency, and intensity were varied to determine the percentage of domains that were on, it would be helpful to better understand the protocol in terms of sequence. In the methods it seems that hindpaw stimulation intensity was first pseudo-randomly varied at 2Hz for 10s, followed by pseudorandomly varied stimulation frequency and then pseudo-randomly varied duration - both at 2mA for 10s. Is this correct?

      We have since updated the methods section to better describe the experimental protocol.

      (6) In Figure 3E the alignment of the "arbor" to the somatic response is a bit misleading. The signals being averaged for the "arbor" are composed of temporally heterogeneous sources (from distal and proximal domains) and when averaged will produce an artificially slow rise time. In contrast, the averaged somatic signals are composed of much more homogenous sources (arising from a more singular event) and therefore have a sharp rise time. It would make more sense to align their kinetics relative to the stimulus onset. It would also make more sense to compare the somatic response of astrocytes to the "arbor" of astrocytes which respond rapidly vs slowly to the foot-shock.

      Aligning the responses to the stimulus onset would exacerbate the artificially slow rise time for the soma and arborization as not all cells come online at the same time from stimulus onset.

      Reviewer #2 (Recommendations For The Authors):

      Data availability

      It seems that the data is not shared on a public repository, while it appears to be necessary according to eLife's general principles (see https://elife-rp.msubmit.net/html/eliferp_author_instructions.html#dataavailability).

      We will upload raw data to a repository upon acceptance of the manuscript.

      Data analysis

      - Why did the authors choose the heaviside step function to characterize conditions for somatic event initiation? It seems that this approach is averaging very heterogeneous data (some cells do not display somatic events even with ~50% domains active while some display somatic events with < 5 it seems).

      Please see discussion to variability in the responses to the public reviews. We have since included more discussion on the use of the Heaviside step function in the Methods section.  

      - Averaging of the data. It seems that the approach chosen to quantify calcium activity overlooks the variability of the signals measured ("Astrocyte calcium quantifications were averaged over all astrocytes of a single video and these values were used in statistical testing.", l.22-23, page 15). What is the variability of the measured features between different astrocytes? Between different animals? To what extent does this averaging strategy overlook the variability of the signals/how much information do we expect to lose? The manuscript would probably benefit from a more advanced statistical approach to analyze the data.

      Is it possible to extract information from the data that would indicate mechanisms allowing somatic activity when the percentage of domain activation was lower than the threshold? How about the opposite (i.e when no global event was triggered even when the percentage of domain activation was high)?

      We are indeed combining the responses from many different diverse astrocyte responses, and we see this as a strength of the paper. Variation is a hallmark of biology, and we have added this to the discussion. In the rare cases where astrocyte somas do not come online when the percent of arborizations is over threshold, or the opposite when somas activate with little domain activation, we would say this is most likely due to imaging 2D instead of the entire 3D cell. We have also added this into our discussion.

      - Here are a few suggestions for additional analysis that might be of interest to the community:

      - Measuring calcium activity in domains depending on their distance from the soma. This would allow us to better understand the spatial integration of the signals and notably answer the following question: Does the emergence of somatic events depend on the spatial distribution of active domains? (and does a smaller domain-soma distance facilitate the emergence of a calcium surge with a lower percentage of active domains?) These measurements could be visualized with plots of xy position of the domains (domain-soma distance) = f(time) with a colormap reflecting dF/F0, for example, at different times pre- and post-somatic events. Instead of DF/F0, these plots could also display the correlation between domain activities.

      We have performed this analysis, and it is now in the new figure (new Figure 5).

      - Adding temporality to the data analysis. It seems that calcium activity is "concatenated" during the whole duration prior to the somatic event (pre-soma) and after (post-soma). However, it is unclear how long the domains remained active and how many domains were still active at the onset of the somatic event. Adding a finer temporal analysis might help answer questions such as the potential need for some degree of synchronization of domain activity to trigger calcium surges.

      It could notably be interesting to measure the level of synchrony of events as a function of their distance from the soma and to analyze how it correlates with the properties of the somatic event.

      We have now included temporal analysis of astrocyte calcium surge in our new figure (new Figure 5). While we did see examples of spatially clustered domain activation in our data, those examples usually included other non-clustered domain activities and when including all of the active domains within an astrocytes arborization, we found no difference between the distance between activated domains before and after soma activation, even when comparing to subthreshold domain activity.

      Experiments

      - Would it be possible to apply different levels of stimulation to a given cell in order to discriminate whether the "no-soma" cells can display somatic events when neuronal activity is enhanced?

      Increased sensory stimulation does increase soma activity (Please see Lines et al., Nature Communications, 2020). An example of increased stimulation leading to somatic activation where it was not present in lower stimuli can be seen in Figure 4A-C.

      - Why choose a stimulation of 2 mA, 2 Hz for 20 sec in the experiments on IP3R2-/- mice?

      Has the same set of various stimulation protocols featured in Figure 4 been applied to IP3R2-/- mice? If so, were more domains activated as stimulation intensity (amplitude; duration, or frequency) increased? Could it trigger somatic events? This information seems necessary to be able to assert that calcium surges rely on the IP3R2 pathway.

      These experiments were not performed.

      -  Adding intermediary values of ATP pulse duration to Figure 6 (e.g. 50 ms and 75 ms) might strengthen the claim that the linear increase of SIC frequency with ATP application duration is only observed above the ~23% threshold.

      Agreed, however these experiments were not performed.

      Minor corrections to the text and figures.

      Methods

      The reader might benefit from a little more detail regarding the analysis of calcium signals. Notably, what was the duration of the calcium recordings? Was it constant across the different conditions tested in the study? Was it different in slice experiments versus in vivo experiments? What were the durations of the pre- and post- soma recordings and their variability? Was the calcium activity normalized for each astrocyte or animal? If not, why not consider normalizing the post-stimulation activity with pre-stimulation baseline activity?

      Similarly, some information on the stimulation protocol seems to be lacking: what was the frequency and intensity of the stimulus in the experiments where stimulus duration varied? Concurrently, what were the duration and intensity when frequency varied? What were the duration and frequency when the intensity varied?

      It might be beneficial to add further information on the algorithm of the Calsee software. What is it performing? How was it tested? Why is it referred to as "semi"-automatic, i.e. what might the user be needing to do manually? The segmentation seems to be omitting some branches connecting distal ROIs to the soma (see e.g. Fig S1.E). How would this influence the analysis and results?

      Results

      - Some assessments in the manuscript seem a bit too assertive/general compared to what can be deduced from the evidence presented in the figures. It could be beneficial to the reader to rephrase the latter. Some examples are listed below:

      - "These results indicate that astrocyte responses occurred initially in the arborizations, which is consistent with the idea that synapses are likely to be accessed at the astrocyte arborization ", l.11-12 page 7. The fact that the time to peak is lower in the arborization does not necessarily mean that signals initiate there. It could be because the kinetics/pathways in those compartments are different or there could be a dilution effect in the soma. Indeed, an influx of the same amount of calcium ions in the soma vs in a small domain will not correspond to the same DF/F0 in those compartments and might thus remain undetected in the soma.

      - "Using transgenic IP3R2-/- mice, we found that the activation of type-2 IP3 receptors is necessary for the generation of astrocyte calcium surge" (page 4, line 1-2), "present data further demonstrate that IP3R2 are necessary for the propagation of astrocyte calcium surge." (l. 18-19 page 13) -> As discussed above, the evidence does not seem to be strong enough to assert that IP3R2 is necessary to trigger somatic events. The results indicate that the IP3R2 pathway seems to facilitate the emergence of somatic events. As astrocytes differ strongly in terms of morphology and expression profiles depending on physiological conditions, the conclusions of this study might only apply to the specific experimental conditions used: region studied, age of the animal, type of sensory stimuli performed, and so on.

      - "These results indicate that spatial threshold of the astrocyte calcium surge has a functional impact on gliotransmission, which have important consequences on the spatial extension of the astrocyte-neuron communication and synaptic regulation", l.41-48 page 11. Figure 6 seems to indicate a correlation between the proportion of astrocyte domains activated and the frequency of SICs. The data seems insufficient to conclude that there is a causal relationship between calcium surge in the astrocyte and gliotransmission or SIC frequency.

      -" These results indicate that, on average, subcellular calcium events located in astrocyte arborizations are related to soma activation.", page 6 l 15-16. It may be more informative to specify the correlation measured: i.e the larger the arborization activity, the larger the percentage of active somas.

      Figures

      Figure 2: Adding more details in the figure legend explaining how the different parameters are calculated might be useful to the reader. Notably, what does soma active (%) refer to?

      Figure 3: Could it be possible to add individual traces of calcium activity in the soma and arborization of individual cells to provide a glimpse of the variability of the signals measured?

      Fig4. B-C: Could it be possible to add in the legend information on the timeline between stimulation and calcium signal recording? (and the duration of the latter).

      Fig4 D-E: Why is the maximum number of active domains in panel D ~50-60% but goes up to ~100% in panel E? Could it be that plotting SEM rather than STD might misrepresent the variability in the percentage of active domains for each stimulus property?

      Fig4F: It seems that the threshold changes with the frequency of the stimulus: e.g. at 10 Hz, the threshold seems larger than 22.6%. What would that mean?

      Fig4G: - Why do some data points display a soma amplitude < 0 DF/F0 ?

      - Why choose a sigmoid fit? What are the statistics associated to the fit? Is it in accordance with the threshold of 23%? Would a linear fit provide a good fit?

      Fig5F: - It seems that a few IP3R2-/- astrocytes displayed somatic events? If so, it might be interesting to mention this in the discussion section and to speculate on why that might be. - It seems that panel 5F displays the average percentage of somas that got activated rather than the probability of somatic events.

      - Is it possible that the effect seen in domains vs arborization is due to statistical effects (as n=2450 vs 112)?

      Fig S1: Panel D legend: double labeling of the radius used for each plot might be useful, notably for colorblind readers as the colors might be hard to see.

      Discussion

      - The discussion section might benefit from a discussion on the similitude between the data presented here and previous reports that reported similar results, i.e that most calcium signals in astrocytes were located in the distal processes, forming microdomains that rarely propagated to the soma. These include Bindocci et al 2017 Science (DOI:10.1126/science.aai8185) and Georgiou et al, Science Advances, 2022 (DOI: 10.1126/sciadv.abe5371).

      Thank you for the suggestions. We have now changed portions of the Methods, Results  and Discussion sections.

      Reviewer #3 (Recommendations For The Authors):

      The text could potentially be improved somewhat.

      Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Maestri et al. use an integrative framework to study the evolutionary history of coronaviruses. They find that coronaviruses arose recently rather than having undergone ancient codivergences with their mammalian hosts. Furthermore, recent host switching has occurred extensively, but typically between closely related species. Humans have acted as an intermediate host, especially between bats and other mammal species.

      Strengths:

      The study draws on a range of data sources to reconstruct the history of virus-host codivergence and host switching. The analyses include various tests of robustness and evaluations through simulation.

      Weaknesses:

      The analyses are limited to a single genetic marker (RdRp) from coronaviruses, but using other sections of the genome might lead to different conclusions. The genetic marker also lacks resolution for recent divergences, which precludes the detailed examination of recent host switches. Careful and detailed reconstruction of the timescale would be helpful for clarifying the evolutionary history of coronaviruses alongside their hosts.

      The use of a single short genetic marker (the RdRp palmprint region) from coronaviruses is indeed a limitation. However, this marker is the one that is currently used for routinely delimiting operational taxonomic units in RNA viruses and reconstructing their evolutionary history (Edgar et al. 2022, see also the Serratus project; https://serratus.io/); therefore, we took the conscious decision early on to rely on this expertise. Unfortunately, this marker cannot provide robust timescale reconstructions for coronavirus evolution (previous estimates of coronavirus origin range from around 10 thousand years ago to 293 million years ago depending on modeling assumptions). Only future genomic work across Coronaviridae that will characterize multiple genetic regions with different evolutionary rates will allow us to precisely elucidate the timescale of the evolutionary history of coronaviruses alongside their hosts. In the meantime, we show here that, while the RdRp palmprint region cannot by itself resolve the precise timescale of coronavirus evolution, it strongly suggests, when used along with cophylogenetic approaches, a recent evolutionary origin in bats.

      We now further discuss these issues and the perspectives offered by future genomic work on lines 462-485.  

      Reviewer #2 (Public Review):

      Summary:

      In their study titled "Recent evolutionary origin and localized diversity hotspots of mammalian coronaviruses," authors Benoît Perez-Lamarque, Renan Maestri, Anna Zhukova, and Hélène Morlon investigate the complex evolutionary history of coronaviruses, particularly those affecting mammals, including humans. The study focuses on unraveling the evolutionary trajectory of these viruses, which have shown a high propensity for causing pandemics, as evidenced by the SARS-CoV2 outbreak.

      The research addresses a significant gap in our understanding of the evolutionary dynamics of coronaviruses, particularly their history, patterns of host-to-host transmission, and geographical spread. These aspects are important for predicting and managing future pandemic scenarios.

      Historically, studies have employed cophylogenetic tests to explore virus-host relationships within the Coronaviridae family, often suggesting a long history of virus-host codiversification spanning millions of years. However, the team led by Perez-Lamarque proposes a novel phylogenetic framework that contrasts this traditional view. Their approach, which involves adapting gene tree-species tree reconciliation, is designed to robustly test the validity of two competing scenarios: an ancient origination and codiversification versus a more recent emergence and diversification through host switching.

      Upon applying this innovative framework to the study of coronaviruses and their mammalian hosts, the authors' findings challenge the prevailing notion of a deep evolutionary history. Instead, their results strongly support a scenario where coronaviruses have a more recent origin, likely in bat populations, followed by diversification predominantly through hostswitching events. This diversification, interestingly, seems to occur preferentially within mammalian orders.

      A critical aspect of their findings is the identification of hotspots of coronavirus diversity, particularly in East Asia and Europe. These regions align with the proposed scenario of a relatively recent origin and subsequent localized host-switching events. The study also highlights the rarity of spillovers from bats to other species, yet underscores the relatively higher likelihood of such spillovers occurring towards humans, suggesting a significant role for humans as an intermediate host in the evolutionary journey of these viruses.

      The research also points out the high rates of host-switching within mammalian orders, including between humans, domesticated animals, and non-flying wild mammals.

      In conclusion, the study by Perez-Lamarque and colleagues presents an important quantitative advance in our understanding of the evolutionary history of mammalian coronaviruses. It suggests that the long-held belief in extensive virus-host codiversification may have been substantially overestimated, paving the way for a reevaluation of how we understand, predict, and potentially control the spread of these viruses.

      Strengths:

      The study is conceptually robust, and its conclusions are convincing.

      Weaknesses:

      Despite the availability of a dated host tree the authors were only able to use the "undated" model in ALE, with the dated method (which only allows time-consistent transfers) failing on their dataset (possibly due to dataset size?). Further exploration of the question would be potentially valuable.

      Our intuition is that ALE in its “dated” version does not necessarily fail on our dataset due to its size: ALE runs, but it provides unrealistic parameter estimates and is not able to output possible reconciliations, as mentioned in our Material and Methods section. We think this issue is mostly due to the fact that there is no pattern of codiversification: the coronavirus and mammal trees are so distinct that finding a reconciliation scenario between these trees with time-consistent switches is very difficult and ALE fails at estimating an amalgamated likelihood for such an unlikely scenario. We now ran the dated version of ALE independently on the smaller alpha and betacoronaviruses datasets. It still fails on the betacoronaviruses dataset.  On the alphacoronaviruses dataset, it does output significant reconciliations, however these reconciliations have a majority of events of transfers and losses, confirming that codiversification is unlikely in this clade.

      Reviewer #3 (Public Review):

      Summary:

      This work uses tools and concepts from co-phylogenetic analyses to reconstruct the evolutionary and diversification history of coronaviruses in mammals. It concludes that crossspecies transmissions from bats to humans are a relatively common event (compared to bats to other species). Across all mammals, the diversification history of coronaviruses suggests that there is potential for further evolutionary diversification.

      Strengths:

      The article uses an interesting approach based on jointly looking at the extant network of coronaviruses-mammals interactions, and the phylogenetic history of both these organisms. The authors do an impressive job of explaining the challenges of reconstructing evolutionary dynamics for RNA viruses, and this helps readers appraise the relevance of their approach.

      Weaknesses:

      I remain unconvinced by the argument that sampling does not introduce substantial biases in the analyses. As the authors highlight, incomplete knowledge of the extant interactions would lead to a biased reconstruction of the diversification history. In a recent paper (Poisot et al. 2023, Patterns), we look at sampling biases in the virome of mammals and suggest that is a fairly prominent issue, that is furthermore structured by taxonomy, space, and phylogenetic position. Case in point, even for betacoronaviruses, there have been many newly confirmed hosts in recent years. For organisms that have received less intense scrutiny, I think a thorough discussion of potential gaps in data would be required (see for example Cohen et al. 2022, Nat. Comms).

      I was also surprised to see little discussion of the differences between alpha and beta coronaviruses - there is evidence that they may differ in their cross-species transmission (see Caraballo et al. 2022 Micr. Spectr.), which could call into question the relevance of treating all coronaviruses as a single, homogeneous group.

      Some of the discussions in this paper also echo previous work by e.g. Geoghegan et al. (see 2017, PLOS Pathogens), which I was surprised to not see discussed, as it is a much earlier investigation of the relative frequencies of co-divergence and host switches for different viral families, with a deep discussion of how this may structure future evolutionary dynamics.

      We totally agree that sampling biases in the virome of mammals is a prominent issue, which is why we conducted a series of sensitivity analyses to test their effect on our main conclusions. We thoroughly tested the effect of (i) the unequal sampling effort across mammalian species that have been screened and (ii) the unequal screening of mammalian species across the mammalian tree of life by subsampling the data to correct for the unequal sampling effort (see Supporting Information Text). In both cases, we still reported low support for a scenario of codiversification, the origin in bats in East Asia, the preferential host switches within mammalian orders, and the rare spillovers from bats to humans. The robustness of our findings to sampling biases may be explained by the fact that the cophylogenetic approach we used (ALE) explicitly accounts for undersampling by assuming that all host switches involve unsampled intermediate hosts. To address the reviewer's comment, we now better underline the importance of sampling biases in our main text (see Discussion, lines 487-494) with supporting references (note that we did not find the Cohen et al. Nature Comm reference). We also better highlight our sensitivity analyses by moving them from the Supporting Information Text to the main text. 

      We agree that distinguishing between alpha and beta coronaviruses provides useful additional insights. We have run separate cophylogenetic analyses for these two sub-clades and now report the results of these additional analyses in the revised manuscript, and put them in context with the existing literature about the two sub-clades.

      We were not aware of the work of Geoghegan et al. (see 2017, PLOS Pathogens), thank you for providing this reference that is now cited. 

      Reviewer #1 (Recommendations For The Authors):

      (1) Overall I found this paper to be quite difficult to follow. The text needs clearer structure, which can be helped by writing in shorter paragraphs and adding section headings. For example, there are some very long paragraphs starting on L83, L176, L215, L511, and L598.

      We have now added section headings and divided these paragraphs into smaller ones.

      (2) It would be helpful to define some of the key terminology relating to the evolutionary interactions between the viruses and their hosts. Some of the terms that are typically used in the context include "coevolution", "cospeciation", "codivergence", and "codiversification". These have different meanings and need to be used carefully. The paper mostly deals with "codivergence" between coronaviruses and their host species.

      We now provide a list of definitions in Box S1. These definitions are as in our recent article clarifying the differences between these patterns/processes (Perez-Lamarque & Morlon 2024).

      Specific comments

      L83-L105: This paragraph can be written more concisely.

      We prefer to keep this paragraph like this as it contains key explanations that are necessary for understanding our approach and results.  

      Figure 1: The timescales of the trees are rather confusing. The different scales are indicated by the gray shading but this is easy to overlook. Maybe stretching or compressing the trees horizontally would help to emphasise the different timescales.

      Done.

      Figure 2: Note that the maximum clade credibility tree is a specific tree sampled from the posterior distribution - it is not a consensus tree. In the figure caption, the meaning of "location" is unclear.

      We have removed the word “consensus”, thank you for noting this. We have replaced “location” by “branching order”. 

      L461: How was the model chosen, and why were different models used in the BEAST and PhyloBayes analyses?

      We did our PhyloBayes analyses first and used the LG model following methodology outlined in previous studies using ALE (e.g. Groussin et al. 2017; Dorrell et al. 2021). Unfortunately, the LG model is not available in the default version of BEAST2 so we had to use a different model (the WAG model). We have now run BEAST2 with the LG model (thanks to the BEAST_CLASSIC package) and we obtained very similar results (see Figure below showing the BEAST consensus trees obtained with the WAG or LG models – they only slightly differ by the branching of the u7351 OTU). We have now added this information in the Methods section. 

      Author response image 1.

      L477: It is not clear to me how the PhyloBayes and BEAST analyses differ. Please expand the explanation of why PhyloBayes was used here.

      We have now clarified this (lines 594-597). 

      L568: Why not test explicitly for recombination?

      We did test for the occurrence of recombination using several approaches, including

      OpenRDP (https://github.com/PoonLab/OpenRDP), our own custom code, and Gubbins (Croucher et al. 2015). These tests were however inconclusive, indicating either the absence or presence of recombination, thus suggesting that the palmprint region is too short to infer anything about recombination. We thus do not exclude the possibility that recombination occurred, and test the robustness of our results to recombination by running our analyses on different sub-parts of the palmprint region. We have clarified this in our Material & Methods.

      L618: "DNA sequences" -> "RNA sequences"

      Done.

      The paper contains numerous minor grammatical errors and would benefit from careful proofreading and editing. Please check the use of plurals and apostrophes. Some of the errors are listed below:

      L49: "As several" -> "As with several"

      Done.

      L178: "reconciliates" -> "reconciles"?

      Done.

      L199: "extent" -> "extant"

      Done.

      L289: This sentence needs rephrasing to avoid a triple negative ("cannot ... reject ... not present")

      Done.

      L469: "temporary" -> "temporal"

      Done.

      L470: "neglectable" -> "negligible"

      Done.

      L577: "not only relying" -> "not relying only"

      Done.

      Reviewer #2 (Recommendations For The Authors):

      The study is generally well-constructed and its results are convincing. However, considering the availability of a dated host tree, conducting a dated reconciliation analysis could be beneficial. Creating a smaller sub-dataset and performing a dated reconciliation analysis would likely be a valuable addition to the research.

      We have now run the dated version of ALE on both the alpha and betacoronaviruses subclades. ALE dated still does not output reconciliations on the betacoronaviruses dataset, but it does on the smaller alphacoronaviruses dataset. We found significant reconciliations, indicating that mammal-alphacoronavirus associations are not random with respect to phylogeny, but the reconciliations involved more host switch and loss events (38 switches + 29 losses) than cospeciation events (65), indicating cophylogenetic signal in the absence of phylogenetic congruence (Perez-Lamarque & Morlon 2024). We now present the results on lines 264-282.  

      Reviewer #3 (Recommendations For The Authors):

      I think the results are written in a very speculative way, with many sentence fragments that should really be part of the discussion.

      We have carefully checked our Results section and rephrased or removed formulation that may have been perceived as speculative.  

      There are a lot of considerations in this manuscript about spread and future pandemics, but I think this is very far from the topic of this paper. When we quantified the coevolutionary risk of bats-betacovs in a recent paper (Forero et al. 2024, Virus Evol.), we only briefly touched upon this discussion because we compared our outputs with a measure of human population density. I don't think the manuscript needs to talk about epidemiology at all, and it would probably be more useful as a purely evo-bio piece.

      We think that it is useful to discuss the potential implications of our results for future pandemics, even though we agree that this discussion is rather speculative. We have removed the mention of predictions in the Abstract and have softened our wording in the Discussion.  

      References:

      Croucher, N.J., Page, A.J., Connor, T.R., Delaney, A.J., Keane, J.A., Bentley, S.D., et al. (2015). Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res., 43, e15.

      Dorrell, R.G., Villain, A., Perez-Lamarque, B., Audren de Kerdrel, G., McCallum, G., Watson, A.K., et al. (2021). Phylogenomic fingerprinting of tempo and functions of horizontal gene transfer within ochrophytes. Proc. Natl. Acad. Sci., 118, e2009974118.

      Edgar, R.C. et al. (2022). Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147.

      Groussin, M., Mazel, F., Sanders, J.G., Smillie, C.S., Lavergne, S., Thuiller, W., et al. (2017).

      Unraveling the processes shaping mammalian gut microbiomes over evolutionary time. Nat. Commun., 8, 14319.

      Perez-Lamarque, B. & Morlon, H. (2024). Distinguishing cophylogenetic signal from phylogenetic congruence clarifies the interplay between evolutionary history and species interactions. Syst. Biol.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public Review):

      Thank you for the helpful comments. Below, we have quoted the relevant sections from the revised manuscript as we respond to the reviewer’s comments item-by-item.

      Weaknesses:

      While the task design in this study is intentionally stimulus-rich and places a minimal constraint on the animal to preserve naturalistic behavior, this is, unfortunately, a double-edged sword, as it also introduces additional variables that confound some of the neural analysis. Because of this, a general weakness of the study is a lack of clear interpretability of the task variable neural correlates. This is a limitation of the task, which includes many naturally correlated variables - however, I think with some additional analyses, the authors could strengthen some of their core arguments and significantly improve clarity.

      We acknowledge the weakness and have included additional analyses to compensate for it. The details are as follows in our reply to the subsequent comments.  

      For example, the authors argue, based on an ANN decoding analysis (Figure 2b), that PFC neurons encode spatial information - but the spatial coordinate that they decode (the distance to the active foraging zone) is itself confounded by the fact that animals exhibit different behavior in different sections of the arena. From the way the data are presented, it is difficult to tell whether the decoder performance reflects a true neural correlate of distance, or whether it is driven by behavior-associated activity that is evoked by different behaviors in different parts of the arena. The author's claim that PFC neurons encode spatial information could be substantiated with a more careful analysis of single-neuron responses to supplement the decoder analysis. For example, 1) They could show examples of single neurons that are active at some constant distance away from the foraging site, regardless of animal behavior, and 2) They could quantify how many neurons are significantly spatially modulated, controlling for correlates of behavior events. One possible approach to disambiguate this confound could be to use regression-based models of neuron spiking to quantify variance in neuron activity that is explained by spatial features, behavioral features, or both.

      First of all, we would like to point out that while the recording was made during naturalistic foraging with minimal constraints behaviorally, a well-trained rat displayed an almost fixed sequence of actions within each zone. The behavioral repertoire performed in each zone was very different from each other: exploratory behaviors in the N-zone, navigating back and forth in the F-zone, and licking sucrose while avoiding attacks in the E-zone. Therefore, the entire arena is not only divided by the geographical features but also by the distinct set of behaviors performed in each zone. This is evident in the data showing a higher decoding accuracy of spatial distance in the F-zone than in the N- or E-zone. In this sense, the heterogeneous encoding reflects heterogenous distribution of dominant behaviors (navigation in the F-zone and attack avoidance while foraging in the E-zone) and hence corroborate the reviewer’s comment at a macroscopic scale encompassing the entire arena.

      Having said that, the more critical question is whether the neural activity is more correlated with microscopic behaviors at every moment rather than the location decoded in the F-zone. As the reviewer suggested, the first-step is to analyze single-neuron activity to identify whether direct neural correlates of location exist. To this end, traditional place maps were constructed for individual neurons. Most neurons did not show cohesive place fields across different regions, indicating little-to-no direct place coding by individual neurons. Only a few neurons displayed recognizable place fields in a consistent manner. However, even these place fields were irregular and patchy, and therefore, nothing comparable to the place cells or grid cells found in the hippocampus or entorhinal cortex. Some examples firing maps have been added to Figure 2 and characterized in the text as below.

      “To determine whether location-specific neural activity exists at the single-cell level in our mPFC data, a traditional place map was constructed for individual neurons. Although most neurons did not show cohesive place fields across different regions in the arena, a few neurons modulated their firing rates based on the rat’s current location. However, even these neurons were not comparable to place cells in the hippocampus (O’Keefe & Dostrovsky, 1971) or grid cells in the entorhinal cortex (Hafting et al., 2005) as the place fields were patchy and irregular in some cases (Figure 2B; Units 66 and 125) or too large, spanning the entire zone rather than a discrete location within it (Units 26 and 56). The latter type of neuron has been identified in other studies (e.g., Kaefer et al., 2020).”

      Next, to verify whether the location decoding reflects neuronal activity due to external features or particular type of action, predicted location was compared between the opposite directions within the F-zone, inbound and outbound in reference to the goal area (Lobsterbot). If the encoding were specifically tied to a particular action or environmental stimuli, there should be a discrepancy when the ANN decoder trained with outbound trajectory is tested for predictions on the inbound path, and vice versa. However, the results showed no significant difference between the two trajectories, suggesting that the decoded distance was not simply reflecting neural responses to location-specific activities or environmental cues during navigation.

      “To determine whether the accuracy of the regressor varied depending on the direction of movement, we compared the decoding accuracy of the regressor for outbound (from the N- to E-zone) vs. inbound (from the E- to N- zone) navigation within the F-zone. There was no significant difference in decoding accuracy between outbound vs. inbound trips (paired t-test; t(39) = 1.52, p =.136), indicating that the stability of spatial encoding was maintained regardless of the moving direction or perceived context (Figure 2E).”

      Additionally, we applied the same regression analysis on a subset of data that were recorded while the door to the robot compartment was closed during the Lobsterbot sessions. This way, it is possible to test the decoding accuracy when the most salient spatial feature, the Lobsterbot, is blocked out of sight. The subset represents an average of 38.92% of the entire session. Interestingly, the decoding accuracy with the subset of data was higher accuracy than that with the entire dataset, indicating that the neural activities were not driven by a single salient landmark. This finding supports our conclusion that the location information can be decoded from a population of neurons rather than from individual neurons that are associated with environmental or proprioceptive cues. We have added the following description of results in the manuscript.

      “Previous analyses indicated that the distance regressor performed robustly regardless of movement direction, but there is a possibility that the decoder detects visual cues or behaviors specific to the E-zone. For example, neural activity related to Lobsterbot confrontation or licking behavior might be used by the regressor to decode distance. To rule out this possibility, we analyzed a subset of data collected when the compartment door was closed, preventing visual access to the Lobsterbot and sucrose port and limiting active foraging behavior. The regressor trained on this subset still decoded distance with a MAE of 12.14 (± 3.046) cm (paired t-test; t(39) = 12.17, p <.001). Notably, the regressor's performance was significantly higher with this subset than with the full dataset (paired t-test; t(39) = 9.895, p <.001).”

      As for the comment on “using regression-based models of neuron spiking to quantify variance in neuron activity that is explained by spatial features, behavioral features, or both”, it is difficult to separate a particular behavioral event let alone timestamping it since the rat’s location was being monitored in the constantly-moving, naturalistic stream of behaviors. However, as mentioned above, a new section entitled “Overlapping populations of mPFC neurons adaptively encode spatial information and defensive decision” argues against single-neuron based account by performing the feature importance analysis. The results showed that even when the top 20% of the most informative neurons were excluded, the remaining neural population could still decode both distance and events.  This analysis supports the idea of a population-wide mode shift rather than distinct subgroups of neurons specialized in processing different sensory or motor events. This idea is also expressed in the schematic diagrams featured in Figure 8 of the revision.

      To substantiate the claim that PFC neurons really switch between different coding "modes," the authors could include a version of this analysis where they have regressed out, or otherwise controlled for, these confounds. Otherwise, the claim that the authors have identified "distinctively different states of ensemble activity," as opposed to simple coding of salient task features, seems premature.

      A key argument in our study is that the mPFC neurons encode different abstract internal representations (distance and avoidance decision) at the level of population. This has been emphasized in the revision with additional analyses and discussions. Most of all, we performed single neuron-based analysis for both spatial encoding (place fields for individual neurons) and avoidance decision (PETHs for head entry and head withdrawal) and contrasted the results with the population analysis. Although some individual neurons displayed a fractured “place cell-like” activity, and some others showed modulated firing at the head-entry and the head-withdrawal events, the ensemble decoding extracted distance information for the current location of the animal at a much higher accuracy. Furthermore, the PCA analysis identified abstract feature dimensions especially regarding the activity in the E-zone that cannot be attributable to a small number of sensory- or motor-related neurons. 

      To mitigate the possibility that the PCA is driven primarily by a small subset of units responsive to salient behavioral events, we also applied PCA to the dataset excluding the activity in the 2-second time window surrounding the head entry and withdrawal. While this approach does not eliminate all cue- or behavior-related activity within the E-zone, it does remove the neural activity associated with emotionally significant events, such as entry into the E-zone, the first drop of sucrose, head withdrawal, and the attack. Even without these events, the PC identified in the E-zone was still separated from those in the F-zone and N-zone. This result again argues in support of distinct states of ensemble activity formed in accordance with different categories of behaviors performed in different zones. Finally, the Naïve Bayesian classifier trained with ensemble activity in the E-zone was able to predict the success and failure of avoidance that occur a few seconds later, indicating that the same population of neurons are encoding the avoidance decision rather than the location of the animal.

      Reviewer 1 (Recommendations):

      The authors include an analysis (Figure 4) of population responses using PCA on session-wide data, which they use to support the claim that PFC neurons encode distinctive neural states, particularly between the encounter zone and nesting/foraging zones. However, because the encounter zone contains unique stimulus and task events (sucrose, threat, etc.), and the samples for PCA are drawn from the entire dataset (including during these events), it seems likely that the Euclidean distance measures analyzed in Figure 4b are driven mostly by the neural correlates of these events rather than some more general change in "state" of PFC dynamics. This does not invalidate this analysis but renders it potentially redundant with the single neuron results shown in Figure 5 - and I think the interpretation of this as supporting a state transition in the coding scheme is somewhat misleading. The authors may consider performing a PCA/population vector analysis on the subset of timepoints that do not contain unique behavior events, rather than on session-wide data, or otherwise equalizing samples that correspond to behavioral events in different zones. Observing a difference in PC-projected population vectors drawn from samples that are not contaminated by unique encounter-related events would substantiate the idea that there is a general shift in neural activity that is more related to the change in context or goal state, and less directly to the distinguishing events themselves.

      Thank you for the comments. Indeed, this is a recurring theme where the reviewers expressed concerns and doubts about heterogenous encoding of different functional modes. Besides the systematic presentation of the results in the manuscript, from PETH to ANN and to Bayesian classifier, we argue, however, that the activity of the mPFC neurons is better represented by the population rather than loose collection of stimulus- or event-related neurons.

      The PCA results that we included as the evidence of distinct functional separation, might reflect activities driven by a small number of event-coding neurons in different zones. As mentioned in the public review, we conducted the same analysis on a subset of data that excluded neural activity potentially influenced by significant events in the E-zone. The critical times are defined as ± 1 second from these events and excluded from the neural data. Despite these exclusions, the results continued to show populational differences between zones, reinforcing the notion that neurons encode abstract behavioral states (decision to avoid or stay) without the sensory- or motor-related activity. Although this analysis does not completely eliminate all possible confounding factors emerging in different external and internal contexts, it provides extra support for the population-level switch occurring in different zones.

      In Figure 7, the authors include a schematic that suggests that the number of neurons representing spatial information increases in the foraging zone, and that they overlap substantially with neurons representing behaviors in the encounter zone, such as withdrawal. They show in Figure 3 that location decoding is better in the foraging zone, but I could not find any explicit analysis of single-neuron correlates of spatial information as suggested in the schematic. Is there a formal analysis that lends support to this idea? It would be simple, and informative, to include a quantification of the fraction of spatial- and behavior-modulated neurons in each zone to see if changes in location coding are really driven by "larger" population representations. Also, the authors could quantify the overlap between spatial- and behavior-modulated neurons in the encounter zone to explicitly test whether neurons "switch" their coding scheme.

      The Figure 7 (now Figure 8) is now completely revised. The schematic diagram is modified to show spatial and avoidance decision encoding by the overlapping population of mPFC neurons (Figure 8a). Most notably, there are very few neurons that encode location but not the avoidance decision or vice versa. This is indicated by the differently colored units in F-zone vs. E-zone. The model also included units that are “not” engaged in any type of encoding or engaged in only one-type of encoding although they are not the majority.

      We have also added a schematic for hypothetical switching mechanisms (Figure 8b) to describe the conceptual scheme for the initiation of encoding-mode switching (sensory-driven vs. arbitrator-driven process)

      “Two main hypotheses could explain this switch. A bottom-up hypothesis suggests sensory inputs or upstream signals dictate encoding priorities, while a top-down hypothesis proposes that an internal or external “arbitrator” selects the encoding mode and coordinates the relevant information (Figure 8B). Although the current study is only a first step toward finding the regulatory mechanism behind this switch, our control experiment, where rats reverted to a simple shuttling task, provide evidence that might favor the top-down hypothesis. The absence of the Lobsterbot degraded spatial encoding rather than enhancing it, indicating that simply reducing the task demand is not sufficient to activate one particular type of encoding mode over another.  The arbitrator hypothesis asserts that the mPFC neurons are called on to encode heterogenous information when the task demand is high and requires behavioral coordination beyond automatic, stimulus-driven execution. Future studies incorporating multiple simultaneous tasks and carefully controlling contextual variables could help determine whether these functional shifts are governed by top-down processes involving specific neural arbitrators or by bottom-up signals.”

      Related to this difference in location coding throughout the environment, the authors suggest in Figure 3a-b that location coding is better in the foraging zone compared to the nest or encounter zones, evidenced by better decoder performance (smaller error) in the foraging zone (Figure 3b). The authors use the same proportion of data from the three zones for setting up training/test sets for cross-validation, but it seems likely that overall, there are substantially more samples from the foraging zone compared to the other two zones, as the animal traverses this section frequently, and whenever it moves from the next into the encounter zone (based on the video). What does the actual heatmap of animal location look like? And, if the data are down-sampled such that each section contributes the same proportion of samples to decoder training, does the error landscape still show better performance in the foraging zone? It is important to disambiguate the effects of uneven sampling from true biological differences in neural activity.

      Thank you for the comment. We agree with the concern regarding uneven data size from different sections of the arena. Indeed, as the heatmap below indicates, the rats spent most of their time in two critical locations, one being a transition area between N-and F-zone and the other near the sucrose port. This imbalance needs to be corrected. In fact we have included methodology to correct this biased sampling. In the result section “Non-navigational behavior reduces the accuracy of decoded location” we have the following results.

      Author response image 1.

      Heatmap of the animal’s position during one example session. (Left) Unprocessed occupancy plot. Each dot represents 0.2 seconds. Right) Smoothed occupancy plot using a Gaussian filter (sigma: 10 pixels, filter size: 1001 pixels). The white line indicates a 10 cm length.

      “To correct for the unequal distribution of location visits (more visits to the F- than to other zones), the regressor was trained using a subset of the original data, which was equalized for the data size per distance range (see Materials and Methods). Despite the correction, there was a significant main effect of the zone (F(1.16, 45.43) = 119.2, p <.001) and the post hoc results showed that the MAEs in the N-zone (19.52 ± 4.46 cm; t(39) = 10.45; p <.001) and the E-zone (26.13 ± 7.57 cm; t(39) = 11.40; p <.001) had a significantly higher errors when compared to the F-zone (14.10 ± 1.64 cm).”

      Also in the method section, we have stated that:

      “In the dataset adjusted for uneven location visits, we divided distance values into five equally sized bins. Then, a sub-dataset was created that contains an equal number of data points for each of these bins.”

      Why do the authors choose to use a multi-layer neural network (Figure 2b-c) to decode the animal's distance to the encounter zone?(…) The authors may consider also showing an analysis using simple regression, or maybe something like an SVM, in addition to the ANN approach.

      We began with a simple linear regression model and progressed to more advanced methods, including SVM and multi-layer neural networks. As shown below, simpler methods could decode distance to some extent, but neural networks and random forest regressors outperformed others (Neural Network: 16.61 cm ± 3.673; Linear Regression: 19.85 cm ± 2.528; Quadratic Regression: 18.68 cm ± 4.674; SVM: 18.88 cm ± 2.676; Random Forest: 13.59 cm ± 3.174).

      We chose the neural network model for two main reasons: (1) previous studies demonstrated its superior performance compared to Bayesian regressors commonly used for decoding neural ensembles, and (2) its generalizability and robustness against noisy data. Although the random forest regressor achieved the lowest decoding error, we avoided using it due to its tendency to overfit and its limited generalization to unseen data.

      Overall, we expect similar results with other regressors but with different statistical power for decoding accuracy. Instead, we speculate that neural network’s use of multiple nodes contributes to robustness against noise from single-unit recordings and enables the network to capture distributed processing within neural ensembles.

      In Figure 6c, the authors show a prediction of withdrawal behavior based on neural activity seconds before the behavior occurs. This is potentially very interesting, as it suggests that something about the state of neural dynamics in PFC is potentially related to the propensity to withdraw, or to the preparation of this behavior. However, another possibility is that the behaves differently, in more subtle ways, while it is anticipating threat and preparing withdrawal behavior - since PFC neurons are correlated with behavior, this could explain decoder performance before the withdrawal behavior occurs. To rule out this possibility, it would be useful to analyze how well, and how early, withdrawal success can be decoded only on the basis of behavioral features from the video, and then to compare this with the time course of the neural decoder. Another approach might be to decode the behavior on the basis of video data as well as neural data, and using a model comparison, measure whether inclusion of neural features significantly increases decoder performance.

      We appreciate this important point, as mPFC activity might indeed reflect motor preparation preceding withdrawal behavior. Another reviewer raised a similar concern regarding potential micro-behavioral influences on mPFC activity prior to withdrawal responses. However, our behavioral analysis suggests that highly trained rats engage in sucrose licking which has little variability regardless of the subsequent behavioral decision. To support, 95% of inter-lick intervals were less than 0.25 seconds, which is not enough time to perform any additional behavior during encounters.

      Author response image 2.

      To further clarify this, we included additional video showing both avoidance and escape withdrawals at close range. This video was recorded during the development of the behavioral paradigm, though we did not routinely collect this view, as animals consistently exhibited stable licking behavior in the E-zone. As demonstrated in the video, the rat remains highly focused on the lick port with minimal body movement during encounters. Therefore, we believe that the neural ensemble dynamics observed in the mPFC are unlikely to be driven by micro-behavioral changes.

      Reviewer 2 (Public Review):

      Thank you for the positive comment on our behavior paradigm and constructive suggestions on additional analysis. We came to think that the role of mPFC could be better portrayed as representing and switching between different encoding targets under different contexts, which in part, was more clearly manifested by the naturalistic behavioral paradigm. In the revision we tried to convey this message more explicitly and provide a new perspective for this important aspect of mPFC function.

      It is not clear what proportion of each of the ensembles recorded is necessary for decoding distance from the threat, and whether it is these same neurons that directly 'switch' to responding to head entry or withdrawal in the encounter phase within the total population. The PCA gets closest to answering this question by demonstrating that activity during the encounter is different from activity in the nesting or foraging zones, but in principle this could be achieved by neurons or ensembles that did not encode spatial parameters. The population analyses are focused on neurons sensitive to behaviours relating to the threat encounter, but even before dividing into subtypes etc., this is at most half of the recorded population.

      In our study, the key idea we aim to convey is that mPFC neurons adapt their encoding schemes based on the context or functional needs of the ongoing task. Other reviewers also suggested strengthening the evidence that the same neurons directly switch between encoding two different tasks. The counteracting hypothesis to "switching functions within the same neurons" posits that there are dedicated subsets of neurons that modulate behavior—either by driving decisions/behaviors themselves or being driven by computations from other brain regions.

      To test this idea, we included an additional analysis chapter in the results section titled Overlapping populations of mPFC neurons adaptively encode spatial information and defensive decision. In this section, we directly tested this hypothesis by examining each neuron's contribution to the distance regressor and the event classifier. The results showed that the histogram of feature importance—the contribution to each task—is highly skewed towards zero for both decoders, and removing neurons with high feature importance does not impair the decoder’s performance. These findings suggest that 1) there is no direct division among neurons involved in the two tasks, and 2) information about spatial/defensive behavior is distributed across neurons.

      Furthermore, we tested whether there is a negative correlation between the feature importance of spatial encoding and avoidance encoding. Even if there were no “key neurons” that transmit a significant amount of information about either spatial or defensive behavior, it is still possible that neurons with higher information in the navigation context might carry less information in the active-foraging context, or vice versa. However, we did not observe such a trend, suggesting that mPFC neurons do not exhibit a preference for encoding one type of information over the other.

      Lastly, another reviewer raised the concern that the PCA results, which we used as evidence of functional separation of different ensemble functions, might be driven by a small number of event-coding neurons. To address this, we conducted the same analysis on a subset of data that excluded neural activity potentially influenced by significant events in the E-zone. In the Peri-Event Time Histogram (PETH) analysis, we observed that some neurons exhibit highly-modulated activity upon arrival at the E-zone (head entry; HE) and immediately following voluntary departure or attack (head withdrawal; HW). We defined 'critical event times' as ± one second from these events and excluded neural data from these periods to determine if PCA could still differentiate neural activities across zones. Despite these exclusions, the results continued to show populational differences between zones, reinforcing the notion that neurons adapt their activity according to the context. We acknowledge that this analysis still cannot eliminate all of the confounding factors due to the context change, but we confirmed that excluding two significant events (delivery onset of sucrose and withdrawal movement) does not alter our result.

      To summarize, these additional results further support the conclusion that spatial and avoidance information is distributed across the neural population rather than being handled by distinct subsets. The analyses revealed no negative correlation between spatial and avoidance encoding, and excluding event-driven neural activity did not alter the observed functional separation, confirming that mPFC neurons dynamically adjust their activity to meet contextual demands.

      A second concern is also illustrated by Fig. 7: in the data presented, separate reward and threat encoding neurons were not shown - in the current study design, it is not possible to dissociate reward and threat responses as the data without the threat present were only used to study spatial encoding integrity.

      Thank you for this valuable feedback. Other reviewers have also noted that Figure 7 (now Figure 8) is misleading and contains assertions not supported by our experiments. In response, we have revised the model to more accurately reflect our findings. We have eliminated the distinction between reward coding and threat coding neurons, simplifying it to focus on spatial encoding and avoidance encoding neurons. The updated figure will more appropriately align with our findings and claims. A. Distinct functional states (spatial vs. avoidance decision) encoded by the same population neurons are separable by the region (F- vs. E zone). B. Hypothetical control models by which mPFC neurons assume different functional states.

      Thirdly, the findings of this work are not mechanistic or functional but are purely correlational. For example, it is claimed that analyzing activity around the withdrawal period allows for ascertaining their functional contributions to decisions. But without a direct manipulation of this activity, it is difficult to make such a claim. The authors later discuss whether the elevated response of Type 2 neurons might simply represent fear or anxiety motivation or threat level, or whether they directly contribute to the decision-making process. As is implicit in the discussion, the current study cannot differentiate between these possibilities. However, the language used throughout does not reflect this. 

      We acknowledge that our experiments only involve correlational study and this serves as weakness. Although we carefully managed to select word to not to be deterministic, we agree that some of the language might mislead readers as if we found direct functional contribution. Thus, we changed expressions as below.

      “We then further analyzed the (functional contribution ->)correlation between neural activity and success and failure of avoidance behavior. If the mPFC neurons (encode ->)participate in the avoidance decisions, avoidance withdrawal (AW; withdrawal before the attack) and escape withdrawal (EW; withdrawal after the attack) may be distinguishable from decoded population activity even prior to motor execution.”

      Also, we added part below in discussion section to clarify the limitations of the study.

      “Despite this interesting conjecture, any analysis based on recording data is only correlational, mandating further studies with direct manipulation of the subpopulation to confirm its functional specificity.”

      Fourthly, the authors mention the representation of different functions in 'distinct spatiotemporal regions' but the bulk of the analyses, particularly in terms of response to the threat, do not compare recordings from PL and IL although - as the authors mention in the introduction - there is prior evidence of functional separation between these regions.

      Thank you for bringing this part to our attention. As we mentioned in the introduction, we acknowledge the functional differences between the PL and IL regions. Although differences in spatial encoding between these two areas were not deeply explored, we anticipated finding differences in event encoding, given the distinct roles of the PL and IL in fear and threat processing. However, our initial analysis revealed no significant differences in event encoding between the regions, and as a result, we did not emphasize these differences in the manuscript. To address this point, we have reanalyzed the data separately and included the following findings in the manuscript.

      “However, we did not observe a difference in decoding accuracy between the PL and IL ensembles, and there were no significant interactions between regressor type (shuffled vs. original) and regions (mixed-effects model; regions: p=.996; interaction: p=.782). These results indicate that the population activity in both the PL and IL contains spatial information (Figure 2D, Video 3).

      […]

      Furthermore, we analyzed whether there is a difference in prediction accuracy between sessions with different recorded regions, the PL and the IL. A repeated two-way ANOVA revealed no significant difference between recorded regions, nor any interaction (regions: F(1, 38) = 0.1828, p = 0.671; interaction: F(1, 38) = 0.1614, p = 0.690).

      […]

      We also examined whether there is a significant difference between the PL and IL in the proportion of Type 1 and Type 2 neurons. In the PL, among 379 recorded units, 143 units (37.73%) were labeled as Type 1, and 75 units (19.79%) were labeled as Type 2. In contrast, in the IL, 156 units (61.66%) and 19 units (7.51%) of 253 recorded units were labeled as Type 1 and Type 2, respectively. A Chi-square analysis revealed that the PL contains a significantly higher proportion of Type 2 neurons (χ²(1, 632) = 34.85, p < .001), while the IL contains a significantly higher proportion of Type 1 neurons compared to the other region (χ²(1, 632) = 18.07, p < .001).”

      To summarize our additional results, we did not observe performance differences in distance decoding or event decoding. The only difference we observed was the proportional variation of Type 1 and Type 2 neurons when we separated the analysis by brain region. These results are somewhat counterintuitive, considering the distinct roles of the two regions—particularly the PL in fear expression and the IL in extinction learning. However, since the studies mentioned in the introduction primarily used lesion and infusion methods, this discrepancy may be due to the different approach taken in this study. Considering this, we have added the following section to the discussion.

      “Interestingly, we found no difference between the PL and IL in the decoding accuracy of distance or avoidance decision. This somewhat surprising considering distinct roles of these regions in the long line of fear conditioning and extinction studies, where the PL has been linked to fear expression and the IL to fear extinction learning (Burgos-Robles et al., 2009; Dejean et al., 2016; Kim et al., 2013; Quirk et al., 2006; Sierra-Mercado et al., 2011; Vidal-Gonzalez et al., 2006). On the other hand, more Type 2 neurons were found in the PL and more Type 1 neurons were found in the IL. To recap, typical Type 1 neurons increased the activity briefly after the head entry and then remained inhibited, while Type 2 neurons showed a burst of activity during head entry and sustained increased activity. One study employing context-dependent fear discrimination task (Kim et al., 2013) also identified two distinct types of PL units: short-latency CS-responsive units, which increased firing during the initial 150 ms of tone presentation, and persistently firing units, which maintained firing for up to 30 seconds. Given the temporal dynamics of Type 2 neurons, it is possible that our unsupervised clustering method may have merged the two types of neurons found in Kim et al.’s study.

      While we did not observe decreased IL activity during dynamic foraging, prior studies have shown that IL excitability decreases after fear conditioning (Santini et al., 2008), and increased IL activity is necessary for fear extinction learning. In our paradigm, extinction learning was unlikely, as the threat persisted throughout the experiment. Future studies with direct manipulation of these subpopulations, particularly examining head withdrawal timing after such interventions, could provide insight into how these subpopulations guide behavior.”

      Additionally, we made some changes in the introduction, mainly replacing the PL/IL with mPFC to be consistent with the main body of results and conclusion and also specifying the correlational nature of the recording study.

      “Machine learning-based populational decoding methods, alongside single-cell analyses, were employed to investigate the correlations between neuronal activity and a range of behavioral indices across different sections within the foraging arena.”

      Reviewer 2 (Recommendations):

      The authors consistently use parametric statistical tests throughout the manuscript. Can they please provide evidence that they have checked whether the data are normally distributed? Otherwise, non-parametric alternatives are more appropriate.

      Thank you for mentioning this important issue in the analysis. We re-ran the test of normality for all our data using the Shapiro-Wilk test with a p-value of .05 and found that the following data sets require non-parametric tests, as summarized in Author response table 1 below. For those analyses which did not pass the normality test, we used a non-parametric alternative test instead. We also updated the methods section. For instance, repeated measures ANOVA for supplementary figure S1 and PCA results were changed to the Friedman test with Dunn’s multiple comparison test.

      Author response table 1.

      Line 107: it is not clear here or in the methods whether a single drop of sucrose solution is delivered per lick or at some rate during the encounter, both during the habituation or in the final task. This is important information in order to understand how animals might make decisions about whether to stay or leave and how to interpret neural responses during this time period. Or is it a large drop, such that it takes multiple licks to consume? Please clarify.

      The apparatus we used incorporated an IR-beam sensor-controlled solenoid valve. As the beam sensor was located right in front of the pipe, the rat’s tongue activated the sensor. As a result, each lick opened the valve for a brief period, releasing a small amount of liquid, and the rat had to continuously lick to gain access to the sucrose. We carefully regulated the flow of the liquid and installed a small sink connected to a vacuum pump, so any remaining sucrose not consumed by the rat was instantly removed from the port. We clarified how sucrose was delivered in the methods section and also in the results section.

      Method:

      “The sucrose port has an IR sensor which was activated by a single lick. The rat usually stays in front of the lick port and continuously lick up to a rate of 6.3 times per second to obtain sucrose. Any sucrose droplets dropped in the bottom sink were immediately removed by negative pressure so that the rat’s behavior was focused on the licking.”

      Result:

      “The lick port was activated by an IR-beam sensor, triggering the solenoid valve when the beam was interrupted. The rat gradually learned to obtain rewards by continuously licking the port.”

      However, I'm not sure I understand the authors' logic in the interpretation: does the S-phase not also consist of goal-directed behaviour? To me, the core difference is that one is mediated by threat and the other by reward. In addition, it would be helpful to visualize the behaviour in the S-phase, particularly the number of approaches. This difference in the amount of 'experience' so to speak might drive some of the decrease in spatial decoding accuracy, even if travel distance is similar (it is also not clear how travel distance is calculated - is this total distance?) Ideally, this would also be included as a predictor in the GLM.

      We agree that the behaviors observed during the shuttling phase can also be considered goal-directed, as the rat moves purposefully toward explicit goals (the sucrose port and the N-zone during the return trip). However, we argue that there is a significant difference in the level of complexity of these goals.

      During the L-phase, the rat not only has to successfully navigate to the E-zone for sucrose but also pay attention to the robots, either to avoid an attack from the robot's forehead or escape the fast-striking motion of the claw. When the rat runs toward the E-zone, it typically takes a side-approaching path, similar to Kim and Choi (2018), and exhibits defensive behaviors such as a stretched posture, which were not observed in the S-phase. This behavioral characteristic differs from the S-phase, where the rat adopted a highly stereotyped navigation pattern fairly quickly (within 3 sessions), evidenced by more than 50 shuttling trajectories per session. In this phase, the rat exhibited more stimulus-response behavior, simply repeating the same actions over time without deliberate optimization.

      In our additional experiment with two different levels of goal complexity (reward-only vs. reward/threat conflict), we used a between-subject design in which both groups experienced both the S-phase and L-phase before surgery and underwent only one type of session afterward. This approach ruled out the possibility of differences in contextual experience. Additionally, since we initially designed the S-phase as extended training, behaviors in the apparatus tended to stabilize after rats completed both the S-phase and L-phase before surgery. As a result, we compared the post-surgery Lobsterbot phase to the post-surgery shuttling phase to investigate how different levels of goal complexity shape spatial encoding strength.

      To clarify our claim, we edited the paragraph below.

      “This absence of spatial correlates may result from a lack of complex goal-oriented navigation behavior, which requires deliberate planning to acquire more rewards and avoid potential threats.

      […]

      After the surgery, unlike the Lob-Exp group, the Ctrl-Exp group returned to the shuttling phase, during which the Lobsterbot was removed. With this protocol, both groups experienced sessions with the Lobsterbot, but the Ctrl-Exp group's task became less complex, as it was reduced to mere reward collection.

      . Given these observations, along with the mPFC’s lack of consistency in spatial encoding, it is plausible that the mPFC operates in multiple functional modes, and the spatial encoding mode is preempted when the complexity of the task requires deliberate spatial navigation.”

      Additionally, we added behavior data during initial S-phase into Supplementary Figure 1.

      It is good point that the amount of experience might drive decrease in spatial decoding accuracy. To test this hypothesis, we added a new variable, the number of Lobsterbot sessions after surgery, to the previous GLM analysis. The updated model predicted the outcome variable with significant accuracy (F(4,44) = 10.31, p < .001), and with the R-squared value at 0.4838. The regression coefficients were as follows: presence of the Lobsterbot (2.76, standard error [SE] = 1.11, t = 2.42, p = .020), number of recorded cells (-0.43, SE = .08, t = -5.22, p < .001), recording location (0.90, SE = 1.11, p = .424), and number of L sessions (0.002, SE = 0.11, p = .981). These results indicate that the number of exposures to the Lobsterbot sessions, as a measure of experience, did not affect spatial decoding accuracy.

      For minor edit, we edited the term as “total travel distance”.

      Relating to the previous point, it should be emphasized in both sections on removing the Lobsterbot and on non-navigational behaviours that the spatial decoding is all in reference to distance from the threat (or reward location). The language in these sections differs from the previous section where 'distance from the goal' is mentioned. If the authors wish to discuss spatial decoding per se, it would be helpful to perform the same analysis but relative to the animals' own location which might have equal accuracy across locations in the arena. Otherwise, it is worth altering the language in e.g. line 258 onwards to state the fact that distance to the goal is only decodable when animals are actively engaged in the task.

      Thank you for this comment, we changed the term as “distance from the conflict zone” or “distance of the rat to the center of the E-zone” to clarify our experiment setup.

      In Fig. 5, why is the number of neurons shown in the PETHs less than the numbers shown in the pie charts?

      The difference in the number of neurons between the PETHs and the pie charts in Figure 5 is because PETHs are drawn only for 'event-responsive' units. For visualizing the neurons, we selectively included those that met certain criteria described in Method section (Behavior-responsive unit analysis). We have updated the caption for Figure 5 as follows to minimize confusion.

      “Multiple subpopulations in the mPFC react differently to head entry and head withdrawal.

      (A) Top: The PETH of head entry-responsive units is color-coded based on the Z-score of activity.

      (C) The PETH of head withdrawal-responsive units is color-coded based on the Z-score of activity.”

      I appreciate the amount of relatively unprocessed data plotted in Figure 5, but it would be great to visualize something similar for AW vs. EW responses within the HW2 population. In other words, what is there that's discernably different within these responses that results in the findings of Fig. 6?

      To visualize the difference in neural activity between AW and EW, we included an additional supplementary figure (Supplementary Figure 5). We divided the neurons into Type 1 and Type 2 and plotted PETH during Avoidance Withdrawal (AW) and Escape Withdrawal (EW). Consistent with the results shown in Figure 6d, we could visually observe increased activity in Type 2 neurons before the execution of AW compared to EW. However, we couldn’t find a similar pattern in Type 1 neurons.

      On a related note, it would add explanatory power if the authors were able to more tightly link the prediction accuracy of the ensemble (particularly the Type 2 neurons) to the timing of the behaviour. Earlier in the manuscript it would be helpful to show latency to withdraw in AW trials; are animals leaving many seconds before the attack happens, or are they just about anticipating the timing of the attack? And therefore when using ensemble activity to predict the success of the AW, is the degree to which this can be done in advance (as the authors say, up to 6 seconds before withdrawal) also related to how long the animal has been engaged with the threat?

      We agree that the timing of head withdrawal, particularly in AW trials, is a critical factor in describing the rat's strategy toward the task. To test whether the rat uses a precise timing strategy—for instance, leaving several seconds before the attack or exploiting the discrete 3- and 6-second attack durations—we plotted all head withdrawal timepoints during the 6-second trials. The distribution was more even, without distinguishable peaks (e.g., at the very initial period or at the 3- or 6-second mark). This indicates a lack of precise temporal strategy by the rat. We included additional data in the supplementary figure (Supplementary Figure 6) and added the following to the results section.

      “We monitored all head withdrawal timepoints to assess whether rats developed a temporal strategy to differentiate between the 3-second and 6-second attacks. We found no evidence of such a strategy, as the timings of premature head withdrawals during the 6-second attack trials were evenly distributed (see Supplementary Figure S1).”

      As depicted in the new supplementary figure, head withdrawal times during avoidance behavior vary from sub-seconds to the 3- or 6-second attack timepoints. After receiving the reviewer’s comment, we became curious whether there is a decoding accuracy difference depending on how long the animal engaged with the threat. We selected all 6-second attack and avoidance withdrawal trials and checked if correctly classified trials (AW trials classified as AW) had different head withdrawal times—perhaps shorter durations—compared to misclassified trials (AW trials classified as EW). As shown in Author response image 3 below, there was no significant difference between these two types, indicating that the latency of head withdrawal does not affect prediction accuracy.

      Author response image 3.

      Finally, there remain some open questions. One is how much encoding strength - of either space or the decision to leave during the encounter - relates to individual differences in animal performance or behaviour, particularly because this seems so variable at baseline. A second is how stable this encoding is. The authors mention that the distance encoding must be stable to an extent for their regressor to work; I am curious whether this stability is also found during the encounter coding, and also whether it is stable across experience. For example, in a session when an individual has a high proportion of anticipatory withdrawals, is the proportion of Type 2 neurons higher?

      Thank you for these questions. To recap the number of animals that we used, we used five rats during Lobsterbot experiments, and three rats for control experiment that we removed Lobsterbot after training. Indeed, there were individual differences in performance (i.e. avoidance success rate), number of recorded units (related to the recording quality), and baseline behaviors. To clarify these differences, see author response image 4 below.

      Author response image 4.

      We used a GLM to measure how much of the decoder’s accuracy was explained by individual differences. The result showed that 38.96% of distance regressor’s performance, and 12.14% of the event classifier’s performance was explained by the individual difference. Since recording quality was highly dependent on the animals, the high subject variability detected in the distance regression might be attributed to the number of recorded cells. Rat00 which had the lowest average mean absolute error had the highest number of recorded cells at average of 18. Compared to the distance regression, there was less subject variability in event classification. Indeed, the GLM results showed that the variability explained by the number of cells was only 0.62% in event classification.

      The reason we mentioned that "distance encoding must be stable for our regressor to work" is entirely based on the population-level analysis. Because we used neural data and behaviors from entire trials within a session, the regressor or classifier would have low accuracy if encoding dynamics changed within the session. In other words, if the way neurons encode avoidance/escape predictive patterns changed within a training set, the classifier would fail to generate an optimized separation function that works well across all datasets.

      To further investigate whether changes in experience affect event classification results over time, we plotted an additional graph below. Although there are individual and daily fluctuations in decoding accuracy, there was no observable trend throughout the experiments.

      Author response image 5.

      Regarding the correlation between the ratio of avoidance withdrawal and the proportion of Type 2 neurons, we were also curious and analyzed the data. Across 40 sessions, the correlation was -0.0716. For Type 1 neurons, it was slightly higher at 0.1459. We believe this indicates no significant relationship between the two variables.

      Minor points:

      I struggled with the overuse of acronyms in the paper. Some might be helpful but F-zone/N-zone, for example, or HE/HW, AW/EW are a bit of a struggle. After reading the paper a few times I learned them but a naive reader might need to often refer back to when they were first defined (as I frequently had to).

      To increase readability, we removed acronyms that are not often used and changed HE/HW to head-entry/head-withdrawal.

      I have a few questions about Figure 1F: in the text (line 150) it says that 'surgery was performed after three L sessions when the rats displayed a range of 30% to 60% AW'. This doesn't seem consistent with what is plotted, which shows greater variability in the proportion of AW behaviours both before and after surgery. It also appears that several rats only experienced two days of the L1 phase; please make clear if so. And finally, what is the line at 50% indicating? Neither the text nor the legend discuss any sort of thresholding at 50%. Instead, it would be best to make the distinction between pre- and post-surgery behaviour visually clearer.

      Thank you for pointing out this issue. We acknowledge there was an error in the text description. As noted in the Methods section, we proceeded with surgery after three Lobsterbot sessions. We have removed the incorrect part from the Results section and revised the Methods section for clarity.

      “After three days of Lobsterbot sessions, the rats underwent microdrive implant surgery, and recording data were collected from subsequent sessions, either Lobsterbot or shuttling sessions, depending on the experiment. For all post-surgery sessions, those with fewer than 20 approaches in 30 minutes were excluded from further analysis.”

      Among the five rats, Rat2 and Rat3 did not approach the robot during the entire Lob2 session, which is why these two rats do not have Lob2 data points. We updated the caption for regarding issue.

      Initially, we added a 50% reference line, but we agree it is unnecessary as we do not discuss this reference. We have updated the figure to include the surgery point, as shown in Supplementary Figure 1.

      Fig. 2C: each dot is an ensemble of simultaneously recorded neurons, i.e. a subset of the total 800-odd units if I understand correctly. How many ensembles does each rat contribute? Similarly, is this evenly distributed across PL and IL?

      Yes, each dot represents a single session, with a total of 40 sessions. Five rats contributed 11, 9, 8, 7, and 5 sessions, respectively. Although each rat initially had more than 10 sessions, we discarded some sessions with a low unit count (fewer than 10 sessions; as detailed in Materials and Methods - Data Collection). We collected 25 sessions from the PL and 15 sessions from the IL. Our goal was to collect more than 200 units per each region.

      Please show individual data points for Fig. 2D.

      We update the figure with individual data points.

      Is there a reason why the section on removing the Lobsterbot (lines 200 - 215) does not have associated MAE plots? Particularly the critical comparison between Lob-Exp and Ctl-Exp.

      We intentionally removed some graphs to create a more compact figure, but we appreciate your suggestion and have included the graph in Figure 2.

      Some references to supplementary materials are not working, e.g. line 333.

      Our submitted version of manuscript had reference error. For the current version, we used plane text, and the references are fixed.

      The legend for Supp. Fig. 2B is incorrect.

      We greatly appreciate this point. We changed the caption to match the figure.

      Reviewer 3 (Public Review):

      Thank you for recognizing our efforts in designing an ethologically relevant foraging task to uncover the multiple roles of the mPFC. While we acknowledge certain limitations in our methodology—particularly that we only observed correlations between neural activity and behavior without direct manipulation—we have conducted additional analyses to further strengthen our findings.

      Weakness:

      The primary concern with this study is the absence of direct evidence regarding the role of the mPFC in the foraging behavior of the rats. The ability to predict heterogeneous variables from the population activity of a specific brain area does not necessarily imply that this brain area is computing or using this information. In light of recent reports revealing the distributed nature of neural coding, conducting direct causal experiments would be essential to draw conclusions about the role of the mPFC in spatial encoding and/or threat evaluation. Alternatively, a comparison with the activity from a different brain region could provide valuable insights (or at the very least, a comparison between PL and IL within the mPFC).

      Thank you for the comment. Indeed, the fundamental limitation of the recording study is that it is only correlational, and any causal relationship between neural activity and behavioral indices is only speculative. We made it clearer in the revision and refrained from expressing any speculative ideas suggesting causality throughout the revision. While we did not provide direct evidence that the mPFC is computing or utilizing spatial/foraging information, we based our assertion on previous studies that have directly demonstrated the mPFC's role in complex decision-making tasks (Martin-Fernandez et al., 2023; Orsini et al., 2018; Zeeb et al., 2015) and in certain types of spatial tasks (De Bruin et al., 1994; Sapiurka et al., 2016) . We would like to emphasize that, to the best of our knowledge, there was no previous study which investigated the mPFC function while animal is solving multiple heterogenous problems in semi-naturalistic environment. Therefore, although our recording study only provides speculative causal inference, it certainly provides a foundation for investigating the mPFC function. Future study employing more sophisticated, cell-type specific manipulations would confirm the hypotheses from the current study.

      One of the key questions of this manuscript is how multiple pieces of information are represented in the recorded population of neurons. Most of the studies mentioned above use highly structured experimental designs, which allow researchers to study only one function of the mPFC. In the current study, the semi-naturalistic environment allows rats to freely switch between multiple behavioral sets, and our decoding analysis quantitatively assesses the extent to which spatial/foraging information is embedded during these sets. Our goal is to demonstrate that two different task hyperspaces are co-expressed in the same region and that the degree of this expression varies according to the rat’s current behavior (See Figure 8(b) in the revised manuscript).

      Alternatively, we added multiple analyses. First, we included a single unit-level analysis looking at the place cell-like property to contrast with the ensemble decoding. Most neurons did not show well-defined place fields although there were some indications for place cell-like property. For example, some neurons displayed fragmented place fields or unusually large place fields only at particular spots in the arena (mostly around the gates). The accuracy from this place information at the single-neuron level is much lower than that acquired from population decoding. Likewise, although there were neurons with modulated firing around the time of particular behavior (head entry and withdrawal), overall prediction accuracy of avoidance decision was much higher when the ensemble-based classifier was applied.

      Moreover, given that high-dimensional movement has been shown to be reflected in the neural activity across the entire dorsal cortex, more thorough comparisons between the neural encoding of task variables and movement would help rule out the possibility that the heterogeneous encoding observed in the mPFC is merely a reflection of the rats' movements in different behavioral modes.

      Thanks for the comment. We acknowledge that the neural activity may reflect various movement components across different zones in the arena. We performed several analyses to test this idea. First, we want to recap our run-and-stop event analysis may provide an insight regarding whether the mPFC neurons are encoding locations despite the significant motor events. The rats typically move across the F-zone fairly routinely and swiftly (as if they are “running”) to reach the E-zone at which they reduce the moving speed to almost a halt (“stopping”). The PETHs around these critical motor events, however, did not show any significant modulation of neural activity indicating that most neurons we recorded from mPFC did not respond to movement.

      We added this analysis to demonstrate that these sudden stops did not evoke the characteristic activation of Type 1 and Type 2 neurons observed during head entry into the E-zone. When we isolated these sudden stops outside the E-zone, we did not observe this neural signature (Supplementary Figure 2).

      Second, our PCA results showed that population activity in the E-zone during dynamic foraging behavior was distinct from the activity observed in the N- and F-zones during navigation. However, there is a possibility that the two behaviorally significant events—entry into the E-zone and voluntary or sudden exit—might be driving the differences observed in the PCA results. To account for this, we designated ±1 second from head entry and head withdrawal as "critical event times," excluded the corresponding neural data, and reanalyzed the data. This method removed neural activity associated with sudden movements in specific zones. Despite this exclusion, the PCA still revealed distinct population activity in the E-zone, different from the other zones (Supplementary Figure 4). This result reduces the likelihood that the observed heterogeneous neural activity is merely a reflection of zone-specific movements.

      Lastly, the main claim of the paper is that the mPFC population switches between different functional modes depending on the context. However, no dynamic analysis or switching model has been employed to directly support this hypothesis.

      Thank you for this comment. Since we did not conduct a manipulation experiment, there is a clear limitation in uncovering how switching occurs between the two task contexts. To make the most of our population recording data, we added an additional results section that examines how individual neurons contribute to both the distance regressor and the event classifier. Our findings support the idea that distance and dynamic foraging information are distributed across neurons, with no distinct subpopulations dedicated to each context. This suggests that mPFC neurons adjust their coding schemes based on the current task context, aligning with Duncan’s (2001) adaptive coding model, which posits that mPFC neurons adapt their coding to meet the task's current demands.

      Reviewer 3 (Recommendations):

      The evidence for spatial encoding is relatively weak. In the F-zone (50 x 48 cm), the average error was approximately 17 cm, constituting about a third of the box's width and likely not significantly smaller than the size of a rat's body. The errors in the shuffled data are also not substantially greater than those in the original data. An essential test indicates that spatial decoding accuracy decreases when the Losterbot is removed. However, assessing the validity of the results is difficult in the current state. There is no figure illustrating the results, and no statistics are provided regarding the test for matching the number of neurons.

      We acknowledge that the average error (~ 17 cm ) measured in our study is relatively large, even though the error is significantly smaller than that by the shuffled control model (22.6 cm). Previous studies reported smaller prediction errors but in different experimental conditions: 16 cm in Kaefer et al. (2020) and less than 10 cm in Ma et al. (2023) and Mashhoori et al. (2018). Most notably, the average number of units used in our study (15.8 units per session) is significantly smaller compared to the previous works, which used 63, 49, and 40 units, respectively. As our GLM results demonstrated, the number of recorded cells significantly influenced decoding accuracy (β = -0.43 cm/neuron). With a similar number of recorded cells, we would have achieved comparable decoding accuracy. In addition, unlike other studies that have employed a dedicated maze such as the virtual track or the 8-shaped maze, we exposed rats to a semi-naturalistic environment where they exhibited a variety of behaviors beyond simple navigation. As argued throughout the manuscript, we believe that the spatial information represented in the mPFC is susceptible to disruption when the animal engages in other activities. A similar phenomenon was reported by Mashhoori et al. (2018), where the decoder, which typically showed a median error of less than 10 cm, exhibited a much higher error—nearly 100 cm—near the feeder location.

      As for the reviewer’s request for comparing spatial decoding without the Lobsterbot, we added a new figure to illustrate the spatial decoding results, including statistical details. We also applied a Generalized Linear Model to regress out the effect of the number of recorded neurons and statistically assess the impact of Lobsterbot removal. This adjustment directly addresses the reviewer's request for a clearer presentation of the results and helps contextualize the decoding performance in relation to the number of recorded neurons.

      As indicated in the public review, drawing conclusions about the role of the mPFC in navigation and avoidance behavior during the foraging task is challenging due to the exclusively correlational nature of the results. The accuracy in AW/EW discrimination increases a few seconds before the response, implying that changes in mPFC activity precede the avoidance/escape response. However, one must question whether this truly reflects the case. Could this phenomenon be attributed to rats modifying their "micro-behavior" (as evidenced by changes in movement observed in the video) before executing the escape response, and subsequently influencing mPFC activity?

      We appreciate the reviewer's thoughtful observation regarding the correlational nature of our results and the potential influence of pre-escape micro-behaviors on mPFC activity. We acknowledge that the increased accuracy in AW/EW discrimination preceding the response could also be correlated with micro-behaviors. However, there is very little room for extraneous behavior other than licking the sucrose delivery port within the E-zone, as the rats are highly trained to perform this stereotypical behavior. To support this, we measured the time delays between licking events (inter-lick intervals). The results show a sharp distribution, with 95% of the intervals falling within a quarter second, indicating that the rats were stable in the E-zone, consistently licking without altering their posture.

      To complement the data presented in Author response image 2, a video clip showing a rat engaged in licking behavior was included. We carefully designed the robot compartment and adjusted the distance between the Lobsterbot and the sucrose port to ensure that rats could exhibit only limited behaviors inside the E-zone. The video confirms that no significant micro-behaviors were observed during the rat’s activity in the E-zone.

      If mPFC activity indeed switches mode, the results do not clearly indicate whether individual cells are specifically dedicated to spatial representation and avoidance or if they adapt their function based on the current goal. Figure 7, presented as a schematic illustration, suggests the latter option. However, the proportion of cells in the HE and HW categories that also encode spatial location has not been demonstrated. It has also not been shown how the switch is manifested at the level of the population.

      Thank you for this comment. As the reviewer pointed out, we suggest that mPFC neurons do not diverge based on their functions, but rather adapt their roles according to the current goal. To support this assertion, we added an additional results section that calculates the feature importance of decoders. This analysis allows us to quantitatively measure each neuron’s contribution to both the distance regressor and the event decoder. Our results indicate that distance and defensive behavior are not encoded by a small subset of neurons; instead, the information is distributed across the population. Shuffling the neural data of a single neuron resulted in a median increase in decoding error of 0.73 cm for the distance regressor and 0.01% for the event decoder, demonstrating that the decoders do not rely on a specific subset of neurons that exclusively encode spatial and/or defensive behavior

      Although we found supporting evidence that mPFC neurons encode two different types of information depending on the current context, we acknowledge that we could not go further in answering how this switch is manifested. One simple explanation is that the function is driven by current contextual information and goals—in other words, a bottom-up mechanism. However, in our control experiment, simplifying the navigation task worsened the encoding of spatial information in the mPFC. Therefore, we speculate that an external or internal arbitrator circuit determines what information to encode. A precise temporal analysis of the timepoint when the switch occurs in more controlled experiments might answer these questions. We have added this discussion to the discussion section.

      PL and IL are two distinct regions; however, there is no comparison between the two areas regarding their functional properties or the representations of the cells. Are the proportions of cell categories (HE vs HW or HE1 vs HE2, spatial encoding vs no spatial encoding) different in IL and PL? Are areas differentially active during the different behaviors?

      Thank you for bringing up this issue. As mentioned in our response to the public review, we included a comparison between the PL and IL regions. While we did not observe any differences in spatial encoding (feature importance scores), the only distinction was in the proportion of Type 1 and Type 2 neurons, as the reviewer suggested. We have incorporated our interpretation of these results into the discussion section.

      The results and interpretations of the cluster analysis appear to be highly dependent on the parameters used to define a cluster. For example, the HE2 category includes cells with activity that precedes events and gradually decreases afterward, as well as cells with activity that only follows the events.

      We strongly agree that dependency on hyperparameters is a crucial point when using unsupervised clustering methods. To eliminate any subjective criteria in defining clusters, we carefully selected our clustering approach, which requires only two hyperparameters: the number of initial clusters (set to 8) and the minimum number of cells required to be considered a valid cluster (cutoff limit, 50). The rationale behind these choices was: 1) a higher number of initial clusters would fail to generalize neural activity, 2) clusters with fewer than 50 neurons would be difficult to analyze, and 3) to prevent the separation of clusters that show noisy responses to the event.

      Author response table 2 shows the differences in the number of cell clusters when we varied these two parameters. As demonstrated, changing these two variables does result in different numbers of clusters. However, when we plotted each cluster type’s activity around head entry (HE) and head withdrawal (HW), an increased number of clusters resulted in the addition of small subsets with low variation in activity around the event, without affecting the general activity patterns of the major clusters.

      The example mentioned by the reviewer—possible separation of HE2—appears when using a hyperparameter set those results in 4 clusters, not 3. In this result, 83 units, which were labeled as HE2 in the 3-cluster hyperparameter set, form a new group, HE3 (Group 3). This group of units shows increased activity after head entry and exhibited characteristics similar to HE2, with most of the units classified as HW2, maintaining high activity until head withdrawal. Among the 83 HE3 units, 36 were further classified as HW2, 44 as non-significant, and 3 as HW1. Therefore, we believe this does not affect our analysis, as we observed the separation of two major groups, Type 1 (HE1-HW1) and Type 2 (HE2-HW2), and focused our analysis on these groups afterward.

      Despite this validation, there remains a strong possibility that our method might not fully capture small yet significant subpopulations of mPFC units. As a result, we have included a sentence in the methods section addressing the rationale and stability of our approach.

      “(Materials and Methods) To compensate for the limited number of neurons recorded per session, the hyperparameter set was chosen to generalize their activity and categorize them into major types, allowing us to focus on neurons that appeared across multiple recording sessions. Although changes in the hyperparameter sets resulted in different numbers of clusters, the major activity types remained consistent (Supplementary Figure S8). However, there is a chance that this method may not differentiate smaller subsets of neurons, particularly those with fewer than 50 recorded neurons.”

      Author response table 2.

      Minor points:

      Line 333: Error! Reference source not found. This was probably the place for citing Figure S2?

      Lines 339, 343: Error! Reference source not found.

      Thank you for mentioning these comments. In the new version, all reference functions from Word have been replaced with plain text.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is a very well written and performed study describing a TOPBP1 separation of function mutation, resulting in defective MSCI maintenance but normal sex body formation. The phenotype differs from that of a previous TOPBP1 null allele, in which both MSCI and sex body formation were defective. Additional defects in CHK phosphorylation and SETX localization are also described.

      Strengths:

      The study is very rigorous, with a remarkably large number of MSCI marks assayed, phosphoproteomics (leading to the interesting SETX discovery) and 10X RNAseq, allowing the MSCI phenotype to be further deconvolved. The approaches in most cases are robust.

      Weaknesses:

      There aren't many; please find list below:

      1) The authors are committed to the idea that maintenance of MSCI is the major defect here. However, based on the data, an alternative would be that some cells achieve sex body formation and MSCI normally, while others do not. It would only take a small percentage of cells exhibiting MSCI failure to kill all the cells in the same germinal epithelium, so this could still explain the complete pachytene block. This isn't a major point...this phenotype is clearly different to the TOPBP1 KO, but a broader discussion of possibilities in the discussion would help. I raise this in the context of both the cytology and 10X analysis:

      a) The assessment that sex body formation is normal is based on cytology in Supp 8 and 9, but a more rigorous approach would be to assess condensation of the XY pair in stage-matched spread cells (maybe they have that data already) by measuring distances between the X and Y centromere, or looking at stage IV of the seminiferous cycle, where all cells should have oval sex bodies but sex body mutants have persistent elongated XY pairs (see work of Namekawa and Turner). The authors do actually mention that gH2AX spreading is defective in many cells....and if this is true, condensation to form a sex body would almost certainly not have taken place in those cells.

      We appreciate the reviewer’s comment and have performed the experiment suggested, counting the number of elongated sex bodies in all sex body-positive cells in seminiferous tubules stained with γH2AX and DAPI (as done by Turner in Hirota et al., 2018). The experiment did not show significant differences between Topbp1+/+ and Topbp1B5/B5 as shown in Author response image 1.

      Author response image 1.

      Topbp1B5/B5 displays normal condensation of the XY-pair. A) Immunostaining of XY condensation in Topbp1+/+ and Topbp1B5/B5 testes sections (γH2AX: green and DAPI: gray). B) Quantification of all sex body-positive cells per tubule (Topbp1+/+ number of cells counted = 781, number of tubules counted = 28, number of mice = 3; Topbp1B5/B5 number of cells counted = 967, number of tubules counted = 28, number of mice = 3). C) Quantification of elongated-sex body cells per tubule (Topbp1+/+ number of cells counted = 19 and 762 normal round/oval-sex bodies cells, number of tubules counted = 28, number of mice = 3; Topbp1B5/B5 number of cells counted = 45 and 922 normal round/oval-sex bodies cells, number of tubules counted = 28, number of mice = 3).

      b) Regarding the 10X data, the finding that expression of some XY genes is elevated and others are not is also consistent with a "partial" phenotype (some cells have normal XY bodies and MSCI, others fail in both). In Fig 6E, X expression looks to be elevated in B5 vs wt at all stages...if this were a maintenance issue, shouldn't it be equal to that in wt and then elevate later?

      We understand the point raised by the reviewer, however we do not favor the “partial” phenotype model because of the absence of any post-pachytene spermatocytes in the B5 mutant. If some cells had escaped the MSCI defect, we would expect to detect cells progressing further in meiosis. Because we cannot rule out completely the possibility of a subtle disruption in XY silencing initiation, we decided to better emphasize this point in the discussion (lines 391-394).

      In Figure 6E, the X-linked genes were normalized against chromosome 9-linked genes. The normalization against pre-leptotene was done for the results displayed on Figure 7, in which we demonstrate the maintenance issue. Furthermore, for the 10X analysis, while the same number of cells were loaded for wild-type and mutant, the composition of cells varied between these two samples. Despite the fact that very few “spermatocyte 3” cells were detected in the mutant, those cells displayed much higher X-linked gene expression than the wild-type spermatocyte 3 cells.

      2) How is the quantitation showing impaired localization of select markers (e.g. SETX) normalized? How do we know that the antibody staining simply didn't work as well on the mutant slides?

      The quantification showing impaired localization of the selected markers such as SETX was done as described by Sims, et al. 2022 and Adams, et al. 2018. In brief, the green signal was measured along (XY cores) or across (XY DNA loops) the X and Y chromosomes and normalized against the analogous signal on the autosomal chromosomes. The possibility that the antibody simply did not work as well on the mutant is unlikely since multiple biological replicates were performed and we reproducibly followed standard practices in the field for meiotic spreads staining, imaging, and quantification. We also note that our findings published in Sims et al, 2022 show that ATR inhibition strongly impairs SETX localization to the sex body, further substantiating our claim that signaling via ATR-TOPBP1 controls SETX.

      3) Is testis TOPBP1 protein expression reduced in the B5 mutant?

      TOPBP1 protein abundance in the B5 mutant is reduced in lysates from whole testis, measured via western blot. We did not detect a significant reduction in TOPBP1 signal intensity measured by immunofluorescence in pachytene spreads of the B5 mutant.

      4) 10X analysis: how were the genes on the y-axis in Supp 24 arranged? Is this by location on the X chromosome?

      These genes were sorted by location across the chromosome X.

      5) The final analyses in Fig 7: X-genes are subdivided based on their behavior (up, down, unchanged). What isn't clear to me is whether the authors have considered the fact that there are global changes in gene expression during meiosis (very low in lep , zyg and early pach, then ramps up hugely from mid pach). In other words, is this normalized to autosomal gene expression?

      For the final analysis in Fig7, the normalization was done by their expression at the pre-leptotene stage. Moreover, the analysis was made comparing X-linked gene behavior in Wild-type vs B5 mutant.

      6) Again regarding the 10X analysis, my prediction would be that not ALL X and Y gene would increase in pach if MSCI were ablated...we should remember that XY genes have been subject to MSCI for some 160 million years of evolution, and this will mean that many enhancers that originally drove their expression prior to the evolution of MSCI will now be lost. This has been our experience: many XY genes aren't elevated at pach even in mutants in which MSCI is totally defective. I'd urge the authors to consider this possibility when they use XY gene expression patterns to diagnose the severity or timing of the MSCI phenotype. This could be a discussion point.

      We greatly appreciate the reviewer’s suggestion and have added discussion about this point to lines 392400).

      Reviewer #2 (Public Review):

      Summary:

      This paper described the role of BRCT repeat 5 in TOPBP1, a DNA damage response protein, in the maintenance of meiotic sex chromosome inactivation (MSCI). By analyzing a Topbp1 mutant mouse with amino acid substitutions in BRCT repeat 5, the authors found reduced phosphorylation of a DNA/RNA helicase, Sentaxin, and decreased localization of the protein to the X-Y sex body in pachynema. Moreover, the authors also found decreased repression of several genes on the sex chromosomes in the male mice.

      Strengths:

      The works including phospho-proteomics and single-cell RNA sequencing with lots of data have been done with great care and most of the results are convincing.

      Weaknesses:

      One concern is that, although the Topbp1 mutant spermatocytes show very severe defects after the stage of late pachynema, the defect in the gene silencing in the sex body is relatively weak. It is a bit difficult to explain how such a weak mis regulation of the gene silencing in mice causes the complete loss of cells in the late stage of spermatogenesis.

      We appreciate the reviewer’s comment. We note that even subtle mis-regulation of XY gene silencing has been reported to lead to significant loss of cells in late stage of prophase I (Ichijima et al., 2011; Modzelewski et al., 2012). Moreover, it is possible that some cells with drastic changes in X-gene expression were excluded from the downstream analysis due to high levels of mitochondrial gene expression (cells that were likely dying due to apoptosis). The exclusion of cells with high levels of mitochondrial gene expression is a common practice in downstream analysis of sc-RNA sequencing data.

      Reviewer #3 (Public Review):

      The work presented by Ascencao and coworkers aims to deepen into the process of sex chromosome inactivation during meiosis (MSCI) as a critical factor in the regulation of meiosis progression in male mammals. For this purpose, they have generated a transgenic mouse model in which a specific domain of TOPBP1 protein has been mutated, hampering the binding of a number of protein partners and interfering with the regulatory cascade initiated by ATR. Through the use of immunolocalization of an impressive number of markers of MSCI, phosphoproteomics and single cell RNA sequencing (scRNAseq), the authors are able to show that despite a proper morphological formation of the sex body and the incorporation of most canonical MSCI makers, sex chromosome-liked genes are reactivated at some point during pachytene and this triggers meiosis progression breakdown, likely due to a defective phosphorylation of the helicase SETX.

      The manuscript presents a clear advance in the understanding of MSCI and meiosis progression with two main strengths. First, the generation of a mouse model with a very uncommon phenotype. Second, the use of a vast methodological approach. The results are well presented and illustrated. Nevertheless, the discussion could be still a bit tuned by the inclusion of some ideas, and perhaps speculations, that have not been considered.

      We appreciate the reviewer’s comment and have improved the discussion section addressing the points raised in the “recommendation For the Authors”.

      Reviewer #1 (Recommendations For The Authors):

      I don't have any additional points here

      Reviewer #2 (Recommendations For The Authors):

      The paper by Ascencao et al. describes a separation-in-function allele of TOPBP1 critical for DNA damage response (DDR) that confers a specific defect in XY sex chromosome inactivation during male mouse meiosis. The authors constructed a Topbp1 separation-of-function mouse by introducing amino acid substitutions in BRCT repeat 5 and found the mice with normal DDR response in mitosis and meiosis show male infertility. Topbp1(B5/B5) mice do not contain spermatocytes after diplonema, as a result, little spermatids/sperms. In the mice, most of the meiotic events in prophase I including chromosome synapsis and meiotic recombination as well as the formation of the sex body are normal. The detailed proteomic analysis revealed the reduced ATR-dependent phosphorylation of a DNA/RNA helicase, Sentaxin. And also single-cell RNA sequencing found that the expression of some of genes from sex chromosomes are not silenced well compared to the control. The works with lots of data have been done with great care and most of the results are convincing. One clear concern is that, although the authors nicely showed a defect in gene silencing in sex chromosomes in the Topbp1(B5/B5) mice, how a small defect in the gene silencing leads to the complete loss of diplotene spermatocytes remains unaddressed.

      Major points:

      Although the authors showed a change in the transcriptome in spermatocytes of Topbp1(B5/B5) male mice, the authors cannot explain the complete lack of spermatids in this mouse. Even the transcriptome seems not to provide a clue.

      1) Given that the TOPBP1-B5 protein cannot bind to both 53BP1 and BLM, it is interesting to check the localization of both proteins on meiotic chromosome spreads (in the case of 53BP1, the localization in MEFs with DNA damage).

      We appreciate the reviewer’s comment. We have tried to stain BLM in meiotic spreads using several different antibodies, however we were not successful getting specific signals for BLM. In the case of 53BP1, we monitored its localization, and it was not significantly different from Topbp1-/- meiotic spreads, please refer to Supplemental Figure 11. While we appreciate the reviewer’s suggestion of looking at the localization of 53BP1 in MEFs with DNA damage, we opted not to perform the experiment because we have shown that 53BP1 can still bind the BRCT 1 and 2 domains of TOPBP1 as previously described (Bigot et al., 2019; Cescutti et al., 2010; Liu et al., 2017). Additionally, both male and female 53BP1 KO mice are fertile (Ward et al., 2003), thus the partial disruption in binding to 53BP1 that we observed in TOPBP1 B5 mutant is likely not causing the infertility phenotype.

      2) A recent preprint by Fujiwara et al. (doi: https://doi.org/10.1101/2023.04.12.536672) showed the accumulation of R-loops in spermatocyte spreads in Senataxin knockout mice. The authors may check the R-loop on the sex body in Topbp1-B5 mice.

      We thank the reviewer for the suggestion. We have tried several protocols to stain R-loops (including the protocol used in the paper mentioned above) but were not successful.

      3) The authors need to check the protein level (and band shift) of Senataxin in the testis by western blotting analysis.

      We have tried several SETX antibodies, and none worked for western blot analysis.

      4) If possible, the authors can see any protein interaction between TOPBP1 and Senataxin.

      We appreciate the suggestion, and we will investigate this interaction in future work.

      5) The authors need to check the statistics in the paper.

      (1) It is better to show actual P-values in the case of "ns".

      P-values were added to the respective figure legends.

      (2) In focus counting such as Figures 3D, G, H, 4B, D, F, H, 5E, and F (and in Supplemental Figures), please indicate how many spreads were counted in each mouse. Moreover, the distribution of focus numbers and intensity of fluorescence are not parametric (not normal distribution). It is better to use a non-parametric method such as Mann-Whitney's U test.

      We appreciate the reviewer's comment and upon consulting with a Statistician at Cornell Statistical Consulting Unit (CSCU), we were advised to use a linear mixed effect model to take into account the variability in cells within each mouse when comparing mice between groups (Topbp1+/+ vs Topbp1B5/B5). We then reanalyzed all quantified meiotic spreads using this mixed effect model, and the p-value, number of mice, and number of cells counted for each group are displayed in the respective figure legends. Upon going through all the quantified meiotic spreads, we realized a minor error in one of the previous data points related to SETX staining in Topbp1+/+ and have fixed it. Using the previous quantification data and the new stats analysis the p-value for cores was 0.5598 and p-value for loops was 0.0273. Now using the correct values and the new stats analysis the p-value for cores is 0.5987 and p-value for loops is 0.0452. The correction did not change the conclusion of this data and is now displayed in the new Figure 5. We also realized a mistake in the ATR quantification when the spreadsheet was moved from excel to Graphpad. Using the previous quantification and the new stats analysis the p-value for cores was 0.2451 and p-value for loops was 0.8933. Now using the correct values and the new stats analysis the p-value for cores is 0.4068 and p-value for loops is 0.9396. The correction did not change the conclusion of this data and is now displayed in the new Figure 4. Moreover, we realized that we used n = 8 (n = number of mice) for MDC1 quantification and n = 2 for pCHK1_S345, instead of n =3 as shown in the preprint version of the manuscript. Corrected values were added to their respective figures and figure legends.

      (3) From Figures 6E, 7B, and 7C, the authors conclude the difference in the expression profile between wild type and Topbp1(B5) spermatocytes. It is better to show P-values for the comparison. Particularly, in Figure 7C, Xiap expression kinetics look similar between wild type and the mutant.

      We have added p-values to figures 6E and 7B and their respective figures or figure legends.<br /> In figure 7C, we now recognize that the Δ could have been misleading as we meant to compare Wild-type SP2 to Wild-type SP3 and Mutant SP2 to SP3; and not comparing Wild-type SP3 to Mutant SP3. Therefore, the Δ was excluded from Figure 7C. For the comparisons between expression levels of SP2 and SP3, it is challenging to calculate p-values for a single gene since these cells have started X-gene silencing and expression values are very low. Meaningful p-values for the comparisons between Wildtype SP3 to Mutant SP3 can be visualized in Figure 7B, where the comparison is based on number of genes instead of expression levels of each gene.

      Minor comments:

      1) Line 34: SPO11 is NOT a nuclease. Just delete it.

      It has been deleted (see line 34).

      2) Line 71, a protein: Is this protein ATR? Is so, please write it. If not, please give the name of the protein.

      In line 71 (now lines 79-80), we refer to TOPBP1-interacting proteins in general since many of these interactions happen through a phosphorylation in the TOPBP1’s interactor. This is the case for BLM, 53BP1, FANCJ, and RAD9. ATR interacts with TOPBP1 through TOPBP1’s AAD domain and this is not a phospho-mediated interaction. We restructured the sentence for clarity.

      3) In the Introduction, the authors often refer to a review by Cimprich and Cortez (2008) in various places. It is better to cite an original paper or the other an appropriate review.

      We have accepted the reviewer’s suggestion and added original papers when appropriate.

      4) Line 143-145: The authors generated eight charge reversal point mutations in the BRCT domain 5 of TOPBP1. If possible, it is helpful to mention the logic to generate these substitutions and also why BRCT domain 5, is not other domains.

      We generated eight charge reversal point mutations to abrogate all possible phospho-dependent interactions and avoid potential residual interactions. We have mutated other BRCT domains as well, which will be published separately.

      5) Line 174 (and Figure 2E): RPA should be either RPA2 or RPA32.

      Corrected (it is RPA2).

      6) Figure 5C-F: Please explain in more detail how the authors quantified the SETX signals. Why the two results are different?

      The quantification was done as described by Sims, et al. 2022, yielding separate data for XY cores and DNA loops. In brief, the green signal was measured along (XY cores) or across (XY DNA loops) the X and Y chromosomes. Signals were normalized by the signal in the autosomal chromosomes.

      Reviewer #3 (Recommendations For The Authors):

      I have no major criticisms, but I include a list of comments and suggestions (some of them conceptual, and disputable) that could help the authors to improve some parts of the manuscript.

      1) Line 52: I realize that the term protein "sequestration" (used in many instances along the manuscript) has been widespread in the literature related to MSCI in the last years. While this might be a cool way to describe the dynamics of proteins accumulating in the sex body, this reviewer considers this term is totally inappropriate. It is confusing and introduces at least to mistakes to the fact of protein accumulation in the sex body. First, it seems to indicate that once trapped in the sex body, proteins are incapable of leaving it, which might be completely wrong (histone replacement refutes this idea). Second, it is suggested that DDR proteins are attracted by the sex body and cannot remain associated to autosomes even if DNA repair has not been completed. This has also been demonstrated to be incorrect (see for example PDMI 19714216). Moreover, DDR proteins can associate de novo to chromosomes if needed, for instance upon DNA damage caused by chemicals or irradiation. Thus, I suggest that the use of "sequestration" should be evaluated more critically, evaluating the misleading ideas that are subjacent to this term. The use of protein "accumulation" is much more objective and descriptive of the real facts.

      We thank the reviewer’s suggestion and have addressed it in lines 52, 97 and 324.

      2) Line 88: Just as a deference to the original ideas, it would be nice to acknowledge that the inactivation of sex chromosomes and the formation of a sex body in mouse meiosis was described more than 50 years ago (PDMI 5833946; 4854664). Likewise, the ideas about the sequential achievement and reinforcement of MSCI during pachytene have been developed during the last 20 years, far before the recent reports cited in the manuscript. Citations to these "old fashion" works would be great.

      We appreciate the reviewer’s suggestion and have addressed it in line 86.

      3) Line 90. Please, take into consideration that such a strong effect on meiosis progression occurs mainly in some knockout mice models and that in many other models (including hybrid mice models from natural populations) autosomal regions can remain unsynapsed and accumulate DDR proteins without impairing meiosis. In other mammalian species, meiosis is even more permissive to these MSUC phenomena.

      We appreciate the reviewer’s suggestion and have addressed it at line 88.

      4) Line 211: The differences in the abundance of MLH1 and MLH3 are remarkable. If these two proteins are supposed to form a heterodimer leading to crossover formation, then the increase of only MLH1 might be related to a different process, not leading to crossover (even not class II ones).

      We agree with the reviewer’s comment and have included this point in the discussion (lines 491- 497).

      5) Line 217: I have some doubts about the results presented in Supplementary Figure 9. First, it is not clear to me how the represented cells counts were performed. Each spot is supposed to represent cell counts in a single individual, but how many cells were counted per individual? The proportion of cells could be a better indicator. Second, some B5/B5 individuals' counts were close to the ones displayed in the wild type. Did mutant animals show a high divergence compared to each other? It could be great to have each individual data displayed in a pie chart, and not only the aggregated data.

      We have now addressed this in the new Supplemental figure 9 legend. Each dot in the graph represents the sum of cells counted for each individual. We counted cells from 8 mice for each, Topbp1+/+ and Topbp1B5/B5.

      Here we summarize the total cells counted per individual:

      Author response table 1.

      6) Line 222: The data on 53BP1 deserve further attention. On the one side, from the analysis presented in Supplementary Figure 11, it seems that 53BP1 tends to show a lower intensity in Topbp1B5/B5 mice. Since only 2 mice were analyzed, while for most of the other proteins 3-8 animals were studied, I suggest increasing the number of animals analyzed for 53BP1 localization, to test if this slight difference turns significant. This is relevant since: 1) the association of 53BP1 protein in somatic cells was clearly affected, and 2) 53BP1 is one of the last MSCI markers incorporated to the sex body at mid-late pachytene. These results should be moved to the main text and not appear as supplementary data. On the other hand, if no differences were to be found in meiosis, compared to somatic cells, how do authors explain these differences? Would 53BP1 have another partner at the sex body apart from TOPBP1? Could TOPBP1 have other BRCT domains (apart from domain 5) able to bind 53BP1?

      We appreciate the reviewer’s suggestion; however, we had an issue with 53BP1 antibody. We analyzed 2 mice and needed to re-order the antibody. This antibody was backordered for almost one year, and when we finally received the order, the company had changed the clone for this antibody, and it no longer worked for meiotic spreads. In somatic cells, we see in HEK-293T a partial disruption in the binding to TOPBP1 B5 through IP-MS and IP-Western blot. The disruption is only partial due to the binding of 53BP1 to other domains in TOPBP1 such as BRCT 1 and 2 (Bigot et al., 2019; Cescutti et al., 2010; Liu et al., 2017). However, in assays in which we would expect a phenotypic response caused by impaired 53BP1, we did not see any effect, such as survival after IR (using the mice) and survival after phleomycin challenge (using Mefs). Moreover, 53BP1 KO mice, males and females, are fertile (Ward et al., 2003) so, the partial disruption in binding to 53BP1 that we observed in TOPBP1 B5 mutant is likely not causing the infertility phenotype.

      7) Line 250: I do not understand what is represented in Figure 5A. Why did the author mix two different experiments (differences in phosphoprotein abundance in B5/B5 compared to wild type and the interference of ATR with AZ20)?

      To account for the differences in cell population observed in the whole testis between Topbp1+/+ and Topbp1B5/B5, and to know exactly which phosphorylation changes were due to disruption in the ATR signaling and not pleiotropic effects, we combined two different phosphoproteomes: One phosphoproteome from the comparison between Topbp1+/+ and Topbp1B5/B5 and another one from the comparison between Vehicle or ATR inhibitor-treated mice. By utilizing this approach, we only consider hits that were disrupted in both analyses. A similar method was used by Sims et.al, 2022 (Sims et al., 2022).

      8) It is not clearly explained what is represented in Figure 6B. There is no explanation in the text or the figure legend. Do this represent the difference between scRNAseq in control and Topbp1B5/B5? If so, please, clarify.

      We thank the reviewer’s comment and have addressed it in the legend of Figure 6B.

      9) Line 342 and following. The authors describe a decrease of gene silencing. The use of two negative concepts is always confusing and results in the conversion to a positive one. I suggest considering the possibility of just talking about increase of gene expression, in order to make the message clearer.

      We appreciate the reviewer’s point here, but it is important to note that the phenomenon disrupted in our mutants is MSCI, which is by definition a gene silencing mechanism. This phenotype is not as simple as “increased gene expression”, it is the removal of a mechanism that is a key feature of prophase I. Thus, because we are focusing on the mechanism of MSCI, it is crucial to maintain this (albeit unusual) terminology.

      10) As for the classification of spermatocytes into 9 categories, I am curious about which spermatocytes are included in each of these categories. For instance, from cytology it seems that in Topbp1B5/B5 mice, spermatocytes are able to reach mid-late pachytene. However, in the spermatocyte categories established by scRNAseq they only reach class 3. Therefore, which are the populations included in the remaining 6 classes of spermatocytes? Do authors have any morphological correlation to these scRNAseq categories? Is it possible that in this mutant morphological advance of meiosis and gene expression profiles are uncoupled?

      The clustering of cells to a specific group is based on RNA expression, which does not always match cytological features. Moreover, during the analysis, cells with high expression of mitochondrial genes are excluded (these are dying cells that do not pass the quality control). Thus, while Topbp1B5/B5 reaches a mid-late-pachytene stage according to cytological analyses, in the single-cell RNA seq analysis we could only detect one pachytene stage. The other 6 remaining categories of spermatocytes can be classified according to their best-fit profile of gene expression. For that, we use the classification described by Chen et al., 2018 and Lau et al.,2020. Spermatocytes 3-5 = Pachytene, Spermatocytes 6-7 = Diplotene, Spermatocytes 8-9 = secondary spermatocytes (metaphase I/II). The gene markers used for this classification are displayed in Author response image 2.

      Author response image 2.

      Genes used as markers of spermatocytes captured in the scRNAseq analysis. Violin plots display the distribution of cells expressing Gm960 (Leptotene marker), Meiob (Leptotene/Zygotene marker), Psma8 (Pachytene marker), Pwill1 (Pachytene marker), Pou5f2 (Diplotene marker), and Ccna1 (Secondary Spermatocytes marker).

      11) Figure 6E shows that overexpression of X-linked genes is not a feature of spermatocytes but it is initiated in spermatogonia. This fact has not been properly stated in the text and perhaps not sufficiently highlighted.

      We noticed subtle changes during the spermatogonia stage and have addressed the reviewer’s comment in lines 317-322, however the downstream analyses related to a defect in X-gene silencing maintenance displayed in Figure 7 were done based on normalization of gene expression to its respective pre-leptotene stage.

      12) Supplementary Figure 24 shows that some X-linked genes are more expressed in Topbp1B5/B5 compared to control mice. In the figure it can be observed that many genes accumulate at the bottom of the graph. Does this have any correlation to the location of these genes along the X chromosome, for instance near or within the PAR? This could correlate with the defects in γH2AX accumulation at this region.

      These are the locations along the chromosome. Only the bottom 5 rows are within the PAR region, so this accumulation is not within the PAR region specifically. The bottom tenth of the genes in the heatmap correspond to roughly a 17 Mb region.

      13) The authors only analyzed the overexpression of genes located on the X chromosome. It would be interesting to show the behavior of Y-linked genes as well.

      The coverage of Y-linked genes was not very high and that is why we have not shown the results in the paper. However, the results for Y-linked genes were similar to the X-linked genes and can be visualized in Author response image 3.

      Author response image 3.

      Single cell RNAseq reveals that Topbp1B5/B5 spermatocytes initiate MSCI but fail to promote full silencing of Y chromosome-linked genes. Violin plot displaying the ratio of the average expression of Y chromosome genes by the average expression of chromosome 9 genes at different stages of spermatogenesis for Topbp1+/+ and Topbp1B5/B5 cells.

      14) Line 425: Authors indicate that it is not known if association of TOPBP1 and BLM, 53BP1 or other proteins is disrupted in Topbp1B5/B5 spermatocytes. Could these experiments be performed in the testis, as they were in somatic cells?

      The cellular composition in Topbp1+/+ and Topbp1B5/B5 testes is very different so it would not be a fair comparison. While we have tried to isolate pachytene cells to perform these experiments, we were successful only when using Topbp1+/+ but not Topbp1B5/B5, likely due to the extremely small size of the mutant testis.

      15) Line 455 and following. I find that the discussion about the role of SETX is not completely clear. It seems that a failure of SETX function could result in defective or no transcription, as a consequence of the impossibility to resolve RNA-DNA hybrid molecules. Therefore, should impairment of SETX lead to reduced or enhanced transcription? Please clarify. On the other hand, this defect in SETX function should affect the whole genome, and not only sex chromosomes. Do authors have any clues about this broad effect?

      We thank the reviewer’s comment and have expanded on discussion in lines 470-474. While we agree with the reviewer’s point that an impairment on SETX should affect the whole genome, however, during pachytene stage, SETX is mostly localized to the sex body. The Topbp1B5/B5 shows a specific defect in X and Y silencing maintenance during pachytene stage, thus we hypothesized that an impairment in SETX localization during pachytene should especially impair the X and Y chromosomes.

      16) As a general comment to the discussion section, I think authors could extend into some specific ideas or speculations. It is shocking that sex chromosome-linked genes are able to escape silencing without dismantling the complex (almost complete) MSCI response in the Topbp1 mutant (although perhaps this is not so surprising considering the high number of escapees reported in the inactivated X chromosome in female somatic cells).

      How to explain this paradox? One possibility (which would make a real breakthrough) is that the expression of sex chromosome-linked genes represents a regulated response to meiotic defects, and not just an unfortunate consequence of a defective MSCI. Thus, MSCI might be somehow irrelevant to prevent the execution of this sex chromosome-based program to stop meiosis progression when needed. The fact that this regulated activation was never proposed is perhaps due to the fact that most of the meiosis mutants characterized so far are unable to reach the stage at which MSCI is properly established, which is the most remarkable difference with the Topbp1 mutant studied here.

      Although naïve, the critical point for the activation of this sex chromosome-based program seems to depend simply on the transcription of Zfy1 and Zfy2 (encoding for transcription factors). The signaling cascades up and downstream these genes are the real mystery, awaiting further studies.

      We thank the very interesting point raised by the reviewer. Our interpretation of the data is that X and Y silencing being a dynamic process requires an initiation step and a maintenance step driven/controlled by the DDR machinery, and that Topbp1B5/B5 shows a grossly normal initiation of X and Y silencing but fails on maintain MSCI. Moreover, the expression of Zfy1 and Zfy2 have been previously demonstrated as enough to trigger cell death (Royo et al., 2010; Vernet et al., 2016), and Topbp1B5/B5 cells show increased expression of these genes. However, we do not exclude the very interesting possibility, raised by the reviewer, that the expression of XY-linked genes represents a regulated response to meiotic defects to stop meiosis progression, leading to the cell death observed in Topbp1B5/B5, which makes the Topbp1B5/B5 an unique model for these studies as most of the previous meiosis mutants are unable to reach the stage at which MSCI is properly established. We add discussion about this exciting point in lines 513-522.

      17) Scale bars are impossible to read in Figures 1I and J, and are missing in all the other image figures. Please, correct.

      We have addressed this in the new Figure 1. For figures displaying meiotic spreads, adding a scale bar is not a common practice in the field as these cells are swollen while being prepared.

      18) Line 828. Since Paula Cohen is an author of the manuscript, it seems weird to acknowledge herself in this section.

      Corrected.

      References

      Adams SR, Maezawa S, Alavattam KG, Abe H, Sakashita A, Shroder M, Broering TJ, Sroga Rios J, Thomas MA, Lin X, Price CM, Barski A, Andreassen PR, Namekawa SH. 2018. RNF8 and SCML2 cooperate to regulate ubiquitination and H3K27 acetylation for escape gene activation on the sex chromosomes. PLoS Genet 14. doi:10.1371/journal.pgen.1007233

      Bigot N, Day M, Baldock RA, Watts FZ, Oliver AW, Pearl LH. 2019. Phosphorylation-mediated interactions with topbp1 couple 53bp1 and 9-1-1 to control the g1 DNA damage checkpoint. Elife 8:1–28.

      Cescutti R, Negrini S, Kohzaki M, Halazonetis TD. 2010. TopBP1 functions with 53BP1 in the G1 DNA damage checkpoint. EMBO J 29:3723–3732.

      Chen Y, Zheng Y, Gao Y, Lin Z, Yang S, Wang T, Wang Q, Xie N, Hua R, Liu M, Sha J, Griswold MD, Li J, Tang F, Tong M-H. 2018. Single-cell RNA-seq uncovers dynamic processes and critical regulators in mouse spermatogenesis. Cell Res 28:879–896.

      Hirota T, Blakeley P, Sangrithi MN, Mahadevaiah SK, Encheva V, Snijders AP, ElInati E, Ojarikre OA, de Rooij DG, Niakan KK, Turner JMA. 2018. SETDB1 Links the Meiotic DNA Damage Response to Sex Chromosome Silencing in Mice. Dev Cell 47:645-659.e6.

      Ichijima Y, Ichijima M, Lou Z, Nussenzweig A, Daniel Camerini-Otero R, Chen J, Andreassen PR, Namekawa SH. 2011. MDC1 directs chromosome-wide silencing of the sex chromosomes in male germ cells. Genes and Development 25:959–971.

      Lau X, Munusamy P, Ng MJ, Sangrithi M. 2020. Single-Cell RNA Sequencing of the Cynomolgus Macaque Testis Reveals Conserved Transcriptional Profiles during Mammalian Spermatogenesis. Dev Cell 54:548-566.e7.

      Liu Y, Cussiol JR, Dibitetto D, Sims JR, Twayana S, Weiss RS, Freire R, Marini F, Pellicioli A, Smolka MB. 2017. TOPBP1Dpb11 plays a conserved role in homologous recombination DNA repair through the coordinated recruitment of 53BP1Rad9. J Cell Biol 216:623–639.

      Modzelewski AJ, Holmes RJ, Hilz S, Grimson A, Cohen PE. 2012. AGO4 regulates entry into meiosis and influences silencing of sex chromosomes in the male mouse germline. Dev Cell 23:251–264. Royo H, Polikiewicz G, Mahadevaiah SK, Prosser H, Mitchell M, Bradley A, De Rooij DG, Burgoyne PS, Turner JMA. 2010. Evidence that meiotic sex chromosome inactivation is essential for male fertility. Curr Biol 20:2117–2123.

      Sims JR, Faça VM, Pereira C, Ascenção C, Comstock W, Badar J, Arroyo-Martinez GA, Freire R, Cohen PE, Weiss RS, Smolka MB. 2022. Phosphoproteomics of ATR signaling in mouse testes. Elife 11. doi:10.7554/eLife.68648

      Vernet N, Mahadevaiah SK, de Rooij DG, Burgoyne PS, Ellis PJI. 2016. Zfy genes are required for efficient meiotic sex chromosome inactivation (MSCI) in spermatocytes. Hum Mol Genet 25:5300–5310.

      Ward IM, Minn K, van Deursen J, Chen J. 2003. p53 Binding protein 53BP1 is required for DNA damage responses and tumor suppression in mice. Mol Cell Biol 23:2556–2563.

      Yeo AJ, Becherel OJ, Luff JE, Graham ME, Richard D, Lavin MF. 2015. Senataxin controls meiotic silencing through ATR activation and chromatin remodeling. Cell Discovery 1. doi:10.1038/celldisc.2015.25

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewers’ Comments:

      Reviewer #1 (Remarks to the Author):

      Summary:

      Fang Huang et al found that RBM7 deficiency promotes metastasis by coordinating MFGE8 splicing switch and NF-kB pathway in breast cancer by utilizing clinical samples as well as cell and tail vein injection models.

      Strengths:

      This study uncovers a previously uncharacterized role of MFGE8 splicing alteration in breast cancer metastasis, and provides evidence supporting RBM7 function in splicing regulation. These findings facilitate the mechanistic understanding of how splicing dysregulation contributes to metastasis in cancer, a direction that has increasingly drawn attention recently, and provides a potentially new prognostic and therapeutic target for breast cancer.

      We thank the reviewer for appreciating the novelty and importance of this study, and have provided new data to address the following concerns raised by the reviewer.

      Weaknesses:

      This study can be strengthened in several aspects by additional experiments or at least by further discussions. First, how RBM7 regulates NF-kB, and how it coordinates splicing and canonical function as a component of NEXT complex should be clarified. Second, although the roles of MFGE8 splicing isoforms in cell migration and invasion have been demonstrated in transwell and wound healing assays, it would be more convincing to explore their roles in vivo such as the tail vein injection model. Third, the clinical significance would be considerably improved, if the therapeutic value of targeting MFGE8 splicing could be demonstrated.

      We’re thankful for the constructive suggestions. A preliminary study on the mechanism by which RBM7 regulates NF-kB pathway is already underway. We found RBM7 depletion remarkably promoted the expression of IL-1β as judged by qPCR and ELISA assays (new Figure S5G- S5I, also see below). IL-1β, commonly known as a pro-inflammatory cytokine, could bind to IL-1R and initiate a multistage enzymatic reaction that triggers the activation of NF-κB pathway (Axel Weber, 2010) (Qing Guo, 2024). Thus we speculated that the upregulation of IL-1β might be a causal factor in RBM7-depletion-induced activation of NF-kB signaling. It will be interesting to determine the complete molecular mechanism in our future study. In addition, we performed a co-IP experiment and found that RBM7 could interact with RNA splicing factor SF3B2, a component of spliceosomal U2 snRNP complex (new Figure S6B, also see below). Consistent with the AS regulation of MFGE8 by RBM7, the depletion of SF3B2 also promoted exon7 skipping, implying a cooperative effect of the two proteins in regulating MFGE8 splicing (new Figure S6C-6D, also see below). This is in concert with a previous study that RRM domain of RBM7 could bind a proline-rich segment within SF3B2 (Falk, Finogenova et al., 2016). The interaction mode with strong similarity to RBM7RRM–ZCCHC8Proline interaction in the NEXT complex indicated mutually exclusive binding of SF3B2 and ZCCHC8 to RBM7. Thus, RBM7 appears to play dual, but not conflicting, roles during RNA processes depending on its interaction with the spliceosome or exosome (see line 427-437 in the new manuscript).

      Author response image 1.

      The mRNA levels of IL-1β in MDA-MB-231 or BT549 cells with stable RBM7 knockdown or control vector were examined by qRT-PCR approach.

      Author response image 2.

      Supernatants from RBM7-knockdown MDA-MB-231 or BT549 cells were collected and protein expression of IL-1β was measured by ELISA kit.

      Author response image 3.

      The knockdown efficiency of RBM7 in two breast cancer cell lines were determined by qRT-PCR approach.

      Author response image 4.

      Immunoprecipitation assay was performed in breast cancer cells expressing HA-RBM7 and Flag-SF3B2 or empty vector. The Flag-tagged precipitated complexes and lysates were analyzed through western blotting.

      Author response image 5.

      The splicing shift of MFGE8 upon SF3B2 knockdown in breast cancer cells was examined by RT-PCR approach. The mean ± SD of PSI values derived from three independent replicates is shown.

      Author response image 6.

      The SF3B2 knockdown efficiency was examined by qRT-PCR.

      To further corroborate the roles of two MFGE8 isoforms in cell invasion, we have performed Fluorescent Gelatin Degradation Assays for investigating invadopodia formation. Consistent with the transwell assay results, MFGE8-L up-regulation suppressed breast cancer cells invasion through a layer of extracellular matrix, whereas breast cancer cells with ectopic expression of MFGE8-S acquired enhanced ability to degrade matrix and invasion (new Figure 5B, also see below). In addition, to determine the therapeutic value of targeting MFGE8 splicing, we transfected triple-negative breast cancer cells with ASOs targeting RBM7-binding motif and examined the potential impact on cell aggressiveness. The results showed an obvious increase in exon7-skipped variant of MFGE8 as compared to the scramble negative control ASOs, meanwhile, the migrative and invasive ability of breast cancer cells treated with splice-targeting ASOs was significantly boosted (new Figure 6B and S5B, also see below), further suggesting that RBM7-knockdown stimulated aggressiveness of breast cancer at least partially relies on splicing switch of MFGE8.

      Author response image 7.

      Gelatin degradation assay was performed to test the effect of RBM7 knockdown on invadopodia function. 10000 cells were plated onto FITC-gelatin substrates (Green) and cultured for 48 h. Representative images are shown (red, Cy3-phalloidin; blue, DAPI) and the degraded areas were quantified by Image J software. Scar bars= 50 μm. P values were determined by one-way ANOVA with Tukey's multiple comparison test (n = 3).

      Author response image 8.

      Representative transwell analysis of migrative/invasive capability of breast cancer cells transfected with 500 nM ASO directed against RBM7-binding region in MFGE8 pre-mRNA. P values were determined by one-way ANOVA with Tukey's multiple comparison test.

      Author response image 9.

      RT-PCR quantification of two MFGE8 isoforms after transfecting breast cancer cells with 500 nM ASO directed against RBM7-binding region in MFGE8 pre-mRNA. P values were calculated by one-way ANOVA with Tukey's multiple comparison test.

      The minor concerns

      (1) Several figure legends do not match with the images, for example, Figure 2K, Figure 4, Figure 7D, and 7E, and the description of Fiure 7F is missing in the text.

      As suggested by the reviewer, we have checked all of the figure legends carefully and corrected all of the misinterpretation.

      (2) The statistical methods for Figure1A and Figure1B should be indicated.

      As suggested by the reviewer, we have included the statistical methods for Figure1A and 1B in Figure1 legend. Data in Figure 1A and 1B are presented as means ± SD and P values were obtained by Mantel-Cox log-rank test.

      (3) The molecular weight of the proteins in the Western Blot images should be marked.

      As suggested by the reviewer, we have added the molecular weight of proteins in all of the western blot images.

      (4) The sequences where RBM7 binds on MFGE8 RNA should be clearly indicated.

      We thank the reviewer for this question. We analyzed the sequence of alternative exon 7 and the motifs nearby its 5’ or 3’ splice sites, and found two RBM7 potentially binding motifs are positioned in proximal to the pseudo 3’ splice site. Subsequent RT-PCR for the precipitation in RIP assays confirmed RBM7 could bind to the upstream sequence containing 5’-UUUCUU-3’ motifs adjacent to intron6/exon7 junction of MFGE8 cassette exon, but not another region nearby it. To pinpoint the location for the potential cis-element for AS regulation by RBM7, we designed antisense oligonucleotides (ASOs) to block RBM7 potentially binding sites (UUUCUU). As shown in revised Figure 4F, when compared to scramble ASO, targeting ASOs contributed to the exclusion of exon7. Additionally, we constructed an exogenous MFGE8 splicing reporter containing exon 6-8 and partial intron sequences to determine the binding site for AS regulation by RBM7. The depletion of RBM7 still induced the splicing shift of the minigene reporter by elevating MFGE8-S variant. While the binding motif UUUCUU was removed or mutated, RBM7 failed to affect the splicing outcomes of MFGE8 (new Figure S3C, also see below). Due to its close proximity to 3’ splice site, UUUCUU residues bound by RBM7 is very likely to participate in spliceosome assembly at the upstream 3’ splice site of exon7, which may explain why disruption of the motif led to almost complete exon7 skipping. The above data suggested that RBM7 regulated the exon skipping of MFGE8 by binding to UUUCUU located six nucleotides upstream of the 3’ splice-site of exon7.

      Author response image 10.

      Upper: the red line in diagram indicates ASOs targeting region which contains UUUCUU; down: MCF7 and MDA-MB-231 cells were transfected with ASOs targeting MFGE8 pre-mRNA for 48h and then applied for RT-PCR identification. P values were determined by one-way ANOVA with Tukey's multiple comparison test.

      Author response image 11.

      Upper: MFGE8 min-splicing reporters with mutation in the RBM7 binding site or a non-specific binding were generated and shown in cartoon; down: RT-PCR assays were performed to identify the splicing outcomes of MFGE8 reporter while RBM7 was depleted in breast cancer cells.

      (5) Some typos, graphic errors, and sentences are hard to understand and need to be corrected, such as lines 80-81, 249-250, line 221 "motfs", line 319 "RBM4". Please carefully proofread and revise the entire manuscript.

      As suggested by the reviewer, we have corrected typos and graphic errors mentioned above. In addition, this manuscript was also extensively edited to improve grammar and sentence structure.

      (6) Define the abbreviations when they first appear, such as MFGE8-L, RBM, etc.

      We thank the reviewer for raising this point. We have defined the abbreviations when firstly presented in the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors reported the biological role of RBM7 deficiency in promoting metastasis of breast cancer. They further used a combination of genomic and molecular biology approaches to discover a novel role of RBM7 in controlling alternative splicing of many genes in cell migration and invasion, which is responsible for the RBM7 activity in suppressing metastasis. They conducted an in-depth mechanistic study on one of the main targets of RBM7, MFGE8, and established a regulatory pathway between RBM7, MFGE8-L/MFGE8-S splicing switch, and NF-κB signaling cascade. This link between RBM7 and cancer pathology was further supported by analysis of clinical data.

      Strengths:

      Overall, this is a very comprehensive study with lots of data, and the evidence is consistent and convincing. Their main conclusion was supported by many lines of evidence, and the results in animal models are pretty impressive.

      Weaknesses:

      However, there are some controls missing, and the data presentation needs to be improved. The writing of the manuscript needs some grammatical improvements because some of the wording might be confusing.

      We thank the reviewer for the positive comments on this work, and have addressed all the concerns raised by the reviewer.

      Specific comments:

      (1) Figure 2. The figure legend is missing for Figure 2C, which caused many mislabels in the rest of the panels. The labels in the main text are correct, but the authors should check the figure legend more carefully. Also in Figure 2C, it is not clear why the authors choose to examine the expression of this subset of genes. The authors only refer to them as "a series of metastasis-related genes", but it is not clear what criteria they used to select these genes for expression analysis.

      We thank the reviewer for raising this question. We have included the figure legend for Figure 2C and improved other figure legends throughout the article. For the second question, since gene ontology analysis of RNA-seq data in RBM7-depleted breast cancer cells showed that a series of differentially expressed genes were enriched in metastasis-associated processe, we identified the expression of this subset of genes in breast cancer cells in the presence or absence of RBM7 by heatmap differential analysis based on qRT-PCR results. To clarify this point and address the reviewer’s concern, we have improved the relevant description of this part (see line 174-180 in the new manuscript).

      (2) Line 218-220. The comparison of PSI changes in different types of AS events is misleading. Because these AS events are regulated in different mechanisms, they cannot draw the conclusion that "the presence of RBM7 may promote the usage of alternative splice sites". For example, the regulators of SE and IR may even be opposite, and thus they should discuss this in different contexts. If they want to conclude this point, they should specifically discuss the SE and A5SS rather than draw an overall conclusion.

      We are thankful for the reviewer’s valuable comment. According to the suggestion, we have removed the overall conclusion and corrected to discuss in SE and A5SS.

      (3) In the section starting at line 243, they first referred to the gene and isoforms as "EFG-E8" or "EFG-E8-L", but later used "EFGE8" and "EFGE8-L". Please be consistent here. In addition, it will be more informative if the authors add a diagram of the difference between two EFGE8 isoforms in terms of protein structure or domain configuration.

      As suggested by the reviewer, we keep using the name “MFGE8-L” for the canonical MFGE8 isoform and “MFGE8-S” for the truncated isoform in this manuscript. In addition, to clarify the structural basis for the different tumor invasion-related functions of two MFGE8 isoforms, we have included a diagram of their domain configuration in new Figure S4F and predicted protein structure in new Figure S4G. The details in the revised manuscript are given below:

      Author response image 12.

      Schematic diagram of the domain composition of two MFGE8 isoforms. Upper: the full-length variant with exon7 indicated by yellow square; down: the truncated variant with exon7 skipping.

      Author response image 13.

      The model structure of two MFGE8 isoforms was implemented using SwissModel software. The F5/8 type C2 protein domain excluded from MFGE8-S variant was marked in red.

      (4) Figure 7B and 7C. The figures need quantification of the inclusion of MFGE exon7 (PSI value) in addition to the RT-PCR gel. The difference seems to be small for some patients.

      As suggested by the reviewer, we have included the relative quantification of PSI for endogenous MFGE8 in breast cancer patients and found increased proportion of exon7 exclusion in most tumor samples when compared to normal tissues (case#1: 86:94; case#2: 84:86; case#3: 79:85; case#4: 63:75; case#5: 69:93; case#6: 71:80) (new Figure 7B, also see below). On the other hand, we have expanded the number of metastatic breast cancer cases and quantified the the AS events within MFGE8 by analyzing the PSI values. The lymph node metastases contain a higher proportion of MFGE8 variant with skipped exon7 in comparison with paired primary tumor tissues (case#1: 80:95; case#2: 86:97; case#3: 84:90; case#4: 70:78; case#5: 83:89) (Figure 7C). This is coherent with decreased RBM7 expression levels found in breast cancer with lymph node metastasis.

      Author response image 14.

      The splicing alteration of MFGE8 in 6 pairs of primary breast cancer tissues and adjacent normal tissues was examined using RT-PCR. The quantification of PSI vales was based on relative band intensities using Image J software.

      Author response image 15.

      The splicing alteration of MFGE8 in primary breast cancer tissues and corresponding lymph node metastases was identified by RT-PCR assays. The quantification of PSI vales wa determined by Image J software.

      Minor comments:

      The writing in many places is a little odd or somewhat confusing, I am listing some examples, but the authors need to polish the whole manuscript more to improve the writing. 1. Line 169-170, "...followed by profiling high-throughput transcriptome by RNA sequencing", should be "followed by high-throughput transcriptome profiling with RNA sequencing". 2. Line 170, "displayed a wide of RBM7-regulated genes were enriched...", they should add a "that" after the "displayed" as the sentence is very long. 3. Line 213, "PSI (percent splicing inclusion)" is not correct, PSI stands for "percent spliced in". 4. Line 216-217, the sentence is long and fragmented, they should break it into two sentences. 5. Line 224, the "tethering" should be changed to "recognizing". There is a subtle difference in the mechanistic implication between these two words. 6. Line 250, should be changed to "...in the ratio of two MFGE8 isoforms".

      We thank the detailed comments from the reviewer. The points mentioned above has been addressed one by one and this manuscript was also extensively edited to improve grammar and sentence structure for better understanding.

      References

      Axel Weber PW, Michael Kracht* (2010) Interleukin-1 (IL-1) Pathway. SCIENCESIGNALING.

      Qing Guo1, Yizi Jin1,2, Xinyu Chen3, Xiaomin Ye4, Xin Shen5, Mingxi Lin1,2, Cheng Zeng1,2, Teng Zhou1,2 and Jian Zhang1,2 (2024) NF-κB in biology and targeted therapy: new insights and translational implications. Signal Transduction and Targeted Therapy.

      Falk S, Finogenova K, Melko M, Benda C, Lykke-Andersen S, Jensen TH, Conti E (2016) Structure of the RBM7–ZCCHC8 core of the NEXT complex reveals connections to splicing factors. Nature Communications.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Balasubramanian et al. characterized the cell types comprising mouse Schlemm's canal (SC) using bulk and single-cell RNA sequencing (scRNA-seq). The results identify expression patterns that delineate the SC inner and outer wall cells and two inner wall 'states'. Further analysis demonstrates expression patterns of glaucoma-associated genes and receptor-ligand pairs between SEC's and neighboring trabecular meshwork. 

      Strengths: 

      While mouse SC has been profiled in previous scRNA-seq studies (van Zyl et al 2020, Thomson et al 2021), these data provide higher resolution of SC cell types, particularly endothelial cell (SEC) populations. SC is an important regulator of anterior chamber outflow and has important consequences for glaucoma. 

      We thank the reviewers for their thorough reading of our manuscript and their insightful comments.

      Weaknesses: 

      (1) Since SC has previously been characterized in mouse, human, and other species by scRNA-seq in other studies, this study would benefit from more direct comparisons to published datasets. For example, Table 4 could be expanded to list the SC cell numbers profiled in each study. Expression patterns highlighted in this study could be independently verified by plotting in publicly available mouse SC datasets. Further, a comparison to human expression patterns would assess whether type-specific expression patterns are conserved. Alternatively, an integrated analysis could be performed. Indeed, the authors mention that an integrated analysis was attempted but the data is not shown. It is unclear if this was because of a lack of agreement between datasets or other reasons.

      Table 4 now includes an expanded list of SC cell numbers in each study. We profiled the expression of Npnt, Selp, and Ccl21a in the Thomson et al., 2021 dataset and have included the concurring results in Figure S5. We were unable to do a similar profile using the Van Zyl., 2020 dataset due to small SC numbers. As previously mentioned, differences such as read depth, strain of animals used (including pigmented vs albino), method of cell isolation (including drug exposure), and number of cells profiled raise a significant impediment to integration with previously published datasets. A comparison to human atlas is a focus of future work.

      (2) Figure 1 presents bulk RNA seq results comparing SEC, BEC, and LEC expression patterns. These populations were isolated using cell surface markers and enrichment by FACS. Since each EC population is derived from the same sample, the accuracy of this data hinges on the purity of enrichment. However, a reference is not given for this method and it is not clear how purity was validated. The authors later note that marker Emcn, which was used to identify BECs, is also expressed in SECs and LECs at lower levels. It should be demonstrated that these populations are clearly separated by flow cytometry. 

      We have added the following clarifying text to the methods section: Forward and side scatter gates were first used to eliminate events with low scatter which include debris, cell fragments and pyknotic cells. Then propidium iodide positive dead cells were gated out. Further gating on the viable cells was applied such that distinct population of cells were isolated a) SECs: GFP+Lvye1-, b) LECs: GFP+ Lyve1+, c) GFP- BECs: Endomucin+.

      We show here a representative of the flow sort showing the clear distinction in SEC and LEC cell isolation.

      Author response image 1.

      Flow sorted SEC and LEC. We obtained two distinct populations; 1. SEC cells (GPF+LYVE1--blue) 2. LEC (GPF+LYVE1+- red). Note eFluor 660 emission was collected using the Alexa647 (A647) setting of the flow cytometer. Additionally, SEC marker expression from bulk RNA-seq aligns with signature gene expression from SECs in single cell RNA-seq (Figure S3).

      (3) Bulk RNA-seq analysis infers similarity from the number of DEGs between samples, however, this is not a robust indicator. A correlation analysis should be run to verify conclusions. 

      We have provided a heatmap with hierarchical clustering based on Euclidean distance of the EC subtypes (Figure 1B) analyzed by bulk RNA seq in addition to number of DE genes between subtypes.

      (4) Figures 2-4 present three different datasets targeting the same tissue: 1) C57bl/6j scRNA-seq, 2) C57bl/6j snRNA-seq, 3) 129/sj scRNA-seq. Integrated analysis comparing datasets #1 to #2 and #3 is also presented. Integration methods are not described beyond 'normalization for cell numbers'. It is unclear if additional alignment methods were used. Integration across each of these datasets needs careful consideration, especially since different filtering methods were used (e.g. <20% mito in scRNA-seq and <5% in snRNA-seq). Improper integration could affect the ability to cluster or exaggerate differences between cell/types and states. It would be useful to demonstrate the contribution of different samples and datasets to each cell type/state to verify that these are not driven by batch effects, mouse strain, or collection platform. 

      We agree that integration should be performed with careful consideration to confounding factors. We demonstrate the contribution of different samples and datasets to show how our datasets integrated well (we had added panels to Figure 3C and 4C) and that cell types/states contribution was uniformly distributed across methods (C57BL/6J single cell and single nuc) and backgrounds (C57BL/6J and 129/Sj) were not a result of integration.

      (5) IW1 and IW2 are not well separated, and it is unclear if these represent truly different cell states. Figure 5b shows the staining of CCL21A and describes expression in the 'posterior portion' but in the image there are no DAPI+ nuclei in the anterior portion, suggesting the sampling in this section is different from Figure 5a. This would be improved by co-staining NPNT and CCL21A to demonstrate specificity. 

      Since both our antibodies are derived from the same species (goat), a co-labeling wasn’t possible. To be prudent, we used adjacent sections, flat-mounts, and RNAscope and provided further evidence of the anterior/posterior “bias” in supplemental figures.

      (6) The substructures observed within clusters in sc/snRNA-seq data suggest that overall profiling may still not be comprehensive. This should be noted in the discussion. 

      We agree and have added this note in the discussion: “With greater sampling and deeper transcriptomic depth, it is likely that additional SEC cell states/types will be identified.”

      Reviewer #2 (Public Review):

      Summary: 

      This article has characterized the mouse Schlemm's canal expression profile using a comprehensive approach based on sorted SEC, LEC, and BEC total RNA-Seq, scRNA-Seq, and snRNA-Seq to enrich the selection of SECs. The study has successfully profiled genome-wide gene expression using sorted SECs, demonstrating that SECs have a closer similarity to LECs than BECs. The combined scRNA- and snRNA-Seq data with deep coverage of gene expression led to the successful identification of many novel biomarkers for inner wall SECs, outer wall SECs, collector channel ECs, and pericytes. In addition, the study also identified two novel states of inner wall SECs separated by new markers. The study provides significant novel information about the biology and expression profile of SECs in the inner and outer walls. It is of great significance to have this novel, convincing, and comprehensive study led by leading researchers published in this journal. 

      Strengths: 

      This is a comprehensive study using various data to support the expression characterization of mouse SECs. First, the study profiled genome-wide expression using sorted SECs, LECs, and BECs from the same tissue/organ to identify the similarities and differences among the three types of cells. Second, snRNA-Seq was applied to enrich the number of SECs from mouse ocular tissues significantly. Increased sampling of SECs and other cells led to more comprehensive coverage and characterization of cells, including pericytes. Third, the combined scRNA- and snRNA-Seq data analyses increase the power to further characterize the subtle differences within SECs, leading to identifying the expression markers of Inner and Outer wall SECs, collector channel ECs, and distal region cells. Fourth, the identified unique markers were validated for RNA and protein expression in mouse ocular tissues. Fifth, the study explored how the IOP- and glaucoma-associated genes are expressed in the ScRNA- and snRNA-Seq data, providing potential connections of these GWAS genes with IOP and glaucoma. Sixth, the initial pathway and network analyses generated exciting hypotheses that could be tested in other independent studies. 

      We thank the reviewer for their comments on the strengths of this study.

      Weaknesses: 

      A few minor weaknesses have been noted. First, since snRNA-Seq and scRNA-Seq generated different coverage of expressed genes in the cells, how did the combined analyses balance the un-equal sequencing coverage and missing data points in the snRNA-Seq data? Second, the RNA/protein validation of selected SEC molecular markers was done using mouse anterior segment tissues. It would be more helpful to examine whether these molecular markers for SECs could work well in human SECs. Third, the effort to characterize the GWAS-identified IOP- and glaucoma-associated genes is exciting but with limited new information. Additional work could be performed to prioritize these genes.

      Integration of sc-Seq and sn-Seq data: We have addressed a similar integration question from reviewer 1 and have now included a plot showing the distribution of cells upon integration. Integration methods are not perfect and generally result in some loss of data especially when datasets of un-equal sequencing coverage are integrated. However, we did not observe any obvious differences between the original (un-integrated) and integrated datasets. We also noted that cell types/states contribution was similarly distributed across methods (C57BL/6J single cell and single nuc) and backgrounds (C57BL/6J and 129/Sj) and clustering were not a result of batch-effects.

      We agree about the human relevance of SEC markers, and this will be a focus of future work.

      Another focus of our future work is to understand how GWAS identified IOP and glaucoma genes change in disease states.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Minor: 

      (1) Figure 5- DAPI should be listed in the legend. 

      (2) Figure 5- It would be helpful to label IW1 and IW2 regions in the UMAPs. 

      We have incorporated the suggestions in Figure 5 and legend.

      Reviewer #2 (Recommendations For The Authors): 

      (1) The study has validated RNA/protein expression of the selected biomarkers for IW/OW SECs in mouse eyes. It would be more helpful to confirm that these newly identified molecular biomarkers for SECs could apply to human eyes. This could be examined through available human scRNA-/snRNA-Seq data or targeted RNA and protein staining experiments. The additional validation in human SECs would make the current discovery more convincing. 

      We agree with the importance of validation in human samples, and is the scope of future work.

      (2) The combination of scRNA-Seq and snRNA-Seq from three batches of experiments increased the statistical power to identify subtypes of SECs. It would be helpful to include more details on how the qc, missing data, and normalization across different batches were dealt with. 

      We have incorporated more details in the methods section of the paper.

      (3) The authors explored the underlying molecular connection between the newly identified IOP/glaucoma-associated genes using the newly generated SEC-targeted scRNA/snRNA-Seq data. Many of these associated genes were present in the same SEC cells. It would be interesting to see how many of these genes' expression levels are correlated with each other via a network. These potential correlation networks across SECs could lead to identifying novel upstream regulators or network hubs, which could target many IOP-associated genes for future studies. 

      We agree with the importance of a correlation network analysis, but this is a focus of future work, especially in normal and disease states.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines patterns of diversity and divergence in two closely related sub-species of Zea mays. While the data are interesting and the authors have tried to exclude multiple confounding factors, many patterns cannot clearly be ascribed to one cause or another.

      Strengths:

      The paper presents interesting data from sets of sympatric populations of the two sub-species, maize and teosinte. This sampling offers unique insights into the diversity and divergence between the two, as well as the geographic structure of each. Many analyses and simulations to check analyses have been carried out.

      Weaknesses:

      The strength of conclusions that can be drawn from the analyses was low, partly because there are many strange patterns. The authors have done a good job of adding caveats, but clearly, these species do not meet many assumptions of our methods.

      Thank you for the comments. We appreciate the multiple rounds of revision the manuscript has undergone and the work has improved as a consequence. Overall we disagree that the patterns are strange, and have made considerable efforts to explain in the text and in our responses why the patterns make sense based on what we know about the history of Zeamays from previous research. We agree that currently available methods are not capable of answering all questions we propose adequately. This reflects both limitations with the available data for these populations (i.e. phenotypes and spatially explicit sampling), and limitations in available methods tailored to the questions at hand (spatially explicit inference of the range over which an allele is adaptive). We have made considerable effort to point out the places where our inferences are likely to have low accuracy or limited resolution. These limitations are in many ways inherent to all inferential based science and should not be considered a weak point specific to this work, nor do they take away from the fundamental conclusions, which have changed quantitatively but not qualitatively over the course of peer review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      -The manuscript should say something about the fact that range-wide PSMC does not show a decline.

      We did not use PSMC methods but instead mushi as described in the methods. On line 356 we described how the lower sample size and strong regularization are the most likely explanations for the lack of a population size decline in the rangewide samples.

      - The manuscript should explain how rdmc was run and what "overlapping" means.

      We described how sweep intervals were inferred starting on line 823 (Methods subsection “Identifying Selective Sweeps”). Sweep regions were defined as the outermost coordinates from all populations that shared any overlap in their respectively defined sweep intervals. The details of how we ran rdmc, including all of the parameters, is described starting on line 895 (methods subsection “Inferring modes of convergent adaptation”).

      - Figure 4: "Negative log10" is messed up

      Thank you. This has been fixed for the Version Of Record.

      - Line 318: "accruacy"

      Thank you. We have edited this typo for the Version Of Record.

      - New Table S3: why don't the proportions add to 1?

      These values represent what proportion of fixed differences at 0 fold sites are unique to each population. The denominator is the total number of fixed differences for each population separately, so each proportion is distinct for each population and thus should not sum to one across them. The table caption has been reworded in efforts to clarify for the Version Of Record.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper examines patterns of diversity and divergence in two closely related sub-species of Zea mays. While the patterns are interesting, the strength of evidence in support of the conclusions drawn from these patterns is weak overall. Most of the main conclusions are not supported by convincing analyses.

      Strengths:

      The paper presents interesting data from sets of sympatric populations of the two sub-species, maize and teosinte. This sampling offers unique insights into the diversity and divergence between the two, as well as the geographic structure of each.

      Weaknesses:

      There were issues with many parts of the paper, especially with the strength of conclusions that can be drawn from the analyses. I list the major issues in the order in which they appear in the paper.

      (1) Gene flow and demography.

      The f4 tests of introgression (Figure 1E) are not independent of one another. So how should we interpret these: as gene flow everywhere, or just one event in an ancestral population? More importantly, almost all the significant points involve one population (Crucero Lagunitas), which suggests that the results do not simply represent gene flow between the sub-species. There was also no signal of increased migration between sympatric pairs of populations. Overall, the evidence for gene flow presented here is not convincing. Can some kind of supporting evidence be presented?

      We agree that the standard approach to f4 tests that we employed here is not without limitations, namely, that the tests are conducted independently, while the true evolutionary history is not. While a joint demographic inference across all populations would be useful, it did not seem tractable to perform over all of our populations with currently available methods, given the number of populations being analyzed, nor does it directly address the question of interest. Our purpose for including the f4 was testing if there was more gene flow between sympatric pairs than in other comparisons (we have made that point more clear in the text near line 174. As described in the text, the distribution of Z scores is generated by pairing focal populations with all other non-focal populations across both subspecies, which means the gene flow signal of interest is marginalized over the effects of gene flow in the other non-focal populations. This is not nearly as rich as inferring the full history, but it gives us some sense of the average amount of gene flow experienced between populations and allows us to address one of our primary questions of interest when conceiving this paper - do sympatric pairs show more geneflow than other pairs? We agree with the reviewer that that answer is largely no, and the writing reflects this.

      Overall, we think both points mentioned by the reviewer here; finding that most but not all tests involved Crucero Lagunitas maize, and that sympatric pairs don’t show higher gene flow; nicely contributes to the overall theme in the paper - the history of both subspecies is idiosyncratic and impacted by humans in ways that do not reflect geographic proximity that we did not anticipate (see expectations near line 110). We have emphasized the connection between f4 tests and the revised rdmc results near line 653.

      The paper also estimates demographic histories (changes in effective population sizes) for each population, and each sub-species together. The text (lines 191-194) says that "all histories estimated a bottleneck that started approximately 10 thousand generations ago" but I do not see this. Figure 2C (not 2E, as cited in the text) shows that teosinte had declines in all populations 10,000 generations ago, but some of these declines were very minimal. Maize has a similar pattern that started more recently, but the overall species history shows no change in effective size at all. There's not a lot of signal in these figures overall.

      I am also curious: how does the demographic model inferred by mushi address inbreeding and homozygosity by descent (lines 197-202)? In other words, why does a change in Ne necessarily affect inbreeding, especially when all effective population sizes are above 10,000?

      All maize populations show a decline beginning 10,000 generations ago. The smallest decline for maize is from 100,000 to 30,000. All teosinte populations show a reduction in population size. The smallest of these drops more than 70% from around 300,000 to 100,000. Three of the teosinte populations showed a reduction in population size from ~10^5 to ~10^3, which is well below 10,000. Thus all populations show declines.

      These large reductions should lead to inbreeding and increased homozygosity by descent. Mushi does not specifically model these features of the data, yet as we show, simulations under the model estimated by Mushi matched the true HBD levels fairly well (Figure 2D).

      The rangewide sample does not show declines, likely because there is enough isolation between populations that the reduction in variation at any given locus is not shared, and is maintained in the populations that did not experience the population decline.

      (2) Proportion of adaptive mutations.

      The paper estimates alpha, the proportion of nonsynonymous substitutions fixed by positive selection, using two different sampling schemes for polymorphism. One uses range-wide polymorphism data and one uses each of the single populations. Because the estimates using these two approaches are similar, the authors conclude that there is little local adaptation. However, this conclusion is not justified.

      There is little information as to how the McDonald-Kreitman test is carried out, but it appears that polymorphism within either teosinte or maize (using either sampling scheme) is compared to fixed differences with an outgroup. These species might be Z. luxurians or Z. diploperennis, as both are mentioned as outgroups. Regardless of which is used, this sampling means that almost all the fixed differences in the MK test will be along the ancestral branch leading to the ancestor of maize or teosinte, and on the branch leading to the outgroup. Therefore, it should not be surprising that alpha does not change based on the sampling scheme, as this should barely change the number of fixed differences (no numbers are reported).

      The lack of differences in results has little to do with range-wide vs restricted adaptation, and much more to do with how MK tests are constructed. Should we expect an excess of fixed amino acid differences on very short internal branches of each sub-species tree? It makes sense that there is more variation in alpha in teosinte than maize, as these branches are longer, but they all seem quite short (it is hard to know precisely, as no Fst values or similar are reported).

      The section “Genetic Diversity” in the methods provides details about how luxurians and diploperennis were used as outgroups. The section “Estimating the Rate of Positive Selection, α”, in the methods includes the definition of α and full joint non-linear regression equation and the software used to estimate it (brms), and the relevant citations crediting the authors of the original method. However, some of the relevant information about the SFS construction is provided in the previous section entitled, “Genetic Diversity”. We added reference to this in results near line 800.

      While we appreciate the concern that “almost all the fixed differences in the MK test will be along the ancestral branch leading to the ancestor of maize or teosinte”, this is only a problem if there aren’t enough fixed differences that are unshared between populations. This is more of a concern for maize than teosinte, which we make clear as a caveat in the manuscript in several places already. The fact that there is variation in alpha among teosinte populations is evidence that these counts do differ among pops. As we can see in the population trees in Figure 1, there is a considerable amount of terminal branch length for all the populations. Indeed if we look at the number of fixed differences at 0 fold sites across populations:

      The variation in the number of fixed differences, particularly across teosinte means that a large number cannot be shared between populations. We can estimate the fixed differences unique to each subpopulation (and total count) demonstrating that, in general, there are a large number of substitutions unique to each population. This is good evidence the rangewide estimates do not reflect a lack of variation within populations, at least not for teosinte. This is now included in the supplement (Table S3).

      Finally, we note that the branches leading to outgroups are likely not substantially longer than those among populations. Given our estimates of Ne, the coalescent within maize and teosinte should be relatively deep (with Ne of 30K it should be ~120K years). The divergence time between Zea mays and these outgroup taxa has been estimated at ~150K years (Chen et al. 2022). This is now mentioned in the text on line 407.

      We have added a caveat about the reviewers concern for the non-independence of fixed difference for maize near line 386.

      (3) Shared and private sweeps.

      In order to make biological inferences from the number of shared and private sweeps, there are a number of issues that must be addressed.

      One issue is false negatives and false positives. If sweeps occur but are missed, then they will appear to be less shared than they really are. Table S3 reports very high false negative rates across much of the parameter space considered, but is not mentioned in the main text. How can we make strong conclusions about the scale of local adaptation given this? Conversely, while there is information about the false positive rate provided, this information doesn't tell us whether it's higher for population-specific events. It certainly seems likely that it would be. In either case, we should be cautious saying that some sweeps are "locally restricted" if they can be missed more than 85% of the time in a second population or falsely identified more than 25% of the time in a single population.

      The reviewer brings up a worthwhile point. The simulation results indeed call into question how many of the sweeps we claim are exclusive to one population actually are. This caveat is already made, but we now make clearer the reviewer’s concern regarding the high false negative rate (near line 299). However, if anything this suggests sweeps are shared even more often than what is reported. One of the major takeaways from the paper is that convergent adaptation is more common than we expected. The most interesting part about the unique sweeps is the comparison between maize and teosinte. While the true proportions may vary, the relatively higher proportion of sweeps exclusive to one population in teosinte compared to maize is unlikely to be affected by false negatives, since the accuracy to identify sweeps pretty similar across subspecies (though perhaps with some exceptions for the populations with stronger bottlenecks). Further, these criticisms are specific to the raisd results. All sweeps shared across multiple populations were analyzed using rdmc. After adjustments made to the number of proposed sites for selection (see response below), there is good agreement between the raisd and rdmc results - the regions we proposed as selective sweeps with raisd all show evidence convergence using rdmc. Recall too that rdmc uses a quite different approach to inference - all populations are used jointly, labelling those that did and did not experience the sweep. If sweeps were present in populations that were labeled as neutral (or vice versa), this would weaken the power to infer selection at the locus. Much of the parameter space we explored is for quite weak selection, and the simulated analysis shows we are likely to miss those instances, often entirely. For strong sweeps, however, our simulations show we have appreciable accuracy.

      Together, there is reason to be optimistic about our detection of strong shared sweeps and that the main conclusions we make are sound.

      Finally, we note that we are unaware of any other empirical study that has performed similar estimates of the accuracy of the sweep calling in their data (as opposed to using simulations). We thus see these analyses as a significant contribution towards transparency that is completely lacking from most papers.

      A second, opposite, issue is shared ancestral events. Maize populations are much more closely related than teosinte (Figure 2B). Because of this, a single, completed sweep in the ancestor of all populations could much more readily show a signal in multiple descendant populations. This is consistent with the data showing more shared events (and possibly more events overall). There also appear to be some very closely (phylogenetically) related teosinte populations. What if there's selection in their shared ancestor? For instance, Los Guajes and Palmar Chico are the two most closely related populations of teosinte and have the fewest unique sweeps (Figure 4B). How do these kinds of ancestrally shared selective events fit into the framework here?

      The reviewer brings up another interesting point and one that likely impacts some of our results.

      As the reviewer describes, this is an issue that is of more concern to the more closely related populations and is less likely to explain results across the subspecies. We have added this as a caveat (near line 456). As is clear in the writing, sharing across subspecies is our primary interest for the rdmc results.

      These analyses of shared sweeps are followed by an analysis of sweeps shared by sympatric pairs of teosinte and maize. Because there are not more events shared by these pairs than expected, the paper concludes that geography and local environment are not important. But wouldn't it be better to test for shared sweeps according to the geographic proximity of populations of the same sub-species? A comparison of the two sub-species does not directly address the scale of adaptation of one organism to its environment, and therefore it is hard to know what to conclude from this analysis.

      We did not intend to conclude that local adaptation is not important. Especially for teosinte, we report and interpret evidence that many sweeps are happening exclusively to one population, which is consistent with the action of location adaptation and consistent with some of our expectations.

      More directly, this is another instance of us having clear hypotheses going into the paper and constructing specific analyses to test them. As we explain in the paper, we expected the scale of local adaptation to be very small, such that subspecies growing next to each other have more opportunities to exchange alleles that are locally adapted to their shared environment. The analysis we conducted makes sense in light of this expectation. We considered conducting tests regarding geographic proximity, but there is limited power with the number of populations we have within subspecies, and the meaning of the tests is unclear if all populations of both subspecies are naively included together. This analysis shows that, at least for sweeps and fixations, adaptation is larger than a single location. While it may not be a complete description on its own, the work here does provide information about the scale of adaptation and is useful to our overall claims and objectives of the paper. As mentioned in the paper, the story might be very different if we were to study through a lens of polygenic adaptation. We also now include in the discussion in several places mention of where broader sampling could improve inference.

      (4) Convergent adaptation

      My biggest concern involves the apparent main conclusion of the paper about the sources of "convergent adaptations". I believe the authors are misapplying the method of Lee and Coop (2017), and have not seriously considered the confounding factors of this method as applied. I am unconvinced by the conclusions that are made from these analyses.

      The method of Lee and Coop (referred to as rdmc) is intended to be applied to a single locus (or very tightly linked loci) that shows adaptation to the same environmental factor in different populations. From their paper: "Geographically separated populations can convergently adapt to the same selection pressure. Convergent evolution at the level of a gene may arise via three distinct modes." However, in the current paper, we are not considering such a restricted case. Instead, genome-wide scans for sweep regions have been made, without regard to similar selection pressures or to whether events are occurring in the same gene. Instead, the method is applied to large genomic regions not associated with known phenotypes or selective pressures.

      I think the larger worry here is whether we are truly considering the "same gene" in these analyses. The methods applied here attempt to find shared sweep regions, not shared genes (or mutations). Even then, there are no details that I could find as to what constitutes a shared sweep. The only relevant text (lines 802-803) describes how a single region is called: "We merged outlier regions within 50,000 Kb of one another and treated as a single sweep region." (It probably doesn't mean "50,000 kb", which would be 50 million bases.) However, no information is given about how to identify overlap between populations or sub-species, nor how likely it is that the shared target of selection would be included in anything identified as a shared sweep. Is there a way to gauge whether we are truly identifying the same target of selection in two populations?

      The question then is, what does rdmc conclude if we are simply looking at a region that happened to be a sweep in two populations, but was not due to shared selection or similar genes? There is little testing of this application here, especially its accuracy. Testing in Lee and Coop (2017) is all carried out assuming the location of the selected site is known, and even then there is quite a lot of difficulty distinguishing among several of the non-neutral models. This was especially true when standing variation was only polymorphic for a short time, as is estimated here for many cases, and would be confused for migration (see Lee and Coop 2017). Furthermore, the model of Lee and Coop (2017) does not seem to consider a completed ancestral sweep that has signals that persist into current populations (see point 3 above). How would rdmc interpret such a scenario?

      Overall, there simply doesn't seem to be enough testing of this method, nor are many caveats raised in relation to the strange distributions of standing variation times (bimodal) or migration rates (opposite between maize and teosinte). It is not clear what inferences can be made with confidence, and certainly the Discussion (and Abstract) makes conclusions about the spread of beneficial alleles via introgression that seem to outstrip the results.

      We have fixed the “50,000 Kb” typo.

      There are several important points the reviewer makes here worth considering. First and most importantly, the method of Lee and Coop (2017) actually does include sites as part of the composite likelihood calculation. For computational feasibility, the number of positions we initially considered was 20 (20 different positions along the input sequence were proposed as the site of the shared beneficial mutation). In efforts to further address the reviewer’s concern about adaptive mutations at distinct loci, we have increased the number of proposed selected sites to 200. This fact should greatly diminish the reviewer’s concern that we are picking up independent sweeps that happened at different nucleotide positions in the same region - evidence for a beneficial mutation must be shared by the selected populations at a proposed site. As the revisions show, this has modified the results of our paper in a number of ways, including changing all of the previous neutral regions to shared via standing variation or migration. Despite these changes, our previous conclusions are intact, including the pattern that migration rates are high when maize populations share the sweep. Relatedly, we disagree with the reviewer’s characterization of the migration results. The pattern is quite clear and makes sense - when a maize population is involved in the sweep, migration rate is inferred to be high. Sweeps exclusive to teosinte are rarer and are inferred to have a low migration rate. This relates directly to the idea that humans have moved maize relatively rapidly across the landscape.

      We have now included a plot showing how the difference between the maximum composite likelihood (CLE) site compares to the next highest CLE site varies across our inferences (Figure S8), which strongly suggests that patterns are not muddled across multiple loci, but are centered at a focal region where the beneficial allele is inferred to be located. While there are too many to show in the manuscript across all sweeps, here is a nice example of what inference looks like for one of the proposed sweep regions.

      Author response image 1.

      Furthermore, the situation the reviewer is describing would be selection acting on independent mutations (mutations at different loci), which would not create an increase in the amount of allele frequency covariance above and beyond what would be expected by drift under the migration and standing variation models.

      We also note that we are not alone in applying this approach to shared outlier signals in the absence of known genes; indeed the authors of the DMC method have applied it to regions of shared outlier signal themselves (e.g. https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008593).

      Reviewer #2 (Public Review):

      Summary:

      The authors sampled multiple populations of maize and teosinte across Mexico, aiming to characterise the geographic scale of local adaptation, patterns of selective sweeps, and modes of convergent evolution between populations and subspecies.

      Strengths & Weaknesses:

      The population genomic methods are standard and appropriate, including Fst, Tajima's D, α, and selective sweep scans. The whole genome sequencing data seems high quality. However, limitations exist regarding limited sampling, potential high false-positive sweep detection rates, and weak evidence for some conclusions, like the role of migration in teosinte adaptation.

      Aims & Conclusions:

      The results are interesting in supporting local adaptation at intermediate geographic scales, widespread convergence between populations, and standing variation/gene flow facilitating adaptation. However, more rigorous assessments of method performance would strengthen confidence. Connecting genetic patterns to phenotypic differences would also help validate associations with local adaptation.

      Impact & Utility:

      This work provides some of the first genomic insights into local adaptation and convergence in maize and teosinte. However, the limited sampling and need for better method validation currently temper the utility and impact. Broader sampling and connecting results to phenotypes would make this a more impactful study and valuable resource. The population genomic data itself provides a helpful resource for the community.

      Additional Context:

      Previous work has found population structure and phenotypic differences consistent with local adaptation in maize and teosinte. However, genomic insights have been lacking. This paper takes initial steps to characterise genomic patterns but is limited by sampling and validation. Additional work building on this foundation could contribute to understanding local adaptation in these agriculturally vital species.

      We appreciate the reviewer’s thoughtful reading of the paper and scrutiny. We hope that the added caveats made in response to reviewer 1 (as well as the previous rounds of peer review) will provide readers with the proper amount of skepticism in the accuracy of some of our initial sweep results, while also demonstrating that many of our conclusions are robust to the concerns raised over the various stages of review.

      We agree with the reviewer that better sampling and the incorporation inference about phenotypic data would be excellent additions, but the information is not available for the studied populations, and is outside scope of this paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - Sometimes alpha is described as a rate, and sometimes as a proportion. The latter is correct.

      We have updated this. Thanks.

      - Line 79: are they really "discrete" populations?

      The teosinte populations sampled are all clearly separated from each other and are physically discrete. The maize population samples came from individual farmer fields. Traditional maize is grown as open-pollinated (outcrossing) populations, and farmers save seed for subsequent generations. An individual farmer’s field thus behaves as a discrete population for our purposes, impacted of course by gene flow, selection, and other evolutionary processes.

      - Lines 418-420: "Large genomes may lead to more soft sweeps, where no single mutation driving adaptive evolution would fix (Mei et al. 2018)." I'm not sure I understand this statement. Why is this a property of genome size?

      Mei et al. 2018 lay out the logic, but essentially they present data arguing that the total number of functionally relevant base pairs increases with genome size (less than linearly). If true, genomes with a large number of potentially functional bp are more likely to undergo soft sweeps (see theory by Hermisson and Pennings cited in Mei et al. 2018).

      - Lines 500-1: selection does not cause one to underestimate effective population sizes. Selection directly affects Ne. I'm not sure what biases the sentences on lines 502-508 are trying to explain.

      We have simplified this section. Not accounting for linked selection (especially positive selection) results in a biased inference of demographic history. See Marsh and Johri (2024) for another example. https://doi.org/10.1093/molbev/msae118

      - Line 511-3: does Uricchio et al. (2019) show any difference in the estimate of alpha from Messer and Petrov (2013) when taking background selection into account?

      What we initially wrote was incorrect. The aMK method of Messer and Petrov (2013) accounts for weakly deleterious polymorphisms, but it does not account for positively selected ones. We have updated this text and suggested our method may underestimate alpha if positively selected segregating alleles are common (near line 539).

      - Lines 598-599: "which would limit the rate of new and beneficial mutations." I don't understand this - shouldn't a bottleneck only affect standing variation? Why would a bottleneck affect new mutations?

      This is simply to say that during the low Ne period of a bottleneck, fewer total mutations (and therefore beneficial mutations) will be generated since there are fewer individuals for mutations to occur in. We have changed “rate” to amount to clarify we do not mean the mutation rate itself.

      Reviewer #2 (Recommendations For The Authors):

      Experiments/Analyses:

      (1) Consider simulating polygenic adaptation in addition to hard and soft sweeps to see if this improves the power to detect adaptive signatures shared between populations. This could involve simulating the coordinated change in allele frequencies across many loci to match a specified shift in trait value due to selection. The ability to detect shared polygenic adaptation between population replicates could be assessed using methods tailored to polygenic signals, such as the Polygenic Selection Score approach. Comparing the power to detect shared polygenic adaptation versus shared hard and soft sweeps would provide further insight into what adaptive modes current methods can uncover. If the power to detect shared polygenic adaptation is very low, the extent of shared adaptation between populations may be even more common than currently inferred. Adding simulations of polygenic adaptation would strengthen the study.

      While this would be a worthwhile undertaking in general, it would be a considerable amount of work outside of the scope and aims of this paper.

      (2) Explore using machine learning approaches like S/HIC to improve power over summary statistic methods potentially.

      We in fact put considerable effort into applying diplo S/HIC before switching to raisd for this project. While predictions on simulations had good power to detect sweeps, we found that applying to our actual data had a dubious number of windows classified as sweeps (e.g. >90% of the genome), which we believed to be false positives. We speculated that this may have to do with sensitivity to demographic or other types of misspecification in the simulations, such as our choice of window sizes compared to local recombination rates. It would likely be fruitful to our further efforts into using machine learning methods for maize and teosinte, but a deeper exploration of the right hyper parameters and simulation choices is likely needed to apply them effectively.

      (3) Increase geographic sampling density, if possible, especially near population pairs showing high differentiation, to better understand the scale of local adaptation.

      We agree this would be valuable research. Hopefully this work inspires further efforts into the question of the spatial and temporal scales of local adaptation with more ambitious spatial sampling designed at the onset

      Writing/Presentation:

      (1) Provide more intuition about the biological interpretation of the migration rates inferred under the migration model of convergence. What do the rates imply about the amount or timing of gene flow?

      We have expanded the discussion sections (starting near line 653) to elaborate on the migration results and connect the rdmc and f4 tests more explicitly. The timing of gene flow is more challenging to address directly with the approaches we used, but we agree it would be interesting to explore more in future papers.

      (2a) Expand the discussion of power limitations and the need for simulation tests. Consider adding ROC curves for sweep detection on simulated data. The relatively low proportion of shared selective sweeps between population replicates highlights limitations in the power to detect sweeps, especially incomplete or soft sweeps. I think it would be a good idea to expand the discussion of the power tradeoffs shown in the simulation analyses. In particular, the ROC curves in Figure S4 clearly show how power declines for weaker selection coefficients across the different sweep types. I suggest making these ROC curves part of the main figures to feature the issue of power limitations more prominently.

      (2b) The discussion would benefit from commenting on how power changes across the sweep simulation scenarios. Adding a summary figure to visualise the effects of sweep type, selection strength, and frequency on detectability could further clarify the power constraints. Stating the proportion of sweeps likely missed strengthens the argument that sharing adaptive alleles is likely even more common than inferred. Discussing power will also motivate the need for developing methods with improved abilities to uncover incomplete and soft sweeps.

      While these are useful suggestions (2a and 2b), the aim of this paper at its core is empirical, and was not intended to give an exhaustive analysis of the power to detect sweeps. We report what parts of the analysis may be impacted by low power and what aspects of our inferences have higher uncertainty due to power. We agree that there is more work to be done to improve methods to detect selection given our findings (see below concerning our efforts to use machine learning as well). While we do not highlight this in the paper, we also note that ours is one of extremely few empirical studies that actually perform power analyses on real data (as opposed to simulations). We think this extra transparency by itself is of substantial utility to the community in demonstrating that the results from simulation studies performed in publications describing a method do not necessarily translate well to empirical data.

      (3) Improve clarity in describing f4 test results. Consider visualising results on a map to show spatial patterns.

      We have expanded the discussion concerning f4 tests (see several comments to reviewer 1). We are not clear on how to effectively visualize f4 spatially, but hope the updates have made the results more clear.

      Minor:

      -  Increase the font size of figure axis labels for improved readability.

      We have looked over and figures and increased font sizes where possible.

      -  Add units to selection coefficient axis labels in Figure 5.

      Selection coefficients are derived in Lee and Coop (2017) from classical population genetics theory. They do not have units, but denote the relative fitness advantage of the heterozygous genotype carrying the beneficial mutation of interest.

      -  Fix the typo 'cophenetic' in Figure S3 caption.

      Fixed. Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study provides convincing evidence on the infraslow oscillation of DG cells during NREM sleep, and how serotonergic innervation modulates hippocampal activity pattern during sleep and memory.

      Strengths and Weaknesses:

      The authors used state-of-the-art techniques to carry out these experiments. Given that the functional role of infraslow rhythm still remains to be studied, this study provides convincing evidence of the role of DG cells in regulating infraslow rhythm, sleep microarchitecture, and memory.

      I have a few minor comments.

      (1) Decreased infraslow rhythm during NREMs in the 5ht1a KO mice is striking. It would be helpful to know whether sleep-wake states, MAs, and transitions to REMs are changed.

      We agree with the reviewer that serotonin receptors may be involved in sleep regulation therefore it is important to analyze the effect of their manipulation. We would also like to bring to the attention of the reviewer that in this case we restricted the 5ht1a manipulation to the hippocampus which does not have a known impact on sleep-wake regulation. The analysis of our recorded dataset from these mice confirmed this notion, because we did not see any changes in sleep metrics (see: supplementary figure 6A).

      (2) It would be interesting to discuss whether the magnitude in changes of infraslow rhythm strength is correlated with memory performance (Figure 6).

      We agree with the reviewer that this could be an interesting point. In our experiments we wanted to minimize the impact of the surgical procedures on the behavior, thus we used separate cohorts to record the photometry and to carry out the behavior experiments, therefore we are unable to correlate behavior and infraslow oscillatory amplitudes in our dataset.

      However, a similar experiment was carried out in a recent paper where the authors discovered that the norepinephrine system also displays infraslow oscillatory cycles during NREM sleep (Kjaerby et al 2022). The authors of that paper gradually decreased the magnitude of the NE pulses during NREM by optogenetic manipulation of the locus coeruleus which led to a fragmented sleep phenotype characterized by increased micro arousal occurrence, decreased REM and reduced spindle activity. They also tested the memory performance of the mice in a novel object recognition task and found diminished performance level in the opto group. Serotonin has multiple roles in the brain, many of them show overlap with proposed functions of the noradrenergic system including regulation of plasticity, signaling reward or fearful stimuli. Therefore, we speculate that the modification of serotonin dynamics during sleep will most likely interfere with memory performance.

      We inserted this paragraph in the discussion part of our paper.

      (3) The authors should cite the Oikonomou Neuron paper that describes slow oscillatory activity of DRN SERT neurons during NREM sleep.

      Thank you for the suggestion, we inserted this paper in the manuscript.

      (4) The authors should clarify how they define the phasic pattern of the photometry signal.

      We have added the details in the Methods.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated DG neuronal activity at the population and single-cell level across sleep/wake periods. They found an infraslow oscillation (0.01-0.03 Hz) in both granule cells (GC) and mossy cells (MC) during NREM sleep.

      The important findings are:

      (1) The antiparallel temporal dynamics of DG neuron activities and serotonin neuron activities/extracellular serotonin levels during NREM sleep, and

      (2) The GC Htr1a-mediated GC infraslow oscillation.

      Strengths:

      (1) The combination of polysomnography, Ca-fiber photometry, two-photon microscopy, and gene depletion is technically sound. The coincidence of microarousals and dips in DG population activity is convincing. The dip in activity in upregulated cells is responsible for the dip at the population level.

      (2) DG GCs express excitatory Htr4 and Htr7 in addition to inhibitory Htr1a, but deletion of Htr1a is sufficient to disrupt DG GC infraslow oscillation, supporting the importance of Htr1a in DG activity during NREM sleep.

      Weaknesses:

      (1) The current data set and analysis are insufficient to interpret the observation correctly.

      a. In Figure 1A, during NREM, the peaks and troughs of GC population activities seem to gradually decrease over time. Please address this point.

      Thank you for the suggestion. We have analyzed and compared the magnitude of the oscillatory signals in the first and last minute of the NREM sleep epochs in Dock10-Cre mice and found no significant difference. However, we did observe that the ISO amplitude is smaller in the early stage of the first NREM epochs, defined as those with the prior wakefulness longer than 5 minutes (new supplementary figure 1).

      b. In Figure 1F, about 30% of Ca dips coincided with MA (EMG increase) and 60% of Ca dips did not coincide with EMG increase. If this is true, the readers can find 8 Ca dips which are not associated with MAs from Figure 1E. If MAs were clustered, please describe this properly.

      We did not find evidence that MAs were clustered in our dataset (see a representative example in supplementary figure 1A). We replaced the example trace with a new one which shows calcium dips with and without MAs. We believe this new trace better represents the data.

      c. In Figure 1F, the legend stated the percentage during NREM. If the authors want to include the percentage of wake and REM, please show the traces with Ca dips during wake and REM. This concern applies to all pie charts provided by the authors.

      Figure 1F (and all other pie charts) shows the outcome of brain states following a calcium-dip episode. That is, we found that the Ca-dips during NREM were followed by MAs in 30% of the cases, 59% of the Ca-dips led to the maintenance of NREM (no MAs) while in 2% and 9% of the cases we detected either REM state or wakening of the animal. These numbers correspond very well with similar analysis done in a recent paper which looked at the infraslow oscillatory behavior of the norepinephrine system (Kjaerby et al 2022) during NREM sleep. We apologize if the wording in the manuscript was misleading, we modified the figure legends to clarify what the pie charts represent. 

      d. In Figure 1C, please provide line plots connecting the same session. This request applies to all related figures.

      We have replaced the dot plots in all related figures with the line plots. 

      e. In Figure 2C, the significant increase during REM and the same level during NREM are not convincing. In Figure 2A, the several EMG increasing bouts do not appear to be MA, but rather wakefulness, because the duration of the EMG increase is greater than 15 seconds. Therefore, it is possible that the wake bouts were mixed with NREM bouts, leading to the decrease of Ca activity during NREM. In fact, In Figure 2E, the 4th MA bout seems to be the wake bout because the EMG increase lasts more than 15 seconds.

      We have replaced the Figure 2C with line plots as suggested above. It is clear that MC activity during REM sleep is higher, compared to that in NREM sleep, whereas the overall difference between wake and NREM is not significant (some increased, some decreased). Regarding the MAs, we have added a trace of averaged EMG signals in Figure 2G, showing that the averaged EMG bursts during MA are shorter than 5 seconds.

      f. Figure 5D REM data are interesting because the DRN activity is stably silenced during REM. The varied correlation means the varied DG activity during REM. The authors need to address it.

      We thank the reviewer for this suggestion. We have added this point to the discussion. We speculate that inputs from the supramammillary nucleus or entorhinal cortex to the DG during REM sleep may both contribute to this variability.

      g. In Figure 6, the authors should show the impact of DG Htr1a knockdown on sleep/wake structure including the frequency of MAs. I agree with the impact of Htr1a on DG ISO, but possible changes in sleep bout may induce the DG ISO disturbance.

      As suggested, we have performed sleep analysis in the Htr1a knockdown experiments including MA quantification. We have found no significant difference between Hrt1-knockdown and control mice in any of the sleep metrics (see: supplemental figure 6). Our interpretation is that the lack of changes in sleep/wake cycles is likely due to the hippocampus not being directly involved in regulating these brain states.

      (2) It is acceptable that DG Htr1a KO induces the reduced freezing in the CFC test (Figure 6E, F), but it is too much of a stretch that the disruption of DG ISO causes impaired fear memory. There should be a correlation.

      We have modified the discussion accordingly.

      (3) It is necessary to describe the extent of AAV-Cre infection. The authors injected AAV into the dorsal DG (AP -1.9 mm), but the histology shows the ventral DG (Supplementary Figure 4), which reduces the reliability of this study.

      The histology image shown in the manuscript was taken from the -2.5 mm anteroposterior level, which we still consider to be part of the dorsal DG. For additional clarity, we have replaced the figure with new histology images slightly more anterior position (AP~2.0mm). 

      Reviewer #3 (Public review):

      Summary:

      The authors employ a series of well-conceived and well-executed experiments involving photometric imaging of the dentate gyrus and raphe nucleus, as well as cell-type specific genetic manipulations of serotonergic receptors that together serve to directly implicate serotonergic regulation of dentate gyrus (DG) granule (GC) and mossy cell (MC) activity in association with an infra slow oscillation (ISO) of neural activity has been previously linked to general cortical regulation during NREM sleep and microarousals.

      Strengths:

      There are a number of novel and important results, including the modulation of dentage granule cell activity by the infraslow oscillation during NREM sleep, the selective association of different subpopulations of granule cells to microarousals (MA), the anticorrelation of raphe activity with infraslow dentate activity.

      The discussion includes a general survey of ISOs and recent work relating to their expression in other brain areas and other potential neuromodulatory system involvement, as well as possible connections with infraslow oscillations, micro-arousals, and sensory sensitivity.

      Weaknesses:

      (1) The behavioral results showing contextual memory impairment resulting from 5-HT1a knockdown are fine but are over-interpreted. The term memory consolidation is used several times, as well as references to sleep-dependence. This is not what was tested. The receptor was knocked down, and then 2 weeks later animals were found to have fear conditioning deficits. They can certainly describe this result as indicating a connection between 5-HT1a receptor function and memory performance, but the connection to sleep and consolidation would just be speculation. The fact that 5-HT1a knockdown also impacted DG ISOs does not establish dependency. Some examples of this are:

      a. The final conclusion asserts "Together, our study highlights the role of neuromodulation in organizing neuronal activity during sleep and sleep-dependent brain functions, such as memory.". However, the reported memory effects (impairment of fear conditioning) were not shown to be explicitly sleep-dependent.

      We thank the reviewer for this comment. We have revised the sentence.

      b. Earlier in the discussion it mentions "Finally, we showed that local genetic ablation of 5-HT1a receptors in GCs impaired the ISO and memory consolidation". The effect shown was on general memory performance - consolidation was not specifically implicated.

      We have revised the sentence.

      (2) The assertion on page 9 that the results demonstrate "that the 5-HT is directly acting in the DG to gate the oscillations" is a bit strong given the magnitude of effect shown in Figure 6D, and the absence of demonstration of negative effect on cortical areas that also show ISO activity and could impact DG activity (see requested cortical sigma power analysis).

      We have revised the sentence.

      (3) Recent work has shown that abnormal DG GC activity can result from the use of the specific Ca indicator being used (GCaMP6s). (Teng, S., Wang, W., Wen, J.J.J. et al. Expression of GCaMP6s in the dentate gyrus induces tonic-clonic seizures. Sci Rep 14, 8104 (2024). https://doi.org/10.1038/s41598-024-58819-9). The authors of that study found that the effect seemed to be specific to GCaMP6s and that GCaMP6f did not lead to abnormal excitability. Note this is of particular concern given similar infraslow variation of cortical excitability in epilepsy (cf Vanhatalo et al. PNAS 2004). While I don't think that the experiments need to be repeated with a different indicator to address this concern, you should be able to use the 2p GCaMP7 experiments that have already been done to provide additional validation by repeating the analyses done for the GCaMP6s photometry experiments. This should be done anyway to allow appropriate comparison of the 2p and photometry results.

      We would like to thank the reviewer for this comment. We also analyzed the two-photon data in the same manner as the photometry data. However, the only supportive evidence that might be related to ISO in the two-photon data, recorded at the somatic level, was decreased fluorescence during MAs in the NREM-upregulated cell group (see Figure 3 D, E). We are unsure why this discrepancy exists, but we have discussed it in the manuscript and offered some alternative explanations. One hypothesis we are currently exploring relates to the different subcellular compartments sampled by the two imaging techniques. The photometry probe was implanted above the dentate gyrus, and since light collection efficiency declines sharply with distance from the probe tip (Pisano et al., 2019), we hypothesize that ISO is stronger at the dendritic level which directly receive the inputs from entorhinal cortex, and which is closest to the probe's tip. We are now conducting multiplane two-photon imaging experiments in our labs to test this hypothesis.

      (4) While the discussion mentions previous work that has linked ISOs during sleep with regulation of cortical oscillations in the sigma band, oddly no such analysis is performed in the current work even though it is presumably available and would be highly relevant to the interpretation of a number of primary results including the relationship between the ISOs and MAs observed in the DG and similar results reported in other areas, as well as the selective impact of DG 5-HT1a knockdown on DG ISOs. For example, in the initial results describing the cross-correlation of calcium activity and EMG/EEG with MA episodes (paragraph 1, page 4), similar results relating brief arousals to the infraslow fluctuation in sleep spindles (sigma band) have been reported also at .02 Hz associated with variation in sensory arousability (cf. Cardis et al., "Cortico-autonomic local arousals and heightened somatosensory arousability during NREMS of mice in neuropathic pain", eLife 2021). It would be important to know whether the current results show similar cortical sigma band correlations. Also, in the results on ISO attenuation following 5-HT1 knockdown on page 7 (Figure 6), how is cortical EEG affected? Is ISO still seen in EEG but attenuated in DG?

      Thank you for this valuable comment. We performed the analysis and found a positive correlation between cortical sigma band activity and DG activity during NREM sleep (see supplementary figure 1C-1E). Additionally, we conducted further analyses using the local 5-HT1a KO mouse model but did not observe significant changes in sleep architecture or MA frequency (see supplementary figure 6A). It is also important to note that ISO was only analyzed using calcium signals, not EEG signals. The standard filtering settings in our EEG data collection (0.5-500 Hz) do not allow us to analyze signals in such a low-frequency range.

      (5) The illustrations of the effect of 5-HT1a knockdown shown in Figure 6 are somewhat misleading. The examples in panels B and C show an effect that is much more dramatic than the overall effect shown in panel D. Panels B and C do not appear to be representative examples. Which of the sample points in panel D are illustrated in panels B and C? It is not appropriate to arbitrarily select two points from different animals for comparison, or worse, to take points from the extremes of the distributions. If the intent is to illustrate what the effect shown in D looks like in the raw data, then you need to select examples that reflect the means shown in panel D. It is also important to show the effect on cortical EEG, particularly in sigma band to see if the effects are restricted to the DG ISOs. It would also be helpful to show that MAs and their correlations as shown in Figure 1 or G as well as broader sleep architecture are not affected.

      We agree with the reviewer that the chosen example may appear somewhat exaggerated. However, we must point out that visually assessing missing or downregulated frequency components can be challenging. To provide a more objective presentation, we included Supplementary Figure 6B-C, in which we performed analysis similar to that in Fig1G in 5HT1a mice. These figures show a significant decrease in ISO amplitude, though the blockade is not complete, due to the incomplete nature of genetic manipulation with viral injection (see Suppl Fig 5). Furthermore, recent studies (Dong et al., 2023; Zhang et al., 2024; Kjaerby et al., 2022) have identified several other neuromodulatory and peptidergic systems that might affect DG activity during MAs.

      To explore this further, we conducted pharmacological experiments. We administered 8-hydroxy-DPAT, a 5-HT1a agonist (i.p. 1 mg/kg) in Dock10-Cre mice injected with AAV-FLEX-GcaMP6s in the DG. Since 5-HT1a receptors act as autoreceptors on raphe 5-HT neurons, this treatment effectively silences the serotonergic system, thereby “removing” 5-HT signaling from the brain. The results, shown in Author response image 1, indicate that pharmacological suppression of 5-HT dampens the ISO in the DG during subsequent sleep intervals, with ISO recovering after the drug is washed out. These findings are consistent with the results obtained with the more specific local genetic manipulation. We have not included this result in the manuscript because we believe that the local downregulation is a cleaner experiment whose interpretation is more straightforward.

      Author response image 1.

      Finally, we also performed sleep analysis in 5-HT1a KO mice, showing that the local downregulation of 5-HT1a receptors had no significant effect on sleep metrics (Suppl Fig 6A). The hippocampus is not typically involved in regulating sleep-wake cycles, so we believe this result is consistent with that understanding.

      (6) On page 9 of the results it states that GCs and MCs are upregulated during NREM and their activity is abruptly terminated by MAs through a 5-HT mediated mechanism. I didn't see anything showing the 5-HT dependence of the MA activity correlation. The results indicate a reduction in ISO modulation of GC activity but not the MA-correlated activity. I would like to see the equivalent of Figure 1,2 G panels with the 5-HT1a manipulation.

      We agree with the reviewer on this point. We did not conduct any pharmacological or genetic manipulation in 2-photon calcium imaging experiments. We have removed that statement. As for the suggested analysis, please see our explanation above (Suppl Fig 6B-C).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Since the authors did not monitor DG neuronal activity with an electrophysiological tool, please rephrase the following sentence: "In this study, we investigated the neuronal activity of the dentate gyrus (DG) with electrophysiological and optical imaging tools during sleep-wake cycles." in the Abstract.

      We have rephrased the sentence as suggested.

      (2) Since the authors did not manipulate the serotonin release during sleep to investigate whether serotonin release modulates DG ISO, please edit the following sentence: "Further experiments revealed that the infraslow oscillation in the DG is modulated by rhythmic serotonin release during sleep" in the Abstract.

      We have rephrased the sentence as suggested.

      (3) Single-cell recording in DG with two-photon microscopy may address the issue raised in the 4th paragraph of the Discussion. In addition, in Fig 6C, the photometry has only captured the diminished oscillation in Htr1a KO, but cannot distinguish whether the activity levels of GC remain at high or low, which is a clear disadvantage of photometry.

      We agree with the reviewer, and have added text to the discussion.

      Reviewer #3 (Recommendations for the authors):

      (1) Some of the figures are missing labels in the spectrogram panels (e.g. no freq units in Figures 4 and 6).

      We have added information in those figures.

      (2) Missing specific locations for EEG electrodes/screws. The text states "we predrilled 2 holes on the right side of the skull (1.5 mm posterior of the Bregma) for implanting recording electrodes". 2 holes on the right side of the skull are pretty vague.

      We have added this information in the Methods.

      (3) Some additional work that could be cited particularly when discussing the serotonergic impact on hippocampal function as it might relate to sleep and memory would include work linking mesopontine activity (both serotonergic and non-serotonergic) to memory-associated hippocampal sharp-wave ripple activity (e.g. Jelitai et al. Front. Neural Circ. 2021, Wang et al Nat. Neuro. 2015).

      We have cited these papers.

      (4) The work cited at the beginning of the Results describing higher population calcium activity during sleep states (15,18,30) is generally appropriate but not explicitly related to GCamP imaging. Pilz et al. "Functional Imaging of Dentate Granule Cells in the Adult Mouse Hippocampus", J.Neurosci. 2016 might be a more relevant citation.

      We have added the citation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the three reviewers for their positive comments and useful suggestions. We have implemented most of the reviewers’ recommendations and hope the manuscript is clearer now.

      The main modifications are:

      - A revision of the introduction to better explain what Transitional Probabilities are and clarify the rationale of the experimental design

      - A revision of the discussion

      - To tune down and better explain the interpretation of the different responses between duplets after a stream with phonetic or voice regularities (possibly an N400).

      - To better clarify the framing of statistical learning as a universal learning mechanism that might share computational principles across features (or domains).

      Below, we provide detailed answers to each reviewer's point.

      Response to Reviewer 1:

      There are no significant weaknesses to signal in the manuscript. However, in order to fully conclude that there is no obvious advantage for the linguistic dimension in neonates, it would have been most useful to test a third condition in which the two dimensions were pitted against each other, that is, in which they provide conflicting information as to the boundaries of the words comprised in the artificial language.

      This last condition would have allowed us to determine whether statistical learning weighs linguistic and non-linguistic features equally, or whether phonetic content is preferentially processed.

      We appreciate the reviewers' suggestion that a stream with conflicting information would provide valuable insights. In the present study, we started with a simpler case involving two orthogonal features (i.e., phonemes and voices), with one feature being informative and the other uninformative, and we found similar learning capacities for both. Future work should explore whether infants—and humans more broadly—can simultaneously track regularities in multiple speech features. However, creating a stream with two conflicting statistical structures is challenging. To use neural entrainment, the two features must lead to segmentation at different chunk sizes so that their effects lead to changes in power/PLV at different frequencies—for instance, using duplets for the voice dimension and triplets for the linguistic dimension (or vice versa). Consequently, the two dimensions would not be directly comparable within the same participant in terms of the number of distinguishable syllables/voices, memory demand, or SNR given the 1/F decrease in amplitude of background EEG activity. This would involve comparisons between two distinct groups counter-balancing chunk size and linguistic non-linguistic dimension. Considering the test phase, words for one dimension would have been part-words for the other dimension. As we are measuring differences and not preferences, interpreting the results would also have been difficult. Additionally, it may be difficult to find a sufficient number of clearly discriminable voices for such a design (triplets imply 12 voices). Therefore, an entirely different experimental paradigm would need to be developed.

      If such a design were tested, one possibility is that the regularities for the two dimensions are calculated in parallel, in line with the idea that the calculation of statistical regularities is a ubiquitous implicit mechanism (see Benjamin et al., 2024, for a proposed neural mechanism). Yet, similar to our present study, possibly only phonetic features would be used as word candidates. Another possibility is that only one informative feature would be explicitly processed at a time due to the serial nature of perceptual awareness, which may prioritise one feature over the other.

      We added one sentence in the discussion stating that more research is needed to understand whether infants can track both regularities simultaneously (p.13, l.270 “Future work could explore whether they can simultaneously track multiple regularities.”).

      Note: The reviewer’s summary contains a typo: syllabic rate (4 Hz) –not 2 Hz, and word rate (2 Hz) –not 4 Hz.

      Response to Reviewer 2:

      N400: I am skeptical regarding the interpretation of the phoneme-specific ERP effect as a precursor of the N400 and would suggest toning it down. While the authors are correct in that infant ERP components are typically slower and more posterior compared to adult components, and the observed pattern is hence consistent with an adult N400, at the same time, it could also be a lot of other things. On a functional level, I can't follow the author's argument as to why a violation in phoneme regularity should elicit an N400, since there is no evidence for any semantic processing involved. In sum, I think there is just not enough evidence from the present paradigm to confidently call it an N400.

      The reviewer is correct that we cannot definitively determine the type of processing reflected by the ERP component that appears when neonates hear a duplet after exposure to a stream with phonetic regularities. We interpreted this component as a precursor to the N400, based on prior findings in speech segmentation tasks without semantic content, where a ~400 ms component emerged when adult participants recognised pseudowords (Sander et al., 2002) or during structured streams of syllables (Cunillera et al., 2006, 2009). Additionally, the component we observed had a similar topography and timing to those labelled as N400 in infant studies, where semantic processing was involved (Parise et al., 2010; Friedrich & Friederici, 2011).

      Given our experimental design, the difference we observed must be related to the type of regularity during familiarisation (either phonemes or voices). Thus, we interpreted this component as reflecting lexical search— a process which could be triggered by a linguistic structure but which would not be relevant to a non-linguistic regularity such as voices. However, we are open to alternative interpretations. In any case, this difference between the two streams reveals that computing regularities based on phonemes versus voices does not lead to the same processes.

      We revised the abstract (p.2, l.33) and the discussion of this result (p.15, l.299), toning them down. We hope the rationale of the interpretation is clearer now, as is the fact that it is just one possible interpretation of the results.

      Female and male voices: Why did the authors choose to include male and female voices? While using both female and male stimuli of course leads to a higher generalizability, it also introduces a second dimension for one feature that is not present for this other (i.e., phoneme for Experiment 1 and voice identity plus gender for Experiment 2). Hence, couldn't it also be that the infants extracted the regularity with which one gender voice followed the other? For instance, in List B, in the words, one gender is always followed by the other (M-F or F-M), while in 2/3 of the part-words, the gender is repeated (F-F and M-M). Wouldn't you expect the same pattern of results if infants learned regularities based on gender rather than identity?

      We used three female and three male voices to maximise acoustic variability. The streams were synthesised using MBROLA, which provides a limited set of artificial voices. Indeed, there were not enough French voices of acceptable quality, so we also used two Italian voices (the phonemes used existed in both Italian and French).

      Voices differ in timbre, and female voices tend to be higher pitched. However, it is sometimes difficult to categorise low-pitched female voices and high-pitched male voices. Given that gender may be an important factor in infants' speech perception (newborns, for instance, prefer female voices at birth), we conducted tests to assess whether this dimension could have influenced our results.

      We report these analyses in SI and referred to them in the methods section (p.25, l.468 “We performed post-hoc tests to ensure that the results were not driven by a perception of two voices: female and male (see SI).”).

      We first quantified the transitional probabilities matrices during the structured stream of Experiment 2, considering that there are only two types of voices: Female and Male.

      For List A, all transition probabilities are equal to 0.5 (P(M|F), P(F|M), P(M|M), P(F|F)), resulting in flat TPs throughout the stream (see Author response image 1, top). Therefore, we would not expect neural entrainment at the word rate (2 Hz), nor would we anticipate ERP differences between the presented duplets in the test phase.

      For List B, P(M|F)=P(F|M)=0.66 while P(M|M)=P(F|F)=0.33. However, this does not produce a regular pattern of TP drops throughout the stream (see Author response image 1, bottom). As a result, strong neural entrainment at 2 Hz was unlikely, although some degree of entrainment might have occasionally occurred due to some drops occurring at a 2 Hz frequency. Regarding the test phase, all three Words and only one Part-word presented alternating patterns (TP=0.6). Therefore, the difference in the ERPs between Words and Part- words in List B might be attributed to gender alternation.

      However, it seems unlikely that gender alternation alone explains the entire pattern of results, as the effect is inconsistent and appears in only one of the lists. To rule out this possibility, we analysed the effects in each list separately.

      Author response image 1.

      Transition probabilities (TPs) across the structured stream in Experiment 2, considering voices processed by gender (Female or Male). Top: List A. Bottom: List B.

      We computed the mean activation within the time windows and electrodes of interest and compared the effects of word type and list using a two-way ANOVA. For the difference between Words and Part-words over the positive cluster, we observed a main effect of word type (F(1,31) = 5.902, p = 0.021), with no effects of list or interactions (p > 0.1). Over the negative cluster, we again observed a main effect of word type (F(1,31) = 10.916, p = 0.0016), with no effects of list or interactions (p > 0.1). See Author response image 2.

      Author response image 2:

      Difference in ERP voltage (Words – Part-words) for the two lists (A and B); W=Words; P=Part-Words,

      We conducted a similar analysis for neural entrainment during the structured stream on voices. A comparison of entrainment at 2 Hz between participants who completed List A and List B showed no significant differences (t(30) = -0.27, p = 0.79). A test against zero for each list indicated significant entrainment in both cases (List A: t(17) = 4.44, p = 0.00036; List B: t(13) = 3.16, p = 0.0075). See Author response image 3.

      Author response image 3.

      Neural entrainment at 2Hz during the structured stream of Experiment 2 for Lists A and B.

      Words entrainment over occipital electrodes: Do you have any idea why the duplet entrainment effect occurs over the electrodes it does, in particular over the occipital electrodes (which seems a bit unintuitive given that this is a purely auditory experiment with sleeping neonates).

      Neural entrainment might be considered as a succession of evoked response induced by the stream. After applying an average reference in high-density EEG recordings, the auditory ERP in neonates typically consists of a central positivity and a posterior negativity with a source located at the electrical zero in a single-dipole model (i.e. approximately in the superior temporal region (Dehaene-Lambertz & Dehaene, 1994). In adults, because of the average reference (i.e. the sum of voltages is equal to zero at each time point) and because the electrodes cannot capture the negative pole of the auditory response, the negativity is distributed around the head. In infants, however, the brain is higher within the skull, allowing for a more accurate recording of the negative pole of the auditory ERP (see Figure 4 for the location of electrodes in an infant head model).

      Besides the posterior electrodes, we can see some entrainment on more anterior electrodes that probably corresponds to the positive pole of the auditory ERP.

      We added a phrase in the discussion to explain why we can expect phase-locked activity in posterior electrodes (p.14, l.277: “Auditory ERPs, after reference-averaged, typically consist of a central positivity and posterior negativity”).

      Author response image 4:

      International 10–20 sensors' location on the skull of an infant template, with the underlying 3-D reconstruction of the grey-white matter interface and projection of each electrode to the cortex. Computed across 16 infants (from Kabdebon et al, Neuroimage, 2014). The O1, O2, T5, and T6 electrodes project lower than in adults.

      Response to Reviewer 3:

      (1) While it's true that voice is not essential for language (i.e., sign languages are implemented over gestures; the use of voices to produce non-linguistic sounds, like laughter), it is a feature of spoken languages. Thus I'm not sure if we can really consider this study as a comparison between linguistic and non-linguistic dimensions. In turn, I'm not sure that these results show that statistical learning at birth operates on non-linguistic features, being voices a linguistic dimension at least in spoken languages. I'd like to hear the authors' opinions on this.

      On one hand, it has been shown that statistical learning (SL) operates across multiple modalities and domains in human adults and animals. On the other hand, SL is considered essential for infants to begin parsing speech. Therefore, we aimed to investigate whether SL capacities at birth are more effective on linguistic dimensions of speech, potentially as a way to promote language learning.

      We agree with the reviewer that voices play an important role in communication (e.g., for identifying who is speaking); however, they do not contribute to language structure or meaning, and listeners are expected to normalize across voices to accurately perceive phonemes and words. Thus, voices are speech features but not linguistic features. Additionally, in natural speech, there are no abrupt voice changes within a word as in our experiment; instead, voice changes typically occur on a longer timescale and involve only a limited number of voices, such as in a dialogue. Therefore, computing regularities based on voice changes would not be useful in real-life language learning. We considered that contrasting syllables and voices was an elegant way to test SL beyond its linguistic dimension, as the experimental paradigm is identical in both experiments.

      We have rephrased the introduction to make this point clearer. See p.5, l.88-92: “To test this, we have taken advantage of the fact that syllables convey two important pieces of information for humans: what is being said and who is speaking, i.e. linguistic content and speaker’s identity. While statistical learning…”.

      Along the same line, in the Discussion section, the present results are interpreted within a theoretical framework showing statistical learning in auditory non-linguistic (string of tones, music) and visual domains as well as visual and other animal species. I'm not sure if that theoretical framework is the right fit for the present results.

      (2) I'm not sure whether the fact that we see parallel and independent tracking of statistics in the two dimensions of speech at birth indicates that newborns would be able to do so in all the other dimensions of the speech. If so, what other dimensions are the authors referring to?

      The reviewer is correct that demonstrating the universality of SL requires testing additional modalities and acoustic dimensions. However, we postulate that SL is grounded in a basic mechanism of long-term associative learning, as proposed in Benjamin et al. (2024), which relies on a slow decay in the representation of a given event. This simple mechanism, capable of operating on any representational output, accounts for many types of sequence learning reported in the literature (Benjamin et al., in preparation).

      We have revised the discussion to clarify this theoretical framework.

      In p.13, l.264: “This mechanism might be rooted in associative learning processes relying on the co- existence of event representations driven by slow activation decays (Benjamin et al., 2024). ”

      In p., l. 364: “Altogether, our results show that statistical learning works similarly on different speech features in human neonates with no clear advantage for computing linguistically relevant regularities in speech. This supports the idea that statistical learning is a general learning mechanism, probably operating on common computational principles across neural networks (Benjamin et al., 2024)…”.

      (3) Lines 341-345: Statistical learning is an evolutionary ancient learning mechanism but I do not think that the present results are showing it. This is a study on human neonates and adults, there are no other animal species involved therefore I do not see a connection with the evolutionary history of statistical learning. It would be much more interesting to make claims on the ontogeny (rather than philogeny) of statistical learning, and what regularities newborns are able to detect right after birth. I believe that this is one of the strengths of this work.

      We did not intend to make claims about the phylogeny of SL. Since SL appears to be a learning mechanism shared across species, we use it as a framework to suggest that SL may arise from general operational principles applicable to diverse neural networks. Thus, while it is highly useful for language acquisition, it is not specific to it.

      We have removed the sentence “Statistical learning is an evolutionary ancient learning mechanism.”, and replaced it by (p.18, l.364) “Altogether, our results show that statistical learning works similarly on different speech features in human neonates with no clear advantage for computing linguistically relevant regularities in speech.” We now emphasise in the discussion that infants compute regularities on both features and propose that SL might be a universal learning mechanism sharing computational principles (Benjamin et al., 2024) (see point 2).

      (4) The description of the stimuli in Lines 110-113 is a bit confusing. In Experiment 1, e.g., "pe" and "tu" are both uttered by the same voice, correct? ("random voice each time" is confusing). Whereas in Experiment 2, e.g., "pe" and "tu" are uttered by different voices, for example, "pe" by yellow voice and "tu" by red voice. If this is correct, then I recommend the authors to rephrase this section to make it more clear.

      To clarify, in Experiment 1, the voices were randomly assigned to each syllable, with the constraint that no voice was repeated consecutively. This means that syllables within the same word were spoken by different voices, and each syllable was heard with various voices throughout the stream. As a result, neonates had to retrieve the words based solely on syllabic patterns, without relying on consistent voice associations or specific voice relationships.

      In Experiment 2, the design was orthogonal: while the syllables were presented in a random order, the voices followed a structured pattern. Similar to Experiment 1, each syllable (e.g., “pe” and “tu”) was spoken by different voices. The key difference is that in Experiment 2, the structured regularities were applied to the voices rather than the syllables. In other words, the “green” voice was always followed by the “red” voice for example but uttered different syllables.

      We have revised the description of the stimuli and the legend of Figure 1 to clarify these important points.

      See p.6, l. 113: “The structure consisted of the random concatenation of three duplets (i.e., two-syllable units) defined only by one of the two dimensions. For example, in Experiment 1, one duplet could be petu with each syllable uttered by a random voice each time they appear in the stream (e.g pe is produced by voice1 and tu by voice6 in one instance and in another instance pe is produced by voice3 and tu by

      voice2). In contrast, in Experiment 2, one duplet could be the combination [voice1- voice6], each uttering randomly any of the syllables.”

      p.20, l. 390 (Figure 1 legend): “For example, the two syllables of the word “petu” were produced by different voices, which randomly changed at each presentation of the word (e.g. “yellow” voice and “green” voice for the first instance, “blue” and “purple” voice for the second instance, etc..). In Experiment 2, the statistical structure was based on voices (TPs alternated between 1 and 0.5), while the syllables changed randomly (uniform TPs of 0.2). For example, the “green” voice was always followed by the “red” voice, but they were randomly saying different syllables “boda” in the first instance, “tupe” in the second instance, etc... “

      (5) Line 114: the sentence "they should compute a 36 x 36 TPs matrix relating each acoustic signal, with TPs alternating between 1/6 within words and 1/12 between words" is confusing as it seems like there are different acoustic signals. Can the authors clarify this point?

      Thank you for highlighting this point. To clarify, our suggestion is that neonates might not track regularities between phonemes and voices as separate features. Instead, they may treat each syllable-voice combination as a distinct item—for example, "pe" spoken by the "yellow" voice is one item, while "pe" spoken by the "red" voice is another. Under this scenario, there would be a total of 36 unique items (6 syllables × 6 voices), and infants would need to track regularities between these 36 combinations.

      We have modified this sentence in the manuscript to make it clearer.

      See p.7, l. 120: “If infants at birth compute regularities based on a neural representation of the syllable as a whole, i.e. comprising both phonetic and voice content, this would require computing a 36 × 36 TPs matrix relating each token.”

      Reviewer #1 (Recommendations for the authors):

      (1) The acronym TP should be spelled out, and a brief description of the fact that dips in TPs signal boundaries while high TPs signal a cohesive unit could be useful for non-specialist readers.

      We have added it at the beginning of the introduction (lines 52-60)

      (2) p.5, l.76: "Here, we aimed to further characterise the characteristics of this mechanism...". I suggest this is rephrased as "to further characterise this mechanism".

      We have changed it as suggested by the reviewer (now p.5, l.81)

      (3) p.9, l.172: "[...] this contribution is unlikely since the electrodes differ from the electrodes, showing enhanced word-rate activity at 2 Hz."

      It is unclear which electrodes differ from which electrodes. I figure that the authors mean that the electrodes showing stronger activity at 2 Hz differ from those showing it at 4 Hz, but the sentence could use rephrasing.

      This part has been rephrased (p.9, l.177-181)

      (4) p.10, l.182: "[...] the entrainment during the first minute of the structure stream [… ]".

      Structured stream.

      It has been corrected (p.10, l.190)

      (5) p.12, l.234: "we compared STATISTICAL LEARNING"

      Why the use of capitals?

      This was an error and it was corrected (p.12, l.242).

      (6) p.15, l.298: "[...] suggesting that such negativity might be related to semantic."

      The sentence feels incomplete. To semantics? To the processing of semantic information?

      The phrase has been corrected (p.15, l.314). Additionally, the discussion of the posterior negativity observed for duplets after familiarisation with a stream with regularities over phonemes has been rephrased (p.15, l.)

      (7) Same page, l.301: "3-mo-olds" 3-month-olds.

      It has been corrected (now in p.16, l.333)

      (8) Same page, l.307: "(see also (Bergelson and Aslin, 2017)" (see also Bergelson and Aslin, 2017).

      It has been corrected (now in p.17, l.340)

      (9) Same page, l.310: "[...] would be considered as possible candidate" As possible candidates.

      This has been rephrased and corrected (now in p.17, l.343)

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 2: The authors mention a "thick orange line", which I think should be a "thick black line".

      We are sorry for this. It has been corrected.

      (2) Ln 166: Should be Figure 2C rather than 3C.

      It has been corrected (now in p.9, l.173)

      (3) Figure 4 is not referenced in the manuscript.

      We referred to it now on p. 12, l.236

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors set out to define the molecular basis for LP as the origin of BRCA1deficient breast cancers. They showed that LPs have the highest level of replicative stress, and hypothesise that this may account for their tendency to transform. They went on to identify ELF3 as a candidate driver of LP transformation and showed that ELF3 expression is up-regulated in response to replicative stress as well as BRCA1 deficiency. They went on to show that ELF3 inactivation led to a higher level of DNA damage, which may result from compromised replicative stress responses.

      While the manuscript supports the interesting idea wherein ELF3 may fuel LP cell transformation, it remains obscure how ELF3 promotes cell tolerance to DNA damage. Interestingly the authors proposed that ELF3 suppresses excessive genomic instability, but in my opinion, I do not see any evidence that supports this claim. In fact, one might think that genomic instability is key to cell transformation.

      We greatly appreciate your thorough review and insightful comments on our manuscript. We have taken your feedback seriously and have made several key revisions to address your concerns.

      To your primary point about how ELF3 helps cells tolerate DNA damage, we have expanded our discussion to clarify the role of ELF3 in the context of BRCA1 deficiency and high replicative stress. We clarified that while ELF3 may not directly suppress excessive genomic instability, it plays a role in maintaining a balance that prevents catastrophic damage in BRCA1-deficient cells. Both BRCA1 deficiency and increased replication stress induce up-regulation of ELF3, which acts as a transcription factor, and it’s up-regulation leads to up-regulation of the expression of a variety of DNA replication-associated proteins that help to maintain homeostasis in the DNA replication process (Figure 5 E and F). Defects in ELF3 also do lead to disruption of the DNA replication process (Figure 5 G-I). While ELF3 cannot completely eliminate genomic instability, ELF3 essentially maintains genomic instability within a dangerous yet non-lethal range: higher than in normal cells, but not so high as to cause cell death.

      This precarious balance can facilitate the transformation of LPs into a malignant state, as you pointed out.

      In the revised manuscript, we emphasized that in cells with inherently low replicative stress, such as other non-LP mammary cells, the ELF3-associated mechanism might help cells endure the high replicative stress caused by BRCA1 deficiency without leading to cancerous changes. However, in LP cells, which naturally experience higher replicative stress, this ELF3-related mechanism may make them more susceptible to transformation into cancer cells. This supports our hypothesis that the combination of high replicative stress and BRCA1 deficiency specifically predisposes LP cells to tumorigenesis.

      We have modified the working model to make it clearer.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript focuses on a persistent question of why germline mutations in BRCA1 which impair homology-directed repair of DNA double-strand breaks predispose to primarily breast and ovarian cancers but not other tissues. The authors propose that replication stress is elevated in the luminal progenitor (LP) cells and apply the gene signature from Dreyer et al as a measure of replication stress in populations of cells selected by FACS previously (published by Lim et al.) and suggest an enrichment of replication stress among the LP cells. This is followed by single-cell RNA seq data from a small number of breast tissues from a small number of BRCA1 mutation carriers but the pathogenic variants are not listed. The authors perform an elegant analysis of the effects of BRCA1 knockdown in MCF10A cells, but these cells are not considered a model of LP cells.

      Overall, the manuscript suffers from significant gaps and leaps in logic among the datasets used. The connection to luminal progenitor cells is not adequately established because the models used are not representative of this population of cells. Therefore, the central hypothesis is not sufficiently justified.

      Strengths:

      The inducible knockdown of BRCA1 provided compelling data pointing to an upregulation of ELF3 in this setting as well as a small number of other genes. It would be useful to discuss the other genes for completeness and explain the logic for focusing on ELF3. Nonetheless, the connection with ELF 3 is reasonable. The authors provide significant data showing a role for ELF3 in breast epithelial cells and its role in cell survival.

      Weaknesses:

      The initial observations in primary breast cells have small sample sizes. The mutations in BRCA1 seem to be presumed to be all the same, but we know that pathogenic variants differ among individuals and range from missense mutations affecting interactions with one critical partner to large-scale truncations of the protein.

      The figure legends are missing critical details that make it difficult for the reader to evaluate the data. The data support the notion that ELF3 may participate in relieving replication stress, but does not appear to be limited to LP cells as proposed in the hypothesis.

      We would like to sincerely thank you for your thorough review and constructive feedback on our manuscript. Your insightful comments and suggestions have been invaluable in guiding our revisions.

      (1) Acknowledgment of Data Set Limitations and Additional Analyses:    We fully acknowledge the importance of the concerns raised regarding the datasets used in our study. We have supplemented our manuscript with the missing information you pointed out and conducted additional analyses as suggested. These efforts have

      (2) Challenges in LP Cell Experiments:

      One of the most critical issues you raised was the lack of validation in LP cells, particularly concerning the role of ELF3 in these cells. We are acutely aware of the significance of this point. Following your review, we made extensive efforts to isolate and culture LP cells from both BRCA1-proficient and BRCA1-deficient patient samples. We tried various methods and invested substantial resources, including time, manpower, and materials, to establish a reliable protocol for isolating and cultivating LP cells in vitro. Unfortunately, despite our best efforts, we were unable to obtain a sufficient number of high-quality cells to generate solid and reproducible results.

      The challenges we faced included the limited availability of patient tissues and the technical difficulties in consistently obtaining viable LP cells. Given the already extended timeline for the revision of this manuscript, we regretfully decided to forgo further attempts to perform these critical experiments with LP cells. In the revised manuscript, we have explicitly addressed the limitations of our cell models and provided a detailed discussion of the challenges faced in isolating LP cells. Despite these limitations, we believe that the consistency between our results and LP cell sequencing data provides valuable insights and a solid foundation for future studies.

      (3) Data Presentation Improvements:

      In response to your feedback, we have also made significant improvements to the data presentation in our manuscript. We updated and optimized figure legends and narrative sections to ensure that the data are clearly and accurately conveyed. These changes aim to enhance the readability and comprehensibility of our findings.

      We greatly appreciate your valuable feedback, which has significantly contributed to the improvement of our manuscript. Your suggestions have helped us refine our arguments and present a more robust and nuanced interpretation of our data. 

      Thank you once again for your critical and constructive review. We look forward to your feedback on our revised manuscript.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):  

      As such, in addition to consolidating the role of ELF3 in promoting cell tolerance to replicative stress (or in suppressing genomic instability), I have a few comments the authors should consider to improve their manuscript.  

      (1) I am not sure how cells have gained a growth advantage if they were arrested (Line 105-106). Perhaps the authors can elaborate.

      Thanks for pointing this out and we are sorry for the misleading statement. We have revised the manuscript and would like to clarify that “survival advantage” may be more accurate than “growth advantage”, and since long-term DOX treatment led to decreased cell survival indicated by decreased number of colonies in Supplemental Fig. S1D, thus many cells died during DOX treatment. Therefore, the cells able to survive throughout DOX treatment and being collected for sequencing may have gained survival advantage compared to their counterparts who fail to survive.  

      (2) Figure 3D - From Western blotting of ELF3, forced expression of E2F6 does not appear to "block" HU-induced ELF3 up-regulation, but merely down-regulate basal level of ELF3, with the effect of HU still notable.

      Thanks for the comment and we agree that E2F6 down-regulate ELF3 baseline expression levels and did not fully block ELF3 up-regulation. After calculating the foldchange after E2F6 overexpression, we did confirm that E2F6 overexpression still partially block HU-induced ELF3 up-regulation, with foldchange from 3.32 to 2.40, supporting our conclusion that HU-induced ELF3 upregulation is regulated by ATRChk1-E2F axis. It does, however, cannot be excluded that E2F6 also regulates ELF3 expression in other replication stress-independent ways, and we have revised the manuscript accordingly. 

      (3) Figure 3J & K - In my opinion, if BRCA1 knockdown were more efficient it remains formally possible that co-depletion of BRCA1 and GATA3 may exhibit additive effects in up-regulating ELF3 mRNA level.

      Thank you for the comment. Actually, the BRCA1 knockdown efficiency in Figure 3J was shown in Supplemental Fig. S3B, and notably both BRCA1 and GATA3 knockdown were numerically more efficient in the double-knockdown group than in the single-knockdown group, individually. Thus, the higher ELF3 up-regulation in double-knockdown group in Figure 3J could be cause by the superior knockdown efficiency of both BRCA1 and GATA3. Nonetheless, we agree that it might be possible that BRCA1 and GATA3 still have separate functions in this experimental setting and marginal additive effect may exist, and the manuscript was revised accordingly.

      (4) Figure 4 - Perhaps the authors can change its title to better summarise the findings. Cell sensitivity assays and xenograph experimentations may not necessarily relate to genomic instability.

      Thank you for the great suggestion. To summarize the results more accurately, we have revised the title as “ELF3 can help cells tolerate replication stress and sustain cell survival”.

      (5) Figure 5B&C - It would be important to document the time-dependent resolution of HU-induced DNA lesions by including additional time-points before, during, and after HU treatment.

      We appreciate the suggestion to include additional time points to document the timedependent resolution of HU-induced DNA lesions. In our experiments, we observed that ELF3 knockdown leads to genomic instability both in the presence and absence of HU treatment. Specifically, Figure 5A and Figure S5 demonstrate that ELF3 knockdown increases genomic instability without HU treatment, indicating its role in maintaining genomic stability under normal conditions. On the other hand, Figure 5B, 5C, and 5D show that ELF3 knockdown under HU-induced replication stress further exacerbates genomic instability. This observation aligns with our finding that ELF3 expression increases in response to replication stress, suggesting its critical role in maintaining replication homeostasis under such conditions. 

      6) Figure 5F&I - Which ELF3 siRNA was used in these experimentations? Since the authors did not exclude off-target effects perhaps it may be worthwhile to include both ELF3 siRNAs for Panel F.

      Thanks for your advice. The qPCR (Figure 5F) and DNA fiber assay (Figure 5I) were using siELF3-4 siRNA. And we repeat the qPCR experiments for Panel F using siELF3-5 siRNA (Supplement Fig. S5B).

      We sincerely thank you for your thoughtful feedback and constructive suggestions. Addressing these points has strengthened our manuscript, and we are grateful for the opportunity to refine and clarify our work. We appreciate your critical evaluation and look forward to further constructive dialogue.

      Reviewer #2 (Recommendations For The Authors):  

      (1) The data driving the hypothesis uses gene expression signatures as an indirect measure of replication stress. This is a critical concern.

      a. At this time, numerous gene expression signatures have been reported to be biomarkers of replication stress. Therefore, it would be valuable to apply additional gene expression signatures to examine the performance and the overlap in the results.

      The recent work by Takahashi et al., 2022 (https://pubmed.ncbi.nlm.nih.gov/36381660/) provides a signature that was derived independently and offers one that can be used to assess the performance of the signatures and stability of the conclusions.

      Thank you for the valuable suggestion. We have done the replication stress evaluation of mammary cell subgroups using the Repstress score developed in the work you mentioned. The result showed that LP cells have trends of higher replication stress compared with other subgroups, though no statistical significance. This result, consistent with our previous analysis, indicated that LP cells have higher trends of replication stress levels. And we have added this data as the last line of Figure 1A in revised vision.

      Author response image 1.

      Replication stress pathway scores of different human normal mammary cell  populations. The gene expression data were from Lim et al. (3).

      b. A direct measure of replication stress in LP cells would be important to confirm the gene expression signature. Therefore, performing immunostaining for markers of replication stress (eg gamma-H2AX foci, DNA fiber assays) would provide more direct data to support the assertions.

      Thank you for this suggestion and we totally agree that experiments revealing replication stress levels by investigating common markers, e.g., gamma-H2AX foci, DNA fiber assays, will provide vital evidence for our hypothesis. However, since our last response, we have been diligently trying to obtain LP cells for these experiments but encountered technical challenges while attempting to isolate and culture LP cells in vitro. 

      In the discussion part, we have revised the manuscript to emphasize that the data obtained from MCF10A should be interpreted with caution and there are certain gaps between the cell models and LP cells.

      (2) The depth of single-cell sequencing can often be limiting. Therefore, a supplementary table listing the genes used for the replication stress signature and the frequency that they are observed in the single-cell sequencing data. This is needed to ensure that the replication stress score does not reflect a small subset of the replication stress signature genes.

      Thanks very much for this evaluable suggestion. We have provided an expression matrix of genes for the replication stress signature in the revised version (Supplementary Table S1), And we also calculated the average expression level of each gene in the cells. As shown in Author response image 2, these genes expressed relatively low at the single-cell level (with counts≤10), The expression differences among genes are relatively small. Thus, we excluded the possibility that several high-expressed genes significantly affect the replicative stress score.

      Author response image 2.

      Average counts of Top 50 genes for the replication stress signature

      (3) As only 4 BRCA mutation carriers are analyzed, it is critical that the mutations be reported for these individuals because pathogenic variants differ in their effects and interactions with the DNA repair machinery in cells.

      Thanks for the suggestion and the information of 4 BRCA1 mutant carriers were added in Supplemental Table S2.

      (4) The figures throughout lack critical details making it difficult to evaluate. Figure 1A states that these are "replication stress pathway scores..." but there is no evaluation of levels of statistical differences. The heat map has what appears to be a log unit score between +2 and -2 but it is unclear whether it is log2 or log10 or some other unit. In 1B, the replication stress scores are visualized as relative values between 0 and 0.1, but there is no indication of what this means or whether there is a statistically significant difference in the levels among the populations. As tumors are composed of multiple cell types, it should be stated how the "tumor cells" are uniquely identified in the figure legend. The lack of critical information is common across many of the figures making review frustratingly difficult.

      Thanks for the suggestion. We have added the statistical analysis and scale in Figure 1A legend. For Figure 1B, replication stress was calculated by sum of replication stress gene expression and presented as ln value. We have provided a quantitative figure and statistical tests (by Mann-Whitney) of replication stress scores for various cell types (Supplementary Figure 1A). 

      In addition, we added details of identification of tumor cells in the method section in the revised manuscript. Briefly, the adjacent normal breast sample served as a control to filter various types of normal cells from tumor samples. the normal cells from the tumor sample were merged with the same types of normal cells from adjacent normal breast samples, leaving one cell cluster only generalized by tumor sample. These tumor specific clusters were considered as malignant cell populations. We further found that the malignant cell population showed higher UMI counts than the normal cell populations, consistent with active metabolism in the malignant cells. More importantly, ER, PR, and HER2 expression of the malignant cells in each case were exactly matched with the clinical records. Finally, we utilized InferCNV to validate malignant cells subset as higher copy number alterations (CNAs) detected in the malignant cells compared with normal cells.

      (5) The hypothesis states that the LP cells are uniquely sensitive to deficiency in BRCA1 compared to other cells. However, the authors use knockdown of BRCA1 in MCF10A cells which are generally considered to be basal cells and not LP cells.

      Thanks for the comment. We totally agree that MCF10A cannot reflect the LP features and was mainly used as a normal mammary cell line model. We have tried to obtain human LP to perform some experiments but have all failed due to the cell vulnerability and difficult to be passed on in vitro. The gap between MCF10A and LP cells was stressed in the discussion part.

      (6) Figure 2, the number of samples being compared is not listed for most of the panels. It appears that ELF3 is enriched in subsets of breast cancers, but much of the data is not focused on BRCA1-deficient tumors. Therefore, the data appears to show that ELF3 expression is more of a generalized feature of TNBCs (which has been reported previously) and dilutes the support for the hypothesis. Therefore, panels C-G raise concerns regarding the overall hypothesis that LP cells are the cell type that is affected.

      Thanks for the suggestion. We have added the number of samples in Figure 2 legends.

      Our analysis focus on basal subtype because of the well-known relationship between BRCA1 deficiency and this subtype. Our results demonstrate the association between ELF3 expression and basal, TNBC, as well as HER2+ subtype, consistent with previous reports. Since TNBC also has high replication stress levels (NPJ Breast Cancer. 2020 Sep 7;6:40.), ELF3 upregulation in this subtype may not be solely due to BRCA1 deficiency, and we totally agree that this analysis may dilute the relationship between ELF3 and BRCA1. We have revised the discussion part to be more precise on this. 

      (7) Figure 3 provides experimental support for the hypothesis. While panel A is of interest, the legend lacks any description beyond "normal mammary tissue" and that there are non-carriers and carriers of BRCA1 mutations. Is this from bulk RNAseq data or single-cell RNAseq data? How many carriers and how many noncarriers? Panel E is ENCODE data from MCF7 cells that are ER+ luminal subtype so it is unclear if this is relevant to the LP cells that are the focus of the hypothesis.

      Thanks for the comments. Figure 3 panel A was from single-cell RNAseq data, including 3 BRCA1 WT patients and 4 BRCA1 mutant patients. All cells (normal cells and tumor cells) are involving, and ELF3 expression was normalized by reads in each cell. We have added this information in the figure legend. 

      It has been difficult to obtain ENCODE data in LP cells. The effect of E2F1 on regulation of ELF3 was validated in MCF10A cells by experiment and consistent with MCF7 ENCODE data, thus we suggest this effect can be conserve in mammary cells, but further confirmation in LP cells is needed. We have revised the manuscript to note that.

      (8) In Figure 4, the authors use BRCA1-deficient breast cancer cells to show the reliance on ELF3 and suggest that this is specific to this genetic lesion and not other subtypes. However, there is no data to show that this is not observed using ER+ cells or TNBC that are not BRCA1-deficient cell lines or models.

      Thanks for pointing this out. As ELF3 knockdown in MCF10A resulted in increased genomic instability (Supplement Fig S5) and less capability to resolve replication stress (Figure 5B), we believe that ELF3 can help deal with replication stress not specifically in BRCA1-mutant cells, but also normal mammary cells, and also multiple cell lines with distinct backgrounds as suggested in Figure 4G, 4H and Supplement Fig S4G. The special link between ELF3 and BRCA1 is reflected by ELF3 significant upregulation upon BRCA1 deficiency, but not ELF3 downstream functions. 

      (9) Figure 5 provides the first direct evaluation of biomarkers of replication stress (gamma H2AX, 53BP1). DNA fiber assays provide the most direct evaluation of replication fork kinetics, and therefore, replication stress. The knockdown of BRCA1 and ELF3 appear to phenocopy one another in the HCC1937, but there is no other cell type to show whether this is specific for BRCA1-deficient cells. For example, the MCF7 cells show E2F1 binding to ELF3 (Figure 3E) and may show replication stress upon knockdown of ELF3. Without testing this, the authors cannot suggest that the effect is linked to BRCA1 status. The authors do not identify the BRCA1 mutation in these cells and whether there is homozygous loss. Similarly, the mutational status in the SUM149PT cells should also be stated. These need to be added to aid interpretation of the results.

      Thank you for the constructive advice. We have added information regarding BRCA1 status of HCC1937 and SUM149PT. As discussed before, the results from Figure 4G and 4H suggest that ELF3 expression is associated with sensitivity to replicationstress-inducing-drugs across many cell lines. Thus ELF3 can maintain the stability of DNA replication is not specific to BRCA1-deficient cells. The reliance of ELF3 in BRCA1-deficiency we proposed is mainly focus on the fact that ELF3 is upregulated in BRCA1 deficient conditions, plus ELF3 may help cells tolerate replication stress during the transformation, therefore the resulted tumor cells-that is BRCA1-deficient breast cancer cells-may be more sensitive when losing ELF3 expression.

      (10) While the data in Figure 6 are valuable extensions of the gene signature derived from the MCF10A cells with BRCA1 knockdown, only 2 BRCA1 carriers are reported. As carriers bear heterozygous mutations in BRCA1, haplo-insufficiency would be necessary to generate the signature. The authors do note the publication by Panthania et al, but there are relatively few examples of haploinsufficiency. It should be noted that Sedic et al., 2015 also suggested haploinsufficiency in breast epithelial cell cultures from BRCA1 heterozygotes which appears to cause premature senescence, possibly via replication stress. However, this was observed in the basal epithelial cells. Therefore, this appears to be a feature of the breast epithelium more generally and is not enriched or limited to the LP cells.

      Thanks very much for your valuable suggestion. We have revised the discussion part to involve this important work and we fully agree that BRCA1 deficiency can cause replication stress not limited to LP cells. While in fact, the point we would like to address in Figure 6 is that BRCA1 deficiency modules the transcription profile towards LP-like cells, but not other-subtype-like cells, in normal mammary cells. We observed surprisingly similar profile between BRCA1-deficient cells and LP cells, suggesting there might be an inherent function of BRCA1 to mediate LP genes transcription. Furthermore, the data indicate that ELF3 has a tighter association with LP genes than other recognized LP-specific transcription factors like ELF5 and EHF, which are of the same family of ELF3. This result is intriguing since ELF3 can be upregulated by BRCA1 deficiency and replication stress. We assume that ELF3 could be a transcription node downstream of BRCA1 deficiency and modulate LP genes expression, and this process might be limited to LP cells since ELF3 has the highest expression levels in LP. Nonetheless, this hypothesis is also needed to be validated in LP cells by experiments. 

      We would like to express our deepest gratitude to the reviewers for their thorough and constructive feedback. Their insightful comments have been invaluable in guiding the revisions of our manuscript, helping us to clarify our hypotheses and strengthen the presentation of our findings. While we encountered some challenges, particularly with the isolation and culturing of LP cells, we made significant efforts to address the reviewers' concerns to the best of our ability. We have updated our manuscript accordingly, ensuring that all issues raised have been addressed comprehensively. We believe that these revisions have substantially improved the quality and clarity of our work, and we are excited to share our findings with the scientific community. Thank you once again for the opportunity to revise our manuscript, and we look forward to your feedback on the updated version.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This paper by Poverlein et al reports the substantial membrane deformation around the oxidative phosphorylation super complex, proposing that this deformation is a key part of super complex formation. I found the paper interesting and well-written but identified a number of technical issues that I suggest should be addressed:

      We thank Reviewer 1 for finding our work interesting. We have addressed the technical issues below.

      (1) Neither the acyl chain chemical makeup nor the protonation state of CDL are specified. The acyl chain is likely 18:2/18:2/18:2/18:2, but the choice of the protonation state is not straightforward.

      We thank the Reviewer for highlighting this missing information. We have now added this information in the Materials and Methods section:

      "…were performed in a POPC:POPE:cardiolipin (2:2:1) membrane containing 5 mol% QH<sub>2</sub> / Q (1:1 ratio). Cardiolipin was modeled as tetraoleoyl cardiolipin (18:1/18:1/18:1/18:1) with a headgroup modeled in a singly protonated state (with Q<sub>tot</sub>=-1)."

      (2) The analysis of the bilayer deformation lacks membrane mechanical expertise. Here I am not ridiculing the authors - the presentation is very conservative: they find a deformed bilayer, do not say what the energy is, but rather try a range of energies in their Monte Carlo model - a good strategy for a group that focuses on protein simulations. The bending modulus and area compressibility modulus are part of the standard model for quantifying the energy of a deformed membrane. I suppose in theory these might be computed by looking at the per-lipid distribution in thickness fluctuations, but this route is extremely perilous on a per-molecule basis. Instead, the fluctuation in the projected area of a lipid patch is used to imply the modulus [see Venable et al "Mechanical properties of lipid bilayers from molecular dynamics simulation" 2015 and citations within]. Variations in the local thickness of the membrane imply local variations of the leaflet normal vector (the vector perpendicular to the leaflet surface), which is curvature. With curvature and thickness, the deformation energy is analyzed.

      See:

      Two papers: "Gramicidin A Channel Formation Induces Local Lipid Redistribution" by Olaf Andersen and colleagues. Here the formation of a short peptide dimer is experimentally linked to hydrophobic mismatch. The presence of a short lipid reduces the influence of the mismatch. See below regarding their model cardiolipin, which they claim is shorter than the surrounding lipid matrix.

      Also, see:

      Faraldo-Gomez lab "Membrane transporter dimerization driven by differential lipid solvation energetics of dissociated and associated states", 2021. Mondal et al "Membrane Driven Spatial Organization of GPCRs" 2013 and many citations within these papers.

      While I strongly recommend putting the membrane deformation into standard model terms, I believe the authors should retain the basic conservative approach that the membrane is strongly deformed around the proteins and that making the SC reduces the deformation, then exploring the consequences with their discrete model.

      We thank the Reviewer for the suggestions and for pointing out the additional references, which are now cited in the revised manuscript. The analysis is indeed significantly more complex for large multi-million atom supercomplexes in comparison to small peptides (gramicidin A) or model systems of lipid membranes. However, in the revised manuscript, we have conducted further analysis on the membrane curvature effects based on the suggestions. We were able to estimate the energetic contribution of the changes in local membrane thickness and curvature, which are now summarized in Table 1, and described in the main text and SI. We find that both the curvature and local thickness contribute to the increased stability of SC.

      We have now extensively modified the result to differentiate between different components of membrane strain properly:

      "We observe a local decrease in the membrane thickness at the protein-lipid interface (Fig. 2G, Fig S2A,D,E), likely arising from the thinner hydrophobic belt region of the OXPHOS proteins (ca. 30 Å, Fig. S1A) relative to the lipid membrane (40.5 Å, Fig. S1). We further observe ∼30% accumulation of cardiolipin at the thinner hydrophobic belt regions (Fig. 2H, Fig. S2B,F,G), with an inhomogeneous distribution around the OXPHOS complexes. While specific interactions between CDL and protein residues may contribute to this enrichment (Fig. 2N), CDL prefers thermodynamically thinner membranes (∼38 Å, Fig. S1B, Fig. S5F). These changes are further reflected in the reduced end-toend distance of lipid chains in the local membrane belt (see Methods, Fig. S6, cf. also Refs. (41-44). In addition to the perturbations in the local membrane thickness, the OXPHOS proteins also induce a subtle inward curvature towards the protein-lipid interface (Fig. S5G), which could modulate the accessibility of the Q/QH2 substrate into the active sites of CI and CIII<sub>2</sub> (see below, section Discussion). This curvature is accompanied by a distortion of the local membrane plane itself (Fig. 2A-F, Fig. S4AC, Fig. S7), with perpendicular leaflet displacements reaching up to ~2 nm relative to the average leaflet plane.

      To quantify the membrane strain effects, we analyzed the cgMD trajectories by projecting the membrane surface onto a 2-dimensional grid and calculating the local membrane height and thickness at each grid point. From these values, we quantified the local membrane curvature (Fig. S5H), which measures the energetic cost of deforming the membrane from a flat geometry (ΔG<sub>curv</sub>). We also computed the energetics associated with changes in the membrane thickness, assessed from the deviations from an ideal local membrane in the absence of embedded proteins (ΔG<sub>thick</sub>, see Supporting Information, for technical details). Our analysis suggests that both contributions are substantially reduced upon formation of the SC, with the curvature decreasing by 19.8 ± 1.3 kcal mol-1 and the thickness penalty by 2.8 ± 2.0 kcal mol-1 (Table 1). These results indicate a significant thermodynamic advantage for SC formation, as it minimizes lipid deformation and stabilizes the membrane environment surrounding Complex I and III.”

      […]

      “Taken together, the analysis suggests that the OXPHOS complexes affect the mechanical properties of the membranes by inducing a small inwards curvature towards the protein-lipid interface (Fig. S5), resulting in a membrane deformation effect, while the SC formation releases some deformation energy relative to the isolated OXPHOS complexes. The localization of specific lipids around the membrane proteins, as well as local membrane perturbation effects, is also supported by simulations of other membrane proteins (45, 46), suggesting that the effects could arise from general protein-membrane interactions.”

      Our Supporting Information section now provides additional information about the membrane curvature.

      (41) R. M. Venable, F. L. H. Brown, R. W. Pastor, Mechanical properties of lipid bilayers from molecular dynamics simulation. Chemistry and Physics of Lipids 192, 60-74 (2015).

      (42) R. Chadda et al., Membrane transporter dimerization driven by differential lipid solvation energetics of dissociated and associated states. eLife 10, e63288 (2021).

      (43) S. Mondal et al., Membrane Driven Spatial Organization of GPCRs. Scientific Reports 3, 2909 (2013).

      (44) J. A. Lundbæk, S. A. Collingwood, H. I. Ingólfsson, R. Kapoor, O. S. Andersen, Lipid bilayer regulation of membrane protein function: gramicidin channels as molecular force probes. Journal of The Royal Society Interface 7, 373-395 (2009).

      We also expanded our SI Method section to account for the new calculations:

      “Analysis of lipid chain end-to-end length

      To probe the protein-induced deformation effect of the membrane, the membrane curvature (H), and the end-to-end distance between the lipid chains, were computed based on aMD and cgMD simulations. The lipid chain length was computed from simulations A1-A6 and C1 based on the first and last carbon atoms of each lipid chain. For example, the end-to-end length of a cardiolipin chain was determined as the distance between atom “CA1” and atom “CA18”.

      “Membrane Curvature and Deformation Energy

      The local mean curvature of the membrane midplane was computed by approximating the membrane surface as a height function Z(x,y), defined as the average location of the N-side and P-side leaflets at each grid point. Based on this, the mean curvature H(x,y) was calculated as,

      where the derivatives are defined as .

      The thickness deformation energy was computed from the local thickness d(x,y) relative to a reference thickness distribution F(d), derived from membrane-only simulations, and converted to a free energy profile via Boltzmann inversion. At each grid point, the F(d) was summed over the grid,

      The bending deformation energy was computed from the mean curvature field H(x,y), assuming a constant bilayer bending modulus κ (taken as 20 kJ mol-1 = 4.78 kcal mol-1):

      where Δ_A_ is the area of the grid cell.

      The thickness and curvature fields were obtained by projecting the coarse-grained MD trajectories (one frame per ns) onto a 2D-grid with a resolution of 0.5 nm. Grid points with low occupancy were downweighted to mitigate noise. More specifically, points with counts below 50% of the median grid count were scaled linearly by their relative count value. To focus the analysis on the region around the protein– membrane interface, only grid points within a radius of 20 nm from the center of the complex were included in the energy calculations. Energies were normalized to an effective membrane area of 1000 nm2 to facilitate the comparison between systems. Bootstrapping with resampling over frames was performed to estimate the standard deviations of G<sub>thick</sub> and G<sub>curv</sub>.

      We find that G<sub>curve</sub> converges slowly due to its sensitivity to local derivatives and the small grid size required to resolve the curvature contribution near the protein. Consequently, tens of microseconds of simulations were necessary to obtain well-converged estimates of the curvature energy.”

      (1) If CDL matches the hydrophobic thickness of the protein it would disrupt SC formation, not favor it. The authors' hypothesis is that the SC stabilizes the deformed membrane around the separated elements. Lipids that are compatible with the monomer deformed region stabilize the monomer, similarly to a surfactant. That is, if CDL prefers the interface because the interface is thin and their CDL is thin, CDL should prevent SC formation. A simpler hypothesis is that CDL's unique electrostatics are part of the glue.

      We rephrased the corresponding paragraph in the Discussion section to reflect the role of electrostatics for the behavior of cardiolipin.

      "…supporting the involvement of CDL as a "SC glue". In this regard, electrostatic effects arising from the negatively charged cardiolipin headgroup could play an important role in the interaction of the OXPHOS complexes."

      Generally our simulations suggest that CDL prefers thinner membranes, which could rationalize these findings.

      "We find that CDL prefers thinner membranes relative to the neutral phospholipids (PE/PC, Fig. S5F),[…]”

      (2) Error bars for lipid and Q* enrichments should be computed averaging over multi-lipid regions of the protein interface, e.g., dividing the protein-lipid interface into six to ten domains, in particular functionally relevant regions. Anionic lipids may have long, >500 ns residence times, which makes lipid enrichment large and characterization of error bars challenging in short simulations. Smaller regions will be noisy. The plots depicted in, for example, Figure S2 are noisy.

      It is indeed challenging to capture lipid movements on the timescales accessible for atomistic MD, and hence the data in Figure S2 contains some noise. In this regard, for the cgMD data presented in the revised Fig. S2H,I, the concentration data was averaged for six domains of the protein-lipid interface.

      (3) The membrane deformation is repeatedly referred to as "entropic" without justification. The bilayer has significant entropic and enthalpic terms just like any biomolecule, why are the authors singling out entropy? The standard "Helfrich" energetic Hamiltonian is a free energy model in that it implicitly integrates over many lipid degrees of freedom.

      We apologize for the unclear message – our intention was not to claim that the effects are purely entropic, but could arise from a combination of both entropic and enthalpic effects. We hope that this has now been better clarified in the revised manuscript. We also agree that it is difficult to separate between entropic and enthalpic effects. However, we wish to point out that, e.g., the temperature-dependence of the SC formation suggests that the entropic contribution is also affecting the process.

      Regarding the Helfrich Hamiltonian, we note that the standard model assumes a homogeneous fluid-like sheet. We have thus difficulties in relating this model to capture the local effects.

      Revisions / clarifications in the main manuscript:

      "SC formation is affected by both enthalpic and entropic effects."

      "We have shown here that the respiratory chain complexes perturb the IMM by affecting the local membrane dynamics. The perturbed thickness and alteration in the lipid dynamics lead to an energetic penalty, which can be related to molecular strain effects, as suggested by the changes of both the internal energy of lipid and their interaction with the surroundings (Fig. S2, S5, S6), which are likely to be of enthalpic origin. However, lipid binding to the OXPHOS complex also results in a reduction in the translational and rotational motion of the lipids and quinone (Fig. S8-S9), which could result in entropic changes. The strain effects are therefore likely to arise from a combination of enthalpic and entropic effects."

      (4) Figure S7 shows the surface area per lipid and leaflet height. This appears to show a result that is central to the interpretation of SC formation but which makes very little sense. One simply does not increase both the height and area of a lipid. This is a change in the lipid volume! The bulk compressibility of most anything is much higher than its Young's modulus [similar to area compressibility]. Instead, something else has happened. My guess is that there is *bilayer* curvature around these proteins and that it has been misinterpreted as area/thickness changes with opposite signs of the two leaflets. If a leaflet gets thin, its area expands. If the manuscript had more details regarding how they computed thickness I could help more. Perhaps they measured the height of a specific atom of the lipid above the average mid-plane normal? The mid-plane of a highly curved membrane would deflect from zero locally and could be misinterpreted as a thickness change.

      We thank the Reviewer for this insightful comment. We chose to define the membrane thickness based on the height of the lipid P-atoms above the average midplane normal. The Reviewer is correct that this measurement gives a changing thickness for a highly curved membrane. However, in this scenario, the thickness would always be overestimated [d<sub>true</sub> = d<sub>measured</sub> / cos (angle between global mid-plane normal and local mid-plane normal)]. Therefore, since we observe a smaller thickness at the protein-lipid interface, the effect is not likely to result from an artifact. For further clarification, see Fig. S4I showing the averaged local position of the Patoms in the cgMD simulations, which further supports that there is a local deformation of the lipid.

      The changes in the local membrane thickness are also supported by our analysis of the membrane thickness (Fig.S2A) and by the lipid chain length distributions (Fig.S6).

      (5) The authors write expertly about how conformational changes are interpreted in terms of function but the language is repeatedly suggestive. Can they put their findings into a more quantitative form with statistical analysis? "The EDA thus suggests that the dynamics of CI and CIII2 are allosterically coupled."

      We extended our analysis on the allosteric effects, which is now described in the revised main text, the SI and the Methods section:

      "In this regard, our graph theoretical analysis (Fig. S11C,D) further indicates that ligand binding to Complex I induces a dynamic crosstalk between NDUFA5 and NDUFA10, consistent with previous work (50, 51), and affecting also the motion of UQCRC2 with respect to its surroundings. Taken together, these effects suggest that the dynamics of CI and CIII<sub>2</sub> show some correlation that could result in allosteric effects, as also indicated based on cryo-EM analysis (40)."

      “Extended Methods

      Allosteric Network Analysis. Interactions between amino acid residues were modeled as an interaction graph, where each residue was represented by a vertex. Two nodes were connected by an edge, if the Ca atoms of the corresponding amino acid residues were closer than 7.5 Å for more than 50% of the frames of simulations S1-S6 (time step of frames: 1 ns). (7) This analysis was carried out for the aMD simulations of the supercomplex, analyzing differences between the Q bound and apo states (simulations A1+A2+A3 vs. A4+A5+A6).”

      (6) The authors write "We find that an increase in the lipid tail length decreases the relative stability of the SC (Figure S5C)" This is a critical point but I could not interpret Figure S5C consistently with this sentence. Can the authors explain this?

      We apologize for this oversight. This sentence should refer to Fig. S5F, which has now been corrected. We have additionally updated the figure to provide an improved estimation of the thickness contribution based on the lipid tail length.

      "We find that an increase in the lipid tail length decreases the relative stability of the SC (Fig. S5F)"

      (7) The authors use a 6x6 and 15x15 lattice to analyze SC formation. The SC assembly has 6 units of E_strain favoring its assembly, which they take up to 4 kT. At 3 kT, the SC should be favored by 18 kT, or a Boltzmann factor of 10^8. With only 225 sites, specific and non-specific complex formation should be robust. Can the authors please check their numbers or provide a qualitative guide to the data that would make clear what I'm missing?

      In the revised manuscript, we have now clarified the definition of the lattice model and the respective energies:

      In summary, the qualitative data presented are interesting (especially the combination of molecular modeling with simpler Monte Carlo modeling aiding broader interpretation of the results) ... but confusing in terms of the non-standard presentation of membrane mechanics and the difficulty of this reviewer to interpret some of the underlying figures: especially, the thickness of the leaflets around the protein and the relative thickness of cardiolipin. Resolving the quantitative interpretation of the bilayer deformation would greatly enhance the significance of their Monte Carlo model of SC formation.

      We thank the Reviewer for the helpful suggestion. We hope that the revisions help to clarify the non-standard presentation and connect to concepts used in the lipid membrane community.

      Reviewer #2 (Public review):

      Summary:

      The authors have used large-scale atomistic and coarse-grained molecular dynamics simulations on the respiratory chain complex and investigated the effect of the complex on the inner mitochondrial membrane. They have also used a simple phenomenological model to establish that the super complex (SC) assembly and stabilisation are driven by the interplay between the "entropic" forces due to strain energy and the enthalpies forces (specific and non-specific) between lipid and protein domains. The authors also show that the SC in the membrane leads to thinning and there is preferential localisation of certain lipids (Cardiolipin) in the annular region of the complex. The data reports that the SC assembly has an effect on the conformational dynamics of individual proteins making up the assembled complex and they undergo "allosteric crosstalk" to maintain the stable functional complex. From their conformational analyses of the proteins (individual and while in the complex) and membrane "structural" properties (such as thinning/lateral organization etc) as well from the out of their phenomenological lattice model, the authors have provided possible implications and molecular origin about the function of the complex in terms of aspects such as charge currents in internal mitochondrion membrane, proton transport activity and ATP synthesis.

      Strengths:

      The work is bold in terms of undertaking modelling and simulation of such a large complex that requires simulations of about a million atoms for long time scales. This requires technical acumen and resources. Also, the effort to make connections to experimental readouts has to be appreciated (though it is difficult to connect functional pathways with limited (additive forcefield) simulations.

      We thank the Reviewer for recognizing the challenge in simulating multimillion atom membrane proteins. We also thank the Reviewer for recognizing the connections we have made to different experiments. Our work indeed relies on atomistic and coarse-grained molecular simulations, which are widely recognized to provide accurate models of membrane proteins.

      Weakness:

      There are several weaknesses in the paper (please see the list below). Claims such as "entropic effect", "membrane strain energy" and "allosteric cross talks" are not properly supported by evidence and seem far-fetched at times. There are other weaknesses as well. Please see the list below.

      We thank the Reviewer for pointing out that key concepts needed further clarification. Please see answers to specific questions below:

      (i) Membrane "strain energy" has been loosely used and no effort is made to explain what the authors mean by the term and how they would quantify it. If the membrane is simulated in stress-free conditions, where are strains building up from?

      We thank the Reviewer for this important question. In the revised manuscript, we have toned down the assignment of the effects into pure entropic or enthalpic effects. We have also provided further clarification of the effects observed in the membrane.

      Example of revisions / clarifications in the main text:

      "SC formation is affected by both enthalpic and entropic effects."

      "We have shown here that the respiratory chain complexes perturb the IMM by affecting the local membrane dynamics. The perturbed thickness and alteration in the lipid dynamics lead to an energetic penalty, which can be related to molecular strain effects, as suggested by the changes of both the internal energy of lipid and their interaction with the surroundings (Fig. S2, S5, S6), which are likely to be of enthalpic origin. However, lipid binding to the OXPHOS complex, also results in a reduction in the translational and rotational motion of the lipids and quinone (Fig. S8-S9), which could result in entropic changes. The strain effects are therefore likely to arise from a combination of enthalpic and entropic effects."

      We have also revised the result section, where we now have explicitly defined and clarified the different contributions to membrane strain, observed in our simulations:

      In the following, we define membrane strain as the local perturbations of the lipid bilayer induced by protein-membrane interactions. These include changes in (i) membrane thickness, (ii) the local membrane composition, (iii) lipid chain configurations, and (iv) local curvature of the membrane plane relative to an undisturbed, protein-free bilayer. Together, these phenomena reflect the thermodynamic effects associated with accommodating large protein complexes within the membrane.

      We now also provide a more quantitative estimation of the membrane strain based on the contribution of changes in local thickness and curvature, summarize in Table 1.

      (ii) In result #1 (Protein membrane interaction modulates the lipid dynamics ....), I strongly feel that the readouts from simulations are overinterpreted. Membrane lateral organization in terms of lipids having preferential localisation is not new (see doi: 10.1021/acscentsci.8b00143) nor membrane thinning and implications to function (https://doi.org/10.1091/mbc.E20-12-0794). The distortions that are visible could be due to a mismatch in the number of lipids that need to be there between the upper and lower leaflets after the protein complex is incorporated. Also, the physiological membrane will have several chemically different lipids that will minimise such distortions as well as would be asymmetric across the leaflets - none of which has been considered. Connecting chain length to strain energy is also not well supported - are the authors trying to correlate membrane order (Lo vs Ld) with strain energy?

      We thank the Reviewer for the suggestions. The role of the membrane in driving supercomplex formation has not, to our knowledge, been suggested before. There are certainly many important studies, which have been better highlighted in the revised manuscript. In this context, we also now cite the papers Srivastava & coworkers and Tielemann & coworkers.

      “The localization of specific lipids around the membrane proteins, as well as local membrane perturbation effects, are also supported by simulations of other membrane proteins (45, 46), suggesting that the effects could arise from general protein-membrane interactions.”

      (45) V. Corradi et al., Lipid–Protein Interactions Are Unique Fingerprints for Membrane Proteins. ACS Central Science 4 (June 13, 2018).

      (46) K. Baratam, K. Jha, A. Srivastava, Flexible pivoting of dynamin pleckstrin homology domain catalyzes fission: insights into molecular degrees of freedom. Molecular Biology of the Cell 32 (2021 Jul 1).

      Physiological membrane will have several chemically different lipids that will minimise such distortions as well as would be asymmetric across the leaflets

      We agree with this point. As shown in Figs. 2H,N, S6, S13, we suggest that cardiolipin functions as a buffer molecule. However, very little is experimentally known about the asymmetric distribution of lipids in the IMM. Therefore, modelling the effect of asymmetry across the left is outside the scope of this study. Moreover, as now better clarified in the revised manuscript, we agree that it is difficult to unambiguously divide the effect into enthalpic and entropic contributions.

      To address the main concern of the Reviewer, we have updated the main text and Supporting Information to clearly state the different aspects of how the proteinmembrane interactions induce perturbations of the lipid bilayer. We define these effects as membrane strain. We now use the changes in local thickness and local curvature to quantify the effect of membrane strain on the stability of the respiratory SC.

      (iii) Entropic effect: What is the evidence towards the entropic effect? If strain energy is entropic, the authors first need to establish that. They discuss enthalpy-entropy compensation but there is no clear data or evidence to support that argument. The lipids will rearrange themselves or have a preference to be close to certain regions of the protein and that generally arises because of enthalpies reasons (see the body of work done by Carol Robinson with Mass Spec where certain lipids prefer proteins in the GAS phase, certainly there is no entropy at play there). I find the claims of entropic effects very unconvincing.

      We agree that it is difficult to distinguish the entropic vs. enthalpic contributions. In the revised manuscript, we better clarify that both effects are likely to be involved.

      The native MS work by Robinson and coworkers and others support that many lipids are strongly bound to membrane proteins, as also supported by the local binding of certain lipid molecules, such as CDL to the SC (Figs. S2, S6, S13).

      We suggest that the accumulation of cardiolipin at the protein-lipid interface involves a combination of entropic and enthalpic effects, arising from the reduction of the lipid mobility (entropy) as indicated by lowered diffusion (Fig. S9), and formation of noncovalent bonds between the lipid and the OXPHOS protein (Fig. S14).

      We added further clarification to the Discussion section.

      “Taken together, our combined findings suggest that the SC formation is affected by thermodynamic effects that reduce the molecular strain in the lipid membrane, whilst the perturbed micro-environment also affects the lipid and Q dynamics, as well as the dynamics of the OXPHOS proteins (see below).”

      (iv) The changes in conformations dynamics are subtle as reported by the authors and the allosteric arguments are made based on normal mode analyses. In the complex, there are large overlapping regions between the CI, CIII2, and SCI/III2. I am not sure how the allosteric crosstalk claim is established in this work - some more analyses and data would be useful. Normal mode analyses (EDA) suggest that the motions are coupled and correlated - I am not convinced that it suggests that there is allosteric cross-talk.

      Our analysis suggests that the SC changes the dynamics of the system. Although it is difficult to assign how these effects result in activity modulation of the system, we note these changes relate to sites that are central for the charge transfer reactions. We thank the Reviewer for suggesting to extend the analysis, which further suggests that regions of the proteins could be allosterically coupled.

      (v) The lattice model should be described better and the rationale for choosing the equation needs to be established. Specific interactions look unfavourable in the equation as compared to non-specific interactions.

      We have now provided further clarification of the lattice model in the Methods section. Addition to the main text:

      “Lattice model of SC formation. A lattice model of the CI and CIII<sub>2</sub> was constructed (Fig. 4A,B) by modeling the OXPHOS proteins in unique grid positions on a 2D N×N lattice. Depending on the relative orientation, the protein-protein interaction was described by specific interactions (giving rise to the energetic contribution E<sub>specific</sub> < 0) and non-specific interactions (E<sub>non-specific</sub> > 0). The membrane-protein interaction determined the strain energy of the membrane (E<sub>strain</sub>), based on the number of neighboring "lipid" occupied grids that are in contact with proteins (Fig. 4A). The interaction between the lipids was indirectly accounted for by the background energy of the model. The proteins could occupy four unique orientations on a grid ([North, East, South, West]). The states and their respective energies that the system can visit are summarized in Table S6.”

      “The conformational landscape was sampled by Monte Carlo (MC) using 10<sup>7</sup> MC iterations with 100 replicas. Temperature effects were modeled by varying β, and the effect of different protein-to-lipid ratios by increasing the grid area. The simulation details can be found in Table S7.”

      Reviewer #3 (Public review):

      Summary:

      In this contribution, the authors report atomistic, coarse-grained, and lattice simulations to analyze the mechanism of supercomplex (SC) formation in mitochondria. The results highlight the importance of membrane deformation as one of the major driving forces for SC formation, which is not entirely surprising given prior work on membrane protein assembly, but certainly of major mechanistic significance for the specific systems of interest.

      Strengths:

      The combination of complementary approaches, including an interesting (re)analysis of cryo-EM data, is particularly powerful and might be applicable to the analysis of related systems. The calculations also revealed that SC formation has interesting impacts on the structural and dynamical (motional correlation) properties of the individual protein components, suggesting further functional relevance of SC formation. Overall, the study is rather thorough and highly creative, and the impact on the field is expected to be significant.

      Weaknesses:

      In general, I don't think the work contains any obvious weaknesses, although I was left with some questions.

      We thank the Reviewer for acknowledging that our work is thorough and creative, and that it is likely to have a significant impact on the field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Diffusion is quantified in speed units (Figure S8). The authors should explain why they have used an apparently incorrect model for quantifying diffusion. The variance of the distribution of a diffusing molecule is linear with time, not its standard deviation (as I suppose I would use for computing effective molecular speed). Perhaps they are quantifying residence times, in which molecules near a wall (protein) will appear to have half the movements of a bulk molecule. This is confusing.

      We thank the Reviewer for the comment. The data shown in previous version of Figure S8 corresponded to the effective molecular velocity, which is now clarified in the revised figure (now Fig. S9). This measure was used to reflect the average residence time of the groups in the vicinity of the sites.

      However, as suggested by the Reviewer, we now also analyzed the positiondependent diffusion of the quinone in the new Figure S9:

      (2) With a highly charged bilayer a large water layer is necessary to verify that the concentration of salt is plateauing at 150 mM at the box edge. 45 A appears to be the default in CHARMM-GUI, but this default guidance is not based on the charge of the bilayer. I suggest the authors plot the average concentration of both anions and cations in mM units along the z coordinate of the simulation cell.

      We thank the Reviewer for the suggestion. We have now provided an analysis of the average ion concentrations along the z coordinate, supporting that the salt concentration plateaus at 150 mM at the box edge.

      Typos:

      SI: "POPC/POPE or CLD" should be CDL

      We apologize for the mistake. We have corrected the typos:

      "of the membrane thickness in a POPC/POPE/CDL/QH2 membrane and a CDL membrane."

      "a pure CDL membrane"

      Reviewer #2 (Recommendations for the authors):

      (1) Suggestion regarding membrane strain energy claims:

      Changes in area per lipid and membrane thinning are surely not akin to membrane strain energy changes. At best, the authors should calculate the area compressibility (both in bilayers with and without proteins) and then make comments. In general, if they are talking about the in-plane properties (bilayer being liquid in 2D), I do not see how they can discuss membrane strain energy with NPT=1 atms barostat reservoir that they are simulating against. At least they can try to plot the membrane lateral pressures in various conditions and then start making such comments. If it was a closed vesicle, I would expect some tension in the membrane due to the closed surface but in the conditions in which the simulations are run, I do not see how strain is so important. If the authors want to be more rigorous, they can calculate "atomic viral" values by doing a tessellation and showing the data to make their point. Strain energy would mean that there is a modulus in-plane. Bending modulus would surely change with membrane thinning and area compressibility changes (simple plate theory) but linear strain is surely something to be defined well before making claims out of it.

      Our work shows that the OXPHOS proteins alter the local membrane thickness and curvature, and we now quantify the deformation penalty associated with that (Table 1). As stated above, we now provide a better definition and description 'membrane strain’ and the observed effect, which is likely to contain both enthalpic and entropic contributions.

      As suggested by the Reviewer, we have computed the lateral pressure profiles around the OXPHOS proteins, further supporting that there are energetic effects related to the "solvation" of the membrane proteins in the IMM. To this end, Figs. S2D,E; Figure S4I and Fig. S5G,H shows the membrane distortion effect; while in Fig. S5A supports that there the 'internal energy' of the lipids changes as result of the SC formation, further justifying that these effects can be assigned as 'strain effects'. The analysis has also been extended by computing the end-to-end distances, shown in Fig. S6.

      Unfortunately, it is technically unfeasible to accurately estimate the area compressibility, bending modulus, or the atomic virial for the present multi-million membrane protein simulations.

      Summary of Revisions/Additions:

      Fig. S2 [...] (D, E) Difference in the membrane thickness around the SC relative to CI (left) or relative to CIII<sub>2</sub> (right) from (D) aMD and (E) cgMD.

      Fig. S4. [...] (I) Visualization of the membrane distortion effect.

      Fig. S5. Analysis of membrane-induced distortion effects. (A) Relative strain effect relative to a lipid membrane from atomistic MD simulations of the SCI/III2, CI, and CIII<sub>2</sub>, suggesting reduction of the membrane strain (blue patches) in the SC surroundings. The figure shows the non-bonded energies relative to the average non-bonded energies from membrane simulations (simulation M4, Table S1). (B) The lipid strain contribution for different lipids calculated from non-bonded interaction energies of the lipids relative to the average lipid interaction in a IMM membrane model (simulation M4). The figure shows the relative strain contribution for nearby lipids (r < 2 Å, in color from panel (C), and lipids >5 Å from the OXPHOS proteins. (C) Selection of lipids (< 2 Å) interacting with the OXPHOS proteins. (D) Potential of mean force (PMF) of membrane thickness derived from thickness distributions from cgMD simulations of a membrane, the SCI/III2, CI, and CIII<sub>2</sub>. (E) Membrane thickness as a function of CDL concentration from cgMD simulations. (F) ΔGthick of the SC as a function of membrane thickness based on cgMD simulations. (G) Membrane curvature around the SCI/III2 (left), CI (middle), and CIII<sub>2</sub> (right) from atomistic simulations. (H) Squared membrane curvature obtained from cgMD simulations, within a 20 nm radius around the center of the system. These maps correspond to the curvature field used in the calculation of the bending deformation energy term (G<sub>curv</sub>).

      Fig. S6. Analysis of lipid end-to-end distance from aMD simulations of (A) SC, (B) CI, (C) CIII<sub>2</sub>.

      (2) Membrane distortions:

      Membrane distortions can arise due to a mismatch in the area between the upper leaflet and the lower left especially when a protein is embedded. Authors can carefully choose the numbers to keep the membrane stable.

      We have further clarified in the revised manuscript that the membranes are stable in all simulation setups. During building the simulation setups, it was carefully considered that no leaflet introduced higher lipid densities that could result in artificial displacements. Our results of the local changes in the lipid dynamics and structure around the OXPHOS complexes are independently supported by both our atomistic and coarse-grained simulations, which contain significantly larger membranes. Moreover, as discussed in our work, the local membrane distortion is also experimentally supported by cryoEM analysis as well as recent in situ cryoTEM data, showing that the OXPHOS proteins indeed affect the local membrane properties.

      Clarifications/Additions to the main text:

      “We find that the individual OXPHOS complexes, CI and CIII<sub>2</sub>, induce pronounced membrane strain effects, supported both by our aMD (Fig. S2A) and cgMD simulations with a large surrounding membrane (Fig. 2G).“

      ” The localization of specific lipids around the membrane proteins, as well as local membrane perturbation effects, are also supported by simulations of other membrane proteins (45, 46), suggesting that the effects could arise from general protein-membrane interactions.”

      "During construction of the simulation setups, it was carefully considered that no leaflet introduced higher lipid densities that could result in artificial displacement effects."

      (3) Strain energy as an entropic effect:

      Please establish that the strain energy (if at all present) can be called an entropic effect.

      We have now better clarified that the SC formation results from combined enthalpic and entropic effects. We apologize that the previous version of the text was unclear in this respect.

      To further probe the involvement of entropic effects, we derived entropic and enthalpic contributions from our lattice model. The model supports that increased strain contributions also alters the entropic contributions, further supporting the coupling between the effects.

      We have also clarified our definition of the effects:

      " The perturbed thickness and alteration in the lipid dynamics leads to an energetic penalty, which can be related to molecular strain effects, as suggested by the changes of both the internal energy of lipid and their interaction with the surroundings (Fig. S2, S5, S6), which are likely to be of enthalpic origin. However, lipid binding to the OXPHOS complex, also results in a reduction in the translational and rotational motion of the lipids and quinone (Fig. S8-S9), which could result in entropic changes. The strain effects are therefore likely to arise from a combination of enthalpic and entropic effects."

      (4) Allosteric cross-talk:

      A thorough network analysis (looking at aspects like graph laplacian, edge weights, eigenvector centrality, changes in characteristic path length, etc can be undertaken to establish allostery (see https://doi.org/10.1093/glycob/cwad094, Ruth Nussinov/Ivet Bahar papers).

      We have expanded the network analysis as suggested by the Reviewer. In this regard, we have expanded the analysis by computing the covariance matrix, further supporting that the SC could involve correlated protein dynamics. We observe a prominent change especially with respect to the ligand state of Complex I, indicative of some degree of allostery, while we find that the apo state of Complex I leads to a slight uncoupling of the motion between CI and CIII<sub>2</sub>.

      Additions in the main text:

      In this regard, our graph theoretical analysis (Fig. S11) further indicates that ligand binding to Complex I induces a dynamic crosstalk between NDUFA5 and NDUFA10, consistent with previous work (48, 49), and affecting also the motion of UQCRC2 with respect to its surroundings_._ Taken together, these effects suggest that the dynamics of CI and CIII<sub>2</sub> show some correlation that could result in allosteric effects, as also indicated based on the cryoEM analysis.

      (5) Lattice model:

      The equation needs to be rationalised. For example, specific interaction (g_i g_j favours separation (lower energy when i and j are not next to each other), and nonspecific interaction favours proximity. Why is that? Also, the notation for degeneracy in partition function and the notation for lattice point. It is mentioned that the "interaction between the lipids was indirectly accounted for by the "background energy" of the model". If the packing/thinning etc are so important to the molecular simulations, will not the background energy change with changing lipid organising during complex formation?

      We have further expanded the technical discussion of the energy terms in our lattice model.

      For example, specific interaction (g_i g_j favours separation (lower energy when i and j are not next to each other), and non-specific interaction favours proximity. Why is that

      "The g<sub>i</sub>g<sub>j</sub> -term assigns a specific energy contribution when the OXPHOS complexes are in adjacent lattice points only in a correct orientation (modeling a specific non-covalent interaction between the complexes such as the Arg29<sup>FB4</sup>-Asp260<sup>C1</sup>/Glu259<sup>C1</sup> interaction between CI and CIII<sub>2</sub>). The d<sub>i</sub>d<sub>j</sub> -term assigns a non-specific interaction for the OXPHOS complexes when they are in adjacent lattice points, but in a "wrong" orientation relative to each other to form a specific interaction. The term introduces a strain into all lattice points surrounding an OXPHOS complex, mimicking the local membrane perturbation effects observed in our molecular simulations.

      This leads to the partition function,

      where wi is the degeneracy of the state, modeling that the SC and OXPHOS proteins can reside at any lattice position of the membrane, and where β=1/k<sub>B</sub>T (k<sub>B</sub>, Boltzmann's constant; T, temperature). The probability of a given state i was calculated as,

      with the free energy (G) defined as,

      This discussion has been included in the methods sections to ensure that our work remains readable for the biological community studying supercomplexes from a biochemical, metabolic, and physiological perspectives.

      (6) This is a minor issue but the paper is poorly organised and can be fixed readily. The figures are not referenced in order. For example, Figure 2G is discussed before discussing Figures 2A-2F (never discussed). Figure S2 is referenced before Figure S1.

      Answer: We thank the Reviewer for pointing this out. The order of the figures was revised.

      Reviewer #3 (Recommendations for the authors):

      A few minor questions/suggestions, not necessarily in the order of importance:

      (1) The discussion of the timescale of simulations is a bit misleading. For example, the discussion cites a timescale of 0.3 ms of CG simulations. The value is actually the sum of multiple CG simulations on the order of 50-75 microseconds. These are already very impressive lengths of CG simulations, there is no need to use the aggregated time to claim even longer time scales.

      We thank the Reviewer for the suggestion on this important clarification. We have now modified the text and tables accordingly:

      "(0.3 ms in cumulative simulation time, 50-75 μs/cgMD simulation)"

      (2) The observation of cardiolipin (CDL) accumulation is interesting. How close are the head groups, relative to the electrostatic screening length at the interface? Should one worry about the potential change of protonation state coupled with the CDL redistribution?

      Answer: We thank the Reviewer for this excellent comment, which has also been on our mind. The CDL indeed form contacts with various functional groups at the protein interface (as shown in Fig. S13), as well as bulk ions (sodium) that could tune the p_K_a of the CDLs, and result in a protonation change. We have clarified these effects in the revised manuscript:

      "While CDL was modeled here in the singly anionic charged state (but cf. Fig. S5E), we note that the local electrostatic environment could tune their p_K_a that result in protonation changes of the lipid, consistent with its function as a proton collecting antenna (62)."

      (3) The authors refer to the membrane strain effect as entropic. Since membrane bending implicates a free energy change that includes both enthalpic and entropic components, I wonder how the authors reached the conclusion that the effect is largely entropic in nature.

      We agree with the Reviewer that the effects are likely to comprise both enthalpic and entropic contributions, which are difficult to separate in practice. To reflect this, we have now better clarified why we consider that both contributions are involved. We apologize that our previous version of the manuscript was unclear in this respect. Clarifications in the main text:

      “The perturbed thickness and alteration in the lipid dynamics lead to an energetic penalty, which can be related to molecular strain effects, as suggested by the changes of both the internal energy of lipid and their interaction with the surroundings (Fig. S2, S5, S6), which are likely to be of enthalpic origin. However, lipid binding to the OXPHOS complex also results in a reduction in the translational and rotational motion of the lipids and quinone (Fig. S8-S9), which could result in entropic changes. The strain effects are therefore likely to arise from a combination of enthalpic and entropic effects."

      (4) The authors refer to the computed dielectric constant as epsilon_perpendicular. Did the authors really distinguish the parallel and perpendicular component of the dielectric tensor, as was done by, for example, R. Netz and co-workers for planar surfaces?

      We have extracted the perpendicular dielectric constant from the total dielectric profiles. We clarify that this differs from the formal definition of by Netz and coworkers.

      “The calculations were performed by averaging the total M over fixed z values from the membrane plane. Note that this treatment differs from extraction of radial and axial contributions of the dielectric tensor, as developed by Netz and co-workers (cf. Ref. (3) and refs therein) that requires a more elaborate treatment, which is outside the scope of the present work.”

      (3) P. Loche, C. Ayaz, A. Schlaich, Y. Uematsu, R.R. Netz. Giant Axial Dielectric Response in Water-Filled Nanotubes and Effective Electrostatic Ion-Ion Interactions from a Tensorial Dielectric Model. J Phys Chem B 123, 10850-10857 (2019).

      (5) Regarding the effect of SC formation on protein structure and dynamics, especially allosteric effects, most of the discussions are rather qualitative in nature. More quantitative analysis would be valuable. For example, the authors did compute covariance matrix although it appears that they chose not to discuss the results in depth. Is the convergence of concern and therefore no thorough discussion is given?

      We have now expanded the analysis by computing the covariance matrix, further supporting that the SC could involve correlated protein dynamics. We observe a prominent change, especially with respect to the ligand state of Complex I, indicative of some degree of allostery, while we find that the apo state of Complex I leads to a slight uncoupling of the motion between CI and CIII<sub>2</sub>.

      Additions in the main text:

      “In this regard, our graph theoretical analysis (Fig. S11) further indicates that ligand binding to Complex I induces a dynamic crosstalk between NDUFA5 and NDUFA10, consistent with previous work (48, 49), and affecting also the motion of UQCRC2 with respect to its surroundings. Taken together, these effects suggest that the dynamics of CI and CIII<sub>2</sub> show some correlation that could result in allosteric effects, as also indicated based on the cryoEM analysis (40).”

      (6) The discussion of quinone diffusion is interesting, although I'm a bit intrigued by the unit of the diffusion constant cited in the discussion. Perhaps a simple typo?

      The plot showed the molecular velocity, which roughly corresponding to the residence times. However, as suggested by the Reviewer, we now also analyzed the position-dependent diffusion of the quinone in the new Figure S9:

    1. Author Response

      Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “The cation channel mechanisms of subthreshold inward depolarizing currents in the VTA dopaminergic neurons and their roles in the depression-like behavior”. These comments are constructive and very helpful for improving our manuscript. We have studied comments carefully and have made provisional revision which we hope meet with approval. We also respond to the reviewer’s comments point by point as following.

      Reviewer #1 (Public Review):

      Comment 1:

      The pharmacological tools used in this study are highly non-selective. Gd3+, used here to block NALCN is actually more commonly used to block TRP channels. 2-APB inhibits not only TRPC channels, but also TRPM and IP3 receptors while stimulating TRPV channels (Bon and Beech, 2013), while FFA actually stimulates TRPC6 channels while inhibiting other TRPCs (Foster et al., 2009).

      We agree with the reviewer that the substances mentioned are not specific. Although we performed shRNA experiments against NALCN and TRPC6, we also used more specific pharmacological modulators for these two channels, L703,606 (the antagonist of NALCN)[1] and larixyl acetate (a potent TRPC6 inhibitor)[2]. The results are shown in figure 3E, F and figure 4C, E.

      Comment 2:

      -The multimodal approach including shRNA knockdown experiments alleviates much of the concern about the non-specific pharmacological agents. Therefore, the author's claim that NALCN is involved in VTA dopaminergic neuron pacemaking is well-supported.

      -However, the claim that TRPC6 is the key TRPC channel in VTA spontaneous firing is somewhat, but not completely supported. As with NALCN above, the pharmacology alone is much too non-specific to support the claim that TRPC6 is the TRP channel responsible for pacemaking. However, unlike the NALCN condition, there is an issue with interpreting the shRNA knockdown experiments. The issue is that TRPC channels often form heteromers with TRPC channels of other types (Goel, Sinkins and Schilling, 2002; Strübing et al., 2003). Therefore, it is possible that knocking down TRPC6 is interfering with the normal function of another TRPC channel, such as TRPC7 or TRPC4.

      From our single-cell RNA-seq results, TRPC7 and TRPC4 are found not to be present broadly like TRPC6 in the VTA DA neurons. And in experiments using single cell PCR (sFig. 9A), only a very small proportion of TRPC6-positive DA cells (DAT+) expressed TRPC4 (sFig. 9Bi) or TRPC7 (sFig. 9Bii), in consistent with the results of single-cell RNA-seq (Fig.2). Therefore, it is possible that knocking down TRPC6 maybe not interfering with the normal function of another TRPC channel, such as TRPC7 or TRPC4.

      Comment 3:

      The claim that TRPC6 channels in the VTA are involved in the depressive-like symptoms of CMUS is supported.

      • However, the connection between the mPFC-projecting VTA neurons, TRPC6 channels, and the chronic unpredictable stress model (CMUS) of depression is not well supported. In Figure 2, it appears that the mPFC-projecting VTA neurons have very low TRPC6 expression compared to VTA neurons projecting to other targets. However, in figure 6, the authors focus on the mPFC-projecting neurons in their CMUS model and show that it is these neurons that are no longer sensitive to pharmacological agents non-specifically blocking TRPC channels (2-APB, see above comment). Finally, in figure 7, the authors show that shRNA knockdown of TRPC6 channels (in all VTA dopaminergic neurons) results in depressive-like symptoms in CMUS mice. Due to the low expression of TRPC6 in mPFC-projecting VTA neurons, the author's claims of "broad and strong expression of TRPC6 channels across VTA DA neurons" is not fully supported. Because of the messy pharmacological tools used, it cannot be clamed that TRPC6 in the mPFC-projecting VTA neurons is altered after CMUS. And because the knockdown experiments are not specific to mPFC-projecting VTA neurons, it cannot be claimed that reducing TRPC6 in these specific neurons is causing depressive symptoms.

      The reason we focused on the mPFC-projecting VTA DA neurons is that this pathway is indicated in depressive-like behaviors of the CMUS model[3-5]. Although mPFC-projecting VTA DA neurons seem have lower level of TRPC6, we reason they are still functional there. However, we do agree with the reviewer that the statement “broad and strong expression of TRPC6 channels across VTA DA neurons" is not fully supported. We have changed the statements based on the reviewer suggestion. Furthermore, we did selectively knockdown TRPC6 in the mPFC-projecting VTA DA neurons, and then studied the behavior (Fig.8).

      Comment 4:

      It is important to note that the experiments presented in Figure 1 have all been previously performed in VTA dopaminergic neurons (Khaliq and Bean, 2010) including showing that low calcium increases VTA neuron spontaneous firing frequency and that replacement of sodium with NMDG hyperpolarizes the membrane potential.

      We agree with reviewer that similar experiments have been performed previously [6] for the flow of our manuscript and for general readers.

      Comment 5:

      -The authors explanation for the increase in firing frequency in 0 calcium conditions is that calcium-activated potassium channels would no longer be activated. However, there is a highly relevant finding that low calcium enhances the NALCN conductance through the calcium sensing receptor from Dejian Ren's lab (Lu et al., 2010) which is not cited in this paper. This increase in NALCN conductance with low calcium has been shown in SNc dopaminergic neurons (Philippart and Khaliq, 2018), and is likely a factor contributing to the low-calcium-mediated increase in spontaneous VTA neuron firing.

      We agree with the reviewer and thanks for the suggestions. A discussion for this has been added.

      Comment 6:

      -One of the only demonstrations of the expression and physiological significance of TRPCs in VTA DA neurons was published by (Rasmus et al., 2011; Klipec et al., 2016) which are not cited in this paper. In their study, TRPC4 expression was detected in a uniformly distributed subset of VTA DA neurons, and TRPC4 KO rats showed decreased VTA DA neuron tonic firing and deficits in cocaine reward and social behaviors.

      We thank the reviewer for the suggestion. The references and a discussion for this has been added.

      Comment 7:

      • Out of all seven TRPCs, TRPC5 is the only one reported to have basal/constitutive activity in heterologous expression systems (Schaefer et al., 2000; Jeon et al., 2012). Others TRPCs such as TRPC6 are typically activated by Gq-coupled GPCRs. Why would TRPC6 be spontaneously/constitutively active in VTA DA neurons?

      In a complex neuronal environment where VTA DA neurons are located, multiple modulatory factors including the GPCRs could be dynamically active, this could lead to the activation of TRP channels including TRPC6.

      Comment 8:

      A new paper from the group of Myoung Kyu Park (Hahn et al., 2023) shows in great detail the interactions between NALCN and TRPC3 channels in pacemaking of SNc DA neurons.

      The reference mentioned has been added. We thank the reviewer.

      Reviewer #2 (Public Review):

      Comment 1:

      These results do not show that TRPC6 mediates stress effects on depression-like behavior. As stated by the authors in the first sentence of the final paragraph, "downregulation of TRPC6 proteins was correlated with reduced firing activity of the VTA DA neurons, the depression-like behaviors, and that knocking down of TRPC6 in the VTA DA neurons confer the mice with depression behaviors." Therefore, the results show associations between TRPC6 downregulation and stress effects on behavior, occlusion of the effects of one by the other on some outcome measures, and cell manipulation effects that resemble stress effects. There is no experiment that shows reversal of stress effects with cell/circuit-specific TRPC6 manipulations. Please adjust the title, abstract and interpretation accordingly.

      We agree with the reviewer’s suggestion. The title was changed to ‘’The cation channel mechanisms of subthreshold inward depolarizing currents in the VTA dopaminergic neurons and their roles in the chronic stress-induced depression-like behavior” and the abstract and interpretation were also adjusted accordingly.

      Comment 2:

      Statistical tests and results are unclear throughout. For all analyses, please report specific tests used, factors/groups, test statistic and p-value for all data analyses reported. In some cases, the chosen test is not appropriate. For example, in Figure 6E, it is not clear how an experiment with 2 factors (stress and drug) can be analyzed with a 1-way RM ANOVA. The potential impact of inappropriate statistical tests on results makes it difficult to assess the accuracy of data interpretation.

      We have redone the statistical analysis as suggested by the reviewer and added specific tests used, factors/groups, test statistic and p-value for all data analyses into the figure legends of the revised manuscript.

      Comment 3:

      Why were only male mice used? Please justify and discuss in the manuscript. Also, change the title to reflect this.

      Although most similar previous studies used male mice or rats[7, 8], we do agree with the reviewer that the female animals should also be tested, in consideration possible role of sex hormones, as such we repeated some key experiments on female mice (sFig.1.6.8. and 13).

      Comment 4:

      Number of recorded cells is very low in Figure 1. Where in VTA did recordings occur? Given the heterogeneity in this brain region, this n may be insufficient. Additional information (e.g., location within VTA, criteria used to identify neurons) should be included. Report the number of mice (i.e., n = 6 cells from X mice) in all figures.

      Yes indeed, the number here is not high. More experiments were performed to increase the N/n number. And the location of recorded cells in VTA and the number of used mice is now shown in all figures; criteria to identify neurons is stated in the Methods-Identification of DA neurons and electrophysiological recordings. At the end of electrophysiological recordings, the recorded VTA neurons were collected for single-cell PCR. VTA DA neurons were identified by single-cell PCR for the presence of TH and DAT.

      Comment 5:

      Authors refer to VTA DA neurons as those that are DAT+ in line 276, although TH expression is considered the standard of DAergic identity, and studies (e.g., Lammel et al, 2008) have shown that a subset of VTA DA neurons have low levels of DAT expression. Authors should reword/clarify that these are DAT-expressing VTA DA neurons.

      The study published by Lammel[9] in 2015 has shown the low dopamine specificity of transgene expression in ventral midbrain of TH-Cre mice; on the other hand, DAT-Cre mice exhibit dopamine-specific Cre expression patterns, although DAT-Cre mice are likely to suffer from their own limitations (for example, low DAT expression in mesocortical DA neurons may make it difficult to target this subpopulation, see Lammel et al., 2008[10]).Hence, in our study, the DAT was used as criteria to identify DAT neurons. Of course, TH and DAT were all tested in single-cell PCR to identify whether the recorded cells were DA neurons.

      Comment 6:

      Neuronal subtype proportions should be quantified and reported (Fig. 1Aii).

      Neuronal subtype proportions are now quantified and reported in Fig. 1Aii.

      Comment 7:

      In addition to reporting projection specificity of neurons expressing specific channels, it would be ideal to report these data according to spatial location in VTA.

      The spatial location of recorded cells in VTA are now shown in all figures.

      Comment 8:

      The authors state that there are a small number of Glut neurons in VTA, then they state that a "significant proportion" of VTA neurons are glutamatergic.

      Thanks, “a significant proportion of neurons” has been changed to “less than half of sequenced DA neurons”.

      Comment 9:

      It is an overstatement that VTA DA neurons are the key determinant of abnormal behaviors in affective disorders.

      Thanks, we have amended the statement to that “Dopaminergic (DA) neurons in the ventral tegmental area (VTA) play an important role in mood, reward and emotion-related behaviors”.

      Reviewer #3 (Public Review):

      Comment 1:

      The authors of this study have examined which cation channels specifically confer to ventral tegmental area dopaminergic neurons their autonomic (spontaneous) firing properties. Having brought evidence for the key role played by NALCN and TRPC6 channels therein, the authors aimed at measuring whether these channels play some role in so-called depression-like (but see below) behaviors triggered by chronic exposure to different stressors. Following evidence for a down-regulation of TRPC6 protein expression in ventral tegmental area dopaminergic cells of stressed animals, the authors provide evidence through viral expression protocols for a causal link between such a down-regulation and so-called depression-like behaviors. The main strength of this study lies on a comprehensive bottom-up approach ranging from patch-clamp recordings to behavioral tasks. However, the interpretation of the results gathered from these behavioral tasks might also be considered one main weakness of the abovementioned approach. Thus, the authors make a confusion (widely observed in numerous publications) with regard to the use of paradigms (forced swim test, tail suspension test) initially aimed (and hence validated) at detecting the antidepressant effects of drugs and which by no means provide clues on "depression" in their subjects. Indeed, in their hands, the authors report that stress elicits changes in these tests which are opposed to those theoretically seen after antidepressant medication. However, these results do not imply that these changes reflect "depression" but rather that the individuals under scrutiny simply show different responses from those seen in nonstressed animals. These limits are even more valid in nonstressed animals injected with TRPC6 shRNAs (how can 5-min tests be compared to a complex and chronic pathological state such as depression?). With regard to anxiety, as investigated with the elevated plus-maze and the open field, the data, as reported, do not allow to check the author's interpretation as anxiety indices are either not correctly provided (e.g. absolute open arm data instead of percents of open arm visits without mention of closed arm behaviors) or subjected to possible biases (lack of distinction between central and peripheral components of the apparatus).

      We agree with the reviewer that behavior tests we used here is debatable whether they represent a real depression state, and this is an open question that could be discussed from different respective. Since these testes (forced swimming and tail suspension), as the reviewer noted, were “widely observed in numerous publications”, we used these seemly only options to reflect a “depression-like” state. One could argue that since these testes were initially used for testing antidepressants (“validated”), with decreased immobility time as indications of anti-depressive effects, why not an increased immobility time reflect a “depression-like” state. As for anxiety tests, the data concerning the elevated plus-maze are also changed based on the reviewer’s suggestion.

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      Reviewer #1 (Recommendations For The Authors):

      Recommendation 1 for improving the paper:

      -The paper needs extensive editing for both overall structural clarity and for the high number of typos and grammatical errors.

      We thank the reviewer’s suggestion. The revised manuscript has been edited extensively.

      Recommendation 2 for improving the paper:

      -Retrobeads are often toxic to cells and build up with increasing time. It is surprising that the authors wait 14-21 days for retrobead expression in their target cells. It is also a problem that the mPFC projecting cells have a longer time with the retrobeads than the other projection-targeting cells because the toxicity could be more extensive with the longer wait time thus confounding the results. The authors should repeat some mPFC experiments at the 14 day time point to confirm that the longer time with the beads is not influencing the differential effects in these cells.

      According to the methods published by Stephan Lammel and Jochen Roeper, “For sufficient labeling, survival periods for retrograde tracer transport depended on respective injection areas: DS and NAc lateral shell, 7 days; NAc core, NAc medial shell, and BLA, 14 days; and mPFC, 21 days[10]”, we did the experiments related to mPFC projecting cells at the 21 day time point. Consistent with the mentioned above, the labeled mPFC projecting cells at 14 day time point, is not sufficient, compared with this at 21 day time point, which is shown as followings.

      Author response image 1.

      Confocal images showing the anatomical distribution of mPFC-projecting DA neurons labelled with retrobeads (red) in the VTA after DAT-immunofluorescence (green) staining at different day time point (A, 14d; B, 21d) after retrobeads injection; Scale bars=10 μm.

      Recommendation 3 for improving the paper:

      -The experiment with FFA in Figure 4E seems weird. Why is there no baseline before the FFA application? And why is the baseline trending downward immediately? The authors should explain why this example experiment is presented differently from all the others.

      We apologize for this part that this example time-course is not typical. Since the FFA is not specific antagonist for TRPC6 and actually stimulates TRPC6 channels, we repeated the experiments with a more specific pharmacological modulator for TRPC6, larixyl acetate (LA), and the results are shown in Figure 4C and 4F.

      Recommendation 4 for improving the paper:

      -It would be much more useful to see exact p values in the text, as it aids in interpreting the 'insignificance' of specific comparisons. Specifically, in Figure 5F, the 2-APB looks like it is having a small effect, and the already low firing rate (due to the TRPC6 knockdown) makes a big effect less likely. It would be useful to know what the actual p value is here (and everywhere).

      OK. We now report all P values in the figure legends of the revised version.

      Recommendation 5 for improving the paper:

      -In the results, it should be explained that the "RMP" of VTA DA neurons was obtained by treating the cells with TTX.

      A sentence indicating the presence of TTX when measuring “RMP” is added in the Results part of the revised version.

      Recommendation 6 for improving the paper:

      -The spacing of the panels in the figures is somewhat odd. The figures could be more compact.

      Thanks, we have re-arranged all figures.

      Recommendation 7 for improving the paper:

      The paper is difficult to read because of significant grammatical errors. Here are some examples by line number, but this list is not at all exhaustive.

      We thank the reviewer for pointing out grammatical errors and we corrected them.

      Reviewer #2 (Recommendations For The Authors):

      Recommendation 1 for improving the paper:

      Fix typos: e.g., change HCH to HCN, change EMP to EPM, "these finding", "compact par" should read "pars compacta", "substantial" in line 475 should read "substantia", Incomplete sentences on line 73 and line 107, etc. Also, what is meant by "autonomic" firing activity? What is meant by "expression files"? Change "depression behaviors" to depression-like behaviors. "The HCN" as written in line 69 is a bit misleading, as HCN channels in the heart and brain are different members of a family of channels, although as written in the text, it seems that they are identical. In Figure 2, rearrange order of brain regions (e.g., from "BLA-VTA" to "VTA-BLA"), because as written, it seems that the focus is on projections into the VTA from each brain region, rather than VTA neurons that project to each respective region.

      We thank the reviewer for pointing out these errors and we corrected them. Autonomic firing activity has been changed to spontaneous firing activity. Expression files has been changed to expression levels. All the “depression behavior” have been changed to depression-like behaviors. In the Figure 2, all “xx-VTA” have been changed to “VTA-xx”.

      Reviewer #3 (Recommendations For The Authors):

      Recommendation 1 for improving the paper:

      Methodology: as opposed to sFig. 8 where the order through which mice were repeatedly tested is precise, such a key information is lacking in Fig. 6 as well as in the Methods section (for example, when such traumatic stress as forced swimming is performed with regard to the other tests?). Relevant to this point is the possible bias triggered by such chronological testing as exposure to the forced swim test likely affects the behaviors recorded in the other tests. Furthermore, the way this test is conducted is appealing as it is mentioned that the water depth was set to 10 cms which is quite low given that immobility scores might be affected by the ability of mice to stand on their tails.

      With regard to the elevated plus-maze, data are erroneously provided. Absolute values regarding open arm behaviors should be provided as percentages of the number of visits (or time spent therein) over the total (open + closed) number of arm visits. Indeed, closed arm visits should also be provided. This variable, also considered an index of locomotor activity, would allow the reader to exclude any effect of locomotion on the exploration in the open field.

      As they stand, data in the open field seem to indicate parallel changes at the center(center time) and the periphery (total distance), hence suggesting locomotor effects rather than anxiogenic effects. Data related to the center and the periphery should be clearly distinguished. Lastly, the number of weeks allowed for the mice to recover from surgeries aimed at delivering viruses are not mentioned. This is important as it could have affected the amplitude of the sensitivity to the stressors.

      We thank the reviewer for the suggestion. The lack information in Figure 6 and the Methods is now supplied. We apologize for the wrong number of “10 cm” in the forced swimming test, this has been corrected. The data concerning the elevated plus-maze are also changed based on the reviewer’s suggestion. For a possible role of locomotor effect, we tested the mice on the rota-rod test. From the result, there is no difference in locomotor activity between control and depressed-like mice (sFig.10G, sFig.12I and sFig.13G). We modified the experimental procedure timeline in Figure 6 and in the method- AAV for gene knockdown or overexpression and viral construct and injection, we added “Mice were singly housed with enough food and water to recover for 4-5 weeks after injection of virus, before behavior tests and electrophysiological recordings.” to report the number of weeks allowed for the mice to recover from surgeries aimed at delivering virus.

      Recommendation 2 for improving the paper:

      Results/conclusions: as yet mentioned, the authors make a confusion in the interpretation of their tail suspension tests and forced swimming tests. I acknowledge that such a confusion is frequent but it is important to note that the tests used by the authors were INITIALLY aimed at detecting the antidepressant effects of drugs under investigation. However, it is not because a test reveals such antidepressant properties that they also provide indices of depression. The authors will surely agree that it is unlikely that a 5-min test provides a model of a chronic pathology accounted for by a complex intrication between genetics and environmental factors. I would propose the authors to read for example Molendijk and De Kloet (Eur J Neurosci 2022). I think that the authors should just neutrally mention their results without any interpretation related to depression. On the other hand, what could have been interesting is to test whether the so-called "depressive-like" responses recorded in the study were sensitive to chronic antidepressant treatments. This would have allowed the authors to further suggest some relevance (if any) with depression-like pathologies.

      As we discussed above, we again agree with the reviewer’s concern. However, if as stated by the reviewer that “However, it is not because a test reveals such antidepressant properties that they also provide indices of depression”, then the experiments suggested by the reviewer “….. to test whether the so-called "depressive-like" responses recorded in the study were sensitive to chronic antidepressant treatments”

      Recommendation 3 for improving the paper:

      A close examination of the responses to CMUS or chronic restraint suggests that indeed two populations of animals were detected, possibly sensitive and resilient to these stressors. Did the authors try to examine this possibility?

      Based on the results of behavior test in CMUS and CRS, animals might be divided into two populations of animals highly-sensitive and moderately-sensitive ones.

      Recommendation 4 for improving the paper:

      There are some text changes that need to be performed:

      Page 2 line 46: ref 4 uses a social stress model which brings no clearcut evidence for it being a "depression" model. Indeed, this model can also be suggested to be a model of chronic anxiety (Kalueff et al., Science 2006; Chaouloff, Cell tissue Res 2013), hence indicating that VTA dopaminergic neurons might also be involved in anxiety.

      page 11, line 329: the references supporting the hypothesis that VTA DA neurons are linked to depression cannot be found in the reference list (10-15 do not correspond to the appropriate references).

      page 11, line 3341: reference 47 does not fit with the authors' assertion as it did not include any behavior.

      Fig. S8: body weight data are likely provided as changes rather than absolute values (e.g. 8 g)

      We agreed with the reviewer’s comments. The line 46“……such as depression states” has been changed to “such as depression- or anxiety-related states”. And we corrected the references in line 329 and 341. Finally, the body weight has been changed to the change in body weight.

      References:

      1. Um, K.B., et al., TRPC3 and NALCN channels drive pacemaking in substantia nigra dopaminergic neurons. Elife, 2021. 10.

      2. Urban, N., et al., Identification and Validation of Larixyl Acetate as a Potent TRPC6 Inhibitor. Mol Pharmacol, 2016. 89(1): p. 197-213.

      3. Zhong, P., et al., HCN2 channels in the ventral tegmental area regulate behavioral responses to chronic stress. Elife, 2018. 7.

      4. Liu, D., et al., Brain-derived neurotrophic factor-mediated projection-specific regulation of depressive-like and nociceptive behaviors in the mesolimbic reward circuitry. Pain, 2018. 159(1): p. 175.

      5. Walsh, J.J. and M.H. Han, The Heterogeneity of Ventral Tegmental Area Neurons: Projection Functions in a Mood-Related Context. Neuroscience, 2014. 282: p. 101-108.

      6. Khaliq, Z.M. and B.P. Bean, Pacemaking in dopaminergic ventral tegmental area neurons: depolarizing drive from background and voltage-dependent sodium conductances. J Neurosci, 2010. 30(21): p. 7401-13.

      7. Li, L., et al., Selective targeting of M-type potassium K(v) 7.4 channels demonstrates their key role in the regulation of dopaminergic neuronal excitability and depression-like behaviour. Br J Pharmacol, 2017. 174(23): p. 4277-4294.

      8. Friedman, A.K., et al., Enhancing depression mechanisms in midbrain dopamine neurons achieves homeostatic resilience. Science, 2014. 344(6181): p. 313-9.

      9. Lammel, S., et al., Diversity of transgenic mouse models for selective targeting of midbrain dopamine neurons. Neuron, 2015. 85(2): p. 429-38.

      10. Lammel, S., et al., Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron, 2008. 57(5): p. 760-73.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public Review): 

      Summary: 

      In their manuscript entitled 'The domesticated transposon protein L1TD1 associates with its ancestor L1 ORF1p to promote LINE-1 retrotransposition', Kavaklıoğlu and colleagues delve into the role of L1TD1, an RNA binding protein (RBP) derived from a LINE1 transposon. L1TD1 proves crucial for maintaining pluripotency in embryonic stem cells and is linked to cancer progression in germ cell tumors, yet its precise molecular function remains elusive. Here, the authors uncover an intriguing interaction between L1TD1 and its ancestral LINE-1 retrotransposon. 

      The authors delete the DNA methyltransferase DNMT1 in a haploid human cell line (HAP1), inducing widespread DNA hypo-methylation. This hypomethylation prompts abnormal expression of L1TD1. To scrutinize L1TD1's function in a DNMT1 knock-out setting, the authors create DNMT1/L1TD1 double knock-out cell lines (DKO). Curiously, while the loss of global DNA methylation doesn't impede proliferation, additional depletion of L1TD1 leads to DNA damage and apoptosis.  

      To unravel the molecular mechanism underpinning L1TD1's protective role in the absence of DNA methylation, the authors dissect L1TD1 complexes in terms of protein and RNA composition. They unveil an association with the LINE-1 transposon protein L1-ORF1 and LINE-1 transcripts, among others.  

      Surprisingly, the authors note fewer LINE-1 retro-transposition events in DKO cells than in DNMT1 KO alone.  

      Strengths: 

      The authors present compelling data suggesting the interplay of a transposon-derived human RNA binding protein with its ancestral transposable element. Their findings spur interesting questions for cancer types, where LINE1 and L1TD1 are aberrantly expressed.  

      Weaknesses: 

      Suggestions for refinement:  

      The initial experiment, inducing global hypo-methylation by eliminating DNMT1 in HAP1 cells, is intriguing and warrants a more detailed description. How many genes experience misregulation or aberrant expression? What phenotypic changes occur in these cells? 

      This is an excellent suggestion. We have gene expression data on WT versus DNMT1 KO HAP1 cells and have included them now as Suppl. Figure S1. The  transcriptome analysis of DNMT1 KO cells showed hundreds of deregulated genes upon DNMT1 ablation. As expected, the majority were up-regulated and gene ontology analysis revealed that among the strongest up-regulated genes were gene clusters with functions in “regulation of transcription from RNA polymerase II promoter” and “cell differentiation” and genes encoding proteins with KRAB domains. In addition, the de novo methyltransferases DNMT3A and DNMT3B were up-regulated in DNMT1 KO cells suggesting the set-up of compensatory mechanisms in these cells. 

      Why did the authors focus on L1TD1? Providing some of this data would be helpful to understand the rationale behind the thorough analysis of L1TD1. 

      We have previously discovered that conditional deletion of the maintenance DNA methyltransferase DNMT1 in the murine epidermis results not only in the up-regulation of mobile elements, such as IAPs but also the induced expression of L1TD1 ([1], Suppl. Table 1 and Author response image 1). Similary, L1TD1 expression was induced by treatment of primary human keratinocytes or squamous cell carcinoma cells with the DNMT inhibitor azadeoxycytidine (Author response images 2 and 3). These findings are in accordance with the observation  that inhibition of DNA methyltransferase activity by aza-deoxycytidine in human non-small cell lung cancer cells (NSCLCs) results in up-regulation of L1TD1 [2]. Our interest in L1TD1 was further fueled by reports on a potential function of L1TD1 as prognostic tumor marker. We have included this information in the last paragraph of the Introduction in the revised manuscript.

      Author response image 1. RT-qPCR of L1TD1 expression in cultured murine control and Dnmt1 Δ/Δker keratinocytes. mRNA levels of L1td1 were analyzed in keratinocytes isolated at P5 from conditional Dnmt1 knockout mice [1]. Hprt expression was used for normalization of mRNA levels and wildtype control was set to 1. Data represent means ±s.d. with n=4. **P < 0.01 (paired t-test). 

      Author response image 2. RT-qPCR analysis of L1TD1 expression in primary human keratinocytes. Cells were treated with 5-aza-2-deoxycidine for 24 hours or 48 hours, with PBS for 48 hours or were left untreated. 18S rRNA expression was used for normalization of mRNA levels and PBS control was set to 1. Data represent means ±s.d. with n=3. **P < 0.01 (paired t-test).

      Author response image 3. Induced L1TD1 expression upon DNMT inhibition in squamous cell carcinoma cell lines SCC9 and SCCO12. Cells were treated with 5-aza-2-deoxycidine for 24 hours, 48 hours or 6 days. (A) Western blot analysis of L1TD1 protein levels using beta-actin as loading control. (B) Indirect immunofluorescence microscopy analysis of L1TD1 expression in SCC9 cells. Nuclear DNA was stained with DAPI. Scale bar: 10 µm. (C)  RT-qPCR analysis of L1TD1 expression in primary human keratinocytes. Cells were treated with 5-aza-2deoxycidine for 24 hours or 48 hours, with PBS for 48 hours or were left untreated. 18S rRNA expression was used for normalization of mRNA levels and PBS control was set to 1. Data represent means ±s.d. with n=3. *P < 0.05, **P < 0.01 (paired t-test).

      The finding that L1TD1/DNMT1 DKO cells exhibit increased apoptosis and DNA damage but decreased L1 retro-transposition is unexpected. Considering the DNA damage associated with retro-transposition and the DNA damage and apoptosis observed in L1TD1/DNMT1 DKO cells, one would anticipate the opposite outcome. Could it be that the observation of fewer transposition-positive colonies stems from the demise of the most transposition-positive colonies? Further exploration of this phenomenon would be intriguing. 

      This is an important point and we were aware of this potential problem. Therefore, we calibrated the retrotransposition assay by transfection with a blasticidin resistance gene vector to take into account potential differences in cell viability and blasticidin sensitivity. Thus, the observed reduction in L1 retrotransposition efficiency is not an indirect effect of reduced cell viability. We have added a corresponding clarification in the Results section on page 8, last paragraph. 

      Based on previous studies with hESCs and germ cell tumors [3], it is likely that, in addition to its role in retrotransposition, L1TD1 has further functions in the regulation of cell proliferation and differentiation. L1TD1 might therefore attenuate the effect of DNMT1 loss in KO cells generating an intermediate phenotype (as pointed out by Reviewer 2) and simultaneous loss of both L1TD1 and DNMT1 results in more pronounced effects on cell viability. This is in agreement with the observation that a subset of L1TD1 associated transcripts encode proteins involved in the control of cell division and cell cycle. It is possible that subtle changes in the expression of these protein that were not detected in our mass spectrometry approach contribute to the antiproliferative effect of L1TD1 depletion as discussed in the Discussion section of the revised manuscript. 

      Reviewer #2 (Public Review):           

      In this study, Kavaklıoğlu et al. investigated and presented evidence for the role of domesticated transposon protein L1TD1 in enabling its ancestral relative, L1 ORF1p, to retrotranspose in HAP1 human tumor cells. The authors provided insight into the molecular function of L1TD1 and shed some clarifying light on previous studies that showed somewhat contradictory outcomes surrounding L1TD1 expression. Here, L1TD1 expression was correlated with L1 activation in a hypomethylation-dependent manner, due to DNMT1 deletion in the HAP1 cell line. The authors then identified L1TD1-associated RNAs using RIP-Seq, which displays a disconnect between transcript and protein abundance (via Tandem Mass Tag multiplex mass spectrometry analysis). The one exception was for L1TD1 itself, which is consistent with a model in which the RNA transcripts associated with L1TD1 are not directly regulated at the translation level. Instead, the authors found the L1TD1 protein associated with L1-RNPs, and this interaction is associated with increased L1 retrotransposition, at least in the contexts of HAP1 cells. Overall, these results support a model in which L1TD1 is restrained by DNA methylation, but in the absence of this repressive mark, L1TD1 is expressed and collaborates with L1 ORF1p (either directly or through interaction with L1 RNA, which remains unclear based on current results), leads to enhances L1 retrotransposition. These results establish the feasibility of this relationship existing in vivo in either development, disease, or both.   

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):        

      Major 

      (1) The study only used one knockout (KO) cell line generated by CRISPR/Cas9. Considering the possibility of an off-target effect, I suggest the authors attempt one or both of these suggestions. 

      A) Generate or acquire a similar DMNT1 deletion that uses distinct sgRNAs, so that the likelihood of off-targets is negligible. A few simple experiments such as qRT-PCR would be sufficient to suggest the same phenotype.  

      B) Confirm the DNMT1 depletion also by siRNA/ASO KD to phenocopy the KO effect.  (2) In addition to the strategies to demonstrate reproducibility, a rescue experiment restoring DNMT1 to the KO or KD cells would be more convincing. (Partial rescue would suffice in this case, as exact endogenous expression levels may be hard to replicate). 

      We have undertook several approaches to study the effect of DNMT1 loss or inactivation: As described above, we have generated a conditional KO mouse with ablation of DNMT1 in the epidermis. DNMT1-deficient keratinocytes isolated from these mice show a significant increase in L1TD1 expression.  In addition, treatment of primary human keratinocytes and two squamous cell carcinoma cell lines with the DNMT inhibitor aza-deoxycytidine led to upregulation of L1TD1 expression. Thus, the derepression of L1TD1 upon loss of DNMT1 expression or activity is not a clonal effect. Also, the spectrum of RNAs identified in RIP experiments as L1TD1-associated transcripts in HAP1 DNMT1 KO cells showed a strong overlap with the RNAs isolated by a related yet different method in human embryonic stem cells. When it comes to the effect of L1TD1 on L1-1 retrotranspostion, a recent study has reported a similar effect of L1TD1 upon overexpression in HeLa cells [4].  

      All of these points together help to convince us that our findings with HAP1 DNMT KO are in agreement with results obtained in various other cell systems and are therefore not due to off-target effects. With that in mind, we would pursue the suggestion of Reviewer 1 to analyze the effects of DNA hypomethylation upon DNMT1 ablation.

      (3) As stated in the introduction, L1TD1 and ORF1p share "sequence resemblance" (Martin 2006). Is the L1TD1 antibody specific or do we see L1 ORF1p if Fig 1C were uncropped?  (6) Is it possible the L1TD1 antibody binds L1 ORF1p? This could make Figure 2D somewhat difficult to interpret. Some validation of the specificity of the L1TD1 antibody would remove this concern (see minor concern below).  

      This is a relevant question. We are convinced that the L1TD1 antibody does not crossreact with L1 ORF1p for the following reasons: Firstly, the antibody does not recognize L1 ORF1p (40 kDa) in the  uncropped Western blot for Figure 1C (Author response image 4A). Secondly, the L1TD1 antibody gives only background signals in DKO cells in the  indirect immunofluorescence experiment shown in Figure 1E of the manuscript. 

      Thirdly, the immunogene sequence of L1TD1 that determines the specificity of the antibody was checked in the antibody data sheet from Sigma Aldrich. The corresponding epitope is not present in the L1 ORF1p sequence. Finally, we have shown that the ORF1p antibody does not cross-react with L1TD1 (Author response image 4B).

      Author response image 4. (A) Uncropped L1TD1 Western blot shown in Figure 1C. An unspecific band is indicated by an asterisk. (B) Westernblot analysis of WT, KO and DKO cells with L1 ORF1p antibody.

      (4) In abstract (P2), the authors mentioned that L1TD1 works as an RNA chaperone, but in the result section (P13), they showed that L1TD1 associates with L1 ORF1p in an RNAindependent manner. Those conclusions appear contradictory. Clarification or revision is required. 

      Our findings that both proteins bind L1 RNA, and that L1TD1 interacts with ORF1p are compatible with a scenario where L1TD1/ORF1p heteromultimers bind to L1 RNA. The additional presence of L1TD1 might thereby enhance the RNA chaperone function of ORF1p. This model is visualized now in Suppl. Figure S7C. 

      (5) Figure 2C fold enrichment for L1TD1 and ARMC1 is a bit difficult to fully appreciate. A 100 to 200-fold enrichment does not seem physiological. This appears to be a "divide by zero" type of result, as the CT for these genes was likely near 40 or undetectable. Another qRT-PCRbased approach (absolute quantification) would be a more revealing experiment. 

      This is the validation of the RIP experiments and the presentation mode is specifically developed for quantification of RIP assays (Sigma Aldrich RIP-qRT-PCR: Data Analysis Calculation Shell). The unspecific binding of the transcript in the absence of L1TD1 in DNMT1/L1TD1 DKO cells is set to 1 and the value in KO cells represents the specific binding relative the unspecific binding. The calculation also corrects for potential differences in the abundance of the respective transcript in the two cell lines. This is not a physiological value but the quantification of specific binding of transcripts to L1TD1. GAPDH as negative control shows no enrichment, whereas specifically associated transcripts show strong enrichement. We have explained the details of RIPqRT-PCR evaluation in Materials and Methods (page 14) and the legend of Figure 2C in the revised manuscript.       

      (6) Is it possible the L1TD1 antibody binds L1 ORF1p? This could make Figure 2D somewhat difficult to interpret. Some validation of the specificity of the L1TD1 antibody would remove this concern (see minor concern below).            

      See response to (3).  

      (7) Figure S4A and S4B: There appear to be a few unusual aspects of these figures that should be pointed out and addressed. First, there doesn't seem to be any ORF1p in the Input (if there is, the exposure is too low). Second, there might be some L1TD1 in the DKO (lane 2) and lane 3. This could be non-specific, but the size is concerning. Overexposure would help see this.

      The ORF1p IP gives rise to strong ORF1p signals in the immunoprecipitated complexes even after short exposure. Under these contions ORF1p is hardly detectable in the input. Regarding the faint band in DKO HAP1 cells, this might be due to a technical problem during Western blot loading. Therefore, the input samples were loaded again on a Western blot and analyzed for the presence of ORF1p, L1TD1 and beta-actin (as loading control) and shown as separate panel in Suppl. Figure S4A. 

      (8) Figure S4C: This is related to our previous concerns involving antibody cross-reactivity. Figure 3E partially addresses this, where it looks like the L1TD1 "speckles" outnumber the ORF1p puncta, but overlap with all of them. This might be consistent with the antibody crossreacting. The western blot (Figure 3C) suggests an upregulation of ORF1p by at least 2-3x in the DKO, but the IF image in 3E is hard to tell if this is the case (slightly more signal, but fewer foci). Can you return to the images and confirm the contrast are comparable? Can you massively overexpose the red channel in 3E to see if there is residual overlap? 

      In Figure 3E the L1TD1 antibody gives no signal in DNMT1/L1TD1 DKO cells confirming that it does not recognize ORF1p. In agreement with the Western blot in Figure 3C the L1 ORF1p signal in Figure 3E is stronger in DKO cells. In DNMT1 KO cells the L1 ORF1p antibody does not recognize all L1TD1 speckles. This result is in agreement with the Western blot shown above in Figure R4B and indicates that the L1 ORF1p antibody does not recognize the L1TD1 protein. The contrast is comparable and after overexposure there are still L1TD1 specific speckles. This might be due to differences in abundance of the two proteins.

      (9) The choice of ARMC1 and YY2 is unclear. What are the criteria for the selection?

      ARMC1 was one of the top hits in a pilot RIP-seq experiment (IP versus input and IP versus  IgG IP). In the actual RIP-seq experiment with DKO HAP1 cells instead of IgG IP as a negative control, we found ARMC1 as an enriched hit, although it was not among the top 5 hits. The results from the 2nd RIP-seq further confirmed the validity of ARMC1 as an L1TD1-interacting transcript. YY2 was of potential biological relevance as an L1TD1 target due to the fact that it is a processed pseudogene originating from YY1 mRNA as a result of retrotransposition. This is mentioned on page 6 of the revised manuscript.

      (10) (P16) L1 is the only protein-coding transposon that is active in humans. This is perhaps too generalized of a statement as written. Other examples are readily found in the literature. Please clarify.  

      We will tone down this statement in the revised manuscript. 

      (11) In both the abstract and last sentence in the discussion section (P17), embryogenesis is mentioned, but this is not addressed at all in the manuscript. Please refrain from implying normal biological functions based on the results of this study unless appropriate samples are used to support them.

      Much of the published data on L1TD1 function are related to embryonic stem cells [3-7]. Therefore, it is important to discuss our findings in the context of previous reports.

      (12) Figure 3E: The format of Figures 1A and 3E are internally inconsistent. Please present similar data/images in a cohesive way throughout the manuscript.  

      We show now consistent IF Figures in the revised manuscript.

      Minor: 

      (1) Intro:           

      - Is L1Td1 in mice and Humans? How "conserved" is it and does this suggest function?  

      Murine and human L1TD1 proteins share 44% identity on the amino acid level and it was suggested that the corresponding genes were under positive selection during evolution with functions in transposon control and maintenance of pluripotency [8].  

      - Why HAP1? (Haploid?) The importance of this cell line is not clear.          

      HAP1 is a nearly haploid human cancer cell line derived from the KBM-7 chronic myelogenous leukemia (CML) cell line [9, 10]. Due to its haploidy is perfectly suited and widely used for loss-of-function screens and gene editing. After gene editing  cells can be used in the nearly haploid or in the diploid state. We usually perform all experiments with diploid HAP1 cell lines.  Importantly, in contrast to other human tumor cell lines, this cell line tolerates ablation of DNMT1. We have included a corresponding explanation in the revised manuscript on page 5, first paragraph.

      - Global methylation status in DNMT1 KO? (Methylations near L1 insertions, for example?) 

      The HAP1 DNMT1 KO cell line with a 20 bp deletion in exon 4 used in our study was validated in the study by Smits et al. [11]. The authors report a significant reduction in overall DNA methylation. However, we are not aware of a DNA methylome study on this cell line. We show now data on the methylation of L1 elements in HAP1 cells and upon DNMT1 deletion in the revised manuscript in Suppl. Figure S1B.

      (2) Figure 1:  

      - Figure 1C. Why is LMNB used instead of Actin (Fig1D)?  

      We show now beta-actin as loading control in the revised manuscript.  

      - Figure 1G shows increased Caspase 3 in KO, while the matching sentence in the result section skips over this. It might be more accurate to mention this and suggest that the single KO has perhaps an intermediate phenotype (Figure 1F shows a slight but not significant trend). 

      We fully agree with the reviewer and have changed the sentence on page 6, 2nd paragraph accordingly.  

      - Would 96 hrs trend closer to significance? An interpretation is that L1TD1 loss could speed up this negative consequence. 

      We thank the reviewer for the suggestion. We have performed a time course experiment with 6 biological replicas for each time point up to 96 hours and found significant changes in the viability upon loss of DNMT1 and again significant reduction in viability upon additional loss of L1TD1 (shown in Figure 1F). These data suggest that as expexted loss of DNMT1 leads to significant reduction viability and that additional ablation of L1TD1 further enhances this effect.

      - What are the "stringent conditions" used to remove non-specific binders and artifacts (negative control subtraction?) 

      Yes, we considered only hits from both analyses, L1TD1 IP in KO versus input and L1TD1 IP in KO versus L1TD1 IP in DKO. This is now explained in more detail in the revised manuscript on page 6, 3rd paragraph.  

      (3) Figure 2:  

      - Figure 2A is a bit too small to read when printed. 

      We have changed this in the revised manuscript.

      - Since WT and DKO lack detectable L1TD1, would you expect any difference in RIP-Seq results between these two?

      Due to the lack of DNMT1 and the resulting DNA hypomethylation, DKO cells are more similar to KO cells than WT cells with respect to the expressed transcripts.

      - Legend says selected dots are in green (it appears blue to me). 

      We have changed this in the revised manuscript.           

      - Would you recover L1 ORF1p and its binding partners in the KO? (Is the antibody specific in the absence of L1TD1 or can it recognize L1?) I noticed an increase in ORF1p in the KO in Figure 3C.  

      Thank you for the suggestion. Yes, L1 ORF1p shows slightly increased expression in the proteome analysis and we have marked the corresponding dot in the Volcano plot (Figure 3A).

      - Should the figure panel reference near the (Rosspopoff & Trono) reference instead be Sup S1C as well? Otherwise, I don't think S1C is mentioned at all. 

      - What are the red vs. green dots in 2D? Can you highlight ERV and ALU with different colors? 

      We added the reference to Suppl. Figure S1C (now S3C) in the revised manuscript. In Figure 2D L1 elements are highlighted in green, ERV elements in yellow, and other associated transposon transcripts in red.     

      - Which L1 subfamily from Figure 2D is represented in the qRT-PCR in 2E "LINE-1"? Do the primers match a specific L1 subfamily? If so, which? 

      We used primers specific for the human L1.2 subfamily. 

      - Pulling down SINE element transcripts makes some sense, as many insertions "borrow" L1 sequences for non-autonomous retro transposition, but can you speculate as to why ERVs are recovered? There should be essentially no overlap in sequence. 

      In the L1TD1 evolution paper [8], a potential link between L1TD1 and ERV elements was discussed: 

      "Alternatively, L1TD1 in sigmodonts could play a role in genome defense against another element active in these genomes. Indeed, the sigmodontine rodents have a highly active family of ERVs, the mysTR elements [46]. Expansion of this family preceded the death of L1s, but these elements are very active, with 3500 to 7000 species-specific insertions in the L1-extinct species examined [47]. This recent ERV amplification in Sigmodontinae contrasts with the megabats (where L1TD1 has been lost in many species); there are apparently no highly active DNA or RNA elements in megabats [48]. If L1TD1 can suppress retroelements other than L1s, this could explain why the gene is retained in sigmodontine rodents but not in megabats." 

      Furthermore, Jin et al. report the binding of L1TD1 to repetitive sequences in transcripts [12]. It is possible that some of these sequences are also present in ERV RNAs.

      - Is S2B a screenshot? (the red underline). 

      No, it is a Powerpoint figure, and we have removed the red underline.

      (4) Figure 3: 

      - Text refers to Figure 3B as a western blot. Figure 3B shows a volcano plot. This is likely 3C but would still be out of order (3A>3C>3B referencing). I think this error is repeated in the last result section. 

      - Figure and legends fail to mention what gene was used for ddCT method (actin, gapdh, etc.). 

      - In general, the supplemental legends feel underwritten and could benefit from additional explanations. (Main figures are appropriate but please double-check that all statistical tests have been mentioned correctly).

      Thank you for pointing this out. We have corrected these errors in the revised manuscript.

      (5) Discussion: 

      -Aluy connection is interesting. Is there an "Alu retrotransposition reporter assay" to test whether L1TD1 enhances this as well? 

      Thank you for the suggestion. There is indeed an Alu retrotransposition reporter assay reported be Dewannieux et al. [13]. The assay is based on a Neo selection marker. We have previously tested a Neo selection-based L1 retrotransposition reporter assay, but this system failed to properly work in HAP1 cells, therefore we switched to a blasticidinbased L1 retrotransposition reporter assay. A corresponding blasticidin-based Alu retrotransposition reporter assay might be interesting for future studies (mentioned in the Discussion, page 11 paragraph 4 of the revised manuscript.

      (6) Material and Methods       : 

      - The number of typos in the materials and methods is too numerous to list. Instead, please refer to the next section that broadly describes the issues seen throughout the manuscript. 

      Writing style  

      (1) Keep a consistent style throughout the manuscript: for example, L1 or LINE-1 (also L1 ORF1p or LINE-1 ORF1p); per or "/"; knockout or knock-out; min or minute; 3 times or three times; media or medium. Additionally, as TE naming conventions are not uniform, it is important to maintain internal consistency so as to not accidentally establish an imprecise version. 

      (2) There's a period between "et al" and the comma, and "et al." should be italic. 

      (3) The authors should explain what the key jargon is when it is first used in the manuscript, such as "retrotransposon" and "retrotransposition".    

      (4) The authors should show the full spelling of some acronyms when they use it for the first time, such as RNA Immunoprecipitation (RIP).  

      (5) Use a space between numbers and alphabets, such as 5 µg.  

      (6) 2.0 × 105 cells, that's not an "x".  

      (7) Numbers in the reference section are lacking (hard to parse).  

      (8) In general, there are a significant number of typos in this draft which at times becomes distracting. For example, (P3) Introduction: Yet, co-option of TEs thorough (not thorough, it should be through) evolution has created so-called domesticated genes beneficial to the gene network in a wide range of organisms. Please carefully revise the entire manuscript for these minor issues that collectively erode the quality of this submission.  

      Thank you for pointing out these mistakes. We have corrected them in the revised manuscript. A native speaker from our research group has carefully checked the paper. In summary, we have added Supplementary Figure S7C and have changed Figures 1C, 1E, 1F, 2A, 2D, 3A, 4B, S3A-D, S4B and S6A based on these comments. 

      REFERENCES

      (1) Beck, M.A., et al., DNA hypomethylation leads to cGAS-induced autoinflammation in the epidermis. EMBO J, 2021. 40(22): p. e108234.

      (2) Altenberger, C., et al., SPAG6 and L1TD1 are transcriptionally regulated by DNA methylation in non-small cell lung cancers. Mol Cancer, 2017. 16(1): p. 1.

      (3) Narva, E., et al., RNA-binding protein L1TD1 interacts with LIN28 via RNA and is required for human embryonic stem cell self-renewal and cancer cell proliferation. Stem Cells, 2012. 30(3): p. 452-60.

      (4) Jin, S.W., et al., Dissolution of ribonucleoprotein condensates by the embryonic stem cell protein L1TD1. Nucleic Acids Res, 2024. 52(6): p. 3310-3326.

      (5) Emani, M.R., et al., The L1TD1 protein interactome reveals the importance of posttranscriptional regulation in human pluripotency. Stem Cell Reports, 2015. 4(3): p. 519-28.

      (6) Santos, M.C., et al., Embryonic Stem Cell-Related Protein L1TD1 Is Required for Cell Viability, Neurosphere Formation, and Chemoresistance in Medulloblastoma. Stem Cells Dev, 2015. 24(22): p. 2700-8.

      (7) Wong, R.C., et al., L1TD1 is a marker for undifferentiated human embryonic stem cells. PLoS One, 2011. 6(4): p. e19355.

      (8) McLaughlin, R.N., Jr., et al., Positive selection and multiple losses of the LINE-1-derived L1TD1 gene in mammals suggest a dual role in genome defense and pluripotency. PLoS Genet, 2014. 10(9): p. e1004531.

      (9) Andersson, B.S., et al., Ph-positive chronic myeloid leukemia with near-haploid conversion in vivo and establishment of a continuously growing cell line with similar cytogenetic pattern. Cancer Genet Cytogenet, 1987. 24(2): p. 335-43.

      (10) Carette, J.E., et al., Ebola virus entry requires the cholesterol transporter Niemann-Pick C1. Nature, 2011. 477(7364): p. 340-3.

      (11) Smits, A.H., et al., Biological plasticity rescues target activity in CRISPR knock outs. Nat Methods, 2019. 16(11): p. 1087-1093.

      (12) Jin, S.W., et al., Dissolution of ribonucleoprotein condensates by the embryonic stem cell protein L1TD1. Nucleic Acids Res, 2024.

      (13) Dewannieux, M., C. Esnault, and T. Heidmann, LINE-mediated retrotransposition of marked Alu sequences. Nat Genet, 2003. 35(1): p. 41-8.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Deng et al reports single-cell expression analysis of developing mouse hearts and examines the requirements for cardiac fibroblasts in heart maturation. Much of this work is overlapping with previous studies, but the single-cell gene expression data may be useful to investigators in the field. The significance and scope of new findings are limited and major conclusions are largely based on correlative data.

      Strengths:

      The strengths of the manuscript are the new single-cell datasets and comprehensive approach to ablating cardiac fibroblasts in pre and postnatal development in mice.

      Weaknesses:

      There are several major weaknesses in the analysis and interpretation of the results.

      (1) The major conclusions regarding collagen signaling and heart maturation are based on gene expression patterns and are not functionally validated. The potential downstream signaling pathways were not examined and known structural contributions of fibrillar collagen to heart maturation are not discussed.

      We thank the reviewer for the comment. In this study, we mainly focused on the functional analysis of fibroblasts in heart development at embryonic and neonatal stages by using cell ablation system and single cell mRNA sequencing analysis. The further functional analysis of collagen pathway is interesting but out of the scope of this study. We will continue this line of research and share the results in the future. Moreover, through the analysis of single cell mRNA-sequencing data, we have predicted the downstream genes that are regulated by the collagen pathway in Fig 5C. We have also added sentences to highlight the structural role of collagen in affecting the related heart developmental processes.

      (2) The heterogeneity of fibroblast populations and contributions to multiple structures in the developing heart are not well-considered in the analysis. The developmental targeting of fibroblasts will likely affect multiple structures in the embryonic heart and other organs. Lethality is described in some of these studies, but additional analysis is needed to determine the effects on heart morphogenesis or other organs beyond the focus on cardiomyocyte maturation being reported. In particular, the endocardial cushions and developing valves are likely to be affected in the prenatal ablations, but these structures are not included in the analyses.

      We thank the reviewer for the comment. We have included a new figure presenting the fibroblast heterogeneity in developing hearts (Fig S3). We have also compared the valve structural differences at E18.5 (Fig S11).

      (3) ECM complexity and extensive previous work on specific ECM proteins in heart development and maturation are not incorporated into the current study. Different types of collagen (basement membrane Col4, filamentous Col6, and fibrillar Col1) are known to be expressed in fibroblast populations in the developing heart and have been studied extensively. Much also has been reported for other ECM components mentioned in the current work.

      We thank the reviewer for the comment. We agree that the ECM is complex, and the functions of many of its components have been previously reported, as mentioned in the introduction. In this study, our focus is to analyze the spatial and temporal expression patterns of various ECM genes in fibroblasts throughout developmental progression (Fig. S5–7). To further acknowledge previous work, we have added additional sentences and cited relevant literature on the role of collagen genes in developing hearts (page 4).

      Reviewer #2 (Public review):

      This study aims to elucidate the role of fibroblasts in regulating myocardium and vascular development through signaling to cardiomyocytes and endothelial cells. This focus is significant, given that fibroblasts, cardiomyocytes, and vascular endothelial cells are the three primary cell types in the heart. The authors employed a Pdgfra-CreER-controlled diphtheria toxin A (DTA) system to ablate fibroblasts at various embryonic and postnatal stages, characterizing the resulting cardiac defects, particularly in myocardium and vasculature development. scRNA-seq analysis of the ablated hearts identified collagen as a crucial signaling molecule from fibroblasts that influences the development of cardiomyocytes and vascular endothelial cells. This is an interesting manuscript; however, there are several major issues, including an over-reliance on the scRNA-seq data, which shows inconsistencies between replicates. Some of the major issues are described below.

      The comments are the same as the comments for “Recommendations for the authors”. Please see the responses below.

      Reviewer #3 (Public review):

      The authors investigated fibroblasts' communication with key cell types in developing and neonatal hearts, with a focus on the critical roles of fibroblast-cardiomyocyte and fibroblast-endothelial cell networks in cardiac morphogenesis. They tried to map the spatial distribution of these cell types and reported the major pathways and signaling molecules driving the communication. They also used Cre-DTA system to ablate Pdgfra labeled cells and observed myocardial and endothelial cell defects at development. They screened the pathways and genes using sequencing data of ablated hearts. Lastly, they reported compensatory collagen expression in long-term ablated neonate hearts. Overall, this study provides us with important insight into fibroblasts' roles in cardiac development and will be a powerful resource for collagens and ECM-focused research.

      Strengths:

      The authors utilized good analyzing tools to investigate multiple databases of single-cell sequencing and Multiseq. They identified significant pathways and cellular and molecular interactions of fibroblasts. Additionally, they compared some of their analytic findings with a human database, and identified several groups of ECM genes with varying roles in mice.

      Weaknesses:

      This study is majorly based on sequencing data analysis. At the bench, they used a very strident technique to study fibroblast functions by ablating one of the major cell populations of the heart. Considering the importance of the fibroblast population, intriguing in vivo findings were expected. Also, they analyzed the downstream genes in ablated hearts, but did not execute any experimental validation for any of the targets.

      Recommendations for the authors:

      Reviewing Editor Comments:

      All three reviewers found the large amount of scRNA-Seq data compelling and valuable, and they noted that the study's conclusions based on the scRNA Seq and fibroblast ablating align closely with previously published studies. Therefore, a more thorough discussion and integration of the current findings with prior studies are recommended. Each reviewer provided specific feedback to improve the manuscript, correct errors, and strengthen the overall presentation, and please edit the manuscript accordingly. Additionally, further validation of the scRNA-Seq data through more data analysis, reference comparisons, or additional experiments is encouraged.

      Reviewer #1 (Recommendations for the authors):

      (1) The heterogeneity of fibroblasts and ECM components in the developing heart needs to be considered in the analysis and description of results. There are extensive reports in both of these areas that would inform the gene expression and ablation studies being reported.

      We thank the reviewer for the comment. We have added a supplemental figure (Fig. S3) analyzing the heterogeneity of fibroblasts during development and described the results on page 3 and 4. Through the analysis of single-cell mRNA sequencing data, we identified four distinct populations of fibroblasts and further performed RNA scope to examine their spatial locations. Additionally, we agree with the reviewer that there are many types of ECM components, which we have addressed in the introduction (page 2). Furthermore, we have conducted a detailed analysis of the spatial and temporal expression patterns of ECM genes throughout developmental progression (Figs. S5–7).

      (2) One of the novel aspects of the work is the prenatal ablation of cardiac fibroblasts. Embryonic lethality was observed in some cases, but the specific cardiac structural anomalies or potential vascular effects were not described. The contributing role of cardiac fibroblasts to valvuloseptal development, which was likely affected in these studies, was not described.

      We thank the reviewer for the comment. Since the heart sections were not initially prepared to compare valve differences between control and ablation conditions, most sections do not include valve structures. However, in the small subset of sections that do contain valves, we have compared valve structures in control and ablated hearts at E18.5 following three doses of tamoxifen treatment from E15.5 to E17.5. In mutants, the valves appear shorter compared to controls. Specifically, we observed that in control hearts, the mitral valve was already connected to the papillary muscle, whereas in ablated hearts, the valve leaflet at similar position was not. We have included these images as a new supplemental figure (Fig. S11). Regarding vascular defects, we have described them in Fig. 3C and 3F.

      (3) The major conclusions regarding collagen signaling and heart development are based on correlations in gene expression and are not validated by functional data. What are the downstream signaling pathways affected and are they affected during development or with ablation? The main conclusions of the study do not take into account well-known structural functions of collagen in the developing heart.

      We thank the reviewer for the comment. Through regulatory prediction analysis, we identified the collagen ligands Col1a1, Col5a1, and Col4a1 from the collagen family (Fig. 5C), which regulate multiple genes in cardiomyocytes, including Masp1. Masp1 is a member of the lectin complement pathway and potentially regulates cardiomyocyte migration during development. These collagen ligands also regulate multiple mitochondria-related genes, such as Etfa, Ndufb10, Ndufs6, and Slc25a4, which are potentially important for cardiomyocyte development and maturation. Moreover, we agree with the reviewer that collagen is an important structural ECM protein, and its deletion or reduction could cause heart developmental defects due to its structural role. We have added a discussion on this possibility (page 8).

      (4) The postnatal ablation studies are very similar to studies with the same mouse lines reported by Kurabara et al 2022 in JMCC (PMID 35569524) which came to similar conclusions and was not cited in the current work.

      We thank the reviewer for the comment and apologize for overlooking this study. We have now included the citation on page 8.

      (5) The discussion of a regenerative response with DTA ablation of fibroblasts is confusing. Proliferation was examined in cardiomyocytes which lose their regenerative capacity after birth in mice. However, cardiac fibroblasts can proliferate in response to injury throughout life which is not really a regenerative process.

      We appreciate the reviewer’s comment. To avoid confusion, we have replaced the term "regeneration" with "response to cell loss" and "compensation."

      (6) Some of the descriptions of single-cell expression data are overstated (Page 7). Regulatory interactions, signaling pathway activation, or function cannot be determined from gene expression data alone.

      We thank the reviewer for the comment. We agree that these conclusions rely on results from multiple assays. We have weakened the description of the analysis by emphasizing that the findings are predictive results from scRNA-seq analysis.

      (7) In the last paragraph of the discussion "data not shown" should be shown or this information should be deleted. As written, the discussion does not present a clear description of what major new findings are being reported or why they are significant. The new insights into heart development are not specified.

      We thank the reviewer for the comment. We have added the data as a supplemental figure (Fig. S19). Since this paragraph is part of the discussion, we believe the results are not conclusive at this stage and require further research to explore the potential protective role of fibroblast ablation in neonatal hearts.

      Minor comments.

      (1) Figure legends are missing information needed to understand what is being shown. For example, in Figure 2, collagen is visualized using CHP staining.

      Thanks. We have gone through all figure legends to ensure that all necessary information has been provided.

      (2) The hearts in Figure S15 are upside down.

      Thanks. We have updated the figure.

      (3) In Figure S16A, "brian" should be "brain".

      Thanks. We have updated it.

      Reviewer #2 (Recommendations for the authors):

      This is an interesting manuscript; however, there are several major issues, including an overreliance on the scRNA-seq data, which shows inconsistencies between replicates. Some of the major issues are described below.

      (1) The CD31 immunostaining data (Figures 3B-G) indicate a reduction in endothelial cell numbers following fibroblast deletion using PdgfraCreER+/-; RosaDTA+/- mice. However, the scRNA-seq data show no percentage change in the endothelial cell population (Figure 4D). Furthermore, while the percentage of Vas_ECs decreased in ablated samples at E16.5, the results at E18.5 were inconsistent, showing an increase in one replicate and a decrease in another, raising concerns about the reliability of the RNA-seq findings.

      We thank the reviewer for the comment. We believe that measuring cell proportions in scRNA-seq results is sensitive and relies on a high number of total and target cells, similar to other cell counting assays such as FACS. As the reviewer pointed out, the proportions of Vas_EC in E18.5 replicates are inconsistent. Specifically, Col_4 at E18.5 showed a relatively low proportion of Vas_EC. Upon examining the cell numbers in each sample, we found that Col_4 had the lowest number of recovered cells, with approximately 760 in total, whereas the other samples had more than 920 cells each. Additionally, since immunofluorescence staining for CD31 marks both Vas_EC and Endo_EC, we combined these two cell types to increase the number of targeted cells. This analysis consistently showed that the ablated samples had lower proportions. However, given that the quantifications have also produced inconsistent results for other cell types, such as Ven_CM, as mentioned in the reviewer’s next question, we have decided to delete this plot to avoid confusion.

      Author response image 1.

      (2) Similarly, while the percentage of Ven_CMs increased at E18.5, it exhibited differing trends at E16.5 (Figure 4E), further highlighting the inconsistency of the scRNA-seq analysis with the other data.

      We thank the reviewer for the comment. Please see the response above.

      (3) Furthermore, the authors noted that the ablated samples had slightly higher percentages of cardiomyocytes in the G1 phase compared to controls (Figures 4H, S11D), which aligns with the enrichment of pathways related to heart development, sarcomere organization, heart tube morphogenesis, and cell proliferation. However, it is unclear how this correlates with heart development, given that the hearts of ablated mice are significantly smaller than those of controls (Figure 3E). Additionally, the heart sections from ablated samples used for CD31/DAPI staining in Figure 3F appear much larger than those of the controls, raising further inconsistencies in the manuscript.

      We thank the reviewer for the comment. We observed changes in G1-phase cardiomyocytes at both E16.5 and E18.5, with pathway enrichment primarily identified in E16.5 cardiomyocytes. At E16.5, the ablated hearts exhibited myocardial defects, including an increased trabecular-to-compact myocardium ratio and reduced vascular density. By E18.5, the ablated embryos had smaller hearts with reduced vascular density, although the trabecular-to-compact myocardium ratio showed no obvious changes. Regarding the larger section size in the ablated hearts compared to the control hearts, there are two reasons contributing to this discrepancy. First, the control and ablated heart sections have different scale bars. The ablated hearts were enlarged compared to control section. Secondly, the heart sections vary in size depending on their position. Sections taken from the middle of the heart are larger than those from the edges. In our initial comparison, we used an edge-positioned section from the control hearts and a middle-positioned section from the ablated hearts. To avoid confusion, we have now updated the control section to match the position of the ablated embryos more closely and used the same size of scale bars in the two images (Fig 3F).

      (4) The manuscript relies heavily on the scRNA-seq dataset, which shows inconsistencies between the two replicates. Furthermore, the morphological and histological analyses do not align with the scRNA-seq findings.

      We respectfully disagree with this comment from the reviewer. As shown in Figure 4B, the scRNAseq data from the two replicates are highly consistent. For inconsistencies in cell proportions and tissue section sizes, please refer to our responses above.

      (5) There is a lack of mechanistic insight into how collagen, as a key signaling molecule from fibroblasts, affects the development of cardiomyocytes and vascular endothelial cells.

      We thank the reviewer for the comment. In this study, we primarily focused on analyzing fibroblast function in heart development using cell ablation and single-cell mRNA sequencing. While further mechanistic analysis of the collagen pathway is intriguing, it falls outside the scope of this study. Additionally, our scRNAseq analysis identified multiple collagen ligands derived from fibroblasts that may regulate gene expression in Ven_CM and influence their development, as shown in Figure 5C. Although validating these predictions would be valuable, it is beyond the scope of this study. We will continue this line of research and share our findings in the future.

      (6) In Figure 1B, Col1a1 expression is observed in the epicardial cells (Figure 1A, E11.5), but this is not represented in the accompanying cartoon.

      We thank the reviewer for the comment. As stated in the main text (page 3), based on scRNA-seq and IF staining results, we observed that Col1a1 is also expressed in epicardial cells. In the cartoon, we depicted the pattern of fibroblasts rather than Col1a1-positive cells, which is why we did not include epicardial cells.

      (7) What is the genotype of the control animals used in the study?

      We thank the reviewer for the comment. We have added the genotype information for the control embryos in the legends of the relevant figures.

      (8) Do the PdgfraCreER+/-; RosaDTA+/- mice survive after birth when induced at E15.5, and do they exhibit any cardiac defects?

      We thank the reviewer for the comment. This is an interesting question; however, we did not perform the experiment because administering tamoxifen to pregnant mice from E15.5 to E18.5 causes delivery complications, as reported in the literature (PMID: 23139287). Unfortunately, this prevents us from exploring this question further.

      Reviewer #3 (Recommendations for the authors):

      Overall, this is a comprehensive study substantiated by the evidence the authors provided in their findings. However, I have a few concerns to be addressed.

      (1) The claim by the authors that "at E17.5 and P3, each FB was in contact with approximately one Vas_EC and four CMs at both stages" is not fully convincing. RNA scope images for Actn2 are not clear enough to lead the quantification (RNA scope images for Cdh5 look better). I suggest performing imaging at higher magnification and the Z stack technique to provide a better understanding of their localization. Also, no changes in FBs adjacent cell numbers (CM&EC) with ages (P3) compared to E17.5? Any thoughts on the explanation?

      We thank the reviewer for the comment. We imaged the staining results using a confocal microscope at 20X resolution. We also considered imaging them at 40X; however, due to the large areas that need to be imaged in these sections, it was challenging to do so. Additionally, we identified each CM based on Actn2 and DAPI staining information and are confident in the accuracy of our quantification results. Moreover, since each FB interacts with multiple CMs and Vas_ECs in 3D projections, but our calculations are based only on 2D imaging sections, there may be discrepancies compared to a true 3D environment. We have added a sentence to address this limitation (page 9). Regarding the similar number of interactions observed at E17.5 and P3, we think there are two possibilities. First, the three cell types may proliferate in a synchronized manner, maintaining a consistent number of interactions. Second, these cell types may exhibit minimal proliferation during late embryonic and early neonatal stages. Instead, heart growth primarily occurs through CM hypertrophy, which does not significantly alter the number of interactions.

      (2) Fix the Capitalized font of RNA markers in Figure S2.

      Thanks. We have updated them.

      (3) I appreciate the visualization of ligand-receptor interactions in collagen network comparison between FB to CM and FB to EC, and predictive analysis on the FB ligands that regulate differentially expressed genes in ablated heart CM and ECs.

      We appreciate the reviewer for the comment.

      (4) The authors depleted Pdgfra-Cre cells at E10.5, and reported 100% DTA+ lethality after 3 days. Induction at E13.5 to ablate Pdgfra-Cre cells resulted in survival at least up to E16.5 age. What could be the possible reasons authors think that lead to embryo lethality when induced at E10.5? Did the authors analyze the expression of Pdgfra at E10.5 to E13.5 using Pdgfra antibody or Pdgfra-Cre labeling, or using the ScRNA seq data?

      We thank the reviewer for the comment. The expression pattern of Pdgfra at E10.5 has been previously reported (PMID: 18297729) and shown to be highly expressed in the atrioventricular region, consistent with the Col1a1 expression pattern we profiled in this study. Therefore, we believe the embryonic lethality observed in the ablated embryos at E10.5 was likely due to the disruption of the atrioventricular structure. However, since Pdgfra is also expressed in other tissues at this stage, we cannot rule out the possibility that the ablation of non-cardiac tissues also contributed to the lethality.

      (5) In terms of the findings on the trabeculation and compaction defects, please provide the images of the ventricles with markers to indicate the compact and trabecular zones and their defects.

      Thanks! We have included images that illustrate the quantification of compact and trabecular myocardium thickness in control and ablated hearts (FigS10C).

      (6) Did the author check the expression of any other marker for the vascular system in addition to CD31 to see the effects of ablated FB on coronary vasculature development?

      We thank the reviewer for the comment. We analyzed only Cd31 to assess the effects of fibroblast ablation on the overall endothelial cell population. We did not separately examine the subpopulations, but this would be an interesting direction for future studies.

      (7) Can the authors interpret how findings from PHH3 proliferation explain thinner compact and thicker trabeculae in ablated hearts?

      We thank the reviewer for the comment and apologize for the misinterpretation of the results. We observed that the ablated hearts have a thinner compact myocardium, while the thickness of the trabecular myocardium remains unchanged, leading to an increased trabecular-to-compact myocardium ratio (Fig 3D). We have corrected the description in the manuscript accordingly. Moreover, since the compact myocardium has a higher proliferation rate than the trabecular myocardium, a reduction in overall cell proliferation is expected to have a more pronounced impact on the compact myocardium. Inhibition of compact myocardium proliferation has been reported to lead thinner compact myocardium and non-compaction defects (PMID: 31342111).

      (8) The authors did not execute experiments to find the downstream target that causes compaction defects and endothelial cell density defects upon ablation of FBs. Can you project from your sequencing analysis what could be the potential downstream if you could execute bench-side experiments on this?

      We appreciate the reviewer for the comment. We believe that the regulatory predictive results in Figures 5C and D from the scRNA-seq data analysis have provided a set of downstream candidates for validation. We could select some of the ligands, such as the collagen ligands Col1a1, Col4a1, and Col5a1, to treat the ablated embryos in vivo to assess whether they could partially rescue the myocardium defects. Additionally, we could conduct ex vivo experiments by co-culturing CM and FB, comparing them with CM alone and CM treated with the identified ligands. This would allow us to evaluate CM proliferation and the expression of downstream genes identified in the prediction results. However, as the reviewer suggested, these experiments are planned for future studies.

      (9) Please provide the echocardiographic M mode images with a comparable number of cardiac cycles in control and ablated (Fig. 6H). Also, the heart rate of the ablated heart is too low to compare other parameters with the control. If you could stabilize the heart rate at comparable values to control the heart, it is possible that EF and FS values will be largely changed.

      We thank the reviewer for the comment. As the echocardiographic analysis was performed on conscious mice, the lower heart rates in the ablated mice are a phenotype associated with the ablation. Unfortunately, we are unable to adjust them to the same as the control mice.

      (10) Can you provide a numerical dataset for any one of the cell chat figures? Like in figure 2A, supporting the claim "However, in terms of interaction strength, FB exhibited the highest values compared to those of other cell types (Fig. 2A)".

      Yes, we have added a supplemental table (Table S2) containing the numerical interaction weights. As shown in the table, the interactions between FB and other cell types have the highest values.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Chen and Phillips describe the dynamic appearance of cytoplasmic granules during embryogenesis analogous to SIMR germ granules, and distinct from CSR-1-containing granules, in the C. elegans germline. They show that the nuclear Argonaute NRDE-3, when mutated to abrogate small RNA binding, or in specific genetic mutants, partially colocalizes to these granules along with other RNAi factors, such as SIMR-1, ENRI-2, RDE-3, and RRF-1. Furthermore, NRDE-3 RIP-seq analysis in early vs. late embryos is used to conclude that NRDE-3 binds CSR-1-dependent 22G RNAs in early embryos and ERGO-1dependent 22G RNAs in late embryos. These data lead to their model that NRDE-3 undergoes small RNA substrate "switching" that occurs in these embryonic SIMR granules and functions to silence two distinct sets of target transcripts - maternal, CSR-1 targeted mRNAs in early embryos and duplicated genes and repeat elements in late embryos.

      Strengths:

      The identification and function of small RNA-related granules during embryogenesis is a poorly understood area and this study will provide the impetus for future studies on the identification and potential functional compartmentalization of small RNA pathways and machinery during embryogenesis.

      Weaknesses:

      (1) While the authors acknowledge the following issue, their finding that loss of SIMR granules has no apparent impact on NRDE-3 small RNA loading puts the functional relevance of these structures into question. As they note in their Discussion, it is entirely possible that these embryonic granules may be "incidental condensates." It would be very welcomed if the authors could include some evidence that these SIMR granules have some function; for example, does the loss of these SIMR granules have an effect on CSR-1 targets in early embryos and ERGO-1-dependent targets in late embryos?

      We appreciate reviewer 1’s concern that we do not provide enough evidence for the function of the SIMR granules. As suggested, we examined the NRDE-3 bound small RNAs more deeply, and we do observe a slight but significant increased CSR-class 22G-RNAs binding to NRDE-3 in late embryos of simr-1 and enri-2 mutants (see below, right). We hypothesize that this result could be due to a slower switch from CSR to ERGO 22G-RNAs in the absence of SIMR granules. We added these data to Figure 6G.

      (2) The analysis of small RNA class "switching" requires some clarification. The authors re-define ERGO1-dependent targets in this study to arrive at a very limited set of genes and their justification for doing this is not convincing. What happens if the published set of ERGO-1 targets is used? 

      As we mentioned in the manuscript, we initially attempted to use the previously defined ERGO targets. However, the major concern is fewer than half the genes classified as ERGO targets by Manage et al. and Fischer et al. overlap with one another (Figure 6—figure supplement 1D and below). We reason this might because the gene sets were defined as genes that lose small RNAs in various ERGO pathway mutants and because different criteria were used to define the lists as discussed in the manuscript (lines 471-476). As a result, some of the previously defined ERGO target genes may actually be indirect targets of the pathway. Here we focus on genes targeted by small RNAs enriched in an ERGO pathway Argonaute IP, which should be more specific.

      In this manuscript, we are interested specifically in the ERGO targets bound by NRDE-3, thus we utilized the IP-small RNA sequencing data from young adult animals (Seroussi et al, 2023), to define a new ERGO list. We are confident about this list because 1) Most of our new ERGO genes overlap with the overlap between ERGO-Manage and ERGO-Fischer list (see Figure 6—figure supplement 1D in our manuscript and below). 2) We observed the most significant decrease of small RNA levels and increase of mRNA levels in the nrde-3 mutants using our newly defined list (see Figure 6—figure supplement 1E-F in our manuscript).

      To further address reviewer 1’s concern about whether the data would look significantly different when using the ERGO-Manage and ERGO-Fischer lists, we made new scatter plots shown in Author response image 1 panels A-C below (ERGO-Manage – purple, ERGO-Fischer- yellow, and the overlap - yellow with purple ring). We found that the small switching pattern of NRDE-3 is consistent with our newly defined list, particularly if we look at the overlap of ERGO-Manage and ERGO-Fischer list (Author response image 1 panels D-F below, red).

      Author response image 1.

      Further, the NRDE-3 RIP-seq data is used to conclude that NRDE-3 predominantly binds CSR-1 class 22G RNAs in early embryos, while ERGO-1-dependent 22G RNAs are enriched in late embryos. a) The relative ratios of each class of small RNAs are given in terms of unique targets. What is the total abundance of sequenced reads of each class in the NRDE-3 IPs? 

      To address the reviewer’s question about the total abundance of sequenced reads of each class in the NRDE-3 IPs: Author response image 2 panel A-B below show the total RPM of CSR and ERGO class sRNAs in inputs and IPs at different stages. Focusing on late embryos, the total abundance of ERGO-dependent sRNAs is similar to CSR-class sRNAs in input, while much higher in IP, indicating an enrichment of ERGO-dependent 22G-RNAs in NRDE-3 consistent with our log2FC (IP vs input) in Figure 6B. This data supports our conclusion that NRDE-3 preferentially binds to ERGO targets in late embryos.

      Author response image 2.

      b) The "switching" model is problematic given that even in late embryos, the majority of 22G RNAs bound by NRDE-3 is the CSR-1 class (Figure 5D). 

      It is important to keep in mind the difference in the total number of CSR target genes (3834) and ERGO target genes (119).  The pie charts shown in Figure 6D are looking at the total proportion of the genes enriched in the NRDE-3 IP that are CSR or ERGO targets. For the NRDE-3 IP in late embryos, that would be 70/119 (58.8%) of ERGO targets are enriched, while 172/3834 (4.5%) of CSR targets are enriched. These data are also supported by the RPM graphs shown in Author response image 2 panels A-B above, which show that the majority of the small RNA bound by NRDE-3 in late embryos are ERGO targets. Nonetheless, NRDE-3 still binds to some CSR targets shown as Figure 6D and panel B, which may be because the amount of CSR-class 22G-RNAs is reduced gradually across embryonic development as the maternally-deposited NRDE-3 loaded with CSR-class 22G-RNAs is diluted by newly transcribed NRDE-3 loaded with ERGOdependent 22G-RNAs (lines 857-862). 

      c) A major difference between NRDE-3 small RNA binding in eri-1 and simr-1 mutants appears to be that NRDE-3 robustly binds CSR-1 22G RNAs in eri-1 but not in simr-1 in late embryos. This result should be better discussed.

      In the eri-1 mutant, we hypothesize that NRDE-3 robustly binds CSR-class 22G-RNAs because ERGOclass 22G-RNAs are not synthesized during mid-embryogenesis, so either NRDE-3 is unloaded (in granule at 100-cell stage in Figure 2A) or mis-loaded with CSR-class 22G-RNAs (in the nucleus at 100cell stage in Figure 2A). We don’t have a robust method to address the proportion of loaded vs. unloaded NRDE-3 so it is difficult to address the degree to which NRDE-3 is misloaded in the eri-1 mutant. In the simr-1 mutant, both classes of small RNAs are present and NRDE-3 is still preferentially loaded with ERGO-dependent 22G-RNAs, though we do see a subtle increase in association with CSR-class 22GRNAs. These data could suggest a less efficient loading of NRDE-3 with ERGO-dependent 22G-RNAs, but we would need more precise methods to address the loading dynamics in the simr-1 mutant.

      (3) Ultimately, if the switching is functionally important, then its impact should be observed in the expression of their targets. RNA-seq or RT-qPCR of select CSR-1 and ERGO-1 targets should be assessed in nrde-3 mutants during early vs late embryogenesis.

      The function of NRDE-3 at ERGO targets has been well studied (Guang et al, 2008) and is also assessed in our H3K9me3 ChIP-seq analysis in Figure 7E where, in mixed staged embryos, H3K9me3 level on ERGO targets (labeled as ‘NRDE-3 targets in young adults’) is reduced significantly in the nrde-3 mutant.

      To understand the function of NRDE-3 binding on CSR targets in early embryos, we attempted to do RTqPCR, smFISH, and anti-H3K9me3 CUT&Tag-seq on early embryos, and we either failed to obtain enough signal or failed to detect any significant difference (data not shown). We additionally tested the possibility that NRDE-3 functions with CSR-class 22G-RNAs in oocytes. We present new data showing that NRDE-3 represses RNA Pol II in oocytes to promote global transcriptional repression at the oocyteto-embryo transition, we now included these data in Figure 8. 

      Reviewer #2 (Public review):

      Summary:

      NRDE-3 is a nuclear WAGO-clade Argonaute that, in somatic cells, binds small RNAs amplified in response to the ERGO-class 26G RNAs that target repetitive sequences. This manuscript reports that, in the germline and early embryos, NRDE-3 interacts with a different set of small RNAs that target mRNAs. This class of small RNAs was previously shown to bind to a different WAGO-clade Argonaute called CSR1, which is cytoplasmic, unlike nuclear NRDE-3. The switch in NRDE-3 specificity parallels recent findings in Ascaris where the Ascaris NRDE homolog was shown to switch from sRNAs that target repetitive sequences to CSR-class sRNAs that target mRNAs.

      The manuscript also correlates the change in NRDE-3 specificity with the appearance in embryos of cytoplasmic condensates that accumulate SIMR-1, a scaffolding protein that the authors previously implicated in sRNA loading for a different nuclear Argonaute HRDE-1. By analogy, and through a set of corelative evidence, the authors argue that SIMR foci arise in embryogenesis to facilitate the change in NRDE-3 small RNA repertoire. The paper presents lots of data that beautifully documents the appearance and composition of the embryonic SIMR-1 foci, including evidence that a mutated NRDE-3 that cannot bind sRNAs accumulates in SIMR-1 foci in a SIMR-1-dependent fashion.

      Weaknesses:

      The genetic evidence, however, does not support a requirement for SIMR-1 foci: the authors detected no defect in NRDE-3 sRNA loading in simr-1 mutants. Although the authors acknowledge this negative result in the discussion, they still argue for a model (Figure 7) that is not supported by genetic data. My main suggestion is that the authors give equal consideration to other models - see below for specifics.

      We appreciate reviewer 2’s comments on the genetic evidence for the function of SIMR foci.  A similar concern was also brought up by reviewer 1. By re-examining our sequencing data, we found that there is a modest but significant increase in NRDE-3 association with CSR-class sRNAs in simr-1 and enri-2 mutants in late embryos. We believe that this data supports our model that SIMR-1 and ENRI-2 are required for an efficient switch of NRDE-3 bound small RNAs. Please refer our response to the reviewer 1 - point (1), and Figure 6G in the updated manuscript. 

      Reviewer #3 (Public review):

      Summary:

      Chen and Phillips present intriguing work that extends our view on the C. elegans small RNA network significantly. While the precise findings are rather C. elegans specific there are also messages for the broader field, most notably the switching of small RNA populations bound to an argonaute, and RNA granules behavior depending on developmental stage. The work also starts to shed more light on the still poorly understood role of the CSR-1 argonaute protein and supports its role in the decay of maternal transcripts. Overall, the work is of excellent quality, and the messages have a significant impact.

      Strengths:

      Compelling evidence for major shift in activities of an argonaute protein during development, and implications for how small RNAs affect early development. Very balanced and thoughtful discussion.

      Weaknesses:

      Claims on col-localization of specific 'granules' are not well supported by quantitative data

      We have now included zoomed images of individual granules to better show the colocalization in Figure 4 and Figure 4—figure supplement 1, and performed Pearson’s colocalization analysis between different sets of proteins in Figure 4B. 

      Reviewer #2 (Recommendations for the authors):

      - The manuscript is very dense and the gene names are not helpful. For example, the authors mention ERGO-1 without clarifying the type of protein, etc. I suggest the authors include a figure to go with the introduction that describes the different classes of primary and secondary sRNAs, associated Argonautes, and other accessory proteins. Also include a table listing relevant gene names, protein classes, main localizations, and proposed functions for easy reference by the readers.

      We agree that the genes names in different small RNA pathways are easily confused. We added a diagram and table in Figure 1—figure supplement 1 depicting the ERGO/NRDE and CSR pathways and added clarification about the ERGO/NRDE-3 pathway in the text from line 126-128.  

      - Line 424 - the wording here and elsewhere seems to imply that SIMR-1 and ENRI-2, although not essential, contribute to NRDE-3 sRNA loading. The sequencing data, however, do not support this - the authors should be clearer on this. If the authors believe there are subtle but significant differences, they should show them perhaps by adding a panel in Figure 5 that directly compares the NRDE-3 IPs in wildtype versus simr-1 mutants. Figure 5H however does not support such a requirement.

      As brought up by reviewer 1, we do not see difference in binding of ERGO-dependent sRNA in simr-1 mutant in late embryos. We do, however, see a modest, but significant, increase of CSR-sRNAs bound by NRDE-3 in simr-1 and enri-2 mutants, which we hypothesize could be due to a less efficient loading of ERGO-dependent 22G-RNAs by NRDE-3. The updated data are now in Figure 6G. We have also edited the text and model figure to soften these conclusions.

      - Condensates of PGL proteins appear at a similar time and place (somatic cells of early embryos) as the embryonic SIMR-1 foci. The PGL foci correspond to autophagy bodies that degrade PGL proteins. Is it possible that SIMR-1 foci also correspond to degradative structures? The possibility that SIMR-1 foci are targeted for autophagy and not functional would fit with the finding that simr-1 mutants do not affect NRDE-3 loading in embryos.

      We appreciate reviewer 2’s comments on possibility of SIMR granules acting as sites for degradation of SIMR-1 and NRDE-3. We think this is not the case for the following reasons: 1) if SIMR granules are sites of autophagic degradation, then we would expect that embryonic SIMR granules in somatic cells, like PGL granules, should only be observed in autophagy mutants; however we see them in wild-type embryos 2) we would not expect a functional Tudor domain to be required for granule localization; however in Figure 1—figure supplement 2B, we show that a point mutation in the Tudor domain of SIMR-1 abrogates SIMR granule formation, and 3) if NRDE-3(HK-AA) is recruited to SIMR granules for degradation while wild-type NRDE-3 is cytoplasmic, then NRDE-3(HK-AA) should shows a significantly reduced protein level comparing to wild-type NRDE-3. In the western blot in Figure 2—figure supplement 1B, NRDE-3 and NRDE-3(HK-AA) protein levels are similar, indicating that NRDE-3(HK-AA) is not degraded despite being unloaded. This is in contrast to what we have observed previously for HRDE-1, which is degraded in its unloaded state. If SIMR-1 played a role directly in promoting degradation of NRDE-3(HK-AA), we would similarly expect to see a change in NRDE-3 or NRDE-3(HK-AA) expression in a simr-1 mutant. We performed western blot and did not observe a significant change in protein expression for NRDE-3 (Figure 3—figure supplement 1A). 

      Although under wild-type conditions, SIMR granules do not appear to be sites of autophagic degradation, upon treatment with lgg-1 (an autophagy protein) RNAi, we found that SIMR-1, as well as many other germ granule and embryonic granule-localized proteins, increase in abundance in late embryos.  This data demonstrates that ZNFX-1, CSR-1, SIMR-1, MUT-2/RDE-3, RRF-1, and unloaded NRDE-3 are removed by autophagic degradation similar to what have been shown previously for PGL-1 proteins (Zhang et al, 2009, Cell). We added these data to Figure 5. It is important to emphasize, however, that the timing of degradation differs for each granule assayed (Lines 447-450), indicating that there must be multiple waves of autophagy to selectively degrade subsets of proteins when they are no longer needed by the embryo.

      - The observation that an NRDE-3 mutant that cannot load sRNAs localizes to SIMR-1 foci does not necessarily imply that wild-type unloaded NRDE-3 would also localize there. Unless the authors have additional data to support this idea, the authors should acknowledge that this hypothesis is speculative. In fact, why does cytoplasmic NRDE-3 not localize to granules in the rde-3;ego-1degron strain shown in Figure 6B?? Is it possible that the NRDE-3 mutant accumulates in SIMR-1 foci because it is unfolded and needs to be degraded?

      We believe that wild-type NRDE-3 also localize to SIMR foci when unloaded. This is supported by the localization of wild-type NRDE-3 in eri-1 and rde-3 mutants, where a subset of small RNAs are depleted. Wild-type NRDE-3 localizes to both somatic SIMR-1 granules and the nucleus, depending on embryo stage (Figure 2A, Figure 2—figure supplement 1C). The granule numbers in eri-1 and rde-3 mutants are less than the nrde-3(HK-AA) mutant, consistent with the imaging data that NRDE-3 only partially localize to somatic granule (Figure 2A – 100-cell stage).

      In the rde-3; ego-1 double mutant, the embryos have severe developmental defect: they cannot divide properly after 4-8 cell stage and exhibit morphology defects after that stage. In wild-type, SIMR foci does not appear until around 8-28-cell stage (shown in Figure 1C), so we believe that cytoplasmic NRDE-3 does not localize to foci in the double mutant is because of the timing.

      - The authors propose that NRDE-3 functions in nuclei to target mRNAs also targeted in the cytoplasm by CSR-1. If so, how do they propose that NRDE-3 might do this since little transcription occurs in oocytes/early embryos?? Are the authors suggesting that NRDE-3 targets germline genes for silencing specifically at the times that zygotic transcription comes back on, or already in maturing oocytes? Is the transcription of most CSR-1 targets silenced in early embryos??

      We appreciate the suggestions to check the function of NRDE-3 in oocytes. We tested this possibility and found it to be correct. NRDE-3 functions in oocytes for transcriptional repression by inhibiting RNA Pol II elongation. We added these data to Figure 8. We also attempted to do RT-qPCR, smFISH, and antiH3K9me3 Cut&Tag-seq on early embryos to further test the hypothesis that NRDE-3 acts with CSR-class 22G-RNAs in early embryos, but we either failed to obtain enough signal or failed to detect any significant difference (data not shown). Therefore, we think that the primary role for NRDE-3 bound to CSR-class 22G-RNAs may be for global transcriptional repression of oocytes prior to fertilization.

      - Line 684-686: "In summary, this work investigating the role of SIMR granules in embryos, together with our previous study of SIMR foci in the germline (Chen and Phillips 2024), has identified a new mechanism for small RNA loading of nuclear Argonaute proteins in C. elegans". This statement appears overstated/incorrect since there is no evidence that SIMR-1 foci are required for sRNA loading of NRDE3. The authors should emphasize other models, as suggested above.

      We have revised the text on line 869-871 to emphasize that SIMR granule regulate the localization of nuclear Argonaute proteins, rather than suggesting a direct role on controlling small RNA loading. We also edit the title, text, and legend for our model in Figure 9. 

      Reviewer #3 (Recommendations for the authors):

      Issues to be addressed:

      - The authors show a switch in 22G RNA binding by NRDE-3 during embryogenesis. While the data is convincing, it would be great if it could be tested if the preferred NRDE-3 replacement model is indeed correct. This could be done relatively easily by giving NRDE-3 a Dendra tag, allowing one to colour-switch the maternal WAGO-3 pool before the zygotic pool comes up. Such data would significantly enhance the manuscript, as this would allow the authors to follow the fate of maternal NRDE-3 more precisely, perhaps identifying a period of sharp decline of maternal NRDE-3.

      We think the NRDE-3 Dendra tag experiment suggested by the reviewer is a clever approach and we will consider generating this strain in the future. However, we feel that optimization of the color-switching tag between the maternal germline and the developing embryos is beyond the scope of this manuscript. To partially address the question about NRDE-3 fate during embryogenesis, we examined the single-cell sequencing data of C. elegans embryos from 1-cell to 16-cell stage (Tintori et al, 2016, Dev Cell; Visualization tool from John I Murray lab), as shown in Author response image 3 Panel A below, NRDE-3 transcript level increases as embryo develops, indicating that zygotic NRDE-3 is being actively expressed starting very early in development. We hypothesize that maternal NRDE-3 will either be diluted as the embryo develops or actively degraded during early embryogenesis. 

      Author response image 3.

      - Figure 3A: * should mark PGCs, but this seems incorrect. At the 8-cell stage there still is only one PGC (P4), not two, and at 100 cells there are only two, not three germ cells. Also, the identification of PGCs with a maker (PGL for instance) would be much more convincing.

      We apologize for the confusion in Figure 3A. We changed the figure legend to clarify that the * indicate nuclear NRDE-3 localization in somatic cells for 8- and 100-cell stage embryos rather than the germ cells.  

      - Overall, the authors should address colocalization more robustly. In the current manuscript, just one image is provided, and often rather zoomed-out. How robust are the claims on colocalization, or lack thereof? With the current data, this cannot be assessed. Pearson correlation, combined with line-scans through a multitude of granules in different embryos will be required to make strong claims on colocalization. This applies to all figures (main and supplement) where claims on different granules are derived from.

      We thank reviewer 3 for this important suggestion. To better address the colocalization, we included insets of individual granules in Figure 2D and Figure 4. We also performed colocalization analysis by calculating the Pearson’s R value between different groups of proteins in Figure 4B, to highlight that SIMR-1 colocalizes with ENRI-2, NRDE-3(HK-AA), RDE-3, and RRF-1, while CSR-1 colocalizes with EGO-1.

      For the proteins that lack colocalization in Figure 4—figure supplement 1, we also added insets of individual granules. Additionally, we included a new set of panels showing SIMR-1 localization compared to tubulin::GFP (Figure 4—figure supplement 1I) in response to a recent preprint (Jin et al, 2024, BioRxiv), which finds NRDE-3 (expressed under a mex-5 promoter) associating with pericentrosomal foci and the spindle in early embryos. We do not see SIMR-1 (or NRDE-3, data not shown) at centrosomes or spindles in wild-type conditions but made a similar observation for SIMR-1 in a mut-16 mutant (Figure 4E). All of the localization patterns were examined on at least 5 individual 100-cell staged embryos with same localization pattern.

      - Figure 7: Its title is: Function of cytoplasmic granules. This is a much stronger statement than provided in the nicely balanced discussion. The role of the granules remains unclear, and they may well be just a reflection of activity, not a driver. While this is nicely discussed in the text, figure 7 misses this nuance. For instance, the title suggests function, and also the legend uses phrases like 'recruited to granule X'. If granules are the results of activity, 'recruitment' is really not the right way to express the findings. The nuance that is so nicely worded in the discussion should come out fully in this figure and its legend as well.

      We have changed the title of Figure 7 (now Figure 9) to “Model for temporally- and developmentallyregulated NRDE-3 function” to deemphasize the role of the granules and to highlight the different functions of NRDE-3. Similarly, we have rephrased the text in the figure and legend and add a some details about our new results.

      Minor:

      Typo: line 663 Acaris

      We corrected the typo.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      The authors demonstrated that carbon depletion triggers the autophagy-dependent formation of Rubisco Containing Bodies, which contain chloroplast stroma material, but exclude thylakoids. The authors show that RCBs bud directly from the main body of chloroplasts rather than from stromules and that their formation is not dependent on the chloroplast fission factor DRP5. The authors also observed a transient engulfment of the RBCs by the tonoplast during delivery to the vacuolar lumen.

      Strengths: 

      The authors demonstrate that autophagy-related protein 8 (ATG8) co-localizes to the chloroplast demarking the place for RCB budding. The authors provide good-quality time-lapse images and co-localization of the markers corroborating previous observations that RCBs contain only stroma material and do not include thylakoid. The text is very well written and easy to follow. 

      Weaknesses: 

      A significant portion of the results presented in the study comes across as a corroboration of the previous findings made under different stress conditions: autophagy-dependent formation of RCBs was reported by Ishida et all in 2009. Furthermore, some included results are not of particular relevance to the study's aim. For example, it is unclear what is the importance of the role of SA in the formation of stromules, which do not serve as an origin for the RCBs. Similarly, the significance of the transient engulfment of RCBs by the tonoplast remained elusive. Although it is indeed a curious observation, previously reported for peroxisomes, its presentation should include an adequate discussion maybe suggesting the involved mechanism. Finally, some conclusions are not fully supported by the data: the suggested timing of events poorly aligns between and even within experiments mostly due to high variation and low number of replicates. Most importantly, the discussion does not place the findings of this study into the context of current knowledge on chlorophagy and does not propose the significance of the piece-meal vs complete organelle sequestration into the vacuole under used conditions, and does not dwell on the early localization of ATG8 to the future budding place on the chloroplast. 

      We performed additional experiments with biological replicates that involved quantification. The results of these experiments validate the findings of this study. We also revised the Discussion section, which now includes a discussion of the interplay between piecemeal-type and entire-organelle-type chloroplast autophagy and the relevance of autophagy adaptor and receptor proteins to the localization of ATG8 on the chloroplast surface. Accordingly, the first subheading section in the Discussion became too long. Therefore, we divided it into two subheading sections. We believe that the revisions successfully address the weaknesses pointed out by the reviewer and enhance the importance of the current study. Below is a detailed description of the improvements made to our manuscript in response to the reviewer comments.

      Reviewer #1 (Recommendations For The Authors): 

      It would be great if the authors kindly used numbered lines to facilitate the review process. 

      We have added line numbers to the text of the revised version of the manuscript.  

      The authors use the words "budding", "protrusion" and "stromule formation" interchangeably in some parts of the text. For the sake of clarity, it would be best to be consistent in the terminology and possibly elaborate on the exact differences between these structure types and the criteria by which they were identified. 

      We have checked all of the text and improved the consistency of the terminology. An important finding of this study is that chloroplasts form budding structures at the site associated with ATG8. These structures then divide to become a type of autophagic cargo termed a Rubiscocontaining body. We therefore mainly use the terms “bud” and “budding” throughout the text. In the experiments shown in Figure 5, we considered the possibility that chloroplast protrusions accumulate in leaves of atg mutants and do not divide because the mutants cannot create autophagosomes. Therefore, the word “protrusion” was used to describe the results shown in Figure 5 in which the proportion of chloroplasts forming protrusions was scored. In the revised text, the word “protrusion” is only used in descriptions of Figure 5. Previous reports define stromules as thin, tubular, extended structures (less than 1 µm in diameter) of the plastid stroma (Hanson and Sattarzadeh, 2011; Brunkard et al., 2015). In the revised text, the word “stromules” is used to describe the structures defined in these previous reports. We have added definitions of each term to the Introduction, Methods and Results sections where appropriate (lines 57–58, 160–162, 247–249, 313–316, 655–658, 668–670).      

      Pages 3-4: the authors observed budding of the chloroplasts within a few minutes - it would be helpful to specify that time was probably counted from the first observation of budding, not from the start of the dark treatment, and also specify the exact treatment duration for each of the experiments. 

      The time scales in the figures do not represent the time from the start of the dark treatment. Instead, they describe the duration from the start of the time-lapse videos that were used to generate the still images. Therefore, the indicated time scales are almost the same as the duration from the start of the observations of each target structure (chloroplast buds or GFPATG8a-labeled structures). As described in the Methods section, leaves were incubated in darkness for 5 to 24 h to induce sugar starvation. Such sugar-starved leaves were subjected to live-cell monitoring for the target structures. Since Arabidopsis leaves accumulate starch as a stored sugar source (Smith and Stitt, 2007; Usadel et al., 2008), dark treatment lasting several minutes is not sufficient for the starch to be consumed and sugar starvation to be induced.   To avoid confusion, we have added definitions of the time scales to the legends of figures containing the results of time-lapse imaging. We have also specified the durations of dark treatments used to obtain the respective results in the legends. 

      Figure 6: the time scale for complete autophagosome formation is in the range of 100-120 sec, how do these results align with the results shown in Figures 3B and C, where complete autophagosomes are suggested to be released into the vacuole after 73.8 sec. Furthermore, another structure is suggested to be formed within 50 sec. Such experiments possibly require a large number of replicates to estimate representative timing. 

      As mentioned in the previous response, the time scales in still frames represent the duration from the start of the corresponding video. Leaves incubated in darkness for 5 to 24 h were subjected to live-cell imaging. When we identified the target structures, e.g., GFP-ATG8alabeled structures on the surfaces of chloroplasts (Figure 6) or chloroplast budding structures (Figure 3), we began to track these structures. Therefore, the time scales in the figures do not align to a common time axis. We revised the descriptions about Figure 3 and Figure 6 in the Results section to clearly explain that the time points in each experiment merely indicates the time of one observation.

      The authors might want to consider using arrows to indicate structures of interest in all movies and figures.

      We have added arrows to indicate the structures of interest in the starting frames of all videos. We hesitate to add arrows to highlight RCBs accumulating in the vacuole (Figure 1-figure supplement 1, Figure 5 and Figure 8) and stromules (Figure 7) because many arrows would be required, which would obscure large portions of the images. We believe that the images without arrows clearly represent the appearance of RCBs or stromules and that their quantification (Figure 1-figure supplement 1C, Figure 5B, Figure 5-figure supplement 1B, Figure 7B, 7D, 7F, and Figure 8B) well supports the results.   

      Figure 7 Supplement 1: do the authors detect complete chloroplasts in the vacuole of atg7 and sid2/atg7? 

      We did not observe the vacuolar transport of whole chloroplasts in atg7 or atg7 sid2 plants under our experimental conditions. The figure below (Figure 1 for Response to reviewers) shows images of mesophyll cells from a leaf (third rosette leaf of a 20-d-old plant) of atg7 accumulating chloroplast stroma–targeted GFP (CT-GFP); this is from the previous version of Figure 7–figure supplement 1. Indeed, some GFP bodies exhibiting strong stromal GFP (CTGFP) signals appeared in the central area of the cell (arrowheads in A). However, such bodies were chloroplasts in epidermal cells. The 3D images (B) and cross-section image (x to z axis) of the region highlighted by the blue dotted line (C) indicate that such GFP bodies are the edges of chloroplasts that localize on the abaxial side of the observed region. Because CT-GFP expression was driven by the 35S promoter, strong GFP signals appeared in chloroplasts in epidermal cells in addition to chloroplasts in mesophyll cells. Previous studies using the same transgenic lines also showed that chloroplasts in epidermal cells exhibit strong GFP signals (Kohler et al., 1997; Caplan et al., 2015; Lee et al., 2023). RBCS-mRFP or GFP driven by the RBCS2B promoter do not label the chloroplasts in epidermal cells (new Figure 7-figure supplement 1). Additionally, because the borders between the mesophyll cell layer and the epidermal cell layer are not even, chloroplasts in epidermal cells are sometimes visible during observations of mesophyll cells. Such detection more frequently occurs during the acquisition of z-stack images. This point was more precisely demonstrated in our previous study with the aid of Calcofluor white staining of cell walls (Nakamura et al., 2018). Please see Supplemental Figure S3 in our previous report. To avoid any misunderstanding, we replaced the image of the leaf from atg7 in the revised figure, which is now Figure 7-figure supplement 2, with an image of another region to more precisely visualize mesophyll cells in this plant line.

      Author response image 1.

      Mesophyll cells in a leaf of atg7 accumulating stromal CT-GFP, reconstructed from the data shown in the previous version of Figure 7–figure supplement 1. (A) Individual channel images (CT-GFP and chlorophyll) from the merged orthogonal projection image shown in the previous version of Figure 7–figure supplement 1. The right panel shows the enhanced chlorophyll signal to clearly visualize the chloroplasts in epidermal cells. Green, CTGFP; magenta, chlorophyll fluorescence. Scale bar, 20 µm. (B) 3D structure of the merged image shown in (A). (C) Images of the cross section indicated by the blue dotted line (a to b) in B. Arrowheads indicate the edges of chloroplasts in epidermal cells.

      Figure 8: it would be interesting to hear the authors' opinion on why they observed a significant increase in RCBs number in the drp5b mutant background

      We have added a discussion of this issue to the revised manuscript (lines 445–459). We now have two hypotheses to explain this issue. One hypothesis is that the impaired chloroplast division due to the drp5b mutation reduces energy availability and thus activates chloroplast autophagy. The other hypothesis is that the drp5b mutation impairs the type of chlorophagy that degrades whole chloroplasts, and thus piecemeal-type chloroplast autophagy via Rubiscocontaining bodies is activated. However, we do not have any experimental evidence supporting either hypothesis.  

      Reviewer #2 (Public Review): 

      This manuscript proposed a new link between the formation of chloroplast budding vesicles (Rubisco-containing bodies [RCBs]) and the development of chloroplast-associated autophagosomes. The authors' previous work demonstrated two types of autophagy pathways involved in chloroplast degradation, including piecemeal degradation of partial chloroplast and whole chloroplast degradation. However, the mechanisms underlying piecemeal degradation are largely unknown, particularly regarding the initiation and release of the budding structures. Here, the authors investigated the progression of piecemeal-type chloroplast trafficking by visualizing it with a high-resolution time-lapse microscope. They provide evidence that autophagosome formation is required for the initiation of chloroplast budding, and that stromule formation is not correlated with this process. In addition, the authors also demonstrated that the release of chloroplast-associated autophagosome is independent of a chloroplast division factor, DRP5b. 

      Overall, the findings are interesting, and in general, the experiments are very well executed. Although the mechanism of how Rubisco-containing bodies are processed is still unclear, this study suggests that a novel chloroplast division machinery exists to facilitate chloroplast autophagy, which will be valuable to investigate in the future. 

      Reviewer #2 (Recommendations For The Authors): 

      Below are some specific comments. 

      (1) In Supplement Figure 1B, there is no chloroplast stromule in RBCS-mRFP x atg7-2 plants under dark treatment with ConA, but in Figure 7A, there are stromules in CT-GFP x atg7-2 plants. How to explain such a discrepancy? Did the authors check the chloroplast morphology of RBCS-mRFP x atg7-2 plants in different developmental stages? Will it behave the same as CT-GFP x atg7-2 under the same condition as in Figure 7A?

      As described in the text, the ages and conditions of the leaves shown in Figure 1–figure supplement 1 and Figure 7 are different. In Figure 1–figure supplement 1, second rosette leaves from 21-d-old plants were incubated in the dark with concanamycin A for 1 d. In Figure 7E and 7F, we explored the condition under which mesophyll chloroplasts in atg leaves actively form stromules to assess how a deficiency in autophagy is related to stromule formation. We found that late senescing leaves (third rosette leaves from 36-d-old plants) of atg5 and atg7 plants accumulated many stromules without additional treatment (Figure 7). It is not surprising that the chloroplast morphologies shown in Figures 1 and 7 are different because the leaf ages and conditions are largely different.

      However, we agree that the differences in chloroplast stroma–targeted GFP and RBCS-mRFP might influence the visualization of stromules. For instance, fluorescent protein– labeled RBCS proteins are incorporated into the Rubisco holoenzyme, comprising eight RBCS and eight RBCL proteins (Ishida et al., 2008; Ono et al., 2013). Such a large protein complex might not accumulate in stromules. Therefore, we examined the chloroplast morphology in late senescing leaves (third rosette leaves from 36-d-old plants) from WT, atg5, and atg7 plants harboring ProRBCS:RBCS-mRFP, as you suggested. Mesophyll chloroplasts formed many stromules in atg5 and atg7 leaves but not in WT leaves (Figure 7–figure supplement 1). These results indicate that RBCS-mRFP can be used to visualize stromules and that the differences in chloroplast morphology between Figure 1-figure supplement 1 and Figure 7 cannot be attributed to the different marker proteins used. A previous study also indicated that Rubisco is present in plastid stromules (Kwok and Hanson, 2004).

      (2) In Figure 2, the author showed that the outer envelope marker Toc64 was colocalized with chloroplast buds. How about proteins in the inner envelope membrane of chloroplasts? 

      We generated Arabidopsis plants expressing red fluorescent protein–tagged K+ EFFLUX ANTIPORTER 1 (KEA1), a chloroplast inner envelope membrane protein (Kunz et al., 2014; Boelter et al., 2020). We found that the chloroplast buds visualized by RBCS-GFP were also marked by KEA1-mRFP (Figure 2–figure supplement 1B). We observed the transport of such buds (Figure 2–figure supplement 2). These results strengthen our claim that autophagy degrades chloroplast stroma and envelope components as a type of specific cargo termed a Rubisco-containing body. The descriptions about this additional experiment are in lines 181– 187. 

      (3) In Figure 3, how many RCBs were tracked for the trafficking analysis to raise the conclusion that the vesicle was released into the vacuole around 73.8s? 

      We apologize for our confusing explanation in the previous version of the manuscript. The time point “73.8 s” merely indicates the time of one observation, as shown in Figure 3. This time does not represent the common timing of vacuolar release of a Rubisco-containing body. As we explained in the response to the comments from reviewer 1, we subjected leaves that were incubated in the dark for several hours to live-cell imaging assays to observe chloroplast morphology in sugar-starved leaves. The time scales of each still frame represent the time from the start of the corresponding video. Therefore, the time points in the respective figures do not align to a common time axis, and the number “73.8 s” is not important. We attempted to emphasize that the type of movement of Rubisco-containing bodies changes during their tracking shown in Figure 3. Based on this finding, we hypothesized that the Rubisco-containing bodies are released into the vacuolar lumen when they initiate random movement. Therefore, we expected that the interaction between the Rubisco-containing bodies and the vacuolar membrane could be captured, and we therefore turned our attention to the dynamics of the vacuolar membrane in subsequent experiments. Accordingly, our observations of the vacuolar membrane allowed us to visualize the release of the Rubisco-containing body into the vacuole (Figure 4). We rephrased these sentences (lines 212–219) to avoid confusion and to explain this idea accurately. We also performed tracking experiments of Rubisco-containing bodies to strengthen the finding that the type of movement of the bodies changes during tracking (Figure 3-figure supplement 1, Videos 8 and 9).

      (4) I do believe the conclusion that vacuolar membranes incorporate RCBs into the vacuole in Figure 4. However, it will be more convincing if images of higher quality are provided. 

      We tried to acquire images that more clearly show the morphology of the vacuolar membrane during the incorporation of the Rubisco-containing body. We obtained the images in Figure 4A using a standard type of confocal microscope, the LSM 800 (Carl Zeiss), and obtained the images in Figure 4B using the Airyscan Fast acquisition mode, a hyper-resolution microscope mode, in the LSM 880 system (Carl Zeiss). We performed additional experiments with another type of confocal microscope, the SP8 (Leica; Figure 4-figure supplement 1A to 1C, Videos 12– 14). The quality of the images from these experiments was as high as possible under the experimental conditions (equipment and plant materials). In general, increasing the image resolution during time-lapse imaging with a confocal microscope requires reducing the time resolution. However, the transport of a Rubisco-containing body occurs relatively quickly: Its engulfment by the vacuolar membrane takes place for just a few seconds (Figure 4, Figure 4figure supplement 1). We could therefore not reduce the time resolution further to better capture the morphology of the vacuolar membrane.

      (5) In Figure 7G, the authors concluded that SA and ROS might be the cause of the extensive formation of stromules. How about the H2O2 level in NahG and atg5 NahG plants? Compared with sid2, NahG appeared to completely inhibit stromule formation in atg5. Will this be related to ROS levels?

      We measured the hydrogen peroxide (H2O2) contents in NahG atg5 plants and atg5 single mutant plants and found that their leaves accumulate more H2O2 than those of wild-type or NahG plants (Figure 7-figure supplement 3). Since we have only maintained fresh seeds of NahG atg5 plants harboring the 35S promoter–driven chloroplast stroma–targeted GFP (Pro35S:CT-GFP) construct, we first confirmed that CT-GFP accumulation does not affect the measurement of H2O2 content. H2O2 levels were similar between wild-type leaves and CT-GFPexpressing leaves. A comparison among Pro35S:CT-GFP expressing lines in the wild-type, atg5, NahG, and NahG atg5 backgrounds revealed enhanced accumulation of H2O2 in the atg5 and NahG atg5 genotypes compared with the wild-type and NahG genotypes. This finding is consistent with the results of histological staining of H2O2 using 3,3′-diaminobenzidine (DAB) in a previous study (Yoshimoto et al., 2009).   

      It is unclear why NahG expression inhibited stromule formation more strongly than the sid2 mutation in the atg5 mutant background, as you pointed out (Figure 7A–D). NahG catabolizes salicylic acid (SA), whereas sid2 mutants are knockout mutants of ISOCHORISMATE SYNTHASE1 (ICS1), a gene required for SA biosynthesis. Plants have two metabolic routes for SA biosynthesis: The isochorismate synthase (ICS) pathway and the phenylalanine ammonia-lyase (PAL) pathway. Furthermore, Arabidopsis plants contain two ICS homologs: ICS1 and ICS2. Previous studies have revealed that ICS1 (SID2) is the main player for SA biosynthesis in response to pathogen infection (Delaney et al., 1994). Another study revealed drastically lower SA contents in the leaves of both sid2 single mutants and NahGexpressing plants compared with those of wild-type plants (Abreu and Munné-Bosch, 2009). Therefore, it is clear that the sid2 single mutation sufficiently inhibits SA accumulation in Arabidopsis leaves. However, low levels of SA biosynthesis through ICS1-independent routes might influence stromule formation in leaves of sid2 atg5 and sid2 atg7. Because a previous study demonstrated that the sid2 single mutation sufficiently suppresses the SA hyperaccumulation–related phenotypes of atg plants (Yoshimoto et al., 2009), we believe that the use of the sid2 mutation was adequate to assess the effects of SA on stromule formation that actively occurs in the atg plants examined in this study.    

      (6) In Supplement Figure 7, I have noticed that there are still some CT-GFP signals (green dots) in the vacuoles of the atg7 mutant, are they RCBs? If so, how can this phenomenon be explained? 

      As we explained in the response to the comment from Reviewer 1, CT-GFP-labeled bodies are chloroplasts in the epidermal cell layer. Please see our response to Reviewer 1’s comment about Figure 7 and the associated figure (Figure 1 for Response to reviewers). The CT-GFP-labeled dots (arrowheads) are the edges of chloroplasts and localize on the abaxial side of the observed region. The dots have faint chlorophyll signals. This phenomenon is much more clear in the image with enhanced brightness (right panel in A). Since the bodies are merely the edges of epidermal chloroplasts, their chlorophyl signals are faint. Therefore, these bodies are not Rubisco-containing bodies but are instead simply the edges of chloroplasts in the epidermal cell layer. 

      (7) On page 24, the second paragraph, lines 12-14, the authors claim that no receptors similar to those involved in mitophagy that bind to LC3 (ATG8) have been established in chloroplasts. Actually, it has been reported that a homologue of mitophagy receptor, NBR1, acts as an autophagy receptor to regulate chloroplast protein degradation (Lee et al, 2023, Elife; Wan et al, 2023, EMBO Journal). Although I do think NBR1 is not involved in RCBs based on these reports, these findings should be discussed here. 

      Thank you for this good suggestion. We have added a discussion about this important point to the Discussion section, along with the relevant citations (lines 482–502).

      (8) In the figure legend, the details of the experiments will be better provided, such as leaves stages (Figure 1, Figure 5...), the number of chloroplasts analyzed (Figure 7...). This can help the readers to follow. 

      Thank you for highlighting this. We have checked all of the figure legends and added descriptions of the leaf stages and experimental conditions.  

      Reviewer #3 (Public Review):

      Summary: 

      Regulated chloroplast breakdown allows plants to modulate these energy-producing organelles, for example during leaf aging, or during changing light conditions. This manuscript investigates how chloroplasts are broken down during light-limiting conditions. 

      The authors present very nice time-lapse imaging of multiple proteins as buds form on the surface of chloroplasts and pinch away, then associate with the vacuole. They use mutant analysis and autophagy markers to demonstrate that this process requires the ATG machinery, but not dynamin-related proteins that are required for chloroplast division. The manuscript concludes with a discussion of an internally-consistent model that summarizes the results. 

      Strengths: 

      The main strength of the manuscript is the high-quality microscopy data. The authors use multiple markers and high-resolution time-lapse imaging to track chloroplast dynamics under light-limiting conditions. 

      Weaknesses: 

      The main weakness of the manuscript is the lack of quantitative data. Quantification of multiple events is required to support the authors' claims, for example, claims about which parts of the plastid bud, about the dynamics of the events, about the colocalization between ATG8 and the plastid stroma buds, and the dynamics of this association. Without understanding how often these events occur and how frequently events follow the manner observed by the authors (in the 1 or 2 examples presented in each figure) it is difficult to appreciate the significance of these findings. 

      We have performed several additional experiments, including the quantification of multiple chloroplast buds or GFP-ATG8-labeled structures from individual plants. The results strengthen our claims and thus improve the significance of the current study. Please see the responses below for details.

      Reviewer #3 (Recommendations For The Authors):

      Overall, the live-cell imaging in this paper is high quality and rigorously conducted. However, without quantification of these events, it is difficult to judge whether this is an occasional contributor to plastid breakdown, or the primary mechanism for this process. 

      - For Figure 1, the authors could estimate the importance of this mechanism for chloroplast breakdown by calculating the volume change in chloroplasts over time during light-limiting conditions, then comparing this to the volume of the puncta that bud off of plastids and the frequency of these events. That is, what percentage of chloroplast volume loss can be accounted for by puncta that bud from chloroplasts? Are there likely other mechanisms contributing to chloroplast breakdown, or is this the primary mechanism? 

      We measured the volumes of chloroplast stroma when the leaves from wild-type (WT) and atg7 plants accumulating RBCS-mRFP were subjected to extended darkness for 1 d (Figure 1-figure supplement 2). The volume of the chloroplast stroma in dark-treated leaves of WT plants was 70% that in leaves before treatment, whereas the volume of the chloroplast stroma in darktreated atg7 leaves was 86% that in leaves before treatment. The transport of Rubiscocontaining bodies into the vacuole did not occur in atg7 leaves (Figure 1-figure supplement 1). These results suggest that the release of chloroplast buds as Rubisco-containing bodies contributes to the decrease in chloroplast stroma volume during dark treatment. These results also suggest that autophagy-independent systems contribute to the decrease in chloroplast volume. It is difficult to monitor the volume or frequency of budding off of puncta from chloroplasts during dark treatment because the budding and transport of the puncta occur relatively quickly and are completed within minutes, and the puncta frequently move away from the plane of focus. Additionally, continuous monitoring of chloroplast morphology over the dark treatment period requires the long-term exposure of leaves to repeated laser excitation, and such treatment might cause unexpected stress. We believe that the evaluation of chloroplast stroma volume after 1 d of dark treatment is important for estimating the contribution of the mechanism described in this study. The descriptions about this additional experiment are in lines 163–174. 

      - The claim that structures budding from the plastid "specifically contains stroma material...without any chlorophyll signal" (p. 6 and Figure 2) should be supported by quantitative analysis of many such buds in multiple cells from multiple independent plants. 

      We performed additional experiments (Figure 2-figure supplement 1) to measure the fluorescence intensity ratios of the stroma marker RBCS-GFP and chlorophyll between chloroplast budding structures and their neighboring chloroplasts in Arabidopsis plants expressing the stromal marker RBCS-GFP along with TOC64-mRFP (a chloroplast outer envelope membrane protein), KEA1-mRFP (a chloroplast inner envelope membrane protein), or ATPC1-tagRFP (a thylakoid membrane protein). The results indicated that chloroplast buds contain chloroplast stroma without chlorophyll signals. The descriptions of this experiment are in lines 175–199. In these experiments, we observed 30 to 33 chloroplast buds from eight individual plants.  

      - Claims about the dynamics of these events in Figures 2 & 3 should be supported by quantitative analysis of many buds in multiple cells from multiple independent plants and appropriate summary statistics (e.g. mean, standard deviation), and claims about the coordination of events should be supported by statistical comparison of these measurements between different markers. 

      As mentioned in the response to the above comments, quantification of fluorescent intensities (Figure 2-figure supplement 1) revealed that the chloroplast budding structures produced TOC64-mRFP and KEA1-mRFP signals without ATPC1-tagRFP signal. These results support the claim that chloroplast buds contain chloroplast stroma and envelope components without thylakoid membranes. 

      It is not easy to quantify the dynamics of chloroplast buds since the puncta sometimes move away from the plane of focus. We therefore added data from individual time-lapse observations showing that the type of movement exhibited by the puncta changes during tracking (Figure 3-figure supplement 1A and 1B, Videos 8 and 9) to strengthen the notion that such a phenomenon was observed repeatedly. 

      - Data in Figure 4 should be supported by quantification of the proportion of plastid-derived puncta that end up inside the vacuole (compared to those that do not) in multiple cells from multiple independent plants. 

      Although we performed additional observations of the destinations of chloroplast-derived puncta, we encountered some difficulty in correctly calculating the proportion of plastid-derived puncta that ended up inside the vacuole. This problem is similar to the difficulty in tracking Rubisco-containing bodies mentioned in the response to the previous comments. During timelapse imaging, puncta sometimes move from the plane of focus toward the deeper side (abaxial side) or near side (adaxial side), causing us to lose track of a number of puncta. Therefore, we could not determine the destinations of all puncta to calculate the proportion of puncta that ended up in the vacuolar lumen.

      Alternatively, we added the results of three experiments (Figure 4-figure supplement 1, Videos 12–14) examining how the vacuolar membrane engulfs the chloroplast-derived puncta to incorporate them inside the vacuole. The data support the notion that such a phenomenon occurs repeatedly in sugar-starved leaves. All results were obtained from individual plants. 

      - Data in Figure 6 should also be supported by quantitative analysis of many buds in multiple cells from multiple independent plants, to determine whether ATG8 associates with all RBCScontaining buds, and vice versa. 

      To address this issue, we performed additional experiments on plants expressing GFP-ATG8a and RBCS-mRFP (Figure 6-figure supplements 3 and 4). First, we observed 58 chloroplast buds from eight individual plants and evaluated the proportion of GFP-ATG8a-labeled chloroplast buds. We determined that 64% of chloroplast buds were at least autophagy-associated structures (Figure 6-figure supplement 3A–3C). This result also suggests that chloroplasts can form autophagy-independent budding structures, which might be associated with stromule-related structures or the autophagy-independent vesiculation machinery. We also evaluated the number of GFP-ATG8a-labeled chloroplast buds (Figure 6-figure supplement 3D and 3E). The formation of such structures increased in response to dark treatment (Figure 6-figure supplement 3D), but they did not appear in atg7 plants exposed to the dark (Figure 6-figure supplement 3E). These results support the notion that the formation of chloroplast buds to be released as Rubisco-containing bodies requires the core ATG machinery. 

      Furthermore, we observed 157 GFP-ATG8a-labeled structures from thirteen individual plants and evaluated the proportion of chloroplast-associated isolation membranes (Figure 6-figure supplement 4). We also classified the chloroplast-associated, GFP-ATG8alabeled structures into two categories: the chloroplast surface type (Figure 7-figure supplement 4A) and the chloroplast bud type (Figure 7-figure supplement 4B). This experiment suggested that 43% of the isolation membranes labeled by GFP-ATG8a were involved in chloroplast degradation during an early phase of sugar starvation (extended darkness for 5 to 9 h from the end of night) in mesophyll cells. We believe that these results indicate that autophagy contributes substantially to chloroplast degradation via the morphological changes observed in this study.  The descriptions about these experiments are in lines 284–300 in the Results section and in lines 426–444 in the Discussion section. 

      - Which parts of the plastid bud (Fig 2), about the dynamics of the events (Fig 3), about the colocalization between ATG8 and the plastid stroma buds, and the dynamics of this association (Fig 6). 

      We performed multiple quantitative studies to address the issues listed above. We believe that these additional experiments strengthened our findings.

      - I suggest that the authors avoid using the term "vesicles" to describe the plastid-derived puncta, since it doesn't seem like coat proteins are required for their formation. I suggest "puncta" or similar terms. 

      We replaced the term “vesicles” with “puncta” or other suitable terms, as suggested.

      References for response to reviewers

      Abreu ME, Munné-Bosch S (2009) Salicylic acid deficiency in transgenic lines and mutants increases seed yield in the annual plant. J Exp Bot 60: 1261-1271.

      Boelter B, Mitterreiter MJ, Schwenkert S, Finkemeier I, Kunz HH (2020) The topology of plastid inner envelope potassium cation efflux antiporter KEA1 provides new insights into its regulatory features. Photosynth Res 145: 43-54.

      Brunkard JO, Runkel AM, Zambryski PC (2015) Chloroplasts extend stromules independently and in response to internal redox signals. Proc Natl Acad Sci U S A 112: 10044-10049.

      Caplan JL, Kumar AS, Park E, Padmanabhan MS, Hoban K, Modla S, Czymmek K, Dinesh-Kumar SP (2015) Chloroplast stromules function during innate immunity. Dev Cell 34: 45-57.

      Delaney TP, Uknes S, Vernooij B, Friedrich L, Weymann K, Negrotto D, Gaffney T, Gutrella M, Kessmann H, Ward E, Ryals J (1994) A Central Role of Salicylic-Acid in Plant-Disease Resistance. Science 266: 1247-1250.

      Hanson MR, Sattarzadeh A (2011) Stromules: Recent Insights into a Long Neglected Feature of Plastid Morphology and Function. Plant Physiol 155: 1486-1492.

      Ishida H, Yoshimoto K, Izumi M, Reisen D, Yano Y, Makino A, Ohsumi Y, Hanson MR, Mae T (2008) Mobilization of rubisco and stroma-localized fluorescent proteins of chloroplasts to the vacuole by an ATG gene-dependent autophagic process. Plant Physiol 148: 142-155.

      Kohler RH, Cao J, Zipfel WR, Webb WW, Hanson MR (1997) Exchange of protein molecules through connections between higher plant plastids. Science 276: 2039-2042.

      Kunz HH, Gierth M, Herdean A, Satoh-Cruz M, Kramer DM, Spetea C, Schroeder JI (2014) Plastidial transporters KEA1, -2, and -3 are essential for chloroplast osmoregulation, integrity, and pH regulation in. Proc Natl Acad Sci U S A 111: 74807485.

      Lee HN, Chacko JV, Solis AG, Chen KE, Barros JA, Signorelli S, Millar AH, Vierstra RD, Eliceiri KW, Otegui MS, Benitez-Alfonso Y (2023) The autophagy receptor NBR1 directs the clearance of photodamaged chloroplasts. Elife 12: e86030.

      Ono Y, Wada S, Izumi M, Makino A, Ishida H (2013) Evidence for contribution of autophagy to rubisco degradation during leaf senescence in Arabidopsis thaliana. Plant Cell Environ 36: 1147-1159.

      Smith AM, Stitt M (2007) Coordination of carbon supply and plant growth. Plant Cell Environ 30: 1126-1149.

      Usadel B, Blasing OE, Gibon Y, Retzlaff K, Hoehne M, Gunther M, Stitt M (2008) Global transcript levels respond to small changes of the carbon status during progressive exhaustion of carbohydrates in Arabidopsis rosettes. Plant Physiol 146: 1834-1861.

      Yoshimoto K, Jikumaru Y, Kamiya Y, Kusano M, Consonni C, Panstruga R, Ohsumi Y, Shirasu K (2009) Autophagy negatively regulates cell death by controlling NPR1dependent salicylic acid signaling during senescence and the innate immune response in Arabidopsis. Plant Cell 21: 2914-2927.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this detailed study, Cohen and Ben-Shaul characterized the AOB cell responses to various conspecific urine samples in female mice across the estrous cycle. The authors found that AOB cell responses vary with the strains and sexes of the samples. Between estrous and non-estrous females, no clear or consistent difference in responses was found. The cell response patterns, as measured by the distance between pairs of stimuli, are largely stable. When some changes do occur, they are not consistent across strains or male status. The authors concluded that AOB detects the signals without interpreting them. Overall, this study will provide useful information for scientists in the field of olfaction.

      Strengths:

      The study uses electrophysiological recording to characterize the responses of AOB cells to various urines in female mice. AOB recording is not trivial as it requires activation of VNO pump. The team uses a unique preparation to activate the VNO pump with electric stimulation, allowing them to record AOB cell responses to urines in anesthetized animals. The study comprehensively described the AOB cell responses to social stimuli and how the responses vary (or not) with features of the urine source and the reproductive state of the recording females. The dataset could be a valuable resource for scientists in the field of olfaction.

      Weaknesses:

      (1) The figures could be better labeled.

      We revised all figures (except the model figure, Fig. 8), and among other improvements (many of which were suggested by the reviewers in other comments), added more labelling and annotation within the figures.

      (2) For Figure 2E, please plot the error bar. Are there any statistics performed to compare the mean responses?

      We added error bars (standard errors of the mean). We had not originally performed statistical comparisons between the stimuli, but now we have. The analysis of responses strength now appears in a new table (Table 1)

      (3) For Figure 2D, it will be more informative to plot the percentage of responsive units.

      Done.

      (4) Could the similarity in response be explained by the similarity in urine composition? The study will be significantly strengthened by understanding the "distance" of chemical composition in different urine.

      We agree. As we wrote in the Discussion: “Ultimately, lacking knowledge of the chemical space associated with each of the stimuli, this and all the other ideas developed here remain speculative.” We note however, that chemical distance (which in itself is hard to define) will provide only part of the picture. The other part is the “projection” of chemical space on the receptor array. This is an idea that we develop in the Discussion and in Figure 8. Specifically, that it is the combination of stimulus composition, and receptor tuning properties that will determine stimulus distances in neuronal space.

      That said, a better understanding of the chemical distance is an important aspect that we are working to include in our future studies. For this dataset unfortunately, we have no such data.

      (5) If it is not possible for the authors to obtain these data first-hand, published data on MUPs and chemicals found in these urines may provide some clues.

      This comment is directly related to the previous one. Measurements about some classes of molecules may be found for some of the stimuli that we used here, but not for all. We are not aware of any single dataset that contains this information for any type of molecule across the entire stimulus set that we have used and pooling results from different studies has limited validity because of the biological and technical variability across studies. In order to reliably interpret our current recordings, it would be necessary to measure the urinary content of the very same samples that were used for stimulation. Unfortunately, we are not able to conduct this analysis at this stage.

      (6) It is not very clear to me whether the female overrepresentation is because there are truly more AOB cells that respond to females than males or because there are only two female samples but 9 male samples.

      The definitive answer to this comment is given in our response to the next one.

      Nevertheless, we agree that this is an important point. It is true that the number of neurons fulfilling each of the patterns depends on the number of individual stimuli that define it (and on the frequency of neurons that respond to those stimuli). However, our measure of “over representation” was designed to overcome this bias, by using bootstrapping to reveal if the observed number of patterns is larger than expected by chance.  The higher frequency of responses to female, as compared to male stimuli, is observed in other studies by others and by us, also when the number of male and female stimuli is matched (e.g., Bansal et al BMC Biol 2021, Ben-Shaul et al, PNAS 2010, Hendrickson et al, JNS, 2008). However, here, by overrepresentation, we do not refer to the higher frequency of female responding neurons, but rather that given the number of responding neurons, the female pattern is more common than expected by chance.

      (7) If the authors only select two male samples, let's say ICR Naïve and ICR DOM, combine them with responses to two female samples, and do the same analysis as in Figure 3, will the female response still be overrepresented?

      Following this suggestion, we have performed this analysis, and we were glad to see that the result is the one we had anticipated. Below, we provide an image of the results, following the same approach that we applied before, and showed in Figure 3C. Here, we defined a female pattern (using the two female samples) and compared it to a male pattern (using the ICR naïve and ICR DOM as suggested). It is as if we had only four stimuli in our set. As in the article, we calculated the expected distribution with 100,000 shuffles. We denoted this pattern as F/M ICR. The results are shown below.

      Under the present conditions, the distribution of the number of female selective patterns is larger (i.e., shifted to the right, compare to the female category in Figure 3C. This is expected, since now the criterion is more permissive. Specifically, now to qualify as a “female pattern”, the two responses to female urine must be stronger only than the responses to the two male stimuli included in this analysis (and to all other responses). Notably, although the null distribution shifted to the right, the actual number of neurons fulfilling this pattern is also larger, so that the actual number remains significantly larger than expected by chance. This is also true for the reverse category (as is the case in the ~female category Figure 3C).  Thus, we conclude that overrepresentation of the female pattern is not a trivial consequence of the number of male and female stimuli.

      Author response image 1.

      (8) In Figure 4B and 4C, the pairwise distance during non-estrus is generally higher than that during estrus, although they are highly correlated. Does it mean that the cells respond to different urines more distinctively during diestrus than in estrus?

      This is an important observation (!) and we had originally overlooked it.  It is true that higher distance (as they are in estrus) imply more distinct population level responses and hence better discrimination among stimuli. However, this is inconsistent with all our other analyses that do not point to enhanced selectivity or discrimination in either state. If anything, we find somewhat higher sparseness in estrus.  Yet, there may be technical explanations for the differences.

      For Euclidean distances, the explanation may be trivial. The distance depends on the number of dimensions (i.e., units), and since our sample contains more neurons recorded during non-estrus, the larger distance is expected.

      In fact, there is a similar dependence on sample size for the correlation distance. Smaller samples are associated with higher (spurious) correlations, and hence larger samples are be associated with larger distances. To demonstrate this, we conducted a simple simulation, where we calculated the absolute correlation coefficients of random samples from standard normal distributions (using the MATLAB function randn), changing the size of the population. For each sample size, we conducted 1000 tests. We considered sample sizes from 10 to 100000, including 200 and 300 (which are similar to our sample sizes). The results are shown in the figure below. Note that the absolute value of the correlation coefficient decreases with sample size, while the p-value for the observed correlation is stable at ~0.5.

      While this is not a rigorous analysis of this issue, and while it does not exactly reflect the scenario in our data, where correlations are generally positive, it shows that the observed correlation (and hence correlation distance) is also affected by sample size.

      For these reasons, we focus on comparison of these distances, rather than the absolute values of the correlation distances.

      Author response image 2.

      Following this comment, we now write in the manuscript:

      “We first note that distances are generally larger during non-estrus, suggesting enhanced discrimination during this stage. However, further analyses of sparseness and selectivity do not support this idea (see below). Furthermore, we note that both Euclidean and correlation distances generally depend on sample size. In both cases, distances are expected to increase as a function of sample size, which in our dataset, is larger for the non-estrus (n = 305) as compared to the estrus (n = 241) neurons. Because of this factor, we focus here on the similarity of the relative within-state distances across the states (and not on their absolute magnitudes). Specifically, we find a positive and significant correlation among pairwise population distances under the two states. Thus, at the population level, representational space remains broadly stable across the estrus cycle. Nevertheless, several points in Fig. 4D, E clearly diverge from a linear relationship, implying that representational space differs under the two states. We next examine such state-dependent changes in more detail.”

      (9) The correlation analysis is not entirely intuitive when just looking at the figures. Some sample heatmaps showing the response differences between estrous states will be helpful.

      If we understand correctly, the idea is to show the correlation matrices from which the values in 4B and 4C are taken. The relevant images are now included in figure 4B, C and are references within the main text.

      Reviewer #2 (Public review):

      Summary:

      Many aspects of the study are carefully done, and in the grand scheme this is a solid contribution. I have no "big-picture" concerns about the approach or methodology. However, in numerous places the manuscript is unnecessarily vague, ambiguous, or confusing. Tightening up the presentation will magnify their impact.

      We have reviewed the text and made substantial editing changes. Along with other specific comments by made both reviewers, we hope that these changes improve the presentation.

      Strengths:

      (1) The study includes urine donors from males of three strains each with three social states, as well as females in two states. This diversity significantly enhances their ability to interpret their results.

      (2) Several distinct analyses are used to explore the question of whether AOB MCs are biased towards specific states or different between estrus and non-estrus females. The results of these different analyses are self-reinforcing about the main conclusions of the study.

      (3) The presentation maintains a neutral perspective throughout while touching on topics of widespread interest.

      Weaknesses:

      (1) Introduction:

      The discussion of the role of the VNS and preferences for different male stimuli should perhaps include Wysocki and Lepri 1991

      We assume that the reviewer is referring to “Consequences of removing the vomeronasal organ” by Wysocki CJ, Lepri JJ, a review article in J Steroid Biochem from 1991. We were not familiar with this specific article and have now read it. The article discusses various male behaviors, and some effects on female behavior and physiology (e.g., puberty acceleration, maternal behaviors, ovulation) but we could not find any mention of the preference of female mice in this article. We also expanded our search to all pubmed articles authored by Wysocki and Lepri and then all articles by Wysocki (with the keyword Vomeronasal). Despite our best intentions to give due credit, we found nothing that seems directly related to this statement. Please correct us if we had missed anything.

      (2) Results:

      a) Given the 20s gap between them, the distinction between sample application and sympathetic nerve trunk stimulation needs to be made crystal clear; in many places, "stimulus application" is used in places where this reviewer suspects they actually mean sympathetic nerve trunk stimulation.

      We realize that this is confusing, and we also agree that at least in one place, we have not been sufficiently clear about the distinction. To clarify, we distinguish between stimulus application (physical application of stimulus to the nostril), and stimulation (which refers to SNT stimulation, which typically induces VNO suction). The general term stimulus presentation refers to the entire process. As explained in the text, in our analysis, we consider the entire window starting at application and ending 40s after stimulation. This is because we sometimes observe immediate responses following application. One such responses is seen in Figure 2D, and this is directly related to a detailed comment made below (on Figure 1D, part c). Indeed, for this figure time 0 indicates stimulus application. This was indicated previously, but we have now rearranged order of the panels to make the distinction between this response and other clearer. We have also revised the figure caption and the text to clarify this issue.

      b) There appears to be a mismatch between the discussion of Figure 3 and its contents. Specifically, there is an example of an "adjusted" pattern in 3A, not 3B.

      True. we have revised the text to correctly refer to the figure. Thanks.

      c) The discussion of patterns neglects to mention whether it's possible for a neuron to belong to more than one pattern. For example, it would seem possible for a neuron to simultaneously fit the "ICR pattern" and the "dominant adjusted pattern" if, e.g., all ICR responses are stronger than all others, but if simultaneously within each strain the dominant male causes the largest response.

      This is true. In the legend to Figure 3B, we actually wrote: “A neuron may fulfill more than one pattern and thus may appear in more than one row.”, but we now also write in the main text:

      “We note that criteria for adjusted patterns are less stringent than for the standard patterns defined above. Furthermore, some patterns are not mutually exclusive, and thus, a neuron may fulfil more than a single pattern.”

      (3) Discussion:

      a) The discussion of chemical specificity in urine focuses on volatiles and MUPs (citation #47), but many important molecules for the VNS are small, nonvolatile ligands. For such molecules, the corresponding study is Fu et al 2015.

      Agreed. We now cite this work and several others that were not included before in the context of chemical and electrophysiological analyses.

      b) "Following our line of reasoning, this scarcity may represent an optimal allocation of resources to separate dominant from naïve males": 1 unit out of 215 is roughly consistent with a single receptor. Surely little would be lost if there could be more computational capacity devoted to this important axis than that? It seems more likely that dominance is computed from multiple neuronal types with mixed encoding.

      We fully agree, and we are not claiming that dominance, nor any other feature, is derived using dedicated feature selective neurons. Our discussion of resource allocation is inevitably speculative. Our main point in this context is that a lack of overrepresentation does not imply that a feature is not important. As a note, we do not think that there is good reason to suppose that AOB neurons reflect the activity of single receptors.

      To present this potential confusion, we now added the following sentences in the Discussion subsection titled “Response patterns of AOB-MCs”:

      “We stress that we do not suggest that features such as physiological state are encoded by the activity of single neurons. In fact, we believe that most ethologically relevant features are encoded by the activity of multiple neurons. Nevertheless, such population level representations ultimately depend on the response properties of individual neurons, and we thus ask: what can we learn from our analysis of response pattern frequency?”

      (4) Methods:

      a) Male status, "were unambiguous in most cases": is it possible to put numerical estimates on this? 55% and 99% are both "most," yet they differ substantially in interpretive uncertainty.

      Upon reexamination, we realized that this sentence is incorrect. Ambiguous cases were not considered as dominant for urine collection. We only classified mice as dominant if they “won” the tube test and exhibited dominant behavior in the subsequent observation period in the cage. The phrasing has now been corrected in the manuscript (Methods section).

      b) Surgical procedures and electrode positioning: important details of probes are missing (electrode recording area, spacing, etc).

      This information has been added to the Methods subsection “Surgical procedures and electrode positioning”

      c) Stimulus presentation procedure: Are stimuli manually pipetted or delivered by apparatus with precise timing?

      They are delivered manually. This has now been clarified in the text.

      d) Data analysis, "we applied more permissive criteria involving response magnitude": it's not clear whether this is what's spelled out in the next paragraph, or whether that's left unspecified. In either case, the next paragraph appears to be about establishing a noise floor on pattern membership, not a "permissive criterion."

      True, the next paragraph is not the explanation for the more permissive criteria. The more permissive criteria involving response magnitude are actually those described in Figure 3A and 3B. The sentence that was quoted above merely states that before applying those criteria, we had also searched for patterns defined by binary designation of neurons as responsive, or not responsive, to each of the stimuli (this is directly related to the next comment below). Using those binary definitions, we obtained a very small number of neurons for each pattern and thus decided to apply the approach actually used and described in the manuscript.

      To clarify this confusion, we thoroughly derived the description of this paragraph, and the beginning of the next one in the Methods section.

      e) Data analysis, method for assessing significance: there's a lot to like about the use of pooling to estimate the baseline and the use of an ANOVA-like test to assess unit responsiveness.

      But:

      i) for a specific stimulus, at 4 trials (the minimum specified in "Stimulus presentation procedure") kruskalwallis is questionable. They state that most trials use 5, however, and that should be okay.

      The exact values are now given in the text. The mean number of repeated presentations per stimulus: 5.1± 0.9, mean ± sd. In 72% of the cases, stimuli were given 5 or more times. Otherwise, they were presented 4 times. In the context of the statistical test, we note that we are not comparing 5 (or 4) values with another set of 5 (or 4 values), but with a much larger sample (~44-55 baseline trials – given 11 trials and 4-5 repeats of each). Under this scenario, we think that the statistical approach is sound. However, the more important consideration, in our opinion, is given below.

      ii) the methods statement suggests they are running kruskalwallis individually for each neuron/stimulus, rather than once per neuron across all stimuli. With 11 stimuli, there is a substantial chance of a false-positive if they used p < 0.05 to assess significance. (The actual threshold was unstated.) Were there any multiple comparison corrections performed? Or did they run kruskalwallis on the neuron, and then if significant assess individual stimuli? (Which is a form of multiple-comparisons correction.)

      First, we indeed failed to mention that our criterion was 0.05. This has been corrected, by adding the information to the results and the Methods sections. No, we did not apply any multiple comparison measures. We consider each neuron-stimulus pair as an independent entity, and we are aware that this leads to a higher false positive rate. On the other hand, applying multiple comparisons would be problematic, as the same number of stimuli used in different studies varies. Application of multiple comparison corrections would thus lead to different response criteria across different studies, which would be very problematic. This raises the almost philosophical question regarding the use of multiple comparisons (as well as one and two tailed tests), but practically, most, if not all of our conclusions involve comparisons across conditions. For this purpose, we think that our procedure is valid. More generally, while selection of responses according to significance has some obvious advantages, the decision to use any particular criterion is entirely arbitrary. Therefore, we do not attach any special meaning to the significance threshold used here. Rather, we think of it as a simple criterion that allows us to exclude weakly responding or non-responsive neurons, and to compare frequencies of neurons that fulfill this criterion, under different conditions and contexts.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Results:

      "are represented more than represented by chance" seems to have a misplaced word

      True. Thanks. Corrected.

      Figure 1D:

      a) Indicate the meaning of the number that appears in the top left for each unit (10, 5, 40, 5, 5) (I'm guessing it's the vertical scale for the PSTH, but best to spell it out explicitly.)

      This information has been added.

      b) "The red vertical line indicates stimulus application": is it the application of the chemical stimulus or SNT shock?

      Please see our answer to c

      c) "For unit 2, time 0 indicate stimulus application, as in this case, responses began after stimulus application, prior to stimulation." First, the meaning of time 0 for the other units is not clearly specified (we infer that unit 2 is an exception, but we don't know what most of them mean). Second, it seems as if the response (?) to ICR naive begins even before stimulus application.

      This issue was also mentioned above as the 2nd weakness raised by this reviewer. To explain the meaning of the red lines, and resolve this confusion, we revised the figure caption text to indicate that for all units (except the former unit 2) time 0 indicates SNT stimulation. We also changed the order of the unit examples, placing the former unit 2 in the rightmost position. It is true that for this unit, there is a firing rate change prior to stimulus application, which actually appears as rate attenuation following stimulus application. In this specific case, we consider this activity as “noise”, and note that this neuron-stimulus combination would not be classified as a response (since there is no consistent change across stimulus presentation).

      As a note, while reviewing this figure, we noted an error. We have previously written that the ITI was 10 s, whereas it was actually 18 s long. This has been corrected in the Figure and in the text.

      Figure 2B:

      "The mean error due to the reduced 2-D representation is 0.29 (arbitrary units)." This is unclear. MDS is often described in terms of % of variance explained, is that what this means? If so, the units are not arbitrary; otherwise, it's unclear whether specifying a value with arbitrary units adds any value.

      This is a very good point, and we thank the reviewer for identifying this mistake. The units are not arbitrary! They are units of correlation distance. We now added a scale bar (a square) to panel 2B to indicate what a distance of 0.1. Following this comment, we also calculated the mean error in the original data, and noted the ratio between the mean absolute error (due to considering only two dimensions) and the mean original distances. We also now report the value of the first two eigenvalues. Specifically, we now write:

      “Note that like all dimensionally reduced representations, the representation in Fig. 2B is an approximation. Here, the first two eigenvalues of account for 44.6% of the variance of the original distances (30.4% and 14.2%, respectively for the first and second dimension). Another way to evaluate the representation is via the mean error due to the reduced 2-D representation. Here, it is 0.29, whereas the mean of the original distances is 0.73.”

      Figure 3A:

      a) There is a truncated label (or something) above the panel letter.

      Thanks. Corrected. This was part of the “Figure” label

      b) The graphic for the "adjusted pattern" also fits the criterion of the "pattern": for example, in the top row the activity for ICR is still higher than for any other stimulus, thus fulfilling the criterion of a "pattern" and not just an "adjusted pattern."

      That was not our intention. An adjusted pattern does not necessarily fulfill the (non-adjusted) “pattern” (while the opposite is true). We have now revised the rightmost panel in figure 3A, adding both “&s” to indicate that all three conditions must be fulfilled, and in attempt for a more intuitive representation, applied a different background denoting stimuli with irrelevant responses. We also changed the terms in the legend within the panel, making them more accurate: (Thus, “strong activity” was changed to “stronger responses”). In addition, we revised the text and figure legends in attempt to better clarify these definitions.

      Figure 3B:

      I'm assuming that the columns of the heatmap correspond to different urine stimuli, and that the color is normalized firing rate. But readers should not have to guess.

      True, and agreed. We added legends to clarify this.

      Figure 4B:

      The caption should mention that the pairwise measures are between the stimulus columns of panel A.

      We revised the caption to indicate this. Note that we also added two additional panels to this figure.

      Figure 5A&B:

      Instead of a multiple-comparisons correction, it seems likely to be better to use a 2-way ANOVA. At a minimum, the nature of the multiple-comparisons correction needs to be specified (many are conservative, but they differ in the extent of how conservative they are).

      We now write in the text that we used a Bonferroni correction (this information previously appeared only in the caption). We also found an error in the caption. We previously wrote that we used a binomial exact test for both panels A and B. However, only the data in panel A was calculated with a binomial exact test. The data in panel B was calculated with a one-way ANOVA.

      We now also applied a 2-way ANOVA to response magnitudes (i.e., panel B). We find a main effect of stimulus, but not of state, and no effect of interaction between the two. This is consistent with our previous analyses. This analysis is now included in the text. We thank the reviewer for this suggestion.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the Reviewers for their thorough reading and thoughtful feedback. Below, we address each of the concerns raised in the public reviews, and outline our revisions that aim to further clarify and strengthen the manuscript.

      In our response, we clarify our conceptualization of elasticity as a dimension of controllability, formalizing it within an information-theoretic framework, and demonstrating that controllability and its elasticity are partially dissociable. Furthermore, we provide clarifications and additional modeling results showing that our experimental design and modeling approach are well-suited to dissociating elasticity inference from more general learning processes, and are not inherently biased to find overestimates of elasticity. Finally, we clarify the advantages and disadvantages of our canonical correlation analysis (CCA) approach for identifying latent relationships between multidimensional data sets, and provide additional analyses that strengthen the link between elasticity estimation biases and a specific psychopathology profile. 

      Public Reviews:

      Reviewer 1 (Public review): 

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment, and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform the understanding of control across domains, which is a topic of great importance.

      We thank the Reviewer for their favorable appraisal and valuable suggestions, which have helped clarify and strengthen the study’s conclusion. 

      An overarching concern is that this paper is framed as addressing resource investments across domains that include time, money, and effort, and the introductory examples focus heavily on effort-based resources (e.g., exercising, studying, practicing). The experiments, though, focus entirely on the equivalent of monetary resources - participants make discrete actions based on the number of points they want to use on a given turn. While the same ideas might generalize to decisions about other kinds of resources (e.g., if participants were having to invest the effort to reach a goal), this seems like the kind of speculation that would be better reserved for the Discussion section rather than using effort investment as a means of introducing a new concept (elasticity of control) that the paper will go on to test.

      We thank the Reviewer for pointing out a lack of clarity regarding the kinds of resources tested in the present experiment. Investing additional resources in the form of extra tickets did not only require participants to pay more money. It also required them to invest additional time – since each additional ticket meant making another attempt to board the vehicle, extending the duration of the trial, and attentional effort – since every attempt required precisely timing a spacebar press as the vehicle crossed the screen. Given this involvement of money, time, and effort resources, we believe it would be imprecise to present the study as concerning monetary resources in particular. That said, we agree with the Reviewer that results might differ depending on the resource type that the experiment or the participant considers most. Thus, we now clarify the kinds of resources the experiment involved (lines 87-97): 

      “To investigate how people learn the elasticity of control, we allowed participants to invest different amounts of resources in attempting to board their preferred vehicle. Participants could purchase one (40 coins), two (60 coins), or three tickets (80 coins) or otherwise walk for free to the nearest location. Participants were informed that a single ticket allowed them to board only if the vehicle stopped at the station, while additional tickets provided extra chances to board even after the vehicle had left the platform. For each additional ticket, the chosen vehicle appeared moving from left to right across the screen, and participants could attempt to board it by pressing the spacebar when it reached the center of the screen. Thus, each additional ticket could increase the chance of boarding but also required a greater investment of resources—decreasing earnings, extending the trial duration, and demanding attentional effort to precisely time a button press when attempting to board.”

      In addition, in the revised discussion, we now highlight the open question of whether inferences concerning the elasticity of control generalize across different resource domains (lines 341-348):

      “Another interesting possibility is that individual elasticity biases vary across different resource types (e.g., money, time, effort). For instance, a given individual may assume that controllability tends to be highly elastic to money but inelastic to effort. Although the task incorporated multiple resource types (money, time, and attentional effort), the results may differ depending on the type of resources on which the participant focuses. Future studies could explore this possibility by developing tasks that separately manipulate elasticity with respect to different resource types. This would clarify whether elasticity biases are domain-specific or domaingeneral, and thus elucidate their impact on everyday decision-making.”

      Setting aside the framing of the core concepts, my understanding of the task is that it effectively captures people's estimates of the likelihood of achieving their goal (Pr(success)) conditional on a given investment of resources. The ground truth across the different environments varies such that this function is sometimes flat (low controllability), sometimes increases linearly (elastic controllability), and sometimes increases as a step function (inelastic controllability). If this is accurate, then it raises two questions.

      First, on the modeling front, I wonder if a suitable alternative to the current model would be to assume that the participants are simply considering different continuous functions like these and, within a Bayesian framework, evaluating the probabilistic evidence for each function based on each trial's outcome. This would give participants an estimate of the marginal increase in Pr(success) for each ticket, and they could then weigh the expected value of that ticket choice (Pr(success)*150 points) against the marginal increase in point cost for each ticket. This should yield similar predictions for optimal performance (e.g., opt-out for lower controllability environments, i.e., flatter functions), and the continuous nature of this form of function approximation also has the benefit of enabling tests of generalization to predict changes in behavior if there was, for instance, changes in available tickets for purchase (e.g., up to 4 or 5) or changes in ticket prices. Such a model would of course also maintain a critical role for priors based on one's experience within the task as well as over longer timescales, and could be meaningfully interpreted as such (e.g., priors related to the likelihood of success/failure and whether one's actions influence these). It could also potentially reduce the complexity of the model by replacing controllability-specific parameters with multiple candidate functions (presumably learned through past experience, and/or tuned by experience in this task environment), each of which is being updated simultaneously.

      We thank the Reviewer for suggesting this interesting alternative modeling approach. We agree that a Bayesian framework evaluating different continuous functions could offer advantages, particularly in its ability to generalize to other ticket quantities and prices. To test the Reviewer's suggestion, we implemented a Bayesian model where participants continuously estimate both controllability and its elasticity as a mixture of three archetypal functions mapping ticket quantities to success probabilities. The flat function provides no control regardless of how many tickets are purchased (corresponding to low controllability). The step function provides the same level of control as long as at least one ticket is purchased (inelastic controllability). The linear function increases control proportionally with each additional ticket (elastic controllability). The model computes the likelihood that each of the functions produced each new observation, and accordingly updates its beliefs. Using these beliefs, the model estimates the probability of success for purchasing each number of tickets, allowing participants to weigh expected control against increasing ticket costs. Despite its theoretical advantages for generalization to different ticket quantities, this continuous function approximation model performed significantly worse than our elastic controllability model (log Bayes Factor > 4100 on combined datasets). We surmise that the main advantage offered by the elastic controllability model is that it does not assume a linear increase in control as a function of resource investment – even though this linear relationship was actually true in our experiment and is required for generalizing to other ticket quantities, it likely does not match what participants were doing. We present these findings in a new section ‘Testing alternative methods’ (lines 686-701):

      “We next examined whether participant behavior would be better characterized as a continuous function approximation rather than the discrete inferences in our model. To test this, we implemented a Bayesian model where participants continuously estimate both controllability and its elasticity as a mixture of three archetypal functions mapping ticket quantities to success probabilities. The flat function provides no control regardless of how many tickets are purchased (corresponding to low controllability). The step function provides full control as long as at least one ticket is purchased (inelastic controllability). The linear function linearly increases control with the number of extra tickets (i.e., 0%, 50%, and 100% control for 1, 2, and 3 tickets, respectively; elastic controllability). The model computes the likelihood that each of the functions produced each new observation, and accordingly updates its beliefs. Using these beliefs, the model estimates the probability of success for purchasing each number of tickets, allowing participants to weigh expected control against increasing ticket costs. Despite its theoretical advantages for generalization to different ticket quantities, this continuous function approximation model performed significantly worse than the elastic controllability model (log Bayes Factor > 4100 on combined datasets), suggesting that participants did not assume that control increases linearly with resource investment.”

      We also refer to this analysis in our updated discussion (326-339): 

      “Second, future models could enable generalization to levels of resource investment not previously experienced. For example, controllability and its elasticity could be jointly estimated via function approximation that considers control as a function of invested resources. Although our implementation of this model did not fit participants’ choices well (see Methods), other modeling assumptions or experimental designs may offer a better test of this idea.”

      Second, if the reframing above is apt (regardless of the best model for implementing it), it seems like the taxonomy being offered by the authors risks a form of "jangle fallacy," in particular by positing distinct constructs (controllability and elasticity) for processes that ultimately comprise aspects of the same process (estimation of the relationship between investment and outcome likelihood). Which of these two frames is used doesn't bear on the rigor of the approach or the strength of the findings, but it does bear on how readers will digest and draw inferences from this work. It is ultimately up to the authors which of these they choose to favor, but I think the paper would benefit from some discussion of a common-process alternative, at least to prevent too strong of inferences about separate processes/modes that may not exist. I personally think the approach and findings in this paper would also be easier to digest under a common-construct approach rather than forcing new terminology but, again, I defer to the authors on this.

      We acknowledge the Reviewer's important point about avoiding a potential "jangle fallacy." We entirely agree with the Reviewer that elasticity and controllability inferences are not distinct processes. Specifically, we view resource elasticity as a dimension of controllability, hence the name of our ‘elastic controllability’ model. In response to this and other Reviewers’ comments, in the revised manuscript, we now offer a formal definition of elasticity as the reduction in uncertainty about controllability due to knowing the amount of resources available to the agent (lines 16-20; see further details in response to Reviewer 3 below).  

      With respect to how this conceptualization is expressed in the modeling, we note that the representation in our model of maximum controllability and its elasticity via different variables is analogous to how a distribution may be represented by separate mean and variance parameters. Even the model suggested by the Reviewer required a dedicated variable representing elastic controllability, namely the probability of the linear controllability function. More generally, a single-process account allows that different aspects of the said process would be differently biased (e.g., one can have an accurate estimate of the mean of a distribution but overestimate its variance). Therefore, our characterization of distinct elasticity and controllability biases (or to put it more accurately, 'elasticity of controllability bias' and 'maximum controllability bias') is consistent with a common construct account.

      To avoid misunderstandings, we have now modified the text to clarify that we view elasticity as a dimension of controllability that can only be estimated in conjunction with controllability. Here are a few examples:

      Lines 21-28: “While only controllable environments can be elastic, the inverse is not necessarily true – controllability can be high, yet inelastic to invested resources – for example, choosing between bus routes affords equal control over commute time to anyone who can afford the basic fare (Figure 1; Supplementary Note 1). That said, since all actions require some resource investment, no controllable environment is completely inelastic when considering the full spectrum of possible agents, including those with insufficient resources to act (e.g., those unable to purchase a bus fare or pay for a fixed-price meal).”

      Lines 45-47: “Experimental paradigms to date have conflated overall controllability and its elasticity, such that controllability was either low or elastic[16-20]. The elasticity of control, however, must be dissociated from overall controllability to accurately diagnose mismanagement of resources.”

      Lines 70-72: “These findings establish elasticity as a crucial dimension of controllability that guides adaptive behavior, and a computational marker of control-related psychopathology.”

      Lines 87-88: “To investigate how people learn the elasticity of control, we allowed participants to invest different amounts of resources in attempting to board their preferred vehicle.”

      Reviewer 2 (Public review):

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Interestingly, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals some important findings about how people consider components of controllability.

      We appreciate the Reviewer's positive assessment of our findings and computational approach to dissociating elasticity and overall controllability.

      The primary weakness of this research is that it is not entirely clear what is meant by "elastic" and "inelastic" and how these constructs differ from existing considerations of various factors/calculations that contribute to perceptions of and decisions about controllability. I think this weakness is primarily an issue of framing, where it's not clear whether elasticity is, in fact, theoretically dissociable from controllability. Instead, it seems that the elements that make up "elasticity" are simply some of the many calculations that contribute to controllability. In other words, an "elastic" environment is inherently more controllable than an "inelastic" one, since both environments might have the same level of predictability, but in an "elastic" environment, one can also partake in additional actions to have additional control overachieving the goal (i.e., expend effort, money, time).

      We thank the Reviewer for highlighting the lack of clarity about the concept of elasticity. We first clarify that elasticity cannot be entirely dissociated from controllability because it is a dimension of controllability. If no controllability is afforded, then there cannot be elasticity or inelasticity. This is why in describing the experimental environments, we only label high-controllability, but not low-controllability, environments as ‘elastic’ or ‘inelastic’. For further details on this conceptualization of elasticity, and associated revisions of the text, see our response above to Reviewer 1. 

      Second, we now clarify that controllability can also be computed without knowing the amount of resources the agent is able and willing to invest, for instance by assuming infinite resources available or a particular distribution of resource availabilities. However, knowing the agent’s available resources often reduces uncertainty concerning controllability. This reduction in uncertainty is what we define as elasticity. Since any action requires some resources, this means that no controllable environment is entirely inelastic if we also consider agents that do not have enough resources to commit any action. However, even in this case, environments can differ in the degree to which they are elastic. For further details on this formal definition, and associated revisions of the text, see our response to Reviewer 3.

      Importantly, whether an environment is more or less elastic does not fully determine whether it is more or less controllable. In particular, environments can be more controllable yet less elastic. This is true even if we allow that investing different levels of resources (i.e., purchasing 0, 1, 2, or 3 tickets) constitute different actions, in conjunction with participants’ vehicle choices. Below, we show this using two existing definitions of controllability. 

      Definition 1, reward-based controllability[1]: If control is defined as the fraction of available reward that is controllably achievable, and we assume all participants are in principle willing and able to invest 3 tickets, controllability can be computed in the present task as:

      where P( S'= goal ∣ 𝑆, 𝐴, 𝐶 ) is the probability of reaching the treasure from present state 𝑆 when taking action A and investing C resources in executing the action. In any of the task environments, the probability of reaching the goal is maximized by purchasing 3 tickets (𝐶 = 3) and choosing the vehicle that leads to the goal (𝐴 = correct vehicle). Conversely, the probability of reaching the goal is minimized by purchasing 3 tickets (𝐶 = 3) and choosing the vehicle that does not lead to the goal (𝐴 = wrong vehicle). This calculation is thus entirely independent of elasticity, since it only considers what would be achieved by maximal resource investment, whereas elasticity consists of the reduction in controllability that would arise if the maximal available 𝐶 is reduced. Consequently, any environment where the maximum available control is higher yet varies less with resource investment would be more controllable and less elastic. 

      Note that if we also account for ticket costs in calculating reward, this will only reduce the fraction of achievable reward and thus the calculated control in elastic environments.   

      Definition 2, information-theoretic controllability[2]: Here controllability is defined as the reduction in outcome entropy due to knowing which action is taken:

      where H(S'|S) is the conditional entropy of the distribution of outcomes S' given the present state S, and H(S'|S, A, C) is the conditional entropy of the outcome given the present state, action, and resource investment. 

      To compare controllability, we consider two environments with the same maximum control:

      • Inelastic environment: If the correct vehicle is chosen, there is a 100% chance of reaching the goal state with 1, 2, or 3 tickets. Thus, out of 7 possible action-resource investment combinations, three deterministically lead to the goal state (≥1 tickets and correct vehicle choice), three never lead to it (≥1 tickets and wrong vehicle choice), and one (0 tickets) leads to it 20% of the time (since walking leads to the treasure on 20% of trials).

      • Elastic Environment: If the correct vehicle is chosen, the probability of boarding it is 0% with 1 ticket, 50% with 2 tickets, and 100% with 3 tickets. Thus, out of 7 possible actionresource investment combinations, one deterministically leads to the goal state (3 tickets and correct vehicle choice), one never leads to it (3 tickets and wrong vehicle choice), one leads to it 60% of the time (2 tickets and correct vehicle choice: 50% boarding + 50% × 20% when failing to board), one leads to it 10% of time (2 ticket and wrong vehicle choice), and three lead to it 20% of time (0-1 tickets).

      Here we assume a uniform prior over actions, which renders the information-theoretic definition of controllability equal to another definition termed ‘instrumental divergence’[3,4]. We note that changing the uniform prior assumption would change the results for the two environments, but that would not change the general conclusion that there can be environments that are more controllable yet less elastic. 

      Step 1: Calculating H(S'|S)

      For the inelastic environment:

      P(goal) = (3 × 100% + 3 × 0% + 1 × 20%)/7 = .46, P(non-goal) = .54  H(S'|S) = – [.46 × log<sub>2</sub>(.46) + .54 × log<sub>2</sub>(.54)] = 1 bit

      For the elastic environment:

      P(goal) = (1 × 100% + 1 × 0% + 1 × 60% + 1 × 10% + 3 × 20%)/7 = .33, P(non-goal) = .67 H(S'|S) = – [.33 × log<sub>2</sub>(.33) + .67 × log<sub>2</sub>(.67)] = .91 bits

      Step 2: Calculating H(S'|S, A, C)

      Inelastic environment: Six action-resource investment combinations have deterministic outcomes entailing zero entropy, whereas investing 0 tickets has a probabilistic outcome (20%). The entropy for 0 tickets is: H(S'|C = 0) = -[.2 × log<sub>2</sub> (.2) + 0.8 × log<sub>2</sub> (.8)] = .72 bits. Since this actionresource investment combination is chosen with probability 1/7, the total conditional entropy is approximately .10 bits

      Elastic environment: 2 actions have deterministic outcomes (3 tickets with correct/wrong vehicle), whereas the other 5 actions have probabilistic outcomes:

      2 tickets and correct vehicle (60% success): 

      H(S'|A = correct, C = 2) = – [.6 × log<sub>2</sub> (.6) + .4 × log<sub>2</sub> (.4)] = .97 bits 2 tickets and wrong vehicle (10% success): 

      H(S'|A = wrong, C = 2) = – [.1 × log<sub>2</sub> (.1) + .9 × log<sub>2</sub> (.9)] = .47 bits 0-1 tickets (20% success):

      H(S'|C = 0-1) = – [.2 × log<sub>2</sub> (.2) + .8 × log<sub>2</sub> (.8)] = .72 bits

      Thus the total conditional entropy of the elastic environment is: H(S'|S, A, C) = (1/7) × .97 + (1/7) × .47 + (3/7) × .72 = .52 bits

      Step 3: Calculating I(S'|A, S)  

      Inelastic environment: I(S'; A, C | S) = H(S'|S) – H(S'|S, A, C) = 1 – 0.1 = .9 bits 

      Elastic environment: I(S'; A, C | S) = H(S'|S) – H(S'|S, A, C) = .91 – .52 = .39 bits

      Thus, the inelastic environment offers higher information-theoretic controllability (.9 bits) compared to the elastic environment (.39 bits). 

      Of note, even if each combination of cost and success/failure to reach the goal is defined as a distinct outcome, then information-theoretic controllability is higher for the inelastic (2.81 bits) than for the elastic (2.30 bits) environment. These calculations are now included in the Supplementary materials (Supplementary Note 1). 

      In sum, for both definitions of controllability, we see that environments can be more elastic yet less controllable. We have also revised the manuscript to clarify this distinction (lines 21-28):

      “While only controllable environments can be elastic, the inverse is not necessarily true – controllability can be high, yet inelastic to invested resources – for example, choosing between bus routes affords equal control over commute time to anyone who can afford the basic fare (Figure 1; Supplementary Note 1). That said, since all actions require some resource investment, no controllable environment is completely inelastic when considering the full spectrum of possible agents, including those with insufficient resources to act (e.g., those unable to purchase a bus fare or pay for a fixed-price meal).”

      Reviewer 3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome is multi-dimensional. In particular, the authors propose that the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally propose that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea thus has the potential to change how we think about mental disorders in a substantial way, and could even help us better understand how healthy people navigate challenging decision-making problems.

      Unfortunately, my view is that neither the theoretical nor empirical aspects of the paper really deliver on that promise. In particular, most (perhaps all) of the interesting claims in the paper have weak empirical support.

      We appreciate the Reviewer's thoughtful engagement with our research and recognition of the potential significance of distinguishing between different dimensions of control in understanding psychopathology. We believe that all the Reviewer’s comments can be addressed with clarifications or additional analyses, as detailed below.  

      Starting with theory, the elasticity idea does not truly "extend" the standard control model in the way the authors suggest. The reason is that effort is simply one dimension of action. Thus, the proposed model ultimately grounds out in how strongly our outcomes depend on our actions (as in the standard model). Contrary to the authors' claims, the elasticity of control is still a fixed property of the environment. Consistent with this, the computational model proposed here is a learning model of this fixed environmental property. The idea is still valuable, however, because it identifies a key dimension of action (namely, effort) that is particularly relevant to the notion of perceived control. Expressing the elasticity idea in this way might support a more general theoretical formulation of the idea that could be applied in other contexts. See Huys & Dayan (2009), Zorowitz, Momennejad, & Daw (2018), and Gagne & Dayan (2022) for examples of generalizable formulations of perceived control.

      We thank the Reviewer for the suggestion that we formalize our concept of elasticity to resource investment, which we agree is a dimension of action. We first note that we have not argued against the claim that elasticity is a fixed property of the environment. We surmise the Reviewer might have misread our statement that “controllability is not a fixed property of the environment”. The latter statement is motivated by the observation that controllability is often higher for agents that can invest more resources (e.g., a richer person can buy more things). We clarify this in our revision of the manuscript in lines 8-15 (changes in bold): 

      “The degree of control we possess over our environment, however, may itself depend on the resources we are willing and able to invest. For example, the control a biker has over their commute time depends on the power they are willing and able to invest in pedaling. In this respect, a highly trained biker would typically have more control than a novice. Likewise, the control a diner in a restaurant has over their meal may depend on how much money they have to spend. In such situations, controllability is not fixed but rather elastic to available resources (i.e., in the same sense that supply and demand may be elastic to changing prices[14]).”

      To formalize elasticity, we build on Huys & Dayan’s definition of controllability1 as the fraction of reward that is controllably achievable, 𝜒 (though using information-theoretic definitions[2,3] would work as well). To the extent that this fraction depends on the amount of resources the agent is able and willing to invest (max 𝐶), this formulation can be probabilistically computed without information about the particular agent involved, specifically, by assuming a certain distribution of agents with different amounts of available resources. This would result in a probability distribution over 𝜒. Elasticity can thus be defined as the amount of information obtained about controllability due to knowing the amount of resources available to the agent: I(𝜒; max 𝐶). We have added this formal definition to the manuscript (lines 15-20): 

      “To formalize how elasticity relates to control, we build on an established definition of controllability as the fraction of reward that is controllably achievable[15], 𝜒. Uncertainty about this fraction could result from uncertainty about the amount of resources that the agent is able and willing to invest, 𝑚𝑎𝑥 𝐶. Elasticity can thus be defined as the amount of information obtained about controllability by knowing the amount of available resources: 𝐼(𝜒; 𝑚𝑎𝑥 𝐶).”

      Turning to experiment, the authors make two key claims: (1) people infer the elasticity of control, and (2) individual differences in how people make this inference are importantly related to psychopathology. Starting with claim 1, there are three sub-claims here; implicitly, the authors make all three. (1A) People's behavior is sensitive to differences in elasticity, (1B) people actually represent/track something like elasticity, and (1C) people do so naturally as they go about their daily lives. The results clearly support 1A. However, 1B and 1C are not supported. Starting with 1B, the experiment cannot support the claim that people represent or track elasticity because the effort is the only dimension over which participants can engage in any meaningful decision-making (the other dimension, selecting which destination to visit, simply amounts to selecting the location where you were just told the treasure lies). Thus, any adaptive behavior will necessarily come out in a sensitivity to how outcomes depend on effort. More concretely, any model that captures the fact that you are more likely to succeed in two attempts than one will produce the observed behavior. The null models do not make this basic assumption and thus do not provide a useful comparison.

      We appreciate the Reviewer's critical analysis of our claims regarding elasticity inference, which as detailed below, has led to an important new analysis that strengthens the study’s conclusions. However, we respectfully disagree with two of the Reviewer’s arguments. First, resource investment was not the only meaningful decision dimension in our task, since participant also needed to choose the correct vehicle to get to the right destination. That this was not trivial is evidenced by our exclusion of over 8% of participants who made incorrect vehicle choices more than 10% of the time. Included participants also occasionally erred in this choice (mean error rate = 3%, range [0-10%] now specified in lines 363-366). 

      Second, the experimental task cannot be solved well by a model that simply tracks how outcomes depend on effort because 20% of the time participants reached the treasure despite failing to board their vehicle of choice. In such cases, reward outcomes and control were decoupled. Participants could identify when this was the case by observing the starting location (since depending on the starting location, the treasure location could have been automatically reached by walking), which was revealed together with the outcome. To determine whether participants distinguished between control-related and non-control-related reward, we have now fitted a variant of our model to the data that allows learning from each of these kinds of outcomes by means of a different free parameter. The results show that participants learned considerably more from control-related outcomes. They were thus not merely tracking outcomes, but specifically inferred when outcomes can be attributed to control. We now include this new analysis in the revised manuscript (Methods lines 648-661):

      “To ascertain that participants were truly learning latent estimates of controllability rather than simpler associations, we conducted two complementary analyses.

      First, we implemented a simple Q-learning model that directly maps ticket quantities to expected values based on reward prediction errors, without representing latent controllability. This associative model performed substantially worse than even our simple controllability model (log Bayes Factor ≥ 1854 on the combined datasets). Second, we fitted a variant of the elastic controllability model that compared learning from control-related versus chance outcomes via separate parameters (instead of assuming no learning from chance outcomes). Chance outcomes were observed by participants in the 20% of trials where reward and control were decoupled, in the sense that participants reached the treasure regardless of whether they boarded their vehicle of choice. Results showed that participants learned considerably more from control-related, as compared to chance, outcomes (mean learning ratio=1.90, CI= [1.83, 1.97]). Together, these analyses show that participants were forming latent controllability estimates rather than direct action-outcome associations.”

      Controllability inference by itself, however, still does not suffice to explain the observed behavior. This is shown by our ‘controllability’ model, which learns to invest more resources to improve control, yet still fails to capture key features of participants’ behavior, as detailed in the manuscript. This means that explaining participants’ behavior requires a model that not only infers controllability—beyond merely outcome probability—but also assumes a priori that increased effort could enhance control. Building these a priori assumption into the model amounts to embedding within it an understanding of elasticity – the idea that control over the environment may be increased by greater resource investment. 

      That being said, we acknowledge the value in considering alternative computational formulations of adaptation to elasticity, as now expressed in the revised discussion (lines 326-333; reproduced below in response to the Reviewer’s comment on updating controllability beliefs when losing with less than 3 tickets).

      For 1C, the claim that people infer elasticity outside of the experimental task cannot be supported because the authors explicitly tell people about the two notions of control as part of the training phase: "To reinforce participants' understanding of how elasticity and controllability were manifested in each planet, [participants] were informed of the planet type they had visited after every 15 trips." (line 384).

      We thank the Reviewer for highlighting this point. We agree that our experimental design does not test whether people infer elasticity spontaneously. However, our research question was whether people can distinguish between elastic and inelastic controllability. The results strongly support that they can, and this does have potential implications for behavior outside of the experimental task. Specifically, to the extent that people are aware that in some contexts additional resource investment improves control, whereas in other contexts it does not, then our results indicate that they would be able to distinguish between these two kinds of contexts through trial-and-error learning. That said, we agree that investigating whether and how people spontaneously infer elasticity is an interesting direction for future work. We have now added this to the discussion of future directions (lines 287-295):

      “Additionally, real life typically doesn’t offer the streamlined recurrence of homogenized experiences that makes learning easier in experimental tasks, nor are people systematically instructed and trained about elastic and inelastic control in each environment. These complexities introduce substantial additional uncertainty into inferences of elasticity in naturalistic settings, thus allowing more room for prior biases to exert their influences. The elasticity biases observed in the present studies are therefore likely to be amplified in real-life behavior. Future research should examine how these complexities affect judgments about the elasticity of control to better understand how people allocate resources in real-life.”

      Finally, I turn to claim 2, that individual differences in how people infer elasticity are importantly related to psychopathology. There is much to say about the decision to treat psychopathology as a unidimensional construct. However, I will keep it concrete and simply note that CCA (by design) obscures the relationship between any two variables. Thus, as suggestive as Figure 6B is, we cannot conclude that there is a strong relationship between Sense of Agency and the elasticity bias---this result is consistent with any possible relationship (even a negative one). The fact that the direct relationship between these two variables is not shown or reported leads me to infer that they do not have a significant or strong relationship in the data.

      We agree that CCA is not designed to reveal the relationship between any two variables. However, the advantage of this analysis is that it pulls together information from multiple variables. Doing so does not treat psychopathology as unidimensional. Rather, it seeks a particular dimension that most strongly correlates with different aspects of task performance.

      This is especially useful for multidimensional psychopathology data because such data are often dominated by strong correlations between dimensions, whereas the research seeks to explain the distinctions between the dimensions. Similar considerations apply to the multidimensional task parameters, which although less correlated, may still jointly predict the relevant psychopathological profile better than each parameter does in isolation. Thus, the CCA enabled us to identify a general relationship between task performance and psychopathology that accounts for different symptom measures and aspects of controllability inference. 

      Using CCA can thus reveal relationships that do not readily show up in two-variable analyses. Indeed, the direct correlation between Sense of Agency (SOA) and elasticity bias was not significant – a result that, for completeness, we now report in Supplementary Figure 3 along with all other direct correlations. We note, however, that the CCA analysis was preregistered and its results were replicated. Additionally, participants scoring higher on the psychopathology profile also overinvested resources in inelastic environments but did not futilely invest in uncontrollable environments (Figure 6A), providing external validation to the conclusion that the CCA captured meaningful variance specific to elasticity inference. Most importantly, an auxiliary analysis specifically confirmed the contributions of both elasticity bias (Figure 6D, middle plot) and, although not reported in the original paper, of the Sense of Agency score (SOA; p=.03 permutation test; see updated Figure 6D, bottom plot) to the observed canonical correlation. The results thus enable us to safely conclude that differences in elasticity inferences are significantly associated with a profile of control-related psychopathology to which SOA contributed significantly. We now report this when presenting the CCA results (lines 255-257): 

      “Loadings on the side of psychopathology were dominated by an impaired sense of agency (SOA; contribution to canonical correlation: p=.03, Figure 6D, bottom plot), along with obsessive compulsive symptoms (OCD), and social anxiety (LSAS) – all symptoms that have been linked to an impaired sense of control[22-25].”

      Finally, whereas interpretation of individual CCA loadings that were not specifically tested remains speculative, we note that the pattern of loadings largely replicated across the initial and replication studies (see Figure 6B), and aligns with prior findings. For instance, the positive loadings of SOA and OCD match prior suggestions that a lower sense of control leads to greater compensatory effort7, whereas the negative loading for depression scores matches prior work showing reduced resource investment in depression[5-6].

      We have now revised the manuscript to clarify the justification for our analytical approach (lines 236-248):

      “To examine whether the individual biases in controllability and elasticity inference have psychopathological ramifications, we assayed participants on a range of self-report measures of psychopathologies previously linked to a distorted sense of control (see Methods, pg. 24). Examining the direct correlations between model parameters and psychopathology measures (reported in Supplementary Figure 3) does not account for the substantial variance that is typically shared among different forms of psychopathology. For this reason, we instead used a canonical correlation analysis (CCA) to identify particular dimensions within the parameter and psychopathology spaces that most strongly correlate with one another.”

      We also now include a cautionary note in the discussion (lines 309-315):

      “Whereas our pre-registered CCA effectively identified associations between task parameters and a psychopathological profile, this analysis method does not directly reveal relationships between individual variables. Auxiliary analyses confirmed significant contributions of both elasticity bias and sense of agency to the observed canonical correlation, but the contribution of other measures remains to be determined by future work. Such work could employ other established measures of agency, including both behavioral indices and subjective self-reports, to better understand how these constructs relate across different contexts and populations.”

      There is also a feature of the task that limits our ability to draw strong conclusions about individual differences in elasticity inference. As the authors clearly acknowledge, the task was designed "to be especially sensitive to overestimation of elasticity" (line 287). A straightforward consequence of this is that the resulting *empirical* estimate of estimation bias (i.e., the gamma_elasticity parameter) is itself biased. This immediately undermines any claim that references the directionality of the elasticity bias (e.g. in the abstract). Concretely, an undirected deficit such as slower learning of elasticity would appear as a directed overestimation bias. When we further consider that elasticity inference is the only meaningful learning/decisionmaking problem in the task (argued above), the situation becomes much worse. Many general deficits in learning or decision-making would be captured by the elasticity bias parameter. Thus, a conservative interpretation of the results is simply that psychopathology is associated with impaired learning and decision-making.

      We apologize for our imprecise statement that the task was ‘especially sensitive to overestimation of elasticity’, which justifiably led to Reviewer’s concern that slower elasticity learning can be mistaken for elasticity bias. To make sure this was not the case, we made use of the fact that our computational model explicitly separates bias direction (𝜆) from the rate of learning through two distinct parameters, which initialize the prior concentration and mean of the model’s initial beliefs concerning elasticity (see Methods pg. 23). The higher the concentration of the initial beliefs (𝜖), the slower the learning. Parameter recovery tests confirmed that our task enables acceptable recovery of both the bias λ<sub>elasticity</sub> (r=.81) and the concentration 𝜖<sub>elasticity</sub> (r=.59) parameters. And importantly, the level of confusion between the parameters was low (confusion of 0.15 for 𝜖<sub>elasticity</sub> → λ<sub>elasticity</sub> and 0.04 for λ<sub>elasticity</sub>→ 𝜖<sub>elasticity</sub> This result confirms that our task enables dissociating elasticity biases from the rate of elasticity learning. 

      Moreover, to validate that the minimal level of confusion existing between bias and the rate of learning did not drive our psychopathology results, we re-ran the CCA while separating concentration from bias parameters. The results (figure below) demonstrate that differences in learning rate (𝜖) had virtually no contribution to our CCA results, whereas the contribution of the pure bias (𝜆) was preserved. 

      We now report on this additional analysis in the text (lines 617-627):

      “To capture prior biases that planets are controllable and elastic, we introduced parameters λ<sub>controllability</sub> and λ<sub>elasticity</sub>, each computed by multiplying the direction (λ – 0.5) and strength (ϵ) of individuals’ prior belief. 𝜖<sub>controllability</sub> and 𝜖<sub>elasticity</sub> range between 0 and 1, with values above 0.5 indicating a bias towards high controllability or elasticity, and values below 0.5 indicating a bias towards low controllability or elasticity. 𝜖<sub>controllability</sub> and 𝜖<sub>elasticity</sub> are positively valued parameters capturing confidence in the bias. Parameter recovery analyses confirmed both good recoverability (see S2 Table) and low confusion between bias direction and strength (𝜖<sub>controllability</sub> → λ<sub>controllability</sub> = −. 07, λ<sub>controllability</sub> → 𝜖<sub>controllability</sub> =. 16, 𝜖<sub>elasticity</sub> → λ<sub>elasticity</sub> =. 15, λ<sub>elasticity</sub> → 𝜖<sub>elasticity</sub> =. 04), ensuring that observed biases and their relation to psychopathology do not merely reflect slower learning (Supplementary Figure 4), which can result from changes in bias strength but not direction.”

      We also more precisely articulate the impact of providing participants with three free tickets at their initial visits to each planet.

      Showing that a model parameter correlates with the data it was fit to does not provide any new information, and cannot support claims like "a prior assumption that control is likely available was reflected in a futile investment of resources in uncontrollable environments." To make that claim, one must collect independent measures of the assumption and the investment.

      We apologize if this and related statements seemed to be describing independent findings. They were meant to describe the relationship between model parameters and model-independent measures of task performance. It is inaccurate, though, to say that they provide no new information, since results could have been otherwise. For instance, whether a higher controllability bias maps onto resource misallocation in uncontrollable environments (as we observed) depends on the range of this parameter in our population sample. Had the range been more negative, a higher controllability bias could have instead manifested as optimal allocation in controllable environments. Additionally, these analyses serve two other purposes: as a validity check, confirming that our computational model effectively captured observed individual differences, and as a help for readers to understand what each parameter in our model represents in terms of observable behavior. We now better clarify the descriptive purposes of these regressions (lines 214-220, 231-235): 

      “To clarify how fitted model parameters related to observable behavior, we regressed participants’ opt-in rates and extra ticket purchases on the parameters (Figure 6A) ...”

      “... In sum, the model parameters captured meaningful individual differences in how participants allocated their resources across environments, with the controllability parameter primarily explaining variance in resource allocation in uncontrollable environments, and the elasticity parameter primarily explaining variance in resource allocation in environments where control was inelastic.”

      Did participants always make two attempts when purchasing tickets? This seems to violate the intuitive model, in which you would sometimes succeed on the first jump. If so, why was this choice made? Relatedly, it is not clear to me after a close reading how the outcome of each trial was actually determined.

      We thank the Reviewer for highlighting the need to clarify these aspects of the task in the revised manuscript. 

      When participants purchased two extra tickets, they attempted both jumps, and were never informed about whether either of them succeeded. Instead, after choosing a vehicle and attempting both jumps, participants were notified where they arrived at. This outcome was determined based on the cumulative probability of either of the two jumps succeeding. Success meant that participants arrived at where their chosen vehicle goes, whereas failure meant they walked to the nearest location (as determined by where they started from). 

      Though it is unintuitive to attempt a second jump before seeing whether the first succeed, this design choice ensured two key objectives. First, that participants would consistently need to invest not only more money but also more effort and time in planets with high elastic controllability. Second, that the task could potentially generalize to the many real-world situations where the amount of invested effort has to be determined prior to seeing any outcome, for instance, preparing for an exam or a job interview. We now explicitly state these details when describing the experimental task (lines 393-395):

      “When participants purchased multiple tickets, they made all boarding attempts in sequence without intermediate feedback, only learning whether they successfully boarded upon reaching their final destination. This served two purposes. First, to ensure that participants would consistently need to invest not only more money but also more effort and time in planets with high elastic controllability. Second, to ensure that results could potentially generalize to the many real-world situations where the amount of invested effort has to be determined prior to seeing any outcome (e.g., preparing for an exam or a job interview).”

      It should be noted that the model is heuristically defined and does not reflect Bayesian updating. In particular, it overestimates control by not using losses with less than 3 tickets (intuitively, the inference here depends on your beliefs about elasticity). I wonder if the forced three-ticket trials in the task might be historically related to this modeling choice.

      We apologize for not making this clear, but in fact losing with less than 3 tickets does reduce the model’s estimate of available control. It does so by increasing the elasticity estimates (a<sub>elastic≥1</sub>,a<sub>elastic2</sub> parameters), signifying that more tickets are needed to obtain the maximum available level of control, thereby reducing the average controllability estimate across ticket investment options. We note this now in the presentation of the computational model (caption Figure 4):

      “A failure to board does not change estimated maximum controllability, but rather suggests that 1 ticket might not suffice to obtain control (a<sub>elastic≥1</sub> + 1; 𝑙𝑖𝑔ℎ𝑡 𝑔𝑟𝑒𝑒𝑛 𝑑𝑖𝑚𝑖𝑛𝑖𝑠ℎ𝑒𝑑). As a result, the model’s estimate of average controllability across ticket options is reduced.”

      It would be interesting to further develop the model such that losing with less than 3 tickets would also impact inferences concerning the maximum available control, depending on present beliefs concerning elasticity, but the forced three-ticket purchases already expose participants to the maximum available control, and thus, the present data may not be best suited to test such a model. These trials were implemented to minimize individual differences concerning inferences of maximum available control, thereby focusing differences on elasticity inferences. We now explicitly address these considerations in the revised discussion (lines 326-333) with the following: 

      “Future research could explore alternative models for implementing elasticity inference that extend beyond our current paradigm. First, further investigation is warranted concerning how uncertainty about controllability and its elasticity interact. In the present study, we minimized individual differences in the estimation of maximum available control by providing participants with three free tickets at their initial visits to each planet. We made this design choice to isolate differences in the estimation of elasticity, as opposed to maximum controllability. To study how these two types of estimations interact, future work could benefit from modifying this aspect of our experimental design.”

      Furthermore, we have now tested a Bayesian model suggested by Reviewer 1, but we found that this model fitted participants’ choices worse (see details in the response to Reviewer 1’s comments). 

      Recommendations for the authors:

      Reviewer 1 (Recommendations for the authors):

      In the introduction, the definition of controllability and elasticity, and the scope of "resources" investigated in the current study were unclear. If I understand correctly, controllability is defined as "the degree to which actions influence the probability of obtaining a reward", and elasticity is defined as the change in controllability based on invested resources. This would define the controllability of the environment and the elasticity of controllability of the environment. However, phrases such as "elastic environment" seem to imply that elasticity can directly attach to an environment, instead of attaching to the controllability of the environment.

      We thank the Reviewer for highlighting the need to clarify our conceptualization of elasticity and controllability. We now provide formal definitions of both, with controllability defined as the fraction of controllably achievable reward[1], and elasticity as the reduction in uncertainty about controllability due to knowing the amount of resources the agent is willing and able to invest (see further details in the response to Reviewer 3’s public comments). In the revised manuscript, we now use more precise language to clarify that elasticity is a property of controllability, not of environments themselves. In addition, we now clarify that the current study manipulated monetary, attentional effort, and time costs together (see further details in the response to Reviewer 1’s public comments).   

      (2) Some of the real-world examples were confusing. For example, the authors mention that investing additional effort due to the belief that this leads to better outcomes in OCD patients is overestimated elasticity, but exercising due to the belief that this can make one taller is overestimated controllability. What's the distinction between the examples? The example of the chess expert practicing to win against a novice, because the amount of effort they invest would not change their level of control over the outcome is also unclear. If the control over the outcome depends on their skill set, wouldn't practicing influence the control over the outcome? In the case of the meeting time example, wouldn't the bus routes differ in their time investments even though they are the same price? In addition to focusing the introductory examples around monetary resources, I would also generally recommend tightening the link between those examples and the experimental task.

      We thank the Reviewer for highlighting the need to clarify the examples used to illustrate elasticity and controllability. We have now revised these examples to more clearly distinguish between the concepts and to strengthen their connection to the experimental task.

      Regarding the OCD example, the possibility that OCD patients overestimate elasticity comes from research suggesting they experience low perceived control but nevertheless engage in excessive resource investment2, reflecting a belief that only through repeated and intense effort can they achieve sufficient control over outcomes. As an example, consider an OCD patient investing unnecessary effort in repeatedly locking their door. This behavior cannot result from an overestimation of controllability because controllability truly is close to maximal. It also cannot result from an underestimation of the maximum attainable control, since in that case investing more effort is futile. Such behavior, however, can result from an overestimation of the degree to which controllability requires effort (i.e., overestimation of elasticity). 

      Similarly, with regards to the chess expert, we intended to illustrate a situation where given their current level, the chess expert is already virtually guaranteed to win, such that additional practice time does not improve their chances. Conversely, the height example illustrates overestimated controllability because the outcome (becoming taller through exercise) is in fact not amenable to control through any amount of resource investment.

      Finally, the meeting time example was meant to illustrate that if the desired outcome is reaching a meeting in time, then different bus routes that cost the same provide equal control over this outcome to anyone who can afford the basic fare. This demonstrates inelastic controllability with respect to money, as spending more on transportation doesn't increase the probability of reaching the meeting on time. The Reviewer correctly notes that time investment may differ between routes. However, investing more time does not improve the expected outcome. This illustrates that inelastic controllability does not preclude agents from investing more resources, but such investment does not increase the fraction of controllably achievable reward (i.e., the probability of reaching the meeting in time).

      In the revised manuscript, we’ve refined each of the above examples to better clarify the specific resources being considered, the outcomes they influence, and their precise relationship to both elasticity and controllability: 

      OCD (lines 40-43): Conversely, the repetitive and unusual amount of effort invested by people with obsessive-compulsive disorder in attempts to exert control[23,24] could indicate an overestimation of elasticity, that is, a belief that adequate control can only be achieved through excessive and repeated resource investment[25].  

      Chess expert (54-57): Alternatively, they may do so because they overestimate the elasticity of control – for example, a chess expert practicing unnecessarily hard to win against a novice, when their existing skill level already ensures control over the match's outcome.

      Height (lines 53-54): A given individual, for instance, may tend to overinvest resources because they overestimate controllability – for example, exercising due to a misguided belief that that this can make one taller, when in fact height cannot be controlled. 

      Meeting time (lines 26-28): Choosing between bus routes affords equal control over commute time to anyone who can afford the basic fare (Figure 1).

      Methods

      (1) In the elastic controllability model definition, controllability is defined as "the belief that boarding is possible" (with any number of tickets). The definition again is different from in the task description where controllability is defined as "the probability of the chosen vehicle stopping at the platform if purchasing a single ticket."

      We clarify that "the probability of the chosen vehicle stopping at the platform if purchasing a single ticket" is our definition for inelastic controllability, as opposed to overall/maximum controllability, as stated here (lines 101-103):

      "We defined inelastic controllability as the probability that even one ticket would lead to successfully boarding the vehicle, and elastic controllability as the degree to which two extra tickets would increase that probability."

      Overall controllability is the summation of the two. This summation is referred to in the elastic controllability model definition as the "the belief that boarding is possible". We now clarify this in the caption to figure 4:

      Elastic Controllability model: Represents beliefs about maximum controllability (black outline) and the degree to which one or two extra tickets are necessary to obtain it. These beliefs are used to calculate the expected control when purchasing 1 ticket (inelastic controllability) and the additional control afforded by 2 and 3 tickets (elastic controllability).    

      We also clarify this in the methods when describing the parameterization of the model (lines 529-531): 

      The expected value of one beta distribution (defined by a,sub>control</sub>, b,sub>control</sub>) represents the belief that boarding is possible (controllability) with any number of tickets. 

      (2) The free parameter K is confusing. What is the psychological meaning of this parameter? Is it there just to account for the fact that failure with 3 tickets made participants favor 3 tickets or is there meaning attached to including this parameter?

      This parameter captures how participants update their beliefs about resource requirements after failing to board with maximum resource investment. Our psychological interpretation is that participants who experience failure despite maximum investment (3 tickets) prioritize resolving uncertainty about whether control is fundamentally possible (before exploring whether control is elastic), which can only be determined by continuing to invest maximum resources. 

      We now clarify this in the methods (lines 555-559):

      To account for our finding that failure with 3 tickets made participants favor 3, over 1 and 2, tickets, we introduced a modified elastic controllability* model, wherein purchasing extra tickets is also favored upon receiving evidence of low controllability (loss with 3 tickets). This effect was modulated by a free parameter 𝜅 which reflects a tendency to prioritize resolving uncertainty about whether control is at all possible by investing maximum resources.

      This interpretation is supported by our analysis of 3-ticket choice trajectories (Supplementary Figure 2 presented in response to Reviewer 2). As shown in the figure, participants who win less than 50% of their 3-ticket attempts persistently purchase 3 tickets over the first 10 trials, despite frequent failures. This persistence gradually declines as participants accumulate evidence about their limited control, corresponding with an increase in opt-out rates.

      (3) Some additional details about the task design would be helpful. It seems that participants first completed 90 practice trials and were informed of the planet type every 15 trials (6 times during practice). What message is given to the participants about the planets? Did the authors analyze the last 15 trials of each condition in the regression analysis, and all 30 trials in the modeling analysis? How does the computational model (especially the prior beliefs parameters) reset when the planet changes? How do points accumulate over the session and/or are participants motivated to budget the points? Is it possible for participants to accumulate many points and then switch to a heuristic of purchasing 3 tickets on each trial?

      We apologize for not previously clarifying these details of the experimental design.

      During practice blocks, participants received explicit feedback about each planet's controllability characteristics, to help them understand when additional resources would or would not improve their boarding success. For high inelastic controllability planets, the message read: "Your ride actually would stop for you with 1 ticket! So purchasing extra tickets, since they do cost money, is a WASTE." For low controllability planets: "Doesn't seem like the vehicle stops for you nor does purchasing extra tickets help." Lastly, for high elastic controllability planets: "Hopefully by now it's clear that only by purchasing 3 tickets (LOADING AREA) are you consistently successful in catching your ride." We now include these messages in the methods section describing the task (lines 453-458).

      We indeed analyzed the last 15 trials of each condition in the regression analysis, and all 30 trials in the modeling analysis. Whereas the modeling attempted to explain participants’ learning process, the regression focused on explaining the resultant behavior, which in our pilot data (N=19), manifested fairly stably in the last 15 trials (ticket choices SD = 0.33 compared to .63 in the first 15 trials). The former is already stated in the text (lines 409-415), and we now also clarify the latter when discussing the model fitting procedure (line 695): 

      Reinforcement-learning models were fitted to all choices made by participants via an expectation maximization approach used in previous work.

      The computational model was initialized with the same prior parameters for all planets. When a participant moved to a new planet, the model's beliefs were reset to these prior values, capturing how participants would approach each new environment with their characteristic expectations about controllability and elasticity. We now clarify this in the methods (line 628): 

      For each new planet participants encountered, these parameters were used to initialize the beta distributions representing participants’ beliefs

      Points accumulated across all planets throughout the session, with participants explicitly motivated to maximize their total points as this directly determined their monetary bonus payment. To address the Reviewer's question about changes in ticket purchasing behavior, we conducted a mixed probit regression examining whether accumulated points influenced participants’ decisions to purchase extra tickets. We did not find such an effect (𝛽<sub>coins accumulated</sub> \= .01 𝑝 = .87), indicating that participants did not switch to simple heuristic strategies after accumulating enough coins. We now report this analysis in the methods (lines 421-427):

      Points accumulated across all planets throughout the session, with participants explicitly motivated to maximize their total points as this directly determined their monetary bonus payment. To ensure that accumulated gains did not lead participants to adopt a simple heuristic strategy of always purchasing 3 tickets, we conducted a mixed probit regression examining whether the number of accumulated coins influenced participants' decisions to purchase extra tickets. We did not find such an effect (𝛽<sub>coins accumulated</sub> = .01 𝑝 = .87), ruling out the potential strategy shift.

      Following the modeling section, it may be helpful to have a table of the fitted models, the parameters of each model, and the meaning/interpretation of each parameter.

      We thank the Reviewer for this suggestion. We have now added a table (Supplementary Table 3) that summarizes all fitted models, their parameters, and the meaning/interpretation of each parameter.

      (1) The conclusions from regressing the task choices (opt-in rates and ticket purchases) on the fitted parameters seem confusing given that the model parameters were fitted on the task behavior, and the relationship between these variables seems circular. For example, the authors found that preferences for purchasing 2 or 3 tickets (a2 and a3; computational parameters) were associated with purchasing more tickets (task behavior). But wouldn't this type of task behavior be what the parameters are explaining? It's not clear whether these correlation analyses are about how individuals allocate their resources or about the validity check of the parameters. Perhaps analyses on individual deviation from the optimal strategy and parameter associations with such deviation are better suited for the questions about whether individual biases lead to resource misallocation.

      We thank the Reviewer for highlighting this seeming confusion. These regressions were meant to describe the relationship between model parameters and model-independent measures of task performance. This serves three purposes. First, a validity check, confirming that our computational model effectively captured observed individual differences. Second, to help readers understand what each parameter in our model represents in terms of observable behavior. Third, to examine in greater detail how parameter values specifically mapped onto observable behavior. For instance, whether a higher controllability bias maps onto resource misallocation in uncontrollable environments (as we observed) depends on the range of this parameter in our population sample. Had the range been more negative, a higher controllability bias could have instead manifested as optimal allocation in controllable environments. We now better clarify the descriptive purposes of these regressions (lines 214-220, 231-235): 

      To clarify how fitted model parameters related to observable behavior, we regressed participants’ opt-in rates and extra ticket purchases on the parameters (Figure 6A) ... 

      ... In sum, the model parameters captured meaningful individual differences in how participants allocated their resources across environments, with the controllability parameter primarily explaining variance in resource allocation in uncontrollable environments, and the elasticity parameter primarily explaining variance in resource allocation in environments where control was inelastic.  

      Regarding the suggestion to analyze deviation from optimal strategy, this corresponds with our present approach in that opting in is always optimal in high controllability environments and always non-optimal in low controllability environments, and similarly, purchasing extra tickets is always optimal in elastic controllability environments and always non-optimal elsewhere. Thus, positive or negative coefficients can be directly translated into closer or farther from optimal, depending on the planet type, as indicated in the figure by color. We now clarify this mapping in the figure legend:

      (2) Minor: The legend of Figure 6A is difficult to read. It might be helpful to label the colors as their planet types (low controllability, high elastic controllability, high inelastic controllability).

      We thank the Reviewer for this helpful suggestion. We have revised the figure accordingly.

      Reviewer 2 (Recommendations for the authors):

      As noted above, I'm not sure I agree with (or perhaps don't fully understand) the claims the authors make about the distinctions between their "elastic" and "inelastic" experimental conditions. Let's take the travel example from Figure 1 - is this not just an example of “hierarchical” controllability calculations? In other words, in the elastic example, my choice is between going one speed or another (i.e., exerting more or less effort), and in the inelastic example, my choice is first, which route to take (also a consideration of speed, but with lower effort costs than the elastic scenario), and second, an estimate of the time cost (not within my direct control, but could be estimated). In the elastic scenarios, additional value considerations vary between options, and in others (inelastic), they don't, with control over the first choice point (which bus route to choose, or which lunch option to take), but not over the price. I wonder if the paper would be better framed (or emphasized) as exploring the influences of effort and related "costs" of control. There isn't really such a thing as controllability that does not have any costs associated with it (whether that be action costs, effort, money, or simply scenario complexity).

      We thank the Reviewer for highlighting the need to clarify our distinction between elastic and inelastic controllability as it manifests in our examples. We first clarify that elasticity concerns how controllability varies with resources, not costs. Though resource investment and costs are often tightly linked, that is not always the case, especially not when comparing between agents. For example, it may be equally difficult (i.e., costly) for a professional biker to pedal at a high speed as it is for a novice to pedal at a medium speed, simply because the biker’s muscles are better trained. This resource advantage increases the biker’s control over his commute time without incurring additional costs as compared to the novice. We now clarify this distinction in the text by revising our example to (lines 9-11): 

      “For example, the control a biker has over their commute time depends on the power they are willing and able to invest in pedaling. In this respect, a highly trained biker would typically have more control than a novice.”

      Second, whereas in our examples additional value considerations indeed vary in elastic environments, that does not have to be the case, and indeed, that is not the case in our experiment. In our experimental task, participants are given the option to purchase as many tickets as they wish regardless of whether they are in an elastic or an inelastic environment.  

      We agree that elastic environments often raise considerations regarding the cost of control (for instance, whether it is worth it to pedal harder to get to the destination in time). To consider this cost against potential payoffs, however, the agent must first determine what are the potential payoffs – that is, it must determine the degree to which controllability is elastic to invested resources. It is this antecedent inference that our experiment studies. We uniquely study this inference using environments where control may not only be low or high, but also, where high control may or may not require additional resource investments. We now clarify this point in Figure 1’s caption:

      “In all situations, agents must infer the degree to which controllability is elastic to be able to determine whether the potential gains in control outweigh the costs of investing additional resources (e.g., physical exertion, money spent, time invested).”

      For a formal definition of the elasticity of control, see our response to Reviewer 3’s public comments. 

      Relatedly, another issue I have with the distinctions between inelastic/elastic is that a high/elastic condition has inherently ‘more’ controllability than a high/inelastic condition, no matter what. For example, in the lunch option scenario, I always have more control in the elastic situation because I have two opportunities to exert choice (food option ‘and’ cost). Is there really a significant difference, then, between calling these distinctions "elastic/inelastic" vs. "higher/lower controllability?" Not that it's uninteresting to test behavioral differences between these two types of scenarios, just that it seems unnecessary to refer to these as conceptually distinct.

      As noted in the response above, control over costs may be higher in elastic environments, but it does not have to be so, as exemplified by the elastic environments in our experimental task. For a fuller explanation of why higher elasticity does not imply higher controllability, see our response to Reviewer 2’s public comments. 

      I also wonder whether it's actually the case that people purchased more tickets in the high control elastic condition simply because this is the optimal solution to achieve the desired outcome, not due to a preference for elastic control. To test this, you would need to include a condition in which people opted to spend more money/effort to have high elastic control in an instance where it was not beneficial to do so.

      We appreciate the Reviewer's question about potential preferences for elastic control. We first clarify that participants did not choose which environment type they encountered, so if control was low or inelastic, investing extra resources did not give them more control. Furthermore, our results show that the average participant did not prefer a priori to purchase more tickets. This is evidenced by participants’ successful adaptation to inelastic environments wherein they purchased significantly fewer tickets (see Figure 2B and 2C), and by participants’ parameter fits, which reveal an a priori bias to assume that controllability is inelastic (𝜆<sub>elasticity</sub> \= .16 ± .19), as well as a fixed preference against purchasing the full number of tickets (𝛼<sub>3</sub> \= −.74 ± .37). 

      We now clarify these findings by including a table of all parameter fits in the revised manuscript (see response to Reviewer 1). 

      It was interesting that the authors found that failure with 3 tickets made people more likely to continue to try 3 tickets, however, there is another possible interpretation. Could it be that this is simply evidence of a general controllability bias, where people just think that it is expected that you should be able to exert more money/effort/time to gain control, and if this initially fails, it is an unusual outcome, and they should try again? Did you look at this trajectory over time? i.e., whether repeated tries with 3 tickets immediately followed a failure with 3 tickets? Relatedly, does the perseveration parameter from the model also correlate with psychopathology?

      We thank the Reviewer for this suggestion. Our model accounts for a general controllability bias through the 𝜆<sub>controllability</sub> parameter, which represents a prior belief that planets are controllable. It also accounts, through the 𝜆<sub>elasticity</sub> parameter, for the prior belief that you should be able to exert more money/effort/time to gain control. Now, our addition of 𝜅 to the model captures the observation that failures with 3 tickets made participants more likely to purchase 3 tickets when they opted in. If this observation was due to participants not accepting that the planet is not controllable, then we would expect the increase in 3-ticket purchases when opting in to be coupled with a diminished reduction in opting in. To determine whether this was the case, we tested a variant of our model where 𝜅 not only increases the elasticity estimate but also reduces the controllability update (using 𝛽<sub>control</sub>+(1- 𝜅) instead of 𝛽<sub>control</sub>+1) after failures with 3 tickets. However, implementing this coupling diminished the model's fit to the data, as compared to allowing both effects to occur independently, indicating that the increase in 3 ticket purchases upon failing with 3 tickets did not result from participants not accepting that controllability is in fact low. Thus, we maintain our original interpretation that failure with 3 tickets increases uncertainty about whether control is possible at all, leading participants who continue to opt in to invest maximum resources to resolve this uncertainty. We now report these results in the revised text (lines 662-674). 

      The trajectory over time is consistent this interpretation (new Supplementary Figure 2 shown below). Specifically, we see that under low controllability (0-50%, orange line), over the first 10 trials participants show higher persistence with 3 tickets after failing, despite experiencing frequent failures, but also a higher opt-out probability. As these participants accumulate evidence about their limited control, we observe a gradual decrease in 3-ticket selections that corresponds directly with a further increase in opting out (right panel, orange line). This pattern qualitatively corresponds with the behavior of our computational model (empty circles). We present the results of the new analysis in lines 180-190: 

      “In fact, failure with 3 tickets even made participants favor 3, over 1 and 2, tickets. This favoring  of 3 tickets continued until participants accumulated sufficient evidence about their limited control to opt out (Supplementary Figure 2). Presumably, the initial failures with 3 tickets resulted in an increased uncertainty about whether it is at all possible to control one’s destination. Consequently, participants who nevertheless opted in invested maximum resources to resolve this uncertainty before exploring whether control is elastic.”

      Regarding correlations between the perseveration parameter and psychopathology, we have now conducted a comprehensive exploratory analysis of all two-way relationships between parameters and psychopathology scores (new Supplementary Figure 3). Whereas we observed modest negative correlations with social anxiety (LSAS, r=-0.13), cyclothymic temperament (r=0.13), and alcohol use (AUDIT, r=-0.13), none reached statistical significance after FDR correction for multiple comparisons. 

      Regarding the modeling, I also wondered whether a better alternative model than the controllability model would be a simple associative learning model, where a number of tickets are mapped to outcomes, regardless of elasticity.

      We thank the Reviewer for suggesting this alternative model. Following this suggestion, we implemented a simple associative learning model that directly maps each option to its expected value, without a latent representation of elasticity or controllability. Unlike our controllability model which learns the probability of reaching the goal state for each ticket quantity, this associative learning model simply updates option values based on reward prediction errors.

      We found that this simple Q-learning model performed worse than even the controllability model at explaining participant data (log Bayes Factor  ≥1854 on the combined datasets), further supporting our hypothesis that participants are learning latent estimates of control rather than simply associating options with outcomes. We present the results of this analysis in lines 662664:

      We implemented a simple Q-learning model that directly maps ticket quantities to expected values based on reward prediction errors, without representing latent controllability. This associative model performed substantially worse than even our simple controllability model (log Bayes Factor ≥ 1854 on the combined datasets).

      Reviewer 3 (Recommendations for the authors):

      Please make all materials available, including code (analysis and experiment) and data. Please also provide a link to the task or a video of a few trials of the main task.

      We thank the reviewer for this important suggestion. All requested materials are now available at https://github.com/lsolomyak/human_inference_of_elastic_control. This includes all experiment code, analysis code, processed data, and a video showing multiple sample trials of the main task.

      References

      (1)  Huys, Q. J. M., & Dayan, P. (2009). A Bayesian formulation of behavioral control. Cognition, 113(3), 314– 328.

      (2)  Ligneul, R. (2021). Prediction or causation? Towards a redefinition of task controllability. Trends in Cognitive Sciences, 25(6), 431–433.

      (3)  Mistry, P., & Liljeholm, M. (2016). Instrumental divergence and the value of control. Scientific Reports, 6, 36295.

      (4)  Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145–151

      (5)  Cohen RM, Weingartner H, Smallberg SA, Pickar D, Murphy DL. Effort and cognition in depression. Arch Gen Psychiatry. 1982 May;39(5):593-7. doi: 10.1001/archpsyc.1982.04290050061012. PMID: 7092490.

      (6)  Bi R, Dong W, Zheng Z, Li S, Zhang D. Altered motivation of effortful decision-making for self and others in subthreshold depression. Depress Anxiety. 2022 Aug;39(8-9):633-645. doi: 10.1002/da.23267. Epub 2022 Jun 3. PMID: 35657301; PMCID: PMC9543190.

      (7)  Tapal, A., Oren, E., Dar, R., & Eitam, B. (2017). The Sense of Agency Scale: A measure of consciously perceived control over one's mind, body, and the immediate environment. Frontiers in Psychology, 8, 1552

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      There has been intense controversy over the generality of Hamilton's inclusive fitness rule for how evolution works on social behaviors. All generally agree that relatedness can be a game changer, for example allowing for otherwise unselectable altruistic behaviors when 𝑐 < 𝑟𝑏, where 𝑐 is the fitness cost to the altruism, 𝑏 is the fitness benefit to another, and 𝑟 their relatedness. Many complications have been successfully incorporated into the theory, including different reproductive values and viscous population structures.

      I agree, especially if by incorporating viscous population structures, the reviewer means the discovery of the cancellation effect (Wilson, Pollock, and Dugatkin, 1992, Taylor, 1992).

      The controversy has centered on another dimension; Hamilton's original model was for additive fitness, but how does his result hold when fitnesses are non-additive? One approach has been not to worry about a general result but just find results for particular cases. A consistent finding is that the results depend on the frequency of the social allele - nonadditivity causes frequency dependence that was absent in Hamilton's approach.

      Just to be extra precise: Hamilton’s (1964) original model did not use the Price equation nor the regression approach to define costs and benefits, and it did indeed simply presuppose fixed, additive fitness effects.

      Also for extra precision on terminology: many researchers will describe all fitnesses in social evolution as frequency dependent. The reason they do, is that with or without additivity, both the fitness of cooperators (with the social allele) and the fitness of defectors (without the social alle) typically increase in the frequency of cooperators in the population; the more cooperators there are, the more individuals run into them, which increases average fitness. The result depending on the frequency I take to mean that which of those two fitnesses is larger flips at a certain frequency, which automatically implies that the difference between them is depending on the frequency of the social allele. This is indeed the result of non-additivity. We will return to this in more detail in the response to Reviewer #3. Also at the end of Appendix B I have added a bit to be extra precise regarding frequency dependence.

      Two other approaches derive from Queller via the Price equation. Queller 1 is to find forms like Hamilton's rule, but with additional terms that deal with non-additive interaction, each with an r-like population structure variable multiplied by a b-like fitness effect (Queller, 1985). Queller 2 redefines the fitness effects c and b as partial regressions of the actor's and recipient's genes on fitness. This leaves Hamilton's rule intact, just with new definitions of c and b that depend on frequency (Queller, 1992a).

      Queller 2 is the version that has been most adopted by the inclusive fitness community along with assertions that Hamilton's rule in completely general. In this paper, van Veelen argues that Queller 1 is the correct approach. He derives a general form that Queller only hinted at. He does so within a more rigorous framework that puts both Price's equation and Hamilton's rule on firmer statistical ground. Within that framework, the Queller 2 approach is seen to be a statistical misspecification - it employs a model without interaction in cases that actually do have interaction. If we accept that this is a fatal flaw, the original version of Hamilton's rule is limited to linear fitness models, which might not be common.

      I totally agree.

      Strengths:

      While the approach is not entirely new, this paper provides a more rigorous approach and a more general result. It shows that both Queller 1 and Queller 2 are identities and give accurate results, because both are derived from the Price equation, which is an identity. So why prefer Queller 1? It identifies the misspecification issue with the Queller 2 approach and points out its consequences. For example, it will not give the minimum squared differences between the model and data. It does not separate the behavioral effects of the individuals from the population state (𝑏 and 𝑐 become dependent on 𝑟 and the population frequency).

      Just to be precise on a detail: in the data domain, as long as the number of parameters in a statistical model is lower than the number of data points, adding parameters typically (generically) lowers the sum of squared errors. That is to say, for an underspecified statistical model, the sum of squared errors goes down if a parameter is added, but for an already overspecified statistical model, the same is still true (although, typically, by how much the sum of squared errors is reduced will differ). The model specification task for a statistician includes knowing when to keep adding parameters, because the data suggest that the model is still underspecified, and when to stop adding parameters, because the model is well-specified, even if adding parameters still reduces the sum of squared errors.

      In a modeling context, on the other hand, one can say that sum of squared differences will stop decreasing at the point where the statistical model is well-specified, that is: when it matches the model we are considering.

      The paper also shows how the same problems can apply to non-social traits. Epistasis is the non-additivity of effects of two genes within the individual. (So one wonders why have we not had a similarly fierce controversy over how we should treat epistasis?)

      The paper is clearly written. Though somewhat repetitive, particularly in the long supplement, most of that repetition has the purpose of underscoring how the same points apply equally to a variety of different models.

      Finally, this may be a big step towards reconciliation in the inclusive fitness wars. Van Veelen has been one of the harshest critics of inclusive fitness, and now he is proposing a version of it.

      I am very happy to hear this, because I am indeed hopeful for reconciliation. I would like to add a comment, though. The debate on Hamilton’s rule/inclusive fitness is regularly thought of as a battle between two partizan camps, where both sides care at least as much about winning as they do about getting things right. This is totally understandable, because to some degree that is true. Also, I agree that it is fair to position me in the camp that is critical of the inclusive fitness literature. However, I would like to think that I have not been taking random shots at Hamilton’s rule. I have pointed to problems with the typical use of the Price equation and Hamilton’s rule, and I think I did for very good reasons. I am obviously very happy that finding the Generalized Price equation, and the general version of Hamilton’s rule, allowed me to go beyond this, and (finally) offer a correct alternative, and I totally appreciate that this opens the door for reconciliation, as this reviewer points out. But I would not describe this as a road-toDamascus moment. In order to illustrate the continuity in my work, I would like to point to three papers.

      In van Veelen (2007), I pointed to the missing link between the central result in Hamilton’s (1964) famous paper (which states that selection dynamics take the population to a state where mean inclusive fitness is maximized), and Hamilton’s actual rule (which states that selection will lead to individuals maximizing their individual inclusive fitness). My repair stated the additional assumptions that were necessary to make the latter follow from the former. I would say that this can hardly be characterized as an attack on Hamilton’s rule. Reading Hamilton (1964) with enough care to notice something is missing, and then repairing it, I think is a sign of respect, and not an attack.

      Van Veelen (2011) is about the replicator dynamics for n-player games, with the possibility of assortment. This puts the paper in a domain that does not assume weak selection, and that is typically not much oriented towards inclusive fitness. I included a theorem that implies that, under the condition of linearity, inclusive fitness not only gets the direction of selection right, but 𝑟𝑏 − 𝑐 becomes a parameter that also determines the speed of selection. This I think is representative, in the sense that in many of my papers, I carefully stake out when the classic version of Hamilton’s rule does work.

      In Akdeniz and van Veelen (2020), we moreover take a totally standard inclusive fitness approach in a model of the cancellation effect at the group level.

      I would say that this does not line up with the image of a harsh critic that takes random shots at Hamilton’s rule or inclusive fitness.

      Weaknesses:

      van Veelen argues that the field essentially abandoned the Queller 1 approach after its publication. I think this is putting it too strongly - there have been a number of theoretical studies that incorporate extra terms with higher-order relatednesses. It is probably accurate to say that there has been relative neglect. But perhaps this is partly due to a perception that this approach is difficult to apply.

      I can imagine that the perceived difficulty in application may have played a role in the neglect of the Queller 1 approach. What for sure has played a role, and I would think a much bigger one, is that the literature has been pretty outspoken that the Queller 1 approach is the wrong way to go. The main text cites a number of papers that hold this position very emphatically (The first one of those was a News and Views by Alan Grafen (1985) that accompanied the paper in which Queller presented his Queller 1 approach. I am very happy that Appendix B shows on how many levels this News and Views was wrong.). There is only a handful of papers that follow the Queller 1 example.

      The model in this paper is quite elegant and helps clarify conceptual issues, but I wonder how practical it will turn out to be. In terms of modeling complicated cases, I suspect most practitioners will continue doing what they have been doing, for example using population genetics or adaptive dynamics, without worrying about neatly separating out a series of terms multiplying fitness coefficients and population structure coefficients.

      I am not sure if I see what the reviewer envisions practitioners that use population genetics will keep on doing. I would think that the Generalized Price equation in regression form is a description of population genetic dynamics, and therefore, if practitioners will not make an effort to “neatly separate out a series of terms multiplying fitness coefficients and population structure coefficients”, then all I can say is that they should. I cannot do more than explain why, if they do not, they are at risk of mischaracterizing what gets selected and why.

      Regarding those that use adaptive dynamics, I would say that this is a whole different approach. Within this approach, one can also apply inclusive fitness; see Section 6 and Appendix D of van Veelen et al. (2017). Appendix D is full of deep technical results and was done by Benjamin Allen.

      For empirical studies, it is going to be hard to even try to estimate all those additional parameters. In reality, even the standard Hamilton's rule is rarely tested by trying to estimate all its parameters. Instead, it is commonly tested more indirectly, for example by comparative tests of the importance of relatedness. That of course would not distinguish between additive and non-additive models that both depend on relatedness, but it does test the core idea of kin selection. It will be interesting to see if van Veelen's approach stimulates new ways of exploring the real world.

      Regarding the impact on empirical studies, there are a few things that I would like to say. The first is that I would just like to repeat, maybe a bit more elaborately, what I wrote at the end of the main text. Given that the generalized version of Hamilton’s rule produces a host of Hamilton-like rules, and given the fact that all of them by construction indicate the direction of selection accurately, the question whether or not Hamilton’s rule holds turns out to be illposed. That means that we can stop doing empirical tests of Hamilton’s rule, which are predicated on the idea that Hamilton’s rule, with benefits and costs being determined by the regression method, could be violated – which it cannot (Side note: it is possible to violate Hamilton’s rule, if costs and benefits are defined according to the counterfactual method; see van Veelen et al. (2017) and van Veelen (2018). This way of defining costs and benefits is less common, although there are authors that find this definition natural enough to assume that this is the way in which everybody defines costs and benefits (Karlin and Matessi, 1983, Matessi and Karlin, 1984).). Instead, we should do empirical studies to find out which version of Hamilton’s rule applies to which behaviour in which species.

      would like to not understate what a step forward this is. The size of the step forwards is of course also due to the dismal point of departure. As theorists, we have failed our empiricists, because all 12 studies included in the review by Bourke (2014) of papers that explicitly test Hamilton’s rule are based on the misguided idea that the traditional Hamilton’s rule, with costs and benefits defined according to the regression method, can be violated. While the field does sometimes have disdain for mathematical nit-picking, this is a point where a little more attention to detail would have really helped. If the hypothesis is that Hamilton’s rule holds, and the null is that it does not, then trying to specify how the empirical quantity that reflects inclusive fitness would be distributed under the null hypothesis (in order to do the right statistical tests) would have forced researchers to do something with the information that this quantity is not distributed at all, because Hamilton’s rule is general (in the sense that it holds for any way in which the world works). If one would prefer to reverse the null and the alternative hypothesis, one would run into similar problems. Understanding that the question is ill-posed therefore is a big step forwards from the terrible state of statistics and the waste of research time, attention and money on the empirical side of this field (see also Section 8 of van Veelen et al., 2017).

      I would agree that doing comparative statics may not be much affected by this. Section 5 of van Veelen et al. (2017) indicates that there can be a large set of circumstances under which the general idea “relatedness up → cooperation up” still applies. But that may be a bit unambitious, and Section 8 of van Veelen et al. (2017), and the final section of van Veelen (2018) contain some reflections on empirical testing that may allow us to go beyond that. As long as there is change happening in the Generalized Price equation, the population is not in equilibrium. For empirical tests, one can either aim to capture selection as it happens, or assume that what we observe reflects properties of an equilibrium. This leads to interesting reflections on how to do empirics, which may differ between traits that are continuous and traits that are discrete (again: see van Veelen et al. (2017), and van Veelen (2018).

      Reviewer #2 (Public review):

      Summary:

      This manuscript reconsiders the "general form" of Hamilton's rule, in which "benefit" and "cost" are defined as regression coefficients. It points out that there is no reason to insist on Hamilton's rule of the form −𝑐 + 𝑏𝑟 > 0, and that, in fact, arbitrarily many terms (i.e. higherorder regression coefficients) can be added to Hamilton's rule to reflect nonlinear interactions. Furthermore, it argues that insisting on a rule of the form −𝑐 + 𝑏𝑟 > 0 can result in conditions that are true but meaningless and that statistical considerations should be employed to determine which form of Hamilton's rule is meaningful for a given dataset or model.

      Totally right. I cannot help to want to be extra precise, though, by distinguishing between the data domain and the modelling domain. In the data domain, statistical considerations apply in order to avoid misspecification. In this domain, avoiding misspecification can be complicated, because we do not know the underlying data generating process, and we depend on noisy data to make a best guess. In the modeling domain, however, there is no excuse for misspecification, as the model is postulated by the modeler. I therefore would think that in this domain, it does not really require “statistical considerations” to minimize the probability of misspecification; we can get the probability of misspecification all the way down to 0 by just choosing not to do it.

      Strengths:

      The point is an important one. While it is not entirely novel-the idea of adding extra terms to Hamilton's rule has arisen sporadically (Queller, 1985, 2011; Fletcher et al., 2006; van Veelen et al., 2017)--it is very useful to have a systematic treatment of this point. I think the manuscript can make an important contribution by helping to clarify a number of debates in the literature. I particularly appreciate the heterozygote advantage example in the SI.

      Me too, and I really hope the readers make it this far! I have thought of putting it in the main text, but did not know where that would fit.

      Weaknesses:

      Although the mathematical analysis is rigorously done and I largely agree with the conclusions, I feel there are some issues regarding terminology, some regarding the state of the field, and the practice of statistics that need to be clarified if the manuscript is truly to resolve the outstanding issues of the field. Otherwise, I worry that it will in some ways add to the confusion.

      (1) The "generalized" Price equation: I agree that the equations labeled (PE.C) and (GPE.C) are different in a subtle yet meaningful way. But I do not see any way in which (GPE.C) is more general than (PE.C). That is, I cannot envision any circumstance in which (GPE.C) applies but (PE.C) does not. A term other than "generalized" should be used.

      This is a great point! Just to make sure that those that read the reports online understand this point, let me add some detail. The equation labeled (PE.C) – which is short for Price equation in covariance form – is

      The derivation in Appendix A then assumes that we have a statistical model that includes a constant and a linear term for the p-score. It then defines the model-estimated fitness of individual 𝑖 as , where 𝑤<sub> 𝑖</sub> is the realized number of offspring of individual 𝑖, and 𝜀<sub> 𝑖</sub> is the error term – and it is the sum over all individuals of this error term-squared that is minimized. The vector of model-estimated fitnesses will typically be different for different choices of the statistical model. Appendix A then goes on to show that, whatever the statistical model is that is used, for all of them , as long as the statistical model includes a constant and a linear term for the p-score. That means that we can rewrite (PE.C) as

      The point that the reviewer is making, is that this is not really a generalization. For a given dataset (or, more generally, for a given population transition, whether empirical or in a model), is just a number, and it happens to be the case that 𝐶𝑜𝑣(𝑤:, 𝑝) returns the same number, whatever statistical model we use for determining what the model-estimated fitnesses 𝑤<sub> 𝑖</sub> are (as long as the statistical model includes a constant and a linear term for the p-score). In other words, (PE.C) is not really nested in (GPE.C), so (GPE.C) is not a proper generalization of (PE.C).

      This is a totally correct point, and I had actually struggled a bit with the question what terminology to use here. Equation (GPE.C) is definitely general, in the sense that we can change the statistical model, and thereby change the vector of model-estimated fitnesses , but as long as we keep the constant and the linear term in the statistical model, the equation still applies. But it is not a generalization of (PE.C).

      I do however have a hard time coming up with a better label. The General Price equation may be a bit better, but it still suggests generalization. The Statistical Model-based Price equation does not suggest or imply generalization, but it does not convey how general it is, and it suggests that it could be an alternative to the normal Price equation that one may or may not choose to use – while this version really is the one we should use. It may moreover create the impression that this is only for doing statistics, and one might use the traditional Price equation for anything that is not statistics. I cannot really think of other good alternatives, but I am of course open to suggestions.

      So, by lack of a better label, I called this the Generalized Price equation in covariance form. Though clearly imperfect, there are still a few good things about this label. The first is that, as mentioned above, this equation is general, in the sense that it holds, regardless of the statistical model. The second reason is that this is Step 1 in a sequence of three steps., the other two of which do produce proper generalizations. Step 2 goes from this equation in covariance form to the Generalized Price Equation in regression form, which is a proper generalization of the traditional Price equation in regression form. Step 3 goes from the Generalized Price Equation in regression form to the general version of Hamilton’s rule, which is also a proper generalization of the classical Hamilton’s rule. Since I would suggest that Step 1 on its own is kind of useless, and therefore Step 1 and Step 2 will typically come as a package, I would be tempted to think that this justifies the abuse of terminology for the Price Equation in covariance form. I did however add the observation made by the reviewer at the point where the Generalized Price equation (in both forms) is derived, so I hope this at least partly addresses this concern.

      (2) Regression vs covariance forms of the Price equation: I think the author uses "generalized" in reference to what Price called the "regression form" of his equation. But to almost everyone in the field, the "Price Equation" refers to the covariance form. For this reason, it is very confusing when the manuscript refers to the regression form as simply "the Price Equation".

      As an example, in the box on p. 15, the manuscript states "The Price equation can be generalized, in the sense that one can write a variety of Price-like equations for a variety of possible true models, that may have generated the data." But it is not the Price equation (covariance form) that is being generalized here. It is only the regression that Price used that is being generalized.

      To be consistent with the field, I suggest the term "Price Equation" be used only to refer to the covariance form unless it is otherwise specified as in "regression form of the Price equation".

      I am not sure about the level of confusion induced here, but I totally see that it can be helpful to avoid all ambiguity. I therefore went over everything, and whenever I wrote “Price equation”, I tried to make sure it comes either with “in covariance form” or with “in regression form”. At some places, it is a bit over the top to keep repeating “in regression form”, when it is abundantly clear which form is being discussed. Also, I added no qualifiers if a statement is true for both forms of the Price equation, or if the claim refers to the whole package of going through Step 1 and Step 2 mentioned above.

      (3) Sample covariance: The author refers to the covariance in the Price equation as “sample covariance”. This is not correct, since sample covariance has a denominator of N-1 rather than N (Bessel’s correction). The correct term, when summing over an entire population, is “population covariance”. Price (1972) was clear about this: “In this paper we will be concerned with population functions and make no use of sample functions”. This point is elaborated on by Frank (2012), in the subsection “Interpretation of Covariance”.

      I totally agree. On page 418 of van Veelen (2005), I wrote:

      “Another possibility is that we think of 𝑧<sub>i</sub> and 𝑞<sub>i</sub>, 𝑖 = 1,…,𝑁 as realizations of a jointly distributed random variable. […] In that case the expression between square brackets is a good approximation for what statisticians […] call a sample covariance. A sample covariance is defined as but in large samples it is OK to replace 𝑁 − 1 by 𝑁, and then this formula reduces to Price’s 𝐶𝑜𝑣(𝑧, 𝑞).”

      In van Veelen et al. (2012), I slid a little, because in Box 1 on page 66, I wrote that is the sample covariance, and only in footnote 1 on the same page did I include Bessel’s correction, when I wrote:

      “To be perfectly precise, the sample covariance is defined as

      In this manuscript, I slid a little further, and left Bessel’s correction out altogether. I am happy that the reviewer pointed this out, so I can make this maximally precise again.

      The reviewer also quotes Price (1972), page 485:

      “In this paper we will be concerned with population functions and make no use of sample functions”.

      Below, the reviewer will return to the issue of distinguishing between the sample covariance with Bessel’s correction, and the sample covariance without Bessel’s correction, where the latter is regularly also referred to as the population covariance. A natural interpretation of the quote from Price (1972), if we read a bit around this quote in the paper, is that the difference between his “population functions” and his “sample functions” is indeed Bessel’s correction.

      The reviewer also states that Frank (2012) elaborates on this in the subsection “Interpretation of Covariance”. What is interesting, though, is that, when Frank (2012) writes, on page 1017 “It is important to distinguish between population measures and sample measures”, the difference between those is not that one does, and the other does not include Bessel’s correction. The difference between “population measures” and “sample measures” in Frank (2012), page 1017

      “It is important to distinguish between population measures and sample measures”,

      the difference between those is not that one does, and the other does not include Bessel’s correction. The difference between “population measures” and “sample measures” in Frank (2012), page 1017, is that

      “In many statistical applications, one only has data on a subset of the full population, that subset forming a sample.”

      The distinction between a population covariance and a sample covariance in Frank (2012) therefore is that they are “covariances” of different things (where the word covariances is in quotation marks, because, again, they are not really covariances). Besides just making sure that Price (1972) and Frank (2012) are not using these terms in the same way, this also perfectly illustrates the mix-up between statistical populations (or data generating processes) and biological populations that I discuss on pages 8 and 9 of Appendix A. I will return to this below, when I explain why I want to avoid using the word “population covariance” for the sample covariance without Bessel’s correction.

      Of course, the difference is negligible when the population is large. However, the author applies the covariance formula to populations as small as 𝑁 = 2, for which the correction factor is significant.

      Absolutely right.

      The author objects to using the term "population covariance" (SI, pp. 8-9) on the grounds that it might be misleading if the covariance, regression coefficients, etc. are used for inference because in this case, what is being inferred is not a population statistic but an underlying relationship. However, I am not convinced that statistical inference is or should be the primary use of the Price equation (see next point). At any rate, avoiding potential confusion is not a sufficient reason to use incorrect terminology.

      There are a few related, but separate issues. One is what to call the 𝐶𝑜𝑣(𝑤, 𝑝)-term. Another, somewhat broader, is to avoid mixing up statistical populations and biological populations. A third is what the primary use of the Price equation is. The third issue I will respond to below, where it reappears. Here I will focus on the first two, which can be discussed without addressing the third.

      In a data context, I now call the 𝐶𝑜𝑣(𝑤, 𝑝)-term “’" times the sample covariance, or, in other words, the sample covariance without Bessel’s correction”. This should be unambiguous. In a modeling context I refer to 𝐶𝑜𝑣(𝑤, 𝑝)-term as “the 𝐶𝑜𝑣(𝑤, 𝑝)-term” and describe it as a summary statistic or a notational convention. There are two reasons for this choice.

      The first is that neither of these use the word “population”. I like this, because there is a persistent scope for confusion between statistical populations and biological populations (as exemplified by Frank, 2012). This leads to an incorrect, but widespread intuition that if we “know the entire (biological) population” in a data context, there is nothing that can be estimated. This is what pages 8 and 9 of Appendix A are all about.

      The second reason is that by using two labels, I also differentiate between the data context and the modeling context. This is important for reasons I will return to later.

      Relatedly, I suggest avoiding using 𝐸 for the second term in the Price equation, since (as the ms points out), it is not the expectation of any random variable. It is a population mean. There is no reason not to use something like Avg or bar notation to indicate population mean. Price (1972) uses "ave" for average.

      I totally agree that the second term in the Price equation is not an expectation. I made this point in van Veelen (2005), and I repeated this in the manuscript. This remark by the reviewer prompted me to spell this out a bit more emphatically in Appendix A. That still leaves me with the choice what notation to use.

      I therefore looked up all contributions to the Theme issue “Fifty years of the Price equation” in the Philosophical Transactions of the Royal Society B, and found that almost all contributions use 𝐸, sometimes saying that this refers to an expectation or an average. Of course, this is wrong. However (and this is another argument), it is equally wrong as using 𝐶𝑜𝑣 or 𝑉𝑎𝑟. The terms abbreviated as 𝐶𝑜𝑣 and 𝑉𝑎𝑟 are equally much not a covariance and a variance as the term abbreviated as 𝐸 is not an expectation. So I would think that there are a few reasons for sticking with 𝐸 here; 1) consistency with the literature; 2) consistency with the treatment of other terms; and 3) the fact that this term is not really of any importance in this manuscript. I do however totally understand the reviewer’s reasons, which I suppose include that for using 𝐸, there are relatively unproblematic alternatives (ave or upper bar) that are not available for the other terms. I hope therefore that being a bit more emphatic in the manuscript about 𝐸 not being an expectation at least partly addresses this concern.

      I should add, however, that the distinction between population statistics vs sample statistics goes away for regression coefficients (e.g. b, c, and r in Hamilton's rule) since in this case, Bessel's correction cancels out.

      Totally correct.

      (4) Descriptive vs. inferential statistics: When discussing the statistical quantities in the Price Equation, the author appears to treat them all as inferential statistics. That is, he takes the position that the population data are all generated by some probabilistic model and that the goal of computing the statistical quantities in the Price Equation is to correctly infer this model.

      Before I respond to this, I would like to point out that this literature has started going off the rails right from the very beginning. One of the initial construction errors was to use the ungeneralized Price equation in regression form. The other one is that the paper in which Price (1970) presented his equation is inconsistent, and suggests that the equation can be used for constructing hypotheses and for testing them at the same time (see van Veelen (2005), page 416). That, of course, is not possible; the first happens in the theory/modeling domain, and the second in the empirical testing/statistics domain, and they are separate exercises.

      These construction errors have warped the literature based on it, and have resulted in a lot of mental gymnastics and esoteric statements, which are needed if we are not willing to consider the possibility that there could be anything amiss with the original paper by Price (1970).

      In this paper, I undo both of these construction errors. Undoing the second one means exploring both domains separately. In Sections 2-4 of Appendix A I explore the possibility that the Price equation is applied to data. In Section 5 of Appendix A I explore the possibility that it is used in a modelling context. The primary effort here is just to do it right, and I have not read anything to suggest that I did not succeed in doing this. Secondarily, of course, I also want to contrast this to what happens in the existing literature. That is what this point by the reviewer is about. It is therefore important to be aware that seeing the contrast accurately is complicated by the apologetic warp in the existing literature.

      As a first effort to unwarp, I would like to point to the fact that I am not taking any position on what the Price equation should be used for. All I do here is explore (and find) possibilities, both in the statistical inference domain and in the modeling domain. I also find that there is scope for misspecification in both, and that, in both domains, we should want to avoid misspecification. The thing that I criticize in the existing literature therefore is not the choice of domain. The thing that I criticize is the insistence on, and celebrating of what is most accurately described as misspecification. This typically happens in the modeling domain.

      It is worth pointing out that those who argue in favor of the Price Equation do not see it this way: "it is a mistake to assume that it must be the evolutionary theorist, writing out covariances, who is performing the equivalent of a statistical analysis." (Gardner, West, and Wild, 2011); "Neither data nor inferences are considered here" (Rousset, 2015). From what I can tell, to the supporters of the Price equation and the regression form of Hamilton's rule, the statistical quantities involved are either population-level *descriptive* statistics (in an empirical context), or else are statistics of random variables (in a stochastic modeling context).

      Again, this description of the friction between my paper and the existing literature is predicated on the suggestion that I have only one domain in mind where the Price equation can be applied. That is not the case; I consider both.

      In the previous paragraph, the reviewer states that I “treat statistical quantities as inferential statistics”, and in this paragraph the reviewer contrasts that with the supporters of the (ungeneralized) Price equation that supposedly treat the same quantities as “descriptive statistics”. This is also beside the point, but it will take some effort to sort out the spaghetti of entangled arguments (where the spaghetti is the result of the history in this field, as indicated earlier).

      First of all, it is not unimportant to point out that the way most people use the terms “inferential statistics” and “descriptive statistics” is that the first refers to an activity, and the second to a function of a bunch of numbers, typically data. Inferential statistics is a combination of parameter estimation and model specification (those are activities). Descriptive statistics are for instance the average values of variables of interest (which makes them a function of a set of numbers). When doing inferential statistics (or statistical inference), looking at the descriptive statistics of the dataset is just a routine before the real work begins. It is important to remember that.

      Now I suppose that this reviewer uses these words a little differently. When he or she writes that I “treat statistical quantities as inferential statistics”, I assume that the reviewer means that I want to use a term like for doing statistical inference, or that, when I want to interpret such a term, I include considerations typical of statistical inference. Within the data domain, that is totally correct. In the paper I argue that there are very good reasons for this. We would like to know what the data can tell us about the actual fitness function, and if we do our statistical inference right, and choose our Price-like equation accordingly, then that means that we would be able to give a meaningful interpretation to a term like . It also means that we then have an equation that describes the genetic population dynamics accurately.

      When the reviewer states that other papers treat them as “population level descriptive statistics” in an empirical context, I have a hard time coming up with papers for which that is the case. Most papers apply the Price equation in the modeling domain (That is to say: this is true in evolution. In ecology the Price equation is often applied to data; see Pillai and Gouhier (2019) and Bourrat et al. (2023)). But even if there are researchers that apply the Price equation to data, then considering these statistical quantities as “descriptive statistics” would not make sense. Looking at the descriptive statistics alone is not an empirical exercise; it is just a routine that happens before the actual statistical inference starts. In a data context, saying that considerations that are standard in statistical inference do not apply, because one is just not doing statistical inference, is the equivalent of an admission of guilt. If you do not consider statistical significance, and never mention that sample size could matter, because you are using these terms as “descriptive statistics, not inferential statistics”, then you’re basically admitting to not doing a serious empirical study.

      Besides treating statistical quantities as descriptive statistics in a data context, the reviewer also states that, in a stochastic modeling context, other researchers treat the same statistical quantities as “statistics of random variables”. This is first of all very generous to the existing literature. I imagine that the reviewer is imagining a modeling exercise where for instance the covariance between two variables is postulated. A theory exercise would then take that as a starting point for the derivation of some theoretical result. This, however, is not what happens in most of the literature.

      There are two things that I would like to point out. First of all, postulating covariances and deriving results from assumptions regarding those covariances is not an activity that requires using the Price equation. There are many stochastic models that function perfectly fine without the Price equation. This is maybe a detail, but it is important to realize that what the reviewer probably thinks of as a legitimate theoretical exercise may be something that can very well be done without the Price equation.

      Secondly, I would like to repeat something that I have pointed out before, which is that the Price equation can be written for any transition, whether this transition is likely or unlikely, given a model, and even for transitions that are impossible. For all of those transitions, one can write the (ungeneralized) Price equation, and for all of those, the Price equation will be an identity, and it will contain the things that the reviewer refers to as “statistical quantities”. It is important to realize that these “statistical quantities”, therefore, are properties of a transition, and that every transition comes with its own ”statistical quantity”. That implies that they are not properties of random variables; they reflect something regarding one transition. What one could imagine, though, is the following. To fix ideas, let’s take the Price equation in regression form, and focus on . A meaningful modeling exercise starts with assumptions about the likelihood of all different transitions, and therefore the likelihood of different values of 𝛽 materializing – or it starts with assumptions that imply those probabilities. In a theoretical exercise, one could then derive statements about the expectation and variance of those “statistical quantities”. For instance, one can calculate the expected value 𝐸[𝛽] =𝐸, and the variance 𝑉𝑎𝑟[𝛽] = 𝑉𝑎𝑟 , where this expectation is a proper expectation (taken over the probabilities with which these transitions materialize) and this variance is a proper variance, for the same reason.

      This is what I do on page 416 of van Veelen (2005) and in Section 5 of Appendix A. I think something like this is what the reviewer may have in mind, but it is worth pointing out that this still does not mean that the from the Price equation for any given transition is now a property of a random variable. Much of the literature, however, is not at the level of sophistication that I imagine the reviewer has in mind – although there are papers that are; see the discussion below of Rousset and Billiard (2000) and Van Cleve (2015).

      In the appendix to this reply, I will address the quotes from Gardner, West, and Wild (2011) and Rousset (2015). This takes up some space, so that is why it is at the end of this reply.

      In short, the manuscript seems to argue that Price equation users are performing statistical inference incorrectly, whereas the users insist that they are not doing statistical inference at all.

      That is not what the manuscript argues, but I am happy to clarify. The manuscript explores both the use of the Price equation when applied to data (and therefore for statistical inference) and when applied to transitions in a model. The criticism on the existing literature is not that it performs statistical inference incorrectly. The criticism is that the literature insists on misspecification, which typically happens in a modelling context.

      The problem (and here I think the author would agree with me) arises when users of the Price equation go on to make predictive or causal claims that would require the kind of statistical analysis they claim not to be doing. Claims of the form "Hamilton's rule predicts.." or use of terms like "benefit" and "cost" suggest that one has inferred a predictive or causal relationship in the given data, while somehow bypassing the entire theory of statistical inference.

      I do not really know how to interpret this paragraph. The use of the word “data” suggests that this pertains to a data context, but I do not know what would qualify as a “predictive claim” in that domain, or how any study would go from data to a claim of the form “Hamilton’s rule predicts …”. Again, I do not really know papers that apply the Price equation to data. None of the empirical papers reviewed in Bourke (2014) for instance do. I would however agree that it is close to obvious that an approach that does indeed bypass the entire theory of statistical inference cannot identify causal relations in datasets. I think the examples in Section 2 of Appendix A also clearly illustrate that a literature in which the word “sample size” is absent, cannot be doing statistical inference.

      There is also a third way to use the Price equation which is entirely unobjectionable: as a way to express the relationship between individual-level fitness and population-level gene frequency change in a form that is convenient for further algebraic manipulation. I suspect that this is actually the most common use of the Price equation in practice.

      I am not sure if I understand what it means for the Price equation to “express the relationship between individual-level fitness and population-level gene frequency change”. That is a bit reminiscent of how John Maynard Smith saw the Price equation (Okasha, 2005), but he also emphasized that he was unable to follow George Price and his equation. For sure, it cannot be that one side of the Price equation reflects something at the individual level and the other something at the population level, because both sides of the Price equation are equally aggregated over the population. Just to be safe, and to avoid unwarranted associative thinking, I would therefore choose to be minimalistic, and say that the Price equation is an identity for a transition between a parent population and an offspring population.

      Regardless of the words we choose, however, the question how harmless or objectionable the use of the Price equation is in the literature is absolutely relevant. In earlier papers I have tried to cover a spectrum of examples of different ways to use (or misuse) the Price equation. In van Veelen (2005) I cover Grafen (1985a), Taylor (1989), Price (1972), and Sober and Wilson (2007). The main paper that is discussed in van Veelen et al. (2012) is Queller (1992b), but Section 7 of that paper also discusses the way the Price equation is used in Rousset and Billiard (2000), Taylor (1989), Queller (1985), and Page and Nowak (2002). These discussions also come with a description of how much it takes to repair them, and this varies all the way from nothing, or a bit of minor rewording, to being beyond repair.

      What is good to observe, is that the papers in which the use of the Price equation is the least problematic, are also the papers in which, if the reference to the Price equation would be taken out, nothing really changes. These are papers that start with a model, or a collection of models, and that, at some point in the derivation of their results, point to a step that can, but does not have to be described as using the Price equation. An example of this is Rousset and Billiard (2000); see the detailed description in Section 7 of van Veelen et al. (2012).

      I am happy to point to a few more papers on the no harm, no foul end of the spectrum here.

      Allen and Tarnita (2012) discuss properties of the dynamics in a well-defined set of models.

      Towards the end of the paper, a version of the Price equation more or less naturally appears. This is more of an interesting aside, though, and does not really play a role in derivation of the core results of the paper. Van Cleve (2015) is similar to Rousset and Billiard (2000), in that the “application of the Price equation” there is a minor ingredient of the derivation of the results. (A detail that this reviewer may find worth mentioning, given earlier comments, is that Van Cleve (2015) writes the left-hand side of the Price equation as 𝐸(𝑤Δ𝑝|𝐩), instead of . First two very unimportant things. Van Cleve (2015) uses 𝑤 for mean fitness, for which is a more common symbol. Another detail of lesser importance is that it includes the vector of parent p-scores in the notation, which in their notation is 𝐩. More importantly, however, is that Van Cleve (2015) writes 𝐸(Δ𝑝) for , which extends the (mis)use of the symbol 𝐸 for what really is just an average. This is consistent within the Price equation, in the sense that it now denotes the average with 𝐸, both on the right-hand side and on the left-hand side of the Price equation. It can however be a little bit confusing, because when Rousset and Billiard (2000) write , then this is a proper expectation. In their case, this summarizes all possible transitions out of a given state, and weighs them by their probabilities of happening, given a state summarized by 𝑝.). I am also happy to extend the spectrum a bit here. Some papers on inclusive fitness do not use the Price equation at all, even though one could imagine places where it could be inserted. A nice example of such a paper is Taylor et al. (2007).

      In this paper, I hope I can be excused from taking a complete inventory of this literature, and I hope that I do not have to count how many papers fall into the different categories. This would help assess the veracity of the suspicion the reviewer has, which is that the most common use of the Price equation is entirely unobjectionable, but I just do not have the time. I would however not want to underestimate the aggregate damage done in this field. The spectrum spanned in my earlier papers does include a fair amount of nonsense results. This typically happens in papers that do not study a specific model or set of models, but that take the Price equation as their point of departure for their theorizing. Also there seems to be a positive correlation between how exalted and venerating the language is that is used when describing the wonders and depths of the Price equation, and how little sense the claims make that are “derived” with it.

      We also should not set the bar too low. This is a literature that, at the starting point, has a few construction errors in it, as described in the paper. That is reason for concern. Moreover, one of the main end products of this literature is what we send our empiricists to the field with. As Section 8 of van Veelen et al. (2017) indicates, what we have supplied to our empiricists to work with is nothing short of terrible. I would therefore want to maintain that the damage done is enormous, and if there are also a few papers around that may use the ungeneralized Price equation in an innocuous way, then that is not enough redemption for my taste. We are still facing a literature in which, at every instance where the Price equation is used, we still need to check in which category it falls.

      For a paper that aims to clarify these thorny concepts in the literature, I think it is worth pointing out these different interpretations of statistical quantities in the Price equation (descriptive statistics vs inferential statistics vs algebraic manipulation). One can then critique the conclusions that are inappropriately drawn from the Price equation, which would require rigorous statistical inference to draw. Without these clarifications, supporters of the Price equation will again argue that this manuscript has misunderstood the purpose of the equation and that they never claimed to do inference in the first place.

      I would like to return to the point that I made at the beginning of my response to point (4), which is that the “thorniness” of these concepts is the result of the warp in the literature, resulting from the construction errors in Price (1970). If people want to understand how to apply the Price equation right, I think that reading Appendix A and B would work just fine. Again, I have not read anything that suggests that there is anything incorrect in there, so if the literature contains “thorny” concepts, it might just be that this is the result of the mental gymnastics necessitated by the unwillingness to accept that there might be something not completely right with Price (1970). Moreover, given my experiences in the field, I am not sure that there is anything that I could say that would convince the supporters of the ungeneralized Price equation.

      (5) "True" models: Even if one accepts that the statistical quantities in the Price equation are inferential in nature, the author appears to go a step further by asserting that, even in empirical populations, there is a specific "true" model which it is our goal to infer. This assumption manifests at many points in the SI when the author refers to the "true model" or "true, underlying population structure" in the context of an empirical population.

      Again, in Appendix A I explore both a data context and a modeling context. In the modeling context none of this applies, because in such a context, there is only the model that we postulate. In the part in which I explore what the Price equation can do in a data context, I do indeed use words like “true model” or "true underlying population structure".  

      I do not think it is necessary or appropriate, in empirical contexts, to posit the existence of a Platonic "true" model that is generating the data. Real populations are not governed by mathematical models. Moreover, the goal of statistical inference is not to determine the "true model" for given data but to say whether a given statistical model is justified based on this data. Fitting a linear model, for example, does not rule out the possibility there may be higher-order interactions - it just means we do not have a statistical basis to infer these higher-order interactions from the data (say, because their p-scores are insignificant), and so we leave them out.

      This remark suggests that the statistical approach in Sections 2-4 of Appendix A is more naïve than it should be, and that I would overlook the possibility of, for instance, interaction effects that are really nonzero, but that are statistically not significant. Now first of all, at a superficial level, I would like to say that this strikes me as somewhat inconsistent. In the remarks further back, the reviewer seems to excuse those that use the Price equation on data without any statistical considerations whatsoever. The reason why the reviewer is giving them a pass, is that they are “just not doing statistical inference”. Instead, they are doing this whole other thing with, you know, descriptive statistics. As I indicated above, that is just a fancy way of saying that they are not doing serious statistics – or serious empirics, for that matter.

      In this comment, on the other hand, the reviewer also suggests that the statistics that I use to replace the total absence of any statistical considerations with, is not quite up to snuff. Below, I will indicate why that is not the case at all, but I think it is also worth registering a touch of irony there.

      In order to address this issue, it is worth first observing that the whole of classical statistics is based on probability theory in the following sense. We are always asking ourselves the question: if the data generating process works like this, what would the likelihood be of certain outcomes (datasets); and if the data generating process works some other way (sometimes: the complement of whatever “this” is), what would the likelihood then be of the same outcomes. By comparing those, we draw inferences about the underlying data generating process (which is a word suggestive of a “Platonic” world view that the reviewer seems to reject). Therefore, if one would impose a ban on using Platonic words like “true data generating process”; “actual fitness function”; or “the population structure that is out there”, it would be impossible to teach any course in statistics, basic or advanced. Also it would be impossible to practice, and talk about, applied statistics.

      Now the reviewer claims that “Real populations are not governed by mathematical models”. I do not really know if I agree or disagree with that statement, but the example that the reviewer gives does not fit that claim. The reviewer suggests that if we find a higher order term not to be statistically significant (and therefore we reject the hypothesis that it is nonzero), then that would not necessarily mean that it is not there. That is totally true, and statisticians tend to be fully aware of that. But that does not imply that there is no true data-generating process; the whole premise of this example is that there is, but that the sample size is not large enough to determine it in a detailed enough way so as to include this interaction effect, that apparently is small relative to the sample size.

      The third thing to reflect on here, is that the reviewer seems to suggest that the Generalized Price equation in regression form, as presented in my paper, comes with a specific statistical approach, that he or she classifies as philosophically naïve or unsophisticated. That, however, is not the case, and I am very grateful that this remark by this reviewer allows me to make a point that I think shines a light on how the Generalized Price equation puts the train that started going off the rails in 1970 back on track, and reconnects it with the statistics it borrows its terminology from. To see that, it is good to be aware that statistics never gives certainty. The whole discipline is built around the awareness that it is possible to draw the wrong inference, and the aim is to determine, minimize, and balance, the likelihoods of making different wrong inferences. So, statistics produces statements about the confidence with which one can say that something works one way or the other. In some instances, the data are not enough to say anything with any confidence. In other cases, the data are rich enough so that it is really unlikely that we incorrectly infer that for instance a certain gene matters for fitness.

      The nice thing about the setup with the Generalized Price equation, is that those statistical considerations translate one-to-one to considerations regarding which Price-like equation to choose. If the data do not allow us to pick any model with confidence, then we should be equally agnostic about which Price-like equation describes the population genetic dynamics accurately. If the statistics gives us high confidence that a certain model matches the data, then we should pick the matching Price-like equation with the same confidence. This also carries over to higher level statistical considerations.

      If we think about terms that, if we would gather a gargantuan amount of data, might be statistically significant, but very small, then economists call those statistically significant, but economically insignificant. When rejecting the statistical significance on the basis of a not gargantuan dataset, statisticians are aware that terms that really have a zero effect, as well as terms, the effect of which is really small, are rejected with the same statistical test – and that we should be fine with that. All such considerations carry over to what we think of regarding the choice of a Price-like equation to describe the population genetic dynamics. Even if people disagree about whether or not to include a term that is statistically significant, but relatively small, such a disagreement can still happen within this setup, and just translates to a disagreement on which Price-like equation to choose.

      Similarly, people could also disagree about whether it is justified to use polynomials to characterize a fitness function. If we decide that we can, because of Taylor expansions, then the core result of the paper implies that the population genetic dynamics can be summarized by a generalized Hamilton’s rule (as long as the fitness function includes a constant and a linear term regarding the p-score). On the other hand, if we do not believe this is justified, and prefer to use an altogether different family of fitness functions, then we can no longer do this. All of this leaves space for all kinds of statistical considerations and disagreements, that just carry over to the choice for one or the other Price-like equation as an accurate description of the population genetic dynamics. Or, if one does not believe polynomials should be used, then this leads to not picking any Price-like equation at all.

      So, this is a long way of saying that the Generalized Price equation creates space for all statistical considerations to regain their place, and does not hinge on one approach to statistics or another.

      What we can say is that if we apply the statistical model to data generated by a probabilistic model, and if these models match, then as the number of observations grows to infinity, the estimators in the statistical model converge to the parameters of the data-generating one.

      But this is a mathematical statement, not a statement about real-world populations.

      Again, I do not know if I agree or disagree with the last sentence. However, that does not really matter, because either option only has implications for how we are to think of the relation between a Price-like equation describing a population genetic dynamics and real-world populations. It is not relevant for the question which Price-like equation to pick, or whether to pick one at all.

      A resolution I suggest to points 3, 4, and 5 above is:

      *A priori, the statistical quantities in the Price Equation are descriptive statistics, pertaining only to the specific population data given.

      *If one wishes to impute any predictive power, generalizability, or causal meaning to these statistics, all the standard considerations of inferential statistics apply. In particular, one must choose a statistical model that is justified based on the given data. In this case, one is not guaranteed to obtain the standard (linear) Hamilton's rule and may obtain any of an infinite family of rules.

      *If one uses a model that is not justified based on the given data, the results will still be correct for the given population data but will lack any meaning or generalizability beyond that.

      *In particular, if one considers data generated by a probabilistic model, and applies a statistical model that does not match the data-generating one, the results will be misleading, and will not generalize beyond the randomly generated realization one uses.

      Of course, the author may propose a different resolution to points 3-5, but they should be resolved somehow. Otherwise, the terminology in the manuscript will be incorrect and the ms will not resolve confusion in the field.

      I have outlined my solutions extensively above. I really appreciate that Reviewers #1 and #2 have spent time and attention on the manuscript and on the long appendices.  

      Appendix to the response to reviewer #2: Some remarks on Gardner, West & Wild (2011), Frank (2012), and Rousset (2015)

      An accurate response to the quote from Gardner, West, and Wild (2011) in the review report takes up space. I therefore wanted to put that in an appendix to the response to reviewer #2. I also include a few paragraphs regarding Frank (2012) and Rousset (2015), both of which are also mentioned by reviewer #2. All of this might also be of interest to people that are curious about how what I find in my paper relates to the existing literature.

      Gardner, West & Wild (2011) The quote I am responding to is “it is a mistake to assume that it must be the evolutionary theorist, writing out covariances, who is performing the equivalent of a statistical analysis” I want to put that into context, so I will go over the whole paragraph that surrounds the quote. The paragraph is called Statistics and Evolutionary Theory and can be found on page 1038 of the paper. I think that it is worth pointing out that it is not easy to respond to their somewhat impressionistic collages of words and formulas. I will therefore cut the paragraph up in a few smaller bits and try to make sense of it bit by bit. The paragraph begins with:

      “Our account of the general theory of kin selection has been framed in statistical terms.” Based on what they write two sentences down, the best match between those words and what they do in the paper would be: “our account uses words like “covariance”, “variance” and “expectation” for things that are not what “covariance”, “variance” and “expectation” mean in probability theory and statistics.” I would be totally open to an argument why that is nonetheless OK to do, but the way Gardner, West, and Wild (2011) phrase it obscures the fact that this needs any justification or reflection at all. “Framing something in statistical terms” is unspecific enough to sound completely harmless.

      “The use of statistical methods in the mathematical development of Darwinian theory has itself been subjected to recent criticism (van Veelen, 2005; Nowak et al., 2010b), so we address this criticism here.

      Also here, specifics would be helpful. The “use of statistical methods” sounds like it is more than just using terms from statistics, so this might refer to the minimizing of the sum of squared differences, which is also mentioned a sentence down in Gardner, West, and Wild (2011). If it does, then it is worth observing that in statistics, the minimizing of the sum of squared differences (or residuals, or errors) comes with theorems that point very clearly to what is being achieved by doing this. The Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest variance within the class of linear unbiased estimators. This implies that minimizing the sum of squared errors helps answering a well-defined question in statistics; under certain conditions, an OLS estimator is our best shot at uncovering an unknown relation between variables. To also minimize a sum of squared differences, but now in the modeling domain, qualifies as “use of statistical methods” only in a very shallow way. It means that a similar minimization is performed. Without an equivalent of the Gauss-Markov theorem that would shine a light on what it is that is being achieved by doing so, that does not carry the same weight as it does in the statistics domain – in that it does not carry any weight at all.

      “The concern is that statistical terms – such as covariances and least-squares regressions – should properly be reserved for conventional statistical analyses, where hypotheses are tested against explicit data, and that they are out of place in the foundations of evolutionary theory (van Veelen, 2005; Nowak et al., 2010b).”

      Again, a few things are a bit vague. What are “explicit data”? Are there data that are not explicit? Why the generic “foundations of evolutionary theory”, instead of a more specific description of what these statistical terms are used for? But either way, this is a misrepresentation of what I wrote in van Veelen (2005). I did not suggest to “reserve statistical terms for conventional statistical analysis” just because. As I do here in the current paper, what I did there was explore the possibilities for the Price equation to help with what I then called Type I and Type II questions. Type I questions find themselves in the modeling domain and Type II questions find themselves in the statistical domain. I was not arguing for a ban on applying statistical concepts outside of the domain of statistical inference. All that I said is that in its current practice, it does not really help answering questions of either type.  

      “However, this concern is misplaced. First, natural selection is a statistical process, and it is therefore natural that this should be defined in terms of aggregate statistics, even if only strictly by analogy (Frank, 1997a, 1998).”

      This is a vague non-argument. Almost nothing is well-defined here. What does it mean for natural selection to be a statistical process? Is that just an unusual term for a random process? If so, then I suppose I agree, but that has nothing to do with what I state or claim. And what does it mean to be defined in terms of aggregate statistics? What is the alternative? I have no idea how any of this relates to anything that I claim or state in my papers.

      “Second, Fisher (1930, p198) coined the term ‘covariance’ in the context of his exposition of the genetical theory of natural selection, so the evolutionary usage of this term has precedent over the way the term is used in other fields.”

      This is what I would call a “historic fallacy”. The fact that Fisher coined the term “covariance” in a book on genetics and natural selection does not mean that any “evolutionary usage” of the term “covariance”, however nonsensical, now has precedent over the way the term is used in other fields. Irrespective of the path that the history of science, genetics, or statistics took, right now we are in a place where about every student at every university anywhere in the world that takes a course in probability theory and/or statistics, learns that covariance is a property of a random variable (see also Wikipedia). And they do for a very good reason; it is essential in recognizing the relation between probability theory on the one hand and statistics on the other. Being curious how this “evolutionary usage” of the term covariance works, if covariance turns out not to be a property of a random variable, is therefore perfectly justified, and “Fisher coined the term” is not a safe word that exempts it from scrutiny. 

      Third, it is a mistake to assume that it must be the evolutionary theorist, writing out covariances, who is performing the equivalent of a statistical analysis.

      Again, that is just not what anyone is saying. Nobody is suggesting that an evolutionary theorist should perform the equivalent of statistical analysis. All I did was point to how little is being achieved by transferring formulas from statistics to a modeling context.

      A better analogy is to regard Mother Nature in the role of statistician, analysing fitness effects of genes by the method of least-squares, and driving genetic change according to the results of her analyses (cf. Crow, 2008).

      I have no idea what any of this means. Mother Nature is a personification of something that is not a person, and that does not have cognition. Without sentience, “Mother Nature” cannot assume the role of statistician, and cannot analyse fitness effects.

      More generally, analogy is the basis of all understanding, so when isomorphisms arise unexpectedly between different branches of mathematics (in this case, theoretical population genetics and statistical least-squares analysis) this represents an opportunity for advancing scientific progress and not an anomaly that is to be avoided.

      This is a strawman argument, puffed up with platitudes. Nobody is arguing against analogies. But what is the analogy supposed to be here? Just taking least squares from statistical inference and performing it in a modeling context does not make it an analogy. The GaussMarkov theorem, which is the basis for why least squares helps answering questions in statistics, just does not mean anything in a modeling context. OLS in modeling is just willful misspecification, and nothing that it does in statistics translates to anything meaningful in modeling. Again, declaring it an analogy, or an isomorphism, does not make it one.

      Frank (2012) Because the reviewer also mentions Frank (2012), I would like to include a small remark on this paper too. “Natural Selection. IV. The Price equation” by Frank (2012) is partly a response to my earlier criticism of the use of the Price equation. Much like Gardner, West, and Wild (2011), I would describe this paper as what is called a ”flight forwards” in Dutch. While the questions I ask are relatively prosaic (such as: how does the Price equation help derive a prediction from model assumptions?), Frank (2012) pivots to suggesting that there is a profound philosophy-of-science disagreement that I am on the wrong side of. It is close to impossible to respond to Frank (2012), because it is a labyrinth of arguments that sound deep and impressive, but that are just not specific enough to know how they relate to points that I made – or even just what they mean in general. Just to pick a random paragraph:

      “Is there some reorientation for the expression of natural selection that may provide subtle perspective, from which we can understand our subject more deeply and analyse our problems with greater ease and greater insight? My answer is, as I have mentioned, that the Price equation provides that sort of reorientation. To argue the point, I will have to keep at the distinction between the concrete and the abstract, and the relative roles of those two endpoints in mature theoretical understanding.”

      For many of those terms, I have no real idea what they mean, and also reading the rest of the paper does not help understanding what this has to do with the more prosaic questions that are waiting for an answer. What is “reorientation”? What does “concrete” versus “abstract” have to do with the question what is being achieved by doing least squares regressions in modeling? What would be an example of a mature and an immature theoretical understanding?

      Rousset (2015) is also mentioned by the reviewer. This paper is not esoteric. It states, as reviewer #2 points out, that "neither data nor inferences are considered". This paper therefore finds itself in the modeling domain, and not in the data domain. It does however still dodge the question what the benefits are of misspecification in the modeling domain. As a matter of fact, it denies that there is misspecification at all.

      “In the presence of synergies, the residuals have zero mean and are uncorrelated to the predictors. No further assumption is made about the distribution of the residuals. Thus, there is no sense in which the regression is misspecified.”

      This is a remarkable quote, and testament to the lasting impact of the construction errors in Price (1970). Misspecification is literally defined as getting the model wrong. In statistics, avoiding misspecification can be complicated, because of the noise in the data. The real datagenerating process is unknown, and because of the noise, there is always the possibility that data that are generated by one model look like they could also have been generated by another. The challenge is to reduce the odds of getting the model wrong to acceptable proportions, which is what statistical tests are for. But in modeling, we know what the model is; it is postulated by the modeler. Therefore, misspecification can be avoided by just not replacing it with a different model.

      What is being discussed in this part of Rousset (2015) is replacing what in this manuscript is called Model 3 (𝑤<sub>𝑖</sub> = 𝛼 + 𝛽<sub>1,0</sub>𝑝<sub>𝑖</sub> + 𝛽<sub>1,1</sub>𝑝<sub>𝑖</sub> + 𝛽<sub>1,1</sub>𝑝<sub>𝑖</sub>𝑞<sub>𝑖</sub> + 𝜀<sub>𝑖</sub>) with Model 2 (𝑤<sub>𝑖</sub> = 𝛼 + 𝛽<sub>1,0</sub>𝑝<sub>𝑖</sub>+ 𝛽<sub>1,0</sub>𝑝<sub>𝑖</sub>𝑞<sub>𝑖</sub> + 𝜀<sub>𝑖</sub>), and choosing the parameters in Model 2 so that it is as close as it can be to Model

      (3) This is just the definition of misspecification. That is to say: the misspecification part is the choosing of Model 2 as a reference model. The minimizing of the sum of squared residuals one could consider as minimizing the damage.

      While Rousset (2015) finds itself in the modeling domain, it does nonetheless point to the field of statistics here, by stating that “the residuals have zero mean and are uncorrelated to the predictors”. From this, the paper concludes that “there is no sense in which the regression is misspecified”. That is just plain wrong. Minimizing the sum of the squared residuals guarantees that the residuals are uncorrelated with the variables that are included in the reference model, with respect to which the squared sum of residuals is minimized. The criterion that Rousset (2015) uses is that the model is well-specified if there is no correlation between the residuals (here: ) and the variables included in the reference model (here: 𝑝<sub>𝑖</sub> and 𝑞<sub>𝑖</sub>). But according to this criterion, all models would always be well-specified, and no model could ever be misspecified. The correct criterion, however, also requires that the residuals are not correlated with variables not included in the reference model. And here, the residuals are in fact correlated with 𝑝<sub>𝑖</sub>𝑞<sub>𝑖</sub>, which is the variable that is included in Model 3, but not in Model 2. Therefore, according to the correct version of this criterion, this model is in fact misspecified – as it should be, because getting the model wrong is the definition of misspecification.

      In order to make sure that there can be no misunderstanding, I have added subsections at the end of Section 2 and Section 4 of Appendix A, and at the end of Section 2 of Appendix B. These subsections show that the algebra of minimizing the sum of squared errors implies that there is no correlation between the errors, or the residuals, and the variables that are included in the model. This is by no means something new; it is the reason why we do OLS to begin with. For additional details about misspecification, I would refer to Section 1b (viii) in van Veelen (2020).

      Finally, there is a detail worth noticing. In the main text, as well as in Appendix B, I use an analogy (and, unlike what Gardner, West, and Wild, 2011, refer to as an analogy, this actually is one). This is an analogy between two choices. On the one hand, there is the choice between Price-like equation 1 (based on Model 1 as a reference model) and Price-like equation 2 (based on Model 2 as a reference model) both applied to Model 2. On the other hand, there is the choice between Price-like equation 2 (based on Model 2 as a reference model) and Price-like equation 3 (based on Model 3 as a reference model) both applied to Model 3. Model 1 is the non-social model, Model 2 is the social model without interaction term, and Model 3 is the social model with interaction term. That makes the first choice a choice between treating a social model as a social model, or as a non-social model. The second choice is between treating a social model with interaction term as a social model with interaction term, or as a social model without interaction term. The power of this analogy is that every argument against treating the social model as if it is a non-social model is also an argument against treating the social model with interaction term as if it is a social model without interaction term.

      This ties in with the incorrect criterion for when a model is well-specified from Rousset (2015) as follows. His criterion (that there should be no correlation between the residuals and the variables in the model) declares the social model without interaction term well-specified as a reference model, when we are considering a social model with interaction term. According to the same criterion, however, the non-social model would also have to be declared to be wellspecified as a reference model, when the model we are considering is a social model. The reason is that also here, there is no correlation between the residuals and the variables that are included in this model. This is clearly not what anyone is advocating for, and for good reasons. The residuals here would, after all, be correlated with the p-score of the partner, which is a variable that is not included in the non-social model. This is a good indication that we should not use the non-social model for a social trait.

      Reviewer #3 (Public review):

      Before responding to this review, I would like to express that I appreciate the fact that the reviews and the responses are public at eLife. Besides just being useful in general, this also allows readers to get a behind the scenes glimpse into the state of the field, and the level of the reviewing. While the reports by Reviewers #1 and #2 show openness and an interest in getting things right, the report by Reviewer #3 is representative of the many review reports that I have received from the inclusive fitness community in the past. These reports tend to be rhetorically strong, and to those who do not have the time to dig deeper in the details, these reports are probably also convincing. I will therefore go through this review line by line to show how little there is behind the confident off-hand dismissal.

      There is an interesting mathematical connection - an "isomorphism"-between Price's equation and least-squares linear regression.

      This is esoteric and needlessly vague. Why is the word “isomorphism” used? In mathematics, an isomorphism is a structure-preserving mapping. The Price equation is an equation, or an identity, which makes it a bit difficult to imagine what the set of objects is on one end of the mapping. Least-squares linear regression can perhaps be seen as a function of a dataset, which would make it a single object (one function). This complicates things at the other end of the mapping too, if that set is a singleton set. The only isomorphism that I can think of is a trivial isomorphism where one equation is mapped onto one function and vice versa. It seems unlikely that this is what the reviewer means. The word isomorphism moreover is in quotes, so maybe this is supposed to be figurative. But what would it be that is being suggested here by this figure of speech? Just saying that there is, as the reviewer puts it, an “interesting mathematical connection”, does not make it so. It would already be a start to just specify what the mathematical connection is, because I have a hard time seeing what that would be. Is it just that, if you divide the Cov(𝑤, 𝑝)-term by the Var(𝑝)-term, then you get a regression coefficient? If that is what the reviewer has in mind, that would be a rather shallow observation.

      Some people have misinterpreted this connection as meaning that there is a generalitylimiting assumption of linearity within Price's equation, and hence that Hamilton's rule-which is derived from Price's equation-provides only an approximation of the action of natural selection.

      Here, the reviewer pulls a switcheroo. The use of the word “general”, or “generality”, here refers to the fact that the classical Price equation is an identity for all possible transitions between a parent and an offspring population. This is the sense in which the inclusive fitness literature uses the word general, and so do I in the relevant places in the manuscript. When I do, I make sure to add phrases like “in the sense that whatever the true model is, it always gets the direction of selection right”. As a consequence, the classical Hamilton’s rule is also totally general, in the same sense.

      One of the core points of the paper is that this is not unique to the classical Price equation. As a matter of fact, there is a large set of Price-like equations and Hamilton-like rules that are equally much identities, and equally much general (in the sense that they get the direction of selection right for all possible transitions). The being an identity and being completely general (in this sense) therefore cannot be a decisive criterion in favour of the classical Price equation and the classical Hamilton’s rule.

      On the other hand, the way in which my Generalized Price equation and my generalized version of Hamilton’s rule are general, is that they do not restrict the statistical model with respect to which errors are squared, summed and minimized to one linear statistical model. This generalization generates the variety of Price-like equations and Hamilton-like rules mentioned above (all of which are general in the sense of always getting the direction of selection right) and it gives us the flexibility to pick one that separates terms that reflect the fitness function from terms that reflect the population state.

      In response to my generalizing the Price equation and Hamilton’s rule in this second sense, the criticism of the reviewer comes down to saying that the Price equation and Hamilton’s rule do not need generalizing, because they already are general – the switcheroo being that this refers to generality in the first sense. That makes it sound like this could be an honest mistake, confusing one way in which these can be described as general with another. However, I really hammered this point home in the manuscript. Even a cursory reading of the manuscript reveals that I am fully aware that the classical Price equation and the classical Hamilton’s rule are general in the first sense.

      It is also not helpful that, as a description of what I supposedly claim, this is impressionistic, and lacks specificity. The Price equation is an equation, or an identity. What does it mean for there to be an “assumption of linearity” within it? For the classical Price equation in covariance form (which Reviewer #2 argues is what most people think of as “the Price equation”) there is no way in which one can transform this into a meaningful statement. There is just nothing in there to which the adjective “linear” can be applied. Linearity only becomes a thing when we ask ourselves how we can interpret the regression coefficient in the classical Price equation in regression form. That would be the linearity of the statistical model the differences with which are squared, summed and minimized in the regression.

      This is in contrast to the majority view that Hamilton's rule is a fully general and exact result.

      Again, in this manuscript, I write, time and again, that the classical Hamilton’s rule is fully general (in the sense that it is applies to any transition), and exact (if that means that it always gets the direction of selection right). So, this is clearly not where the contrast with the majority view lies. The contrast with the majority view is that the majority insist on misspecification, and I suggest not to do that.

      To briefly give some mathematical details: Price's equation defines the action of natural selection in relation to a trait of interest as the covariance between fitness 𝑤 and the genetic breeding value 𝑔 for the trait, i.e. Cov(𝑤, 𝑔);

      The Price equation is an identity, not a definition. When deciding on a definition, there is some freedom. We can choose to define ⊂ so that 𝐴 ⊂ 𝐵 means that 𝐴 is a strict subset of 𝐵; or we can choose to define ⊂ so that 𝐴 ⊂ 𝐵 means that 𝐴 is a (not necessarily strict) subset of 𝐵. The Price equation does not “define the action of natural selection”, because it is an identity. There is no freedom to “define” any other way.

      The more serious reason why this is conceptually also a little dangerous, is the following. Imagine a locus with two alleles. Both of them are non-coding bits of DNA. Selection therefore does not act on either of them. Now imagine a parent population with an average p-score of 0.5, or, in other words, the frequency of these alleles in the parent population is 50-50. That makes the expected value of the p-score in the offspring population 0.5 too. In finite populations, however, randomness can make the p-score grow a bit larger or a bit smaller than 0.5. If the parent population is small, the variance (the expected squared deviation from 0.5) can actually be sizeable. If the p-score in the offspring population lands above 0.5, then the Price equation has a > 0 and a 𝐶𝑜𝑣(𝑤, 𝑝) > 0. Describing the Price equation as “defining the action of natural selection” now suggests that higher p-scores have been selected for (or, in other words, that “the action of natural selection in relation to a trait of interest” is positive). With equal probability, however, < 0 and therefore also 𝐶𝑜𝑣(𝑤, 𝑝) < 0, and this would then make us draw the opposite conclusion, that natural selection has acted to lower the p-scores in the population. Both of those would be wrong, because in this situation, it would have been randomness that changed the average p-score. 

      this is a fully general result that applies exactly to any arbitrary set of (𝑔, 𝑤) data; without any loss of generality this covariance can be expressed as the product of genetic variance Var(𝑝) and a coefficient 𝑏(𝑔, 𝑤), the coefficient simply being defined as 𝑏(𝑔, 𝑤) = for all Var(𝑝) > 0; it happens that if one fits a straight line to the same (𝑔, 𝑤) data by means of least-squares regression then the slope of that line is equal to 𝑏(𝑔, 𝑤).

      Why this needs to be explained is a bit of a mystery. These “mathematical details” are in almost all Price equation papers, and they are the point of departure of my Appendix A (it is on page 7 of a more than 90 page long set of appendices). Seeing the need to explain this suggests that the reviewer thinks that there is a chance that I or anyone reading this paper would have missed this. I have not, and, more importantly, none of this invalidates the point I make in the paper.   

      All of this has already been discussed, repeatedly, in the literature.

      All of this has already been discussed, repeatedly, in the literature indeed. It is just that it does not engage with anything I write in the manuscript, or that I wrote in my other papers.

      Now turn to the present paper: the first sentence of the Abstract says "The generality of Hamilton's rule is much debated", and then the next sentence says "In this paper, I show that this debate can be resolved by constructing a general version of Hamilton's rule".

      This is correct.

      But immediately it's clear that this isn't really resolving the debate, what this paper is actually doing is asserting the correctness of the minority view (i.e. that Hamilton's rule as it currently stands is not a general result)

      It seems to me that the reason why this is “immediately clear” to this reviewer is that the reviewer has not processed the contents of the paper. I am not sure if I have to repeat this, but I am not saying that “Hamilton’s rule as it currently stands” is not general (in the sense that it always gets the direction of selection right). It is, and I say that it is a bunch of times. But so are other rules.

      and then attempting to build a more general form of Hamilton's rule upon that shaky foundation.

      I am not just “attempting to build a more general form of Hamilton's rule”. I did in fact build a more general form of Hamilton’s rule (where the generality refers to the richer set of reference statistical models).

      Predictably, the paper erroneously interprets the standard formulation of Hamilton's rule as a linear approximation and develops non-linear extensions to improve the goodness of fit for a result that is already exactly correct.

      Nowhere in the paper or the appendices do I describe the standard formulation of Hamilton’s rule (or, for that matter, any formulation of Hamilton’s rule) as an “approximation”. It is just not a word that has anything to do with this. If we are doing statistical inference, and the sum of squared errors that is minimized decreases by adding a variable in the statistical model with regard to which the sum of squared errors is minimized, then that will typically improve the goodness of fit. In statistics this is not described that as an improvement in how well the statistical model “approximates” the data, or whatever it is that the reviewer would suggest is being approximated here.

      This is not a convincing contribution. It will not change minds or improve understanding of the topic.

      There is indeed plenty of scope for this not to change minds or improve understanding of the topic. It will not change the minds or improve the understanding of those that are not really interested in getting this right. Obviously, it will also not convince those that do not read it.

      Nor is it particularly novel. Smith et al (2010, "A generalisation of Hamilton's rule for the evolution of microbial cooperation" Science 328, 1700-1703) similarly interpreted Hamilton's rule as a linear model and provided a corresponding polynomial expansion - usefully fitting the model to microbial data so as to learn something about the costs and benefits of cooperation in an empirical setting. it's odd that this paper isn't cited here.

      Let me begin by pointing to what I agree with. Given that smith et al. (2010) and my manuscript are both in the business of generalizing Hamilton’s rule, it would be helpful to the reader if my paper includes more information about how the two efforts relate. I will discuss the relation below, and I will also include that in Appendix B, and point to it in the main text. Before I do, however, I would like to point to two details in the review report that fit a pattern.

      The first is that the reviewer describes what smith et al. (2010) do as “useful”, and seems to think of fitting polynomial expansions as a legitimate way to “learn something about the costs and benefits of cooperation in an empirical setting”. That sounds quite positive. My paper, in which I supposedly repeat this, however, is characterized as misguided. This fits a pattern; all of the reviews I received from the inclusive fitness community include a “done before”, and regularly the done before is described approvingly, while my paper is described as fundamentally flawed.

      Also customary is the lack of detail. What would be really useful here, is something like “equation A.14 in this manuscript is the same as equation 6 in smith et al. (2010) if we choose . This kind of statement would pin down the way in which what I do has been done before. That, however, would require going into detail, at the risk of finding out that what is done in my manuscript is actually quite different from what happens in smith et al. (2010). That is also a recurrent thing. When I look up the done before, I typically find something that is not quite the same.  

      Now on to the paper. What smith et al. (2010) try to do is something that I wholeheartedly support. It is an empirical study that tries to capture non-linearity. A first point of order is that it is worth asking ourselves: linear or non-linear in what? For that, I would like to go back to the setup of my manuscript. Model 2 from the Main Text is

      In this fitness function, 𝑝! is the p-score of individual 𝑖 and 𝑞! is the p-score of the partner that individual 𝑖 is matched with. This is a standard model of social behaviour if 𝛽<sub>1,0</sub> < 0 and 𝛽<sub>0,1</sub> > 0. Such choices for 𝛽<sub>1,0</sub> and 𝛽<sub>0,1</sub> indicate that having a higher p-score decreases the fitness of individual 𝑖 and increases the fitness of its partner. Here we assume that 𝛼 = 1, 𝛽<sub>1,0</sub> \= −1, and 𝛽<sub>0,1</sub> \= 2. We assume that p-scores can only be 0 or 1, or, in other words, we assume that there are only cooperators and defectors in the population (or, in terms of smith et al., 2010: cooperators and cheaters).

      For a well-mixed population, where the likelihood of being matched with a cooperator is the same for cooperators and defectors (it is equal to the frequency of cooperators for both), we can now plot the fitnesses of cooperators (red) and defectors (blue) as a function of the frequency of cooperators (Appendix 1-figure 6 left).

      We can do the same for a population with relatedness where the probability of being matched with a cooperator is + 𝑓<sub>c</sub> for cooperators, and 𝑓<sub>c</sub> for defectors, where 𝑓<sub>c</sub> is the frequency of cooperators (Appendix 1-figure 6 right). For relatedness 𝑟 = 0 and 𝑟 = "7, cooperation is selected against at every frequency.

      Increasing relatedness further, we would find that for 𝑟 = the lines coincide, which implies that at every frequency, cooperation is neither selected for nor against. For 𝑟 > ": cooperation will be selected for at every frequency. This pattern implies that, as we have seen in the manuscript, the classical Hamilton’s rule works perfectly fine for Model 2; with 𝑐 = −𝛽<sub>1,0</sub> = 1 and 𝑏 = 𝛽<sub>0,1</sub> \= 2, cooperation is selected for if and only if 𝑟𝑏 > 𝑐. The fitnesses of cooperators and defectors as functions of the frequency of cooperators, moreover, are always parallel lines, regardless of relatedness.

      Model 3 in the main text extends Model 2 by adding an interaction term:

      Now we choose 𝛼 = 1, 𝛽<sub>1,0</sub> = −1, 𝛽<sub>1,0</sub> = 1, and 𝛽<sub>1,1</sub>  \= 1. We again draw the fitnesses of cooperators and defectors, both at relatedness 𝑟 = 0 (Appendix 1-figure 7 left) and at relatedness 𝑟 = (Appendix 1-figure 7 right). In the manuscript, I argue that the appropriate version of Hamilton’s rule here is Queller’s rule: 𝑟<sub>0,1</sub>𝑏<sub>0,1</sub> + 𝑟<sub>1,1</sub>𝑏<sub>1,1</sub> > 𝑐 with 𝑐 = −𝛽<sub>1,0</sub> = 1, 𝑏<sub>0,1</sub> = 𝛽<sub>0,1</sub> = 1, and 𝑏<sub>1,1</sub> = 𝛽<sub>1,1</sub> = 1. The fitnesses of cooperators and defectors as functions of the frequency of cooperators are still straight lines, but they are no longer parallel.

      The first thing to observe, therefore, is that a model with synergy, in which the classic version of Hamilton’s rule would be misspecified, and Queller’s rule would be well-specified, does not require the fitnesses as functions of the frequencies of cooperators to be non-linear. All that changes with the addition of the interaction term, is that they stop being parallel.

      The paper by smith et al. (2010) is an effort to capture non-linearities in the way fitnesses depend on the frequency of cooperators. That, therefore, goes beyond the step from Model 2 to Model 3. Whether it uses the right method to capture those non-linearities, we will come back to in a second, but it is important to realize that also without these non-linearities, the classic version of Hamilton’s rule can be too limiting to accurately describe selection. (Here, I should add that this implies that we were wrong in Wu et al. (2013), when we suggested that “for this experiment, it seems unnecessary to use the generalized Hamilton’s rule, if instead the Malthusian fitness is adopted. In other words, the Wrightian fitness approach calls for a generalization of Hamilton’s rule, whereas the Malthusian fitness approach does not (or at least not in a drastic way, as Malthusian fitnesses are almost linear in the frequency of cooperators).” Using Malthusian fitnesses, the functions were close to linear, but not close to parallel, and therefore also here, Hamilton’s rule needs generalizing - albeit in a different way than smith et al. (2010) did).

      The cooperation that is observed in the Myxococcus xanthus studied by smith et al. (2010) is not a good match with a model where individuals are matched in pairs for an interaction that determines their fitnesses. These microbes cooperate in large groups, and a better match would therefore be the n-player public goods games studied in van Veelen (2018). There, we see that simple, straightforward ways to describe synergies (or anti-synergies) can easily lead to fitnesses not being linear in the frequency of cooperators.

      The way smith et al. (2010) try to capture those non-linearities, however, is not free of complications. We addressed those in Wu et al. (2013), and I summarized them, shortly, in van Veelen (2018). One of the issues is that most of the non-linearity smith et al. (2010) pick up is the result of considering Wrightian fitness rather than Malthusian fitness. In a continuous time model with a constant growth rate, the population size at time 𝑡 is 𝑁(𝑡) = 𝑒<sup>mt</sup>𝑁(0), where 𝑚 is the Malthusian fitness. In a discrete time model with a constant average number of offspring per individual, the population at time 𝑡 is 𝑁(𝑡) = 𝑤<sup>t</sup>𝑁(0), where 𝑤 is the Wrightian fitness. If we take 𝑚 = ln 𝑤, these are the same, and if 𝑤 is close to 1, then 𝑚 can be approximated by 𝑤 − 1. That also implies that if 𝑤 is close to 1 (or, equivalently, if 𝑚 is close to 0) one is locally linear if the other is too. However, in the experiment by smith et al. (2010) the aggregate fitness effects are not small, and what is highly nonlinear in terms of Wrightian fitness is close to linear in Malthusian fitness.

      Another complication is that the Taylor coefficients that smith et al. (2010) find are the result of a combination of the data and the choice of a functional form they choose to first apply to their data. That means that a different choice of a functional form would have given different Taylor coefficients, while the in-between transformation can also be skipped. Also, the number of Taylor coefficients is larger than the dimensionality of the data, which are based on averages for 6 frequencies. For more details on these complications, I would like to refer to Wu et al. (2013) and van Veelen (2018). A nice detail is that if we consider the way the fitnesses of cooperators and defectors compare when using Malthusian fitnesses, then a comparison of the slopes actually suggests anti-synergies, which leads to a stable mix of cooperators and cheaters, already in the absence of population structure. This matches what is suggested by Archetti and Scheuring, (2011, 2012) and Archetti (2018).

      Besides these technical complications, smith et al. (2010) is also different, in the sense that it is an empirical paper. It does not contain the Generalized Price equation, it contains no insights regarding how to derive population genetic dynamics from the Generalized Price equation, or how to derive the appropriate rules from those, and it has a very different approach to separating fitness effects and population structure.

      To end on a positive note, I would like to quote a bit out of Wu et al. (2013):

      “While we criticise these mathematical issues, we are convinced that smith et al. (2010) aim into the right direction: to incorporate the nonlinearities characteristic of biology into social evolution, we may have to extend and generalize the approach of inclusive fitness. It would be beautiful if such a generalization would ultimately include Hamilton’s original rule as a special case […].”

      I like to think that this is exactly what I have done in this paper.

      References

      Akdeniz, A., & van Veelen, M. (2020). The cancellation effect at the group level. Evolution, 74(7), 1246–1254. doi: 10.1111/evo.13995

      Allen, B., & Tarnita, C. E. (2012). Measures of success in a class of evolutionary models with fixed population size and structure. Journal of Mathematical Biology, 68, 109–143. doi: 10.1007/s00285-012-0622-x

      Archetti, M. (2018). How to Analyze Models of Nonlinear Public Goods. Games 2018, Vol. 9, Page 17, 9(2), 17. doi: 10.3390/g9020017

      Archetti, M., & Scheuring, I. (2011). Coexistence of cooperation and defection in public goods games. Evolution, 65(4), 1140–1148. doi: 10.1111/j.1558-5646.2010.01185.x

      Archetti, M., & Scheuring, I. (2012). Review: Game theory of public goods in one-shot social dilemmas without assortment. Journal of Theoretical Biology, 299, 9–20. doi: 10.1016/j.jtbi.2011.06.018

      Bourke, A. F. G. (2014). Hamilton’s rule and the causes of social evolution. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1642), 20130362. doi: 10.1098/rstb.2013.0362

      Bourrat, P., Godsoe, W., Pillai, P., Gouhier, T. C., Ulrich, W., Gotelli, N. J., & van Veelen, M. (2023). What is the price of using the Price equation in ecology? Oikos, 2023(8). doi: 10.1111/oik.10024

      Crow, J. F. (2008). Commentary: Haldane and beanbag genetics. International Journal of Epidemiology, 37(3), 442–445. doi: 10.1093/ije/dyn048

      Fisher, R. (1930). The genetical theory of natural selection. Retrieved from https://www.cabidigitallibrary.org/doi/full/10.5555/19601600934

      Fletcher, J. A., & Zwick, M. (2006). Unifying the theories of inclusive fitness and reciprocal altruism. American Naturalist, 168(2), 252–262. doi: 10.1086/506529

      Frank, S. A. (1997). The Price equation, Fisher’s fundamental theorem, kin selection, and causal analysis. Evolution, 51(6), 1712–1729. doi: 10.1111/j.1558-5646.1997.tb05096.x

      Frank, S. A. (1998). Foundations of social evolution. Princeton: Princeton University Press.

      Frank, S. A. (2012). Natural selection. IV. The Price equation*. Journal of Evolutionary Biology, 25(6), 1002–1019. doi: 10.1111/j.1420-9101.2012.02498.x

      Gardner, A., West, S. A., & Wild, G. (2011). The genetical theory of kin selection. Journal of Evolutionary Biology, 24(5), 1020–1043. doi: 10.1111/j.1420-9101.2011.02236.x

      Grafen, A. (1985a). A geometric view of relatedness. Oxford Surveys in Evolutionary Biology, 2(2), 28-89.

      Grafen, A. (1985b). News and Views. Evolutionary theory: Hamilton’s rule OK. Nature, 318(6044), 310–311. doi: 10.1038/318310a0

      Hamilton, W. D. (1964). The genetical evolution of social behaviour. I. Journal of Theoretical Biology, 7(1), 1–16. doi: 10.1016/0022-5193(64)90038-4

      Karlin, S., & Matessi, C. (1983). The eleventh R. A. Fisher Memorial Lecture - Kin selection and altruism. Proceedings of the Royal Society of London. Series B. Biological Sciences, 219(1216), 327–353. doi: 10.1098/rspb.1983.0077

      Matessi, C., & Karlin, S. (1984). On the evolution of altruism by kin selection. Proceedings of the National Academy of Sciences, 81(6), 1754–1758. doi: 10.1073/pnas.81.6.1754

      Nowak, M. A., Tarnita, C. E., & Wilson, E. O. (2010). The evolution of eusociality. Nature, 466(7310), 1057–1062. doi: 10.1038/nature09205

      Okasha, S. (2005). Maynard Smith on the levels of selection question. Biology and Philosophy, 20(5), 989–1010. doi: 10.1007/S10539-005-9019-1/METRICS

      Page, K. M., & Nowak, M. A. (2002). Unifying evolutionary dynamics. Journal of Theoretical Biology, 219(1). doi: 10.1016/S0022-5193(02)93112-7

      Pillai, P., & Gouhier, T. C. (2019). Not even wrong: the spurious measurement of biodiversity’s effects on ecosystem functioning. Ecology, 100(7), e02645. doi: 10.1002/ecy.2645

      Price, G. R. (1970). Selection and Covariance. Nature, 227(5257), 520–521. doi: 10.1038/227520a0

      Price, G. R. (1972). Extension of covariance selection mathematics. Annals of Human Genetics, 35(4), 485-490.

      Queller, D. C. (1985). Kinship, reciprocity and synergism in the evolution of social behaviour. Nature, 318(6044), 366–367. doi: 10.1038/318366a0

      Queller, D. C. (1992a). A general model for kin selection. Evolution, 46(2), 376–380. doi: 10.1111/j.1558-5646.1992.tb02045.x

      Queller, D. C. (1992b). Quantitative Genetics, Inclusive Fitness, and Group Selection. The American Naturalist, 139(3), 540–558. doi: 10.1086/285343

      Queller, D. C. (2011). Expanded social fitness and Hamilton’s rule for kin, kith, and kind. Proceedings of the National Academy of Sciences, 108(supplement_2), 10792–10799. doi: 10.1073/pnas.1100298108

      Rousset, & Billiard. (2000). A theoretical basis for measures of kin selection in subdivided populations: Finite populations and localized dispersal. Journal of Evolutionary Biology, 13(5). doi: 10.1046/j.1420-9101.2000.00219.x

      Rousset, F. (2015). Regression, least squares, and the general version of inclusive fitness. Evolution, 69(11), 2963–2970. doi: 10.1111/evo.12791

      Smith, J., Van Dyken, J. D., & Zee, P. C. (2010). A generalization of hamilton’s rule for the evolution of microbial cooperation. Science, 328(5986), 1700–1703. doi: 10.1126/science.1189675

      Sober, Elliott., & Wilson, D. Sloan. (2007). Unto others : the evolution and psychology of unselfish behavior. 394. Retrieved from https://www.hup.harvard.edu/books/9780674930476

      Taylor, P. D. (1992). Altruism in viscous populations - an inclusive fitness model. Evolutionary Ecology, 6(4), 352–356. doi: 10.1007/bf02270971

      Taylor, Peter D. (1989). Evolutionary stability in one-parameter models under weak selection. Theoretical Population Biology, 36(2), 125–143. doi: 10.1016/00405809(89)90025-7

      Taylor, Peter D., Day, T., & Wild, G. (2007). Evolution of cooperation in a finite homogeneous graph. Nature, 447(7143), 469–472. doi: 10.1038/nature05784

      Van Cleve, J. (2015). Social evolution and genetic interactions in the short and long term. Theoretical Population Biology, 103. doi: 10.1016/j.tpb.2015.05.002

      van Veelen, M. (2005). On the use of the Price equation. Journal of Theoretical Biology, 237(4). doi: 10.1016/j.jtbi.2005.04.026

      van Veelen, M. (2007). Hamilton’s missing link. Journal of Theoretical Biology, 246(3). doi: 10.1016/j.jtbi.2007.01.001

      van Veelen, M. (2011). The replicator dynamics with n players and population structure. Journal of Theoretical Biology, 276(1). doi: 10.1016/j.jtbi.2011.01.044

      van Veelen, M. (2018). Can Hamilton’s rule be violated? ELife, 7. doi: 10.7554/eLife.41901

      van Veelen, M. (2020). The problem with the Price equation. Philosophical Transactions of the Royal Society B: Biological Sciences, 375(1797), 20190355. doi: 10.1098/rstb.2019.0355

      van Veelen, M., Allen, B., Hoffman, M., Simon, B., & Veller, C. (2017). Hamilton’s rule. Journal of Theoretical Biology, 414. doi: 10.1016/j.jtbi.2016.08.019

      van Veelen, M., García, J., Sabelis, M. W., & Egas, M. (2012). Group selection and inclusive fitness are not equivalent; the Price equation vs. models and statistics. Journal of Theoretical Biology, 299. doi: 10.1016/j.jtbi.2011.07.025

      Wilson, D. S., Pollock, G. B., & Dugatkin, L. A. (1992). Can altruism evolve in purely viscous populations? Evolutionary Ecology, 6(4), 331–341. doi: 10.1007/bf02270969

      Wu, B., Gokhale, C. S., van Veelen, M., Wang, L., & Traulsen, A. (2013). Interpretations arising from Wrightian and Malthusian fitness under strong frequency dependent selection. Ecology and Evolution, 3(5). doi: 10.1002/ece3.500

    1. Author response:

      The following is the authors’ response to the original reviews.

      In this letter, we respond to each of the reviewers’ comments. We support responses by referring to the revised manuscript and, where necessary, by including additional descriptions and analyses that we consider extrinsic to the manuscript itself. In this letter, all changes to the manuscript are shown in blue. As noted, the displayed figures have been added to the manuscript or the SI. We believe that we have successfully addressed all comments and that the quality of our paper has improved significantly.

      Comment 1: In addition to the technical comments by the reviewers, I would encourage the authors to discuss the dependency of their observations, e.g. emergence of microphase separation, not only on the sequence of the polypeptides, but also on the solution conditions. Similarly, the distributions of ions in the condensate bulk, interphase, and diluted phase, and hence the interfacial free energy, are significantly affected both by the chemical composition of the condensate and the salt concentration itself, see: https://pubs.acs.org/doi/10.1021/acs.nanolett.1c03138

      We thank the editor for this suggestion. Here, we have focused on the effect of sequence on condensate organization. We agree that how changes in solution condition affect condensate, including microphase separation of ELPs, is potentially interesting as well. We note this as a possible future direction at multiple places in the revised Conclusions and Discussion:

      “The simulations successfully reproduced condensate stability variation upon amino acid substitution. While our study is performed at set salt concentration and temperature to isolate the contributions of amino acid hydrophobicity to condensate organization, future studies may consider implementing temperature [cite] or salt [cite] dependent models to explore how solution conditions affect the organization of ELP condensates.”

      “Such a microenvironment arises from the collective behavior of many proteins, can deviate from that of individual chains, and is likely sensitive to the solution conditions,[cite] which are held constant in our study. Future work on systems with double amino acid substitutions or changes to salt concentration or temperature could elucidate the generality of the mean field interpretation and the additivity of individual contributions.”

      Response to referee 1

      Comment 0: This is an interesting, informative, and well-designed study that combines theoretical and experimental methodologies to tackle the phenomenon of higher-resolution structures/substructures in model biomolecular condensates. The results should be published. However, there is significant room for improvement in the presentation and interpretation of the results. As it stands, the precise definition of “frustration,” which is a main theme of this manuscript (as emphasized in the title), is not sufficiently well articulated. This situation should be rectified to avoid ””rustration” becoming a ”catch-all” term without a clear perimeter of applicability rather than a precise, informative description of the physical state of affairs. There are also a few other concerns, e.g., regarding interpretation of correlation of phase-separation critical temperature and transfer free energy of amino acid residues as well as the difference between critical temperature and onset temperature, and the way the simulated configurations are similar to that of gyroids.

      We want to thank the reviewers for their insightful comments. We revised the manuscript extensively to improve its clarity and to address the reviewers’ concerns. In the following, we provide point-to-point responses to all the comments.

      Comment 1: It is accurately pointed out on p.4 that elastin-like polypeptides (ELPs) undergo heat-induced phase separation and therefore exhibit lower critical solution temperatures (LCSTs). But it is not entirely clear how this feature is reproduced by the authors’ simulation. A relationship between simulated surface tension and “transition temperature” is provided in Fig.1C; but is the ”transition temperature” (authors cited ref.41 by Urry) the same as critical temperature? Apparently, Urry’s Tt is””critical onset temperature”, the temperature when phase separation happens at a given polymer concentration. This is different from the (global) critical temperature LCST - though the two may be correlated-or not-depending on the shape of the phase boundary. Moreover, is the MOFF coarse-grained forcefield (first step in the multi-scale simulation), by itself, capable of reproducing heat-induced phase separation in a way similar to the forcefield of Dignon et al., ACS Cent Sci 5, 821-230 (2019)? Or is this temperature-dependent effect appearing only subsequently, after the implementation of the MARTINI and/or all-atom steps? Clarification is needed. To afford a more informative context for the authors’ introductory discussion, the aforementioned Dignon et al. work and the review by Cinar et al. [Chem Eur J 25, 13049-13069 (2019)], both touching upon the physical underpinning of the LCST feature of elastin, should also be cited along with refs.41-43.

      We thank the reviewer for their comment. First, we apologize for the lack of clarity between the global lower critical solution temperature, Tc, and the transition temperature, Tt. We have modified the manuscript to be more explicit that the transition temperature we utilize is dependent on the solution conditions, instead of the global lower critical solution temperature.

      Author response image 1.

      Tt as a function of concentration for ELP[V5A2G3] constructs of different chain lengths. Logarithmic fits to the data for each construct using Eq. 1 are also shown. It is evident that the different curves converge to the critical temperature Tc at the critical concentration Cc. Figure reproduced from ref.[2] CC BY 4.0.

      However, as shown by Chilkoti and coworkers [1, 2] and in Author response image 1, the critical temperature of ELPs Tc is indeed linearly related to Tt with the following relationship

      The above equation highlights the dependence of Tt on the chain length (length) and polymer concentration (conc). The parameter Cc is the corresponding theoretical polypeptide concentration that would be required to achieve Tc, and k is the proportionality constant. Instead of making computationally expensive predictions of condensate critical temperatures, we focused on the surface tension, which can be more readily determined from single constant temperature simulations as detailed in the Methods section. This decision was made so to make it computationally feasible to systematically probe the properties of all 20 amino acids in diblock ELPs in our multiscale model. Furthermore, an expected relationship between the critical temperature and the surface tension can be inferred based on the Flory Huggins theory. In particular, relationships between the Flory Huggins parameter, χ, and interfacial tension (τ) have been investigated, and the relationship can be approximated as

      where α is a positive constant, whose exact value depends on the proximity of χ to the critical value of χ necessary for phase separation (χC).[3, 4] As detailed in new Supplemental Theory of the Supporting Information, for systems undergoing LCST,

      with Therefore, we have

      Several conclusions can be drawn from Eq. 4. First, for α = 1, τ is linearly proportional to Tc. Secondly, τ decreases at larger values for Tc since trend that is consistent with results presented in Figure 1 of the main text. Finally, as detailed in the Supplemental Theory, the inverse relationship between τ and Tc is only expected for systems exhibiting LCSTs. For systems with UCST, τ increases at larger Tc. Therefore, reproducing the correct trend supports the model’s ability to capture the temperature-dependent effect specific to the ELP system.

      We modified the text to define the physical meaning of Tt more explicitly. Furthermore, we added a new section in the Supporting Information titled Supplemental Theory to detail the relationship between Tt, Tc, the Flory-Huggins parameter χ, and the surface tension τ. The updated text now reads:

      “Utilizing the simulated condensate conformations, we computed various quantities to benchmark against experimental measurements. While the critical temperature has been widely used as a measure for condensate stability, determining it computationally is expensive. As an alternative, we computed the surface tension, τ, using 100-µs-long MARTINI simulations performed with the NPNAT ensemble.[cite] As detailed in the Supplemental Theory in the Supporting information, an inverse relationship is expected between τ and the critical temperature, Tc, for systems exhibiting LCSTs. We further approximate Tc with the transition temperatures (Tt) of ELP sequences,[cite] which are the temperatures at which ELPs undergo an LCST transition at a specified solution condition. Tt was shown to be linearly proportional to TC[cite]. As expected, a negative correlation can be readily seen between computed surface tension and experimental Tt (Fig. 1C). This observed negative correlation between Tt and τ supports the simulation approach’s accuracy in reproducing the sequence-dependent changes in ELP phase behavior.”

      The reviewer is correct that MOFF does not explicitly account for temperature-dependent effects in its interaction parameters. But as mentioned above and indicated by the reviewer, the following steps with explicit solvent simulations in the multiscale strategy succeed in capturing sequence-dependent differences in ELP systems, which are evident in both transition temperature and surface tension.

      We cited the two references suggested by the reviewer in the introduction. We further added the following text in the discussion section to suggest explicitly exploring temperature-dependent effects as an interesting future direction.

      “While our study is performed at set salt concentration and temperature to isolate the contributions of amino acid hydrophobicity to condensate organization, future studies may consider implementing temperature[cite] or salt[cite] dependent models to explore how solution conditions effect the organization of ELP condensates.”

      Comment 2: “Frustration” and ”frustrated” are used prominently in the manuscript to characterize certain observed molecular configurations (11 times total, in both the title and in the abstract). Apparently, it is the most significant conceptual pronouncement of this work, hence its precise meaning is of central importance to the authors’ thesis. Whereas one should recognize that the theoretical and experimental observations are striking without invocation of the “frustration” terminology, usage of the term can be useful if it offers a unifying conceptual framework. However, as it stands, a clear definition of the term “frustration” is lacking, leaving readers to wonder what molecular configurations are considered “frustrated” and what are not (i.e., is the claim of observation of frustration falsifiable?). For instance, “frustrated microphase separation” appears in both the title and abstract. A logical question one may ask is: “Are all microphase separations frustrated”? If the answer is in the affirmative, does invocation of the term “frustration” add anything to our physical insight? If the answer is not in the affirmative, then how does one distinguish between microphase separations that are frustrated from those that are not frustrated? Presumably all simulated and experimental molecular configurations in the present study are those of lowest free energy for the given temperature. In other words, they are what they are. In the discussion about frustrated phase separation on p.13, for example, the authors appear to refer to the fact that chain connectivity is preventing hydrophobic residues to come together in a way to achieve the most favorable interactions as if there were no chain connectivity (one may imagine in that case all the hydrophobic residues will form a large cluster without microphase separation). Is this what the authors mean by “frustration”? If that’s true, isn’t that merely stating the obvious, at least for the observed microphase separation? In general, does “frustration” always mean deviation of actual, physical molecular configurations from certain imagined/hypothetical/reference molecular configurations, and therefore dependent upon the choice of the imagined reference configuration? If this is how the authors apply the term “frustration” in the present work, what is the zero-frustration reference state/configuration for microphase separation? And, similarly, what is the zero-frustration reference state/configuration when frustrated EPS-water interactions are discussed (p.14-p.15, Fig.5)? How do non-frustrated water-protein interactions look like? Is the classic clathrate-like organization of water hydrogen bonds around small nonpolar solute “frustrated”?

      We thank the reviewer for their insightful comment, and agree that the concept of “frustration” is both important to our conclusions and, upon review, is too vague in our previous draft of the manuscript.

      For conceptual simplicity and to maximize transferability to real biological systems, we will focus our discussion of frustration on one specific type, which we term “chain frustration.” Chain frustration occurs in states where tertiary interactions between chemically distinct polymer blocks favor phase separation, while chain connectivity prevents macroscopic phase separation from occurring.[5] This frustration leads to microphase separation with microdomains of different monomers.

      We agree with the reviewer that “all microphase separations” are frustrated, and have revised the title to

      “Microphase Separation Produces Interfacial Environment within Diblock Biomolecular Condensates”

      Furthermore, we also removed frustration from the abstract to read

      “The interspersion of hydrophilic and hydrophobic residues and a lack of secondary structure formation result in an interfacial environment, which explains both the strong correlation between ELP condensate stability and interfacial hydrophobicity scales, as well as the prevalence of protein-water hydrogen bonds.”

      We have limited our discussion of the frustration to the incomplete separation of hydrophobic and hydrophobic groups. As pointed out by the reviewer, in this case, frustration refers to the fact that chain connectivity is preventing hydrophobic residues from coming together in a way to achieve the most favorable interactions as if there were no chain connectivity. The reference would be a perfectly macroscopic phase separation that partitions hydrophobic from hydrophilic groups.

      While the frustration from chain connectivity is well understood for block copolymers[5], its effect on producing the interfacial solvation environment, to the best of our knowledge, has not been emphasized before. We have revised the text at the point where we mention frustration to clearly define its meaning.

      “Therefore, while microphase separation occurs in ELP condensates, frustration remains in the system. Hydrophilic residues cannot completely separate from hydrophobic ones due to constraints imposed by the acid sequence, creating unique microenvironments.”

      When discussing the interactions between ELP and water, we used the hydrogen bond analysis to emphasize the interfacial environment. For example, the hydrophobic residues tend to “repel” water molecules, reducing the hydrogen bond density; on the other hand, hydrophilic residues and backbone retain water molecules. This difference resulted in the positive and negative correlation with Tt shown in Fig 5C. The behavior of water molecules is, therefore, inhomogeneous inside the condensate. We expect water molecules to become frustrated due to the simultaneous contact with both hydrophobic and hydrophilic chemical groups, and a perfect reference state would be the pure water environment. However, since this point is not central to our study, to avoid confusion, we have avoided mentioning frustration and revised the text to read amino acid sequence, creating unique microenvironments.”

      “The water hydrogen bond density also highlights an interfacial environment of blended hydrophobic and hydrophilic regions.”

      After revising the text, frustration only appears three times in the manuscript.

      Comment 3: In the discussion about the correlation of various transfer free energy scales for amino acids and Urry’s critical onset temperature (ref.41) on p.11 and Fig.4, is there any theoretical relationship to be expected between the interactions among amino acids of ELPs and their critical onset temperatures? While a certain correlation may be intuitively expected if the free energy scale ”is working”, is there any theoretical insight into the mathematical form of this relationship? A clarifying discussion is needed because it bears logically on whether the observed correlation or lack thereof for different transfer energy scales is a good indication of the adequacy of the energy scales in describing the actual physical interactions at play. This question requires some prior knowledge of the expected mathematical relationship between interaction parameters and onset temperature.

      We thank the reviewer for their comment. The exact relationship between the interactions between amino acids and their transition temperature can be understood in terms of the Flory-Huggins theory, which describes the thermodynamics of polymer mixtures using a lattice model. The chemical composition of the mixture is built into the polymer-solvent interaction parameter

      Where is the coordination number, T is the temperature, kB is the Boltzmann constant, and {ϵpp, ϵss, ϵps} are the strength of polymer-polymer, solventsolvent, and polymer-solvent interactions respectively.[6]

      From the original derivation of Flory-Huggins theory, it can be shown that phase separation occurs when χ is greater than its critical value, or χC, we can derive the critical temperature as

      Δϵ can indeed be interpreted as the free energy cost of transferring a polymer bead from a solution phase to a polymer phase. It corresponds to the change of energy from a mixed state, with contacts between polymer and solvent (ϵps), to the demixed state with only polymer-polymer (ϵpp) and solvent-solvent (ϵss) contacts.

      Therefore, the transfer free energy, and the interactions among amino acids of ELPs, are expected to correlate with the critical temperature. The above discussion has been incorporated into the new section Supplemental Theory in the Supporting Information. There, we also discuss the more general scenario where Δϵ is temperature dependent, which is essential for giving rise to LCST.

      We have modified the main text in the discussions of Figure 4 to better explain these mathematical relationships and their necessary assumptions in order to help interpret our simulations. Here is an expert from where we discuss Figure 4:

      “The strong dependence of molecular organization on amino acid hydrophobicity suggests that the solvation environment of individual residues might be a determining factor for condensate stability. Indeed, as shown in the Supplemental Theory of the Supporting Information, the critical temperature is closely related to the free energy cost of transferring polymer beads from a solution state to a polymer-only environment. This transfer free energy is often used to quantify the hydrophobicity of amino acids [cite]. To explore their relationship more quantitatively, we compared the transition temperature for ELP condensates measured by Urry [cite] to several hydrophobicity scales.”

      Comment 4: To provide a more comprehensive context for the present study, it is useful to compare the microphase separation seen in the authors’ simulation with the micelle-like structures observed in recent simulated condensed/aggregated states of hydrophobic-polar (HP) model sequences in Statt et al., J Chem Phys 152, 075101 (2020) [see esp. Fig.6] and Wesse´n et al., J Phys Chem B 126, 9222-9245 (2022) [see, e.g., Fig.10].

      We thank the reviewer for this suggestion. The results of Statt et al. and Wessen et al.´ indeed provide a nice comparison to our results. While we capture some of the same behavior they observe, the full array of chemical space in our model seems to give some additional morphologies as well.

      First, as predicted by the self-consistent field theory, block copolymers are expected to form primarily lamellar like micelles that clearly seperate the dense and dilute phase when the volume fraction, f, is 0.5 (Response to Comment 5). This prediction is indeed consistent with results from simulations with the HP model, and is consistent with our simulations when the substituted amino acid, X, is sufficiently polar.

      However, this observation is only one of several behaviors we observe. In particular, our simulations also produce gyroid-like structures, which are predicted to emerge at small volume differences, i.e. f ≈ 0.4 or f ≈ 0.6. These different configurations likely emerge due to the more realistic representation of amino acids in our model, which presents more frustration than the HP model. In particular, the backbone atoms are inherently hydrophilic and cannot separate from the hydrophobic side chains. Therefore, under microphase separation, it is inherently difficult to separate the different chemical groups to form lamellar or micelle-like structures. This produces a condensate interior with interfacial properties that may not be captured by the HP model.

      We make note of the micelle-like topologies predicted by HP models in the revised text, citing both Statt et al. and Wessen et al.:´

      “Surprisingly, microphase separation did not produce lamellar morphology as expected for block copolymers with equal volume fraction of the two blocks (Fig. S3 in the Supporting Information) [cite]. In particular, the condensates appear to form gyroid-like structures (Fig. S4 in the Supporting Information), in which the V and X blocks form two interpenetrating networks. This morphology also differs from micelle-like structures seen in simplified hydrophobicpolar (HP) polymers [cite]. It promotes interfacial contacts while maintaining substantial self-interactions as well. Weak interfacial tension between different ELP blocks has also been noted by Hassouneh et al.[cite]”

      Comment 5: ”Gyroid-like morphology” is mentioned several times in the manuscript (p.4, p.8, p.17, Fig.S3). This is apparently an interesting observation, but a clear explanation is lacking. A more detailed and specific discussion, perhaps with additional graphical presentations, should be provided to demonstrate why the simulated condensed-phase ELP configurations are similar to the classical description of gyroid as in, e.g., Terrones & Mackay, Chem Phys Lett 207, 45-50 (1993) and Lambert et al., Phil Trans R Soc A 354, 2009-2023 (1996).

      We thank the reviewer for their comment. Gyroids are canonical structures for diblock copolymers.[5, 7, 8, 9] Their stability is predicted using self-consistent field theory (SCFT), and occurs due to the balance of the volume fraction of polymer block A (fA), the length of the polymer (N), and the Flory-Huggins interaction parameter (χ).[8, 9] The prediction from SCFT suggests that gyroids occur at smaller values of χN and values fA near, but not equal to 0.5 (Author response image 2).[10] We hypothesize that these configurations emerge at equal molar fraction of V and X amino acids due to small differences in solvation volume between each half of the polymer chain.

      Our support for gyroid-like structures is mainly from observations of two interpenetrating networks formed by the two ELP blocks. We have revised Figure S4 to clearly highlight the two networks as shown in Author response image 3.

      We have revised the main text to clearly define the gyroid-like structures as interpenetrating networks, and added the theoretical phase diagram of diblock copolymers predicted by SCFT as Figure S3 in the Supporting Information.

      “In particular, the condensates appear to form gyroid-like structures (Fig. S4 in the Supporting Information), in which the V and X blocks form two interpenetrating networks. This morphology also differs from micelle-like structures seen in simplified hydrophobic-polar (HP) polymers [cite]. It promotes interfacial contacts while maintaining substantial self-interactions as well. Weak interfacial tension between different ELP blocks has also been noted by Hassouneh et al.[cite]”

      We note, however, that proving that our observations are indeed gyroid structures requires more sophisticated mathematical analysis that is beyond the scope of the study. It is also possible that these structures are metastable in our simulations. We emphasize these caveats in the updated Discussion Section.

      “Further studies on the thermodynamic stability of these morphologies and comparing them with predictions from the self-consistent field theory shall provide more insights into the driving forces for their emergence [cite].”

      Author response image 2.

      Theoretical phase diagram[8] and corresponding morphologies for diblock copolymers. The phases are labeled as: body centered cubic (BCC), hexagonal cylinders (HEX), gyroid (GYR), and lamellar (LAM). fA is the volume fraction of a single polymer block, denoted A, χ is the Flory-Huggins interaction parameter, and N is the total degree of polymerisation. Figure reproduced from ref.[10] CC BY 4.0.

      Author response image 3.

      Representative configurations of (A) V5F5 and (B) V5L5 condensates from MARTINI simulations. The valine substituted half of the chain is colored blue (V5) and the X substituted half of the chain is colored red (X5). To highlight the interpenetrating networks formed by the two halves, only the X substituted half of the chain is shown on the left. Simulation interfaces are once repeated periodically in the positive x and positive y dimensions for clarity. High density regions formed by the multiple X substituted half of the chains are highlighted in yellow circles, with one of the chain shown in green.

      Response to referee 2

      Comment 1: The experimental characterization relies on BODIPY and SBD reporting, respectively, on viscosity and polarity. The fluorescent signal of these dyes can possibly depend on many other factors, including quenching. Additional controls are required, or a more extensive discussion with additional references, and a mention to potential limitations of this approach.

      We agree with the reviewer that the fluorescence lifetime signal will be affected by many factors. Compared with the fluorescence intensity, the fluorescence lifetime mainly depends on the dyes’ self properties and environmental factors. BODIPY and SBD have been used in biological systems to detect the microviscosity and micropolarity of condensates. Our group published the same SBD and BODIPY fluorophores in previous work to quantify the microenvironment of protein aggregation and condensations. The extended data (ChemBioChem 20:1078–1087. doi: 10.1002/cbic.201800782; Aggregate 4:e301. doi:10.1002/agt2.301; Nat Chem Biol 1–9. doi:10.1038/s41589-023-01477-1) shows evidences that the BODIPY is only sensitive to the viscosity while SBD is only sensitive to the polarity, but nonsensitive to other environmental factors. As for the quenched issue, the fluorophores with extended pi-rich structure display aggregation-caused quenching (ACQ) effect in high probe concentration, which will lower the fluorescence lifetime and intensity. We usually labeled the 20% molar ratio of the ELPs using NHS-ester fluorophores to get stock solutions. Due to the labeling efficiency, the exact labeling ratio is much lower than 20%. The labeled ELP stock solution will be further mixed with unlabeled ELP to get ELP solutions with low labeling fractions. We measured the ELPs labeled with a different fraction of dyes. The result shows that only BODIPY performs slight ACQ phenomena at a high

      Author response image 4.

      FLIM images of ELP condensates labeled with different fractions of dyes. A) FLIM images of V30A30 condensates with 5%, 2.5%, and 1% BODIPY labels. B) FLIM images of V30A30 condensates with 5%, 2.5%, and 1% fraction of SBD. Droplets were formed with a final concentration of 70 µM ELP labeled with different fractions of BODIPY or SBD in 2 M NaCl solution. Scale bar:5 µm.

      To mostly avoid the potential ACQ effect and achieve enough fluorescence signals, we finally use the ELP labeled with a lower fraction of dyes, 1% of BODIPY and 2.5 % of SBD, to perform the FLIM experiments. The data in Figure 3 will be corrected with the following data.

      Author response image 5.

      Structures of NHS-BODIPY and NHS-SBD, and representative FLIM images of V30A30, A30V30, V30G30 and G30V30 labeled with respective fluorophores. The fluorescence lifetime of each image is the average acquired from three independent experiments. Scale bar: 5 µm.

      We revised the text in the section Microphase separation of ELP condensates as follows “To experimentally test the microphase separation behavior uncovered in simulations, we studied the micro-physicochemical properties of the V-end and X-end of the peptides. We constructed diblock peptides with the combination of 30 pentameric repeats of V block and X (A or G) block, namely V30A30 and V30G30 (Experimental Sequences Section in the Supporting Information). The amino-termini of V30A30 and V30G30 sequences were subsequently labeled with environmentally sensitive BODIPY or SBD fluorophores [cite], whose lifetime could be measured to quantify the viscosity or polarity of the V-end (Fig. 3A, left panel) [cite]. These probes have been reported to be only sensitive to single physicochemical properties.[cite] To avoid artifacts induced by fluorophore labeling, we usually used ELPs labeled with a low fraction of dyes. We also constructed A30V30 and G30V30 diblock peptides, wherein the viscosity or polarity of the A-end or the G-end could be measured by fluorophores that are attached at the amino-terminus (Fig. 3A, right panel). Using FLIM, we found that the lifetime of BODIPY for the V-end (5.43 ns) was longer than that for the A-end (4.35 ns), suggesting that the V-end indeed has a higher microviscosity than the A-end (ηV= 2233.54 cp vs ηA= 969.57 cp). Accordingly, the lifetime of SBD was longer for the V-end (8.75 ns) than the A-end (7.00 ns), indicating that the micropolarity of the V-end was lower than the A-end (ϵV= 13.25 vs ϵA = 18.97). These observations could be largely attributed to the greater extent of dehydration at the V-end due to its higher local peptide density. We further showed that the observed differences are not results of possible artifacts arising from any subtle distinctions between the two sequences V30A30 and A30V30 (Experimental Characterization of ELP Condensates Section in the Supporting Information, Fig. S8-S9 in the Supporting Information). Similar results were observed using the V-G sequences. FLIM experiments revealed that the V-end was more viscous than the G-end (ηV= 2972.72 cp vs ηG= 1958.60 cp) and the V-end was less polar than the G-end (ϵV= 9.14 vs ϵG = 27.50). These experimental observations provided the first line of evidence to support the microphase separation, as suggested by the simulation results.”

      We revised the text in the section Experimental methods as follows

      “The proteins of interest were labeled with NHS ester fluorophore. We used ELPs with 1% BODIPY labels or 2.5% SBD labels to form condensates, which avoid the artifacts induced by fluorophores. Droplets were formed with the final concentration of 70 µM ELP in 2 M NaCl for V-A and 1.5 M NH4SO4 for V-G diblock, respectively. A drop of droplets containing solution was placed on a 0.17 mm coverslip with a 500 µm spacer. Images were acquired by Leica Falcon Fluorescence Microscope equipped with Wil pulse laser and 63X/0.12 oil-immersion objective. The BODIPY was excited at 488 nm and the SBD was excited at 448 nm. The fluorescence lifetime fitting and image analysis were performed in LAS X and Image J.”

      We also used a lower concentration of free dyes to remeasure the properties of the ELP condensates. The Figure S9 data are corrected as follows. The slight differences between the results are caused by experimental errors, which don’t affect the conclusion.

      Author response image 6.

      FLIM image of unlabeled ELP condensates. A) Chemical structure of free fluorophore, which can measure the physicochemical properties of condensates without labeling. B) Representative FLIM images of V30A30 and A30V30. The mix is the mixture of V30A30 (35 µM) and A30V30 (35 µM). Droplets were formed with a final concentration of 70 µM ELP in 2 M NaCl solution with 1 µM fluorophore. C) Representative FLIM images of V30G30 and G30V30. Droplets were formed with a final concentration of 70 µM ELP in 1.5 M (NH4)2SO4 solution with 1 µM fluorophore. The mix is the mixture of V30G30(35 µM) and G30V30 (35 µM). Scale bar, 5 µm. The fluorescence lifetime of each image is the average from three independent measurements.

      We also revised the Sequence dependence of micro-viscosity and polarity section of the Supporting Information as follows

      “Since we used V30X30 and X30V30 to quantify the V- and X-end of the V-X blocks, it is possible that the observed differences arose from the innate property of the V30X30 and X30V30 sequences. To rule out this artifact, we formed the ELP condensates with sequences of V30X30, X30V30, or the V30X30 and X30V30 mixture. The condensates were subsequently treated with the aldehydeBODIPY and methyl-ester SBD fluorophores without the NHS ester reactive warhead (Fig. S9A in the Supporting Information). After brief incubation, aldehyde-BODIPY and methyl-ester SBD fluorophores were recruited into and homogeneously distributed in the ELP condensates. The fluorescence lifetime of aldehyde-BODIPY was the same for V30A30 (4.96 ns), A30V30 (4.99 ns), and their mixture (4.98 ns) (Fig. S9B in the Supporting Information, upper panel). Interestingly, this value is around the average (4.89 ns) of the A-end (4.35 ns) and the V-end (5.43 ns) labeled NHS-BODIPY. For the SBD measurement, methyl-ester SBD resulted in almost identical lifetime values of V30A30 (8.25 ns), A30V30 (8.27 ns), and their mixture (8.28 ns) (Fig. S9B in the Supporting Information, lower panel), again around the average values (7.88 ns) of the A-end (7.00 ns) and the V-end (8.75 ns) labeled NHS-SBD. In addition to the V-A blocks, similar observations were made for the V-G blocks as V30G30 and G30V30 sequences (Fig. S9C in the Supporting Information). The slight difference between the results is attributed to the experiment errors. Because the fluorophores did not covalently label the amino-terminus of the ELP peptides, their lifetime reports closer to the averaged property of the condensates instead of the microscopic property of the V-end or the X-end when the number of molecules is sufficient and the molecular distribution has no preference.

      Our results reveal that the V30X30 and X30V30 condensates exhibited similar macroscopic viscosity or polarity, suggesting that the previously observed different viscosity or polarity of V30X30 and X30V30 could be attributed to the microscopic property of the V-end or X-end.”

      The FLIM technique combined with environment-sensitive fluorophores is a powerful tool for us to investigate the physicochemical properties of the microenvironment within the condensates. However, there are some limitations to this method. As the fluorophore is labeled in the protein, we can only detect the microenvironment surrounding the surface of the probe(the distance may be angstrom level). The fluorescence signal values we got are the statistical average of the fluorescence signals from the complex microenvironments. The signal from the probes is determined by the sampling position, orientation, and number of fluorescent probes. So the quantified values can be compared relatively, but these values can not accurately describe the physical or chemical states in different systems. In addition, the resolution in FLIM experiments is not enough to directly distinguish the microstructure in condensates.

      Comment 2: It is unclear if, after the application of stretching, the micro-structure will eventually return to the original configuration or not. Overall, the point of this experiment remains somewhat unclear.

      We thank the reviewer for this comment. The ELP condensates are actually viscous fluids and they could coalesce into larger droplets within seconds. Due to the high viscosity, ELP condensates show slow fluorescence recovery after photobleaching. As stretching the condensates, the micro-structure of condensates changes to show a response to the outer force. The fluorophores may be pulled out from the microenvironment. For such a dynamic system, we speculate that the microstructure will return to the original after the condensation system equilibrium, which may be a long process. However, it is hard to characterize whether these microstructures have completely returned to their original positions. The purpose of this experiment is to show the microenvironment properties of each terminal in another aspect. The experiment also shows evidence that the microenvironment around the V terminus is more dense than the A terminus.

      Comment 3: The title is too generic and does not reflect the content of the work. There is no analysis of biological condensates. The results are specific to di-block polypetides with specific sequences. This should be clearly specified in text and title.

      We have revised the title to ”Microphase Separation Produces Interfacial Environment within Diblock Biomolecular Condensates”

      Comment 4: MD is out of the expertise of this reviewer. However, when looking at the density profiles (Figure S2), the simulation does not seem to be fully converged. The densities fluctuate inconsistently along the Z direction. The authors should comment on assessing simulation convergence. In many cases, the section used for the density values in the plot (i.e., below 0.06 box lengths away from the condensate center) does not seem representative of the dense phase. It should be justified, why these simulations can still be used for density/hydrogen bonding analysis.

      We thank the reviewer for their comment, and agree that convergence of MD simulations is simultaneously important and difficult to control for. To demonstrate the convergence of our simulations, we have taken an example system (V5F5) and reproduced the density profile in 4 unique time windows of 50 ns each (Author response image 7A-D). We find that all distributions are nearly identical, indicating that further extending these simulations is unlikely to change our findings.

      While we agree that the choice of 0.06 box lengths is arbitrary, it was chosen as an approximation for the interior of the condensate, where the more hydrophobic half of the protein chain tends to be at higher concentration. However, this choice is not important to our overall conclusion. Halving (Author response image 7E) or doubling (Author response image 7F) the cutoff maintains the inverse correlation between the protein density of the X5 half of the condensate and experimental transition temperature.

      Finally, in our multiscale simulation approach, the all-atom portion of the simulation is mostly used to examine water structure and protein solvation. We can see that dividing the simulation into four independent time estimates does not substantially change these properties, resulting in low standard deviations in Figure 5 and Figure 6. Similarly, our previous work on the dielectric of ELP condensates has shown that choosing different starting structures from MARTINI simulations is unlikely to effect the estimate of similar quantities.[11]

      Author response image 7.

      Checking convergence of all-atom simulations of ELP condensates. (A-D) The relative mass density along the Z-distance from the condensate center is shown for the V-substituted and X-substituted halves of V5F5 in four independent time windows of 50 ns each. The Z−axis is defined as the direction perpendicular to the condensate-water interface. The dashed line represents a Z-distance of 0.06 box lengths away from the condensate center, which was the original cutoff for correlation analysis. E-F) Correlation between the mass fraction of the X5 half of the condensate and transition temperature (Tt) from Urry.[12] The condensate is defined as having a Z-distance of 0.03 box lengths (E) or 0.12 box lengths (F) away from the condensate center. ρ is the Pearson correlation coefficient between the two data sets, and the dashed diagonal line is the best fit line. Error bars represent standard deviations of the mean taken over box length intervals of 0.01.

      References

      (1) McDaniel JR, Radford DC, Chilkoti A (2013) A unified model for de novo design of elastin-like polypeptides with tunable inverse transition temperatures. Biomacromolecules 14:2866–2872.

      ](2) Meyer DE, Chilkoti A (2004) Quantification of the effects of chain length and concentration on the thermal behavior of elastin-like polypeptides. Biomacromolecules 5:846–851.

      (3) Helfand E, Tagami Y (1972) Theory of the interface between immiscible polymers. J. Chem. Phys. 56:3592.

      (4) Roe RJ (1975) Theory of the interface between polymers or polymer solutions. I. Two components system. J. Chem. Phys. 62:490–499.

      (5) Shi AC (2021) Frustration in block copolymer assemblies. J. Phys. Condens. Matter 33.

      (6) Flory PJ (1942) Thermodynamics of high polymer solutions. J. Chem. Phys. 10:51.

      (7) Grason GM (2006) The packing of soft materials: Molecular asymmetry, geometric frustration and optimal lattices in block copolymer melts. Phys. Rep. 433:1–64.

      (8) Matsen MW, Bates FS (1996) Unifying weak- and strong-segregation block copolymer theories. Macromolecules 29:1091–1098.

      (9) Matsen MW, Schick M (1994) Stable and unstable phases of a diblock copolymer melt. Phys. Rev. Lett. 72:2660–2663.

      (10) Swann JM, Topham PD (2010) Design and application of nanoscale actuators using block-copolymers. Polymers 2:454–469.

      (11) Ye S et al. (2023) Micropolarity governs the structural organization of biomolecular condensates. Nat. Chem. Biol. pp 1–9.

      (12) Urry DW (1997) Physical chemistry of biological free energy transduction as demonstrated by elastic protein-based polymers. J. Phys. Chem. B 101:11007–11028.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      The manuscript establishes a sophisticated mouse model for acute retinal artery occlusion (RAO) by combining unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) with a silicone wire embolus and carotid artery ligation, generating ischemia-reperfusion injury upon removal of the embolus. This clinically relevant model is useful for studying the cellular and molecular mechanisms of RAO. The data overall are solid, presenting a novel tool for screening pathogenic genes and promoting further therapeutic research in RAO.

      Thank you for recognizing the sophistication and clinical relevance of our mouse model for acute retinal artery occlusion. We are grateful for your supportive feedback.

      Public reviews:

      (1) Response to Reviewer #1: 

      Summary:

      Wang, Y. et al. used a silicone wire embolus to definitively and acutely clot the pterygopalatine ophthalmic artery in addition to carotid artery ligation to completely block the blood supply to the mouse inner retina, which mimics clinical acute retinal artery occlusion. A detailed characterization of this mouse model determined the time course of inner retina degeneration and associated functional deficits, which closely mimic human patients. Whole retina transcriptome profiling and comparison revealed distinct features associated with ischemia, reperfusion, and different model mechanisms. Interestingly and importantly, this team found a sequential event including reperfusion-induced leukocyte infiltration from blood vessels, residual microglial activation, and neuroinflammation that may lead to neuronal cell death.

      Strengths:

      Clear demonstration of the surgery procedure with informative illustrations, images, and superb surgical videos.

      Two-time points of ischemia and reperfusion were studied with convincing histological and in vivo data to demonstrate the time course of various changes in retinal neuronal cell survivals, ERG functions, and inner/outer retina thickness.

      The transcriptome comparison among different retinal artery occlusion models provides informative evidence to differentiate these models.

      The potential applications of the in vivo retinal ischemia-reperfusion model and relevant readouts demonstrated by this study will certainly inspire further investigation of the dynamic morphological and functional changes of retinal neurons and glial cell responses during disease progression and before and after treatments.

      We sincerely appreciate your detailed and positive feedback. These evaluations are invaluable in highlighting the significance and impact of our work. Thank you for your thoughtful and supportive review.

      Weaknesses:

      It would be beneficial to the manuscript and the readers if the authors could improve the English of this manuscript by correcting obvious grammar errors, eliminating many of the acronyms that are not commonly used by the field, and providing a reason why this complicated but clever surgery procedure was designed and a summary table with the time course of all the morphological, functional, cellular, and transcriptome changes associated with this model.

      Thank you for your thorough review of the manuscript. We sincerely apologize for any grammatical errors resulting from our English language proficiency and have taken the necessary steps to polish the article. Additionally, we have heeded your advice and reduced the use of field-specific acronyms to enhance readability for both the manuscript and its readers.

      Regarding the rationale behind the design of the UPOAO model, we have provided a description in Introduction section. Our group focuses on the research of pathogenesis and clinical treatment for RAO. The absence of an accurate mouse model simulating the retinal ischemic process has hampered progress in developing neuroprotective agents for RAO. To better simulate the retinal ischemic process and possible ischemia-reperfusion injury following RAO, we developed a novel vascular-associated mouse model called the unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) model. We drew inspiration from the widely employed middle cerebral artery occlusion (MCAO) model, commonly used in cerebral ischemic injury research, which guided the development of the UPOAO model.

      We appreciate your valuable suggestion regarding the inclusion of a summary table outlining the time course of morphological, functional, cellular, and transcriptome changes associated with this model. To address this, we intend to include a supplementary table at the end of the article (Table. S2 Summary Table), which will offer a comprehensive overview of the experimental results, thereby aiding in clarity and interpretation.

      Once again, we thank you for your insightful comments and suggestions, which have greatly contributed to the improvement of our manuscript.

      (2) Response to Reviewer #2: 

      Summary:

      The authors of this manuscript aim to develop a novel animal model to accurately simulate the retinal ischemic process in retinal artery occlusion (RAO). A unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) mouse model was established using silicone wire embolization combined with carotid artery ligation. This manuscript provided data to show the changes in major classes of retinal neural cells and visual dysfunction following various durations of ischemia (30 minutes and 60 minutes) and reperfusion (3 days and 7 days) after UPOAO. Additionally, transcriptomics was utilized to investigate the transcriptional changes and elucidate changes in the pathophysiological process in the UPOAO model post-ischemia and reperfusion. Furthermore, the authors compared transcriptomic differences between the UPOAO model and other retinal ischemic-reperfusion models, including HIOP and UCCAO, and revealed unique pathological processes.

      Strengths:

      The UPOAO model represents a novel approach to studying retinal artery occlusion. The study is very comprehensive.

      We greatly appreciate your positive assessment of our work and are encouraged by your recognition of its significance.

      Weaknesses:

      Some statements are incorrect and confusing. It would be helpful to review and clarify these to ensure accuracy and improve readability.

      We sincerely appreciate your meticulous review of the manuscript. Taking into account your valuable feedback, we will thoroughly address the inaccuracies identified in the revised version. Additionally, we will commit to polishing the article to ensure improved readability. We apologize for any confusion caused by these inaccuracies and genuinely thank you for bringing them to our attention.

      Recommendations For The Authors:

      Reviewer #1:

      (1) Response to comment:

      The conclusions of this paper are mostly well supported by clear images and convincing data analysis, but some aspects of image presentation and additional data analysis may be needed to strengthen the manuscript.

      We sincerely appreciate your positive assessment of our work and your recognition of the clear images and convincing data analysis supporting our conclusions. Your constructive feedback on enhancing the clarity of our manuscript's image presentation and additional data analysis is highly valued. In response to your suggestions, we have taken steps to improve readability by removing or correcting uncommon acronyms from certain images. We have also conducted further data analysis to provide more comprehensive insights. Thank you for your guidance in improving the quality of our manuscript.

      (2) Response to recommendation (1):

      In Results 3.1 or in Method 2.2: please explain why this combination of silicone wire embolization and carotid artery ligation was chosen to replace previous models such as UCCAO? What are the advantages? And why the silicone wire embolus was inserted through ECA instead of inserting into CCA directly? The cleverly designed surgical procedure is very impressive but the reasoning behind it is not obvious and needs more explanation.

      Thank you for your valuable feedback.

      In the introduction, we briefly describe the rationale for developing the UPOAO model to simulate acute ischemia-reperfusion of retinal artery occlusion (RAO). Previous common retinal ischemia model had certain shortcomings. For example, in the HIOP model, which is often used for simulating glaucoma, the ischemic factor of interrupted retinal blood flow may be amplified due to the dual effects of IOP-induced mechanical stress [1, 2] and vascular ischemia due to normal saline perfusion in the anterior chamber. In the UCCAO model, recanalization is performed after ligation of the carotid blood vessels, and the retina communicates with the blood vessels in the brain, resulting in retinal hypoperfusion. The retina ischemia in UCCAO is a chronical process, for example, the retina became thinner at week 10 and week 15 [3], while RAO is an acute total retinal ischemic disease. Therefore, it is critically important to develop a simple mouse model that can simulate acute retinal ischemia and reperfusion injury in RAO patients.

      Various models have been developed for ischemic stroke research, with the endoluminal suture model being the most employed method for middle cerebral artery occlusion (MCAO). In this model, filaments are introduced through either the external or internal carotid artery and advanced into the middle cerebral artery, causing temporary blood flow blockage for a specific duration. This method has been extensively employed in studies involving transient occlusion [4]. Among the MCAO models, the Koizumi method (occlusion from the common carotid artery (CCA) to the middle cerebral artery (MCA)) and the Longa method (occlusion from the external carotid artery (ECA) to the MCA) are frequently used. Among these two methods, the Longa method is more widely utilized in research studies. The Longa method has a much lower mortality rate post-surgery (26%) than that of the Koizumi (44%) [5]. The MCAO model induces substantial infarct areas and significantly contributes to advancements in stroke research, including investigations into blood-brain barrier disruption and inflammatory responses to ischemia.

      RAO is considered a form of ocular stroke. Inspired by the MCAO model, we have employed a silicone wire embolus to induce acute interruption of blood flow to the retina. This approach enables the investigation of pathophysiological processes associated with RAO, providing valuable insights into the understanding of this condition. We have clarified these points in the revised manuscript (line 129).

      The reasoning behind inserting the silicone wire embolus through the ECA instead of directly into the CCA is twofold:

      (1) Convenience and avoidance of heavy bleeding and mortality. Inserting the silicone wire embolus requires creating an opening in the artery, which then needs to be ligated at both ends after the silicone wire embolus is removed to prevent excessive bleeding. The ECA's ability to form a straight line with the ICA after folding makes it more convenient for the entry and removal of the silicone wire embolus. This procedure is more convenient to perform on the ECA. The blood flow to the CCA can be restored after the plug is removed from ECA, ensuring that the blood supply to the brain through the CCA is not affected.

      (2) Preservation of reperfusion process. If the silicone wire embolus were inserted directly into the CCA, the ends of the CCA opening would need to be ligated after the silicone wire embolus is removed. This would result in a lack of reperfusion process after retinal ischemia. To enable the reperfusion process, the decision was made to open the ECA instead.

      We have clarified these points in the revised manuscript to better explain the rationale behind our methodology (line 139). Thank you for prompting this important clarification, which we believe will enhance the understanding of our readers.

      (3) Response to recommendation (2):

      Did the UPOPA actually block OA, including both the retinal (CRA) and choroidal (SPCA and LPCA) blood supply? If so, why does it seem only the inner retina was affected but not the outer retina?

      Thank you for your question. We agree with you that the UPOAO model blocks OA, which includes retinal and choroidal vessels. Our experimental results primarily indicate damage to the inner retinal layer within 7 days of reperfusion. For example, OCT and HE staining showed significant thinning of the inner retina after 60 minutes of ischemia followed by 7 days of reperfusion (Figure 4). At the same time, the b-wave amplitudes were decreases, usually indicating damage to the inner layer of the retina. However, the outer retina was seemed not affected by 60 minutes of ischemia based on the results of OCT, HE and immunofluorescence.

      Inner layer of the retina was known to show the highest sensitivity to hypoxic challenges [6], whereas the outer retinal layer was more resistant to hypoxic stress [7]. The possible reason for these results was that the outer layer like photoreceptors is more tolerant against ischemia than inner layer of the retina. Previous studies of retinal ischemia-reperfusion models supported this assumption. In the UCCAO model, the b-wave was more affected than the a-wave. Decreases in the amplitudes of OPs, scotopic b-wave, and photopic b-wave were consistently observed on week 4 after UCCAO, while the amplitude of scotopic a-wave did not dramatically change [8]. Prolonged ischemia, such as permanent ischemia, led to photoreceptor cell degradation, as seen in Stevens et al.'s report of photoreceptors loss 3 months after permanent ligation of both common carotid arteries in bilateral common carotid artery occlusion (BCCAO) [9]. In the HIOP model, the GCL and INL reacted sensitively to ischemic processes. A significant thinning of the GCL as early as 6 hours after 60 minutes of ischemia [10]. Horizontal cells and photoreceptors remained mostly unaffected, while most RGCs and several amacrine cell subtypes disappear [11, 12].

      Our study revealed the changes that occurred within 60 minutes of ischemia and the first 7 days of reperfusion in the UPOAO model. One possibility was that the ischemia duration in our model was not long enough to affect the outer retinal cells. Furthermore, the observation time point for reperfusion was not long enough to see the structure damage and visual dysfunctions in the outer retinal layer. As we have explained in the manuscript, further exploration is needed to understand changes induced by longer ischemia duration and reperfusion periods. Revealing the damage to retinal structure and function during longer ischemia time will be an emphasis direction for our further research.

      (4) Response to recommendation (3):

      Better to only use well-accepted acronyms and remove those that are rarely seen in other publications, such as IMRL, MRL, HIOP, TRT, etc.

      Thank you for your valuable feedback. In our manuscript, we utilized the Spectralis HRA+OCT device (Heidelberg) to capture the retinal images. However, the resulting image layering did not adequately distinguish each retinal layer clearly. To address this limitation, we referred to a clinical OCT stratification approach in RVO and divided the retina into the inner, middle, and outer layers [16]. We acknowledge that this hierarchical description is not commonly used and have therefore followed your recommendation to remove these rare acronyms and instead employ the layer structure abbreviation along with the plus sign. The methods and results have been revised accordingly (line 213, line 368, Figure 4 and Figure S2).

      In addition, for the HIOP model, it is also known as the IR or RIRI model [17-19], and the pathophysiological process of retinal ischemia-reperfusion injury (IRI) is usually used to represent this type of anterior chamber perfusion model. To avoid confusion between the pathophysiological process of ischemia-reperfusion studied in this paper and the common model of high intraocular pressure, we have consistently referred to it as the HIOP model, an abbreviation that is cited in many references [20-22].

      Thanks again for the suggestion. We apologize for any confusion caused by the use of abbreviations and have made the necessary corrections in the manuscript. We have also strengthened the details of OCT layering in the images to enhance readability for our audience.

      (5) Response to recommendation (4):

      Figure 3F, G: What do the OP changes mean? What retina cell dysfunction leads to OP changes? Is there RGC-relevant visual function readout to correlate with RGC death?

      Oscillatory potentials (OPs) are important components of the electroretinogram (ERG). While the precise origin of OPs remains unclear, they are generally believed to be generated from the inner retinal layer, specifically involving bipolar cells, amacrine cells and ganglion cells [23]. OPs are sensitive indicators of retinal ischemic effects and can detect dysfunction before alterations in the b-waves occur [24-26] (We have added these statements at line 358). In this research, the reduction of OPs indicated dysfunction in the inner retinal layer and retinal ischemia.

      The function of RGCs can be non-invasively assessed by using various ERG technique that emphasize the activity of inner retina neurons, including OPs of multifocal ERG (mfERG), photopic negative response (PhNR) in mfERG, pattern electroretinogram (PERG), negative Scotopic Threshold Response (nSTR) [27]. Among these indicators, the PERG appears to be more specifically related to the presence of functional RGCs. However, the complexity of electrophysiological sources and species-specific differences in RGCs characteristics should also be considered. In addition, visual evoked potentials (VEP) can assess the function of visual signaling in the whole visual pathway from RGC axons to the visual cortex of the brain [28, 29]. Unfortunately, due to the unavailability of specific equipment required for evaluating RGCs function, we encountered limitations in conducting a comprehensive assessment in this study. This limitation emphasizes the importance of future studies incorporating RGCs evaluation to provide a more comprehensive understanding of visual pathway functionality and its implications, considering indicators such as PERG and PhNR.

      Thank you for your careful review and insightful questions.

      (6) Response to recommendation (5):

      Figure 4B: RNFL/GCL/IPL normally called GCC (ganglion cell complex).

      We appreciate your helpful recommendation regarding the abbreviation GCC (ganglion cell complex) for the combination of RNFL, GCL, and IPL. We have updated this terminology in the revised manuscript (line 213 and Figure 4).

      (7) Response to recommendation (6):

      Figure 4 A-F: Normally a circular OCT image surrounding the optic nerve head is preferred to measure retina thickness. If in these figures, all the OCT images are from the same location, it may be acceptable, but need to provide imaging details on how these OCT planes are selected and what has been done to make sure the same locations were selected for comparison.

      We agree with your comment on OCT imaging that the retina is usually captured OCT images surrounding the optic nerve head. In this study, our goal was to assess both the thickness of the peripheral retina and the retina near the optic nerve head. To achieve this, we considered the optic nerve head as the apex of the selected field of view (left upper region of panel A in Figure 4). For each mouse, we obtained OCT images of the superior nasal (SN), superior temporal (ST), inferior nasal (IN), and inferior temporal (IT) fields of the optic nerve. We then averaged the thicknesses from these four fields. In each field, we measured and statistically evaluated the retinal thickness at distances of 1.5, 3, and 4.5 papillae diameters (PD) from the optic nerve head.

      This approach allowed us to ensure that the same locations were selected for comparison and provided a comprehensive assessment of retinal thickness across different regions. We have detailed this methodology in the revised manuscript to clarify the imaging process and the consistency of the selected locations.

      Thank you for your insightful feedback.

      Reviewer #2:

      Addressing the following concerns is necessary to improve the manuscript.

      (1) Response to recommendation (1):

      The manuscript contains many grammatical errors and should be carefully reviewed for corrections. For example: In the title, "Silicone Wire Embolization-induced Acute Retinal Artery Ischemia and Reperfusion Model in Mouse: Gene Expression Provide Insight into Pathological Processes". It should be "Provides" instead of "Provide". In the Abstract, "The resident microglia within the retina and peripheral leukocytes which access to the retina were pronounced increased on reperfusion periods." It should be "pronouncedly" or "markedly" instead of " pronounced".

      Thank you for your careful reading and pointing out the grammatical errors in the manuscript. We apologize for these mistakes and have since revised and polished the article with the assistance of native English speakers. Ensuring accurate and clear language usage in scientific writing is crucial, and we appreciate your help in improving the quality of our manuscript. Thank you for bringing these errors to our attention.

      (2) Response to recommendation (2):

      Video 2: the video content from "30s-47s" and "50s-67s" is repeatedly shown.

      Thank you for your careful review of the video. In the process of preparing the external carotid artery for silicone wire embolus insertion, we first ligated the distal end with a square knot and then tied a loose knot at the proximal end. In the video content from "30s-47s" and "50s-67s", we are tying a square knot. We apologize for any confusion caused by these repeated video clips.

      (3) Response to recommendation (3):

      Figure 1: The ConA staining (H-I) and FFA (J-K) were performed before the removal of silicone wire embolus. It would be beneficial to clarify this in the figure legend too. Additionally, the label 'Post. Sup. Alveolar art.: Posterior superior alveolar artery' is not present in Figure 1L."

      Thank you for your thorough review of the manuscript and the valuable suggestions regarding Figure 1. We have updated the figure legend of Figure 1 to clarify that ConA staining (H-I) and FFA (J-K) were performed before the removal of the silicone wire embolus (line 868 and line 873). Additionally, we have included the label 'Post. Sup. Alveolar art' in Figure 1L as you pointed out. We appreciate your careful attention to detail, and we have ensured that these omissions have been rectified in the revised version of the manuscript.

      (4) Response to recommendation (4):

      Figure 2: only representative images of RGCs at the peripheral retina were shown. It is not clear if only RGCs in the peripheral retina were quantified. Is there RGC loss in the central and middle retina in the UPOAO model as well? How many fields of RGCs were quantified for each retina?

      Thank you for your meticulous review of the manuscript. The quantification method of RGCs is described in detail as follows:

      Four radial incisions were made in the retina and flattened on a glass slide to create a "four-leaf clover" shape. Retina was photographed using a fluorescence microscope (BX63, Olympus, Japan). We captured images from three different regions of each retinal quadrant: 0.1 mm-0.5 mm (central region, field numbers: 1, 4, 7, 10), 0.9 mm-1.3 mm (middle region, field numbers: 2, 5, 8, 11), and 1.7 mm-2.1 mm (peripheral region, field numbers: 3, 6, 9, 12) from the optic nerve head, respectively, as shown in Author response image 1.

      Of these, the peripheral field changes were the most noticeable, so we used the Leica SP8 confocal microscope (20X) to capture peripheral field RGCs as a demonstration (Figure 2A, C, E, G). RGC counts of twelve fields of each retina were quantified and the average density of RGCs in twelve fields per retina was shown in Figure 2B, D, F, K. RGC counts in the central (field number: 1, 4, 7, 10), middle (field number: 2, 5, 8, 11), and peripheral (field number: 3, 6, 9, 12) visual fields were shown in Author response table 1-4.We have included this detailed methodology in the revised manuscript to clarify the quantification process and to address the presence of RGCs loss in both the central and middle retina in the UPOAO model. Thank you for pointing out the need for this clarification.

      Author response image 1.

      Schematic diagram of field selection. Scale bar=1.4 mm. Each retinal petal has three distinct visual fields (the area circled by the green line) that radiate from the optic nerve head to the periphery, in that order, the central, middle, and peripheral visual fields.

      Author response table 1.

      RGCs counts in each field of each retina (30-minute ischemia and 3-day reperfusion)

      Author response table 2.

      RGCs counts in each field of each retina (30-minute ischemia and 7-day reperfusion)

      Author response table 3.

      RGCs counts in each field of each retina (60-minute ischemia and 3-day reperfusion)

      Author response table 4.

      RGCs counts in each field of each retina (60-minute ischemia and 7-day reperfusion)

      (5) Response to recommendation (5):

      Figure 3: The representative wave lines in panels A (60min_3d, 60min_7d) and F do not reflect the statistical analysis presented in panels D, E, and G, especially for the amplitudes of b waves and OPs.

      Thank you for your careful review of the manuscript. We've added labels for a-waves, b-waves, and improved the presentation of OPs to make the details of the amplitude more visible (Figure 3). In the previous version, due to incorrect settings, we did not adjust the ordinate spacing when fitting curves of representative wave lines in four groups, resulting in the curves being compressed vertically to the same height. We have now adjusted the curves to be fitted under the same scale bar (shown in the bottom right corner of Figure. 3A). What’s else, we removed the baseline wave of the OPs wave and adjusted the abscissa scale to highlight the N waves and P waves for easy reading (Figure 3F).

      (6) Response to recommendation (6):

      There are two different Supplementary Figure 1 and no Supplementary Figure 3, resulting in misaligned references to Supplementary Figures 1, 2, and 3 in the text.

      Thank you for your careful review of the manuscript. We have reviewed the manuscript again and identified errors in uploading the supplementary figures, which resulted in duplicate Supplementary Figure 1 and the absence of Supplementary Figure 3. We have corrected these issues and realigned the references to Supplementary Figures 1, 2, and 3 in the text to ensure consistency. We appreciate your attention to detail and your reminder to address this issue.

      (7) Response to recommendation (7):

      There is confusion about the definition of ORL (outer retina layer). In Lines 208-209, ORL was defined as the combined thickness of the rest to the retinal pigment epithelium (RPE). It seems the ONL is included in ORL. But in lines 358-359, 907-908, "the ORL encompassed the region from the inner segment/outer segment (IS/OS) to the RPE". Please make the definition consistent. In addition, it is hard to distinguish the regions marked by the green lines in Fig. 4A (sham image) after Line 902.

      Thank you for your careful review of the manuscript. We have addressed the confusion regarding the definition of the outer retinal layer (ORL). The Heidelberg OCT device does not distinguish the layers of the mouse retina well, so we divided it into three broader layers:

      (1) Ganglion Cell Complex (GCC) layer, which encompasses RNFL+GCL+IPL.

      (2) Middle Retinal Layer, which includes INL+OPL.

      (3) Outer Retinal Layer (ORL), which includes ONL+IS/OS+RPE.

      We apologize for the inconsistency and have revised the errors in the manuscript and figure legends accordingly. Additionally, we have removed rare domain-specific acronyms and replaced them with more commonly understood abbreviations, as suggested, to avoid confusion.

      Furthermore, we have enlarged parts of the OCT images to better display the layers, hoping to meet the readers' requirements and improve clarity. Thank you for your valuable feedback.

      (8) Response to recommendation (8):

      Figure 4 (Panels H-J, L-M) incorporated with the text (Line 902) differs from the high-resolution version of Figure 4 included later in the manuscript. In Figure 4 (Panels H-J, L-M) merged with the text (Line 902), the quantification of the IPL and INL thickness is incorrect, and the scale bar is inaccurate. However in the high-resolution version of Figure 4 provided later, the thickness of the RNFL+GCL is incorrect.

      Thank you for your careful review of the manuscript. The quantification of the IPL and INL thickness in Figure 4 (Panels H-J, L-M) incorporated with the text has been revised to ensure accurate measurements and scale bars (Figure 4 and line 924). The high-resolution version of Figure 4 provided later has been updated to correct the thickness measurements of the RNFL+GCL. We have ensured that the ordinate in the high-resolution version of Figure 4 now correctly represents length units, consistent with the equal proportional conversion used in the integrated text figures.

      Thank you for your valuable feedback and for pointing out these errors. We have made the necessary corrections to align the figures accurately with the manuscript.

      (9) Response to recommendation (9):

      Line 384-386: the statement "Notably, a-waves in ERG and the thickness of the outer retinal layers in both OCT and HE remained unchanged." is not accurate, since a-waves in ERG is not changed in 3 days but changed in 7 days, and the thickness of the outer retinal layers in HE is either not measured or not shown in Figure 4.

      Thank you for your careful review of the manuscript. We apologize for this error and have revised it.

      We aimed to convey that the amplitude of the a-waves, which represent the function of the photoreceptors, does not show significant variation, which is consistent with the thickness of the outer retinal layer observed in OCT and HE images. Our results indicated that at 7 days post-injury, the amplitude of the a-waves in ERG was statistically different only at stimulus light intensity of 0.3, 3.0 and 10.0 cd.s/m2. In contrast, the b-wave amplitude was reduced by half compared to sham eyes at almost all stimulus light intensities. At the same time, the immunofluorescence staining results of photoreceptor cells showed no significant change at 7-days. Therefore, we consider the change in a-wave amplitudes were not significant compared to the significant decrease in b-wave amplitude. We have clarified this in the revised manuscript.

      We also analyzed the thickness of the outer retinal layers in HE and found it to be consistent with OCT results, showing no significant changes (shown in below Author response image 2).

      Thank you for your valuable feedback, which has helped improve the accuracy and clarity of our manuscript.

      Author response image 2.

      Thickness of OPL, ONL, IS/OS+RPE in HE staining. n=3; ns: no significance (p>0.05).

      (10) Response to recommendation (10):

      Figure 5 and Figure S3: Quantification data from different sections of the same retina should be averaged to represent one single sample (one data point) for statistical analysis. * in images of Fig. 5E, F, I, J is not defined in the figure legend. It would be easier for readers to follow if the GCL, IPL, INL, and OPL were labeled in retinal sections.

      Thank you for your careful review of the manuscript and recommendation. We have reperformed the statistical analysis and updated the results in Figure 5 and Figure S3. In the UPOAO experimental eyes, no no significant change in the number of HCs (Calbindin) was observed during the 3-days reperfusion period, while a notable reduction was observed after 7 days (Figure 5). Additionally, we have added the definition of the asterisks (*) in the figure legend to clarify their significance. We have also labeled the retinal layers, including the GCL, IPL, INL, OPL, and ONL, in the images to make it easier for readers to follow and understand the data.

      Thank you for helping us improve the clarity and accuracy of our manuscript.

      (11) Response to recommendation (11):

      Lines 407-409, the statement "which aligns with the a-waves observed in ERG (Figure 3D, E) and the changes seen in the outer retinal layers in OCT (Fig S2C, D)" is confusing. No changes were observed by OCT in Fig S2D.

      Thank you for your review and we are sorry about the confusion. The overall trend of the amplitude of the a-wave in ERG at 7-days did not change significantly, which is consistent with the immunofluorescence staining results of the photoreceptor cells. Based on these observations, we consider that the change in the amplitude of the a-wave was not significant. As you pointed out in recommendation 9,since a-waves in ERG were changed in 7-days at the stimulus light intensity of 0.3, 3.0 and 10.0 cd.s/m2, our description on the a-waves in 7-days was not accurate. We have clarified this point in the revised manuscript to ensure it accurately reflects the data presented.

      (12) Response to recommendation (12):

      In Figure S4, panel C shows lymphocyte-mediated immunity, and panel D shows leukocyte-mediated immunity. Please adjust the figure legend accordingly to reflect the figures.

      Thank you for your careful review of the manuscript. We have modified the figure legend of Figure S4.

      (13) Response to recommendation (13):

      Lines 440-442 state "These results suggested early ischemic processions such as cell migration and potential collateral vessel formation." It is not clear why and how "potential collateral vessel formation" is suggested by Figure 6 and Figure S4. Please clarify this in the text.

      Thank you for your careful review of the manuscript and we have deleted this sentence due to insufficient evidence. We have corrected this sentence: "These results suggested that in the early stage of retinal ischemic injury, leukocytes from the microvasculature may infiltrate retinal tissue. More experimental validation will be performed to confirm this hypothesis."(line 448). We will be more cautious in drawing conclusions in the future. Thank you for your reminder.

      (14) Response to recommendation (14):

      For the figure legend of Figure 6 "In each heatmap, upper box showed the top 10 up-regulated genes, and the below one showed the top 10 down-regulated genes." Is this correct? It appears that the upper box shows the top 10 down-regulated genes, and the lower box shows the top 10 up-regulated genes.

      Thank you for your careful review of the manuscript and we have modified the figure legend of Figure 6. In the heatmaps, the upper box showed the top 10 down-regulated genes, and the below one showed the top 10 up-regulated genes (line 977).

      (15) Response to recommendation (15):

      For the figure legend of Figure 7, the statement 'Data points are from retinal sections of four animals' is incorrect, as these data were obtained from whole retinas instead of retinal sections. Please revise the legend to reflect this accurately. The scale bar was absent in the images of Figure 7. Asterisk in Figure 7H and 7I was not defined.

      Thank you for your careful review of the manuscript and we have revised the errors. We have added the scale bar (Figure 7D). The white asterisks in Figure 7H and 7I indicate the activated microglial cells and we have added this definition in the legend of Figure7 (line 981).

      (16) Response to recommendation (16):

      It would be better to switch the order of Figure S7 and Figure S8 to align with their descriptions in the text.

      Thank you for your recommendation and we have switched the order of Figure S7 and Figure S8.

      (17) Response to recommendation (17):

      The gene names in Figure S8 should be written consistently with those listed in Table S1.

      Thank you for your recommendation and we have corrected the gene names.

      (18) Response to recommendation (18):

      In Figure 9, it is not clear why amacrine cells were not included in the UPOAO model, as amacrine cells were also injured as shown in Figure 5I-L.

      Thank you for your careful review of the manuscript and we have added amacrine cells in Figure 9.

      References

      (1) Yang, H., et al., The connective tissue phenotype of glaucomatous cupping in the monkey eye - Clinical and research implications. Prog Retin Eye Res, 2017. 59: p. 1-52.

      (2) Pavlatos, E., et al., Regional Deformation of the Optic Nerve Head and Peripapillary Sclera During IOP Elevation. Invest Ophthalmol Vis Sci, 2018. 59(8): p. 3779-3788.

      (3) Lee, D., et al., A mouse model of retinal hypoperfusion injury induced by unilateral common carotid artery occlusion. Experimental Eye Research, 2020. 201: p. 108275.

      (4) Barthels, D. and H. Das, Current advances in ischemic stroke research and therapies. Biochim Biophys Acta Mol Basis Dis, 2020. 1866(4): p. 165260.

      (5) Smith, H.K., et al., Critical differences between two classical surgical approaches for middle cerebral artery occlusion-induced stroke in mice. J Neurosci Methods, 2015. 249: p. 99-105.

      (6) Janáky, M., et al., Hypobaric hypoxia reduces the amplitude of oscillatory potentials in the human ERG. Doc Ophthalmol, 2007. 114(1): p. 45-51.

      (7) Tinjust, D., H. Kergoat, and J.V. Lovasik, Neuroretinal function during mild systemic hypoxia. Aviat Space Environ Med, 2002. 73(12): p. 1189-94.

      (8) Lee, D., et al., Retinal Degeneration in a Murine Model of Retinal Ischemia by Unilateral Common Carotid Artery Occlusion. Biomed Res Int, 2021. 2021: p. 7727648.

      (9) Yamamoto, H., et al., Complex neurodegeneration in retina following moderate ischemia induced by bilateral common carotid artery occlusion in Wistar rats. Exp Eye Res, 2006. 82(5): p. 767-79.

      (10) Palmhof, M., et al., From Ganglion Cell to Photoreceptor Layer: Timeline of Deterioration in a Rat Ischemia/Reperfusion Model. Front Cell Neurosci, 2019. 13: p. 174.

      (11) Adachi, M., et al., High intraocular pressure-induced ischemia and reperfusion injury in the optic nerve and retina in rats. Graefes Arch Clin Exp Ophthalmol, 1996. 234(7): p. 445-51.

      (12) Jehle, T., et al., Quantification of ischemic damage in the rat retina: a comparative study using evoked potentials, electroretinography, and histology. Invest Ophthalmol Vis Sci, 2008. 49(3): p. 1056-64.

      (13) Hayreh, S.S., H.E. Kolder, and T.A. Weingeist, Central retinal artery occlusion and retinal tolerance time. Ophthalmology, 1980. 87(1): p. 75-8.

      (14) Luo, X., et al., Hypoglycemia induces general neuronal death, whereas hypoxia and glutamate transport blockade lead to selective retinal ganglion cell death in vitro. Invest Ophthalmol Vis Sci, 2001. 42(11): p. 2695-705.

      (15) Schmid, H., et al., Loss of inner retinal neurons after retinal ischemia in rats. Invest Ophthalmol Vis Sci, 2014. 55(4): p. 2777-87.

      (16) Furashova, O. and E. Matthè, Hyperreflectivity of Inner Retinal Layers as a Quantitative Parameter of Ischemic Damage in Acute Retinal Vein Occlusion (RVO): An Optical Coherence Tomography Study. Clin Ophthalmol, 2020. 14: p. 2453-2462.

      (17) Pang, Y., et al., CD38 Deficiency Protects Mouse Retinal Ganglion Cells Through Activating the NAD+/Sirt1 Pathway in Ischemia-Reperfusion and Optic Nerve Crush Models. Invest Ophthalmol Vis Sci, 2024. 65(5): p. 36.

      (18) Feng, Y., et al., GSK840 Alleviates Retinal Neuronal Injury by Inhibiting RIPK3/MLKL-Mediated RGC Necroptosis After Ischemia/Reperfusion. Invest Ophthalmol Vis Sci, 2023. 64(14): p. 42.

      (19) Zeng, S., et al., CREG Protects Retinal Ganglion Cells loss and Retinal Function Impairment Against ischemia-reperfusion Injury in mice via Akt Signaling Pathway. Mol Neurobiol, 2023. 60(10): p. 6018-6028.

      (20) Rosenbaum, D.M., et al., The role of the p53 protein in the selective vulnerability of the inner retina to transient ischemia. Invest Ophthalmol Vis Sci, 1998. 39(11): p. 2132-9.

      (21) Zhang, Y., et al., Melatonin Alleviates Pyroptosis of Retinal Neurons Following Acute Intraocular Hypertension. CNS Neurol Disord Drug Targets, 2021. 20(3): p. 285-297.

      (22) Zhu, J., et al., Protective effects of Erigeron breviscapus Hand.- Mazz. (EBHM) extract in retinal neurodegeneration models. Mol Vis, 2018. 24: p. 315-325.

      (23) Wachtmeister, L., Oscillatory potentials in the retina: what do they reveal. Prog Retin Eye Res, 1998. 17(4): p. 485-521.

      (24) Cao, W., et al., Dextromethorphan attenuates the effects of ischemia on rabbit electroretinographic oscillatory potentials. Documenta Ophthalmologica, 1993. 84(3): p. 247-256.

      (25) Xu, J., et al., Pregabalin Mediates Retinal Ganglion Cell Survival From Retinal Ischemia/Reperfusion Injury Via the Akt/GSK3β/β-Catenin Signaling Pathway. Invest Ophthalmol Vis Sci, 2022. 63(12): p. 7.

      (26)Takács, B., et al., Electroretinographical Analysis of the Effect of BGP-15 in Eyedrops for Compensating Global Ischemia-Reperfusion in the Eyes of Sprague Dawley Rats. Biomedicines, 2024. 12(3).

      (27) Porciatti, V., Electrophysiological assessment of retinal ganglion cell function. Exp Eye Res, 2015. 141: p. 164-70.

      (28) Ridder, W.H. and S. Nusinowitz, The visual evoked potential in the mouse—Origins and response characteristics. Vision Research, 2006. 46(6): p. 902-913.

      (29) Liu, S., et al., An optimized procedure to record visual evoked potential in mice. Exp Eye Res, 2022. 218: p. 109011.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Song, Shi, and Lin use an existing deep learning-based sequence model to derive a score for each haplotype within a genomic region, and then perform association tests between these scores and phenotypes of interest. The authors then perform some downstream analyses (fine-mapping, various enrichment analyses, and building polygenic scores) to ensure that these associations are meaningful. The authors find that their approach allows them to find additional associations, the associations have biologically interpretable enrichments in terms of tissues and pathways, and can slightly improve polygenic scores when combined with standard SNP-based PRS.

      Strengths:

      • I found the central idea of the paper to be conceptually straightforward and an appealing way to use the power of sequence models in an association testing framework.

      • The findings are largely biologically interpretable, and it seems like this could be a promising approach to boost power for some downstream applications.

      Weaknesses:

      • The methods used to generate polygenic scores were difficult to follow. In particular, a fully connected neural network with linear activations predicting a single output should be equivalent to linear regression (all intermediate layers of the network can be collapsed using matrix-multiplication, so the output is just the inner product of the input with some vector). Using the last hidden layer of such a network for downstream tasks should also be equivalent to projecting the input down to a lower dimensional space with some essentially randomly chosen projection. As such, I am surprised that the neural network approach performs so well, and it would be nice if the authors could compare it to other linear approaches (e.g., LASSO or ridge regression for prediction; PCA or an auto-encoder for converting the input to a lower dimensional representation).

      Response: We thank the reviewer for the recognition and valuable suggestion on our work. Just as the reviewer suggested, our polygenic prediction procedure is equivalent to linear transformation and in this revision, we indeed found that it was unnecessary to use neural network framework to replace linear model. Indeed, both our result and previous work indicated that linear model fitted polygenic traits better than non-linear one, which was also the reason we chose linear activation for neural network in the original manuscript.

      In this revision, we followed the reviewer’s suggestion to apply a more straightforward linear framework for polygenic prediction. We first calculated weighted sum of HFS for each block (1,361 independent blocks in total), then, in each target ancestry, we used LASSO regression to integrate them with SNP PRS into one final score. We also conducted comparative analysis in British European test set and found that LASSO, ridge and elastic net gave similar result, and LASSO performed slightly better. By applying this straightforward framework and sliding window strategy, we moderately improved the prediction performance.

      Line 349: “Using height as a representative trait, we first estimated the proportion of variance captured by top loci, and found that HFS of loci with PIP>0.4 (n=5,101) captured roughly 80% of variance explained by all genome-wide loci (n=1,200,024 corresponded to sling-window strategy; Figure 5A). We then calculated HFS+LDAK in non-British European (NBE), South Asian (SAS), East Asian (EAS) and African (AFR) population in UK Biobank, and observed 17.5%, 16.1%, 17.2% and 39.8% improvement over LDAK alone (p=3.21×10-16, 0.0001, 0.002 and 0.001, respectively. Figure 5C).”

      Author response image 1.

      • A very interesting point of the paper was the low R^2 between the HFS scores in adjacent windows, but the explanation of this was unclear to me. Since the HFS scores are just deterministic functions of the SNPs, it feels like if the SNPs are in LD then the HFS scores should be and vice versa. It would be nice to compare the LD between adjacent windows to the average LD of pairs of SNPs from the two windows to see if this is driven by the fact that SNPs are being separated into windows, or if sei is somehow upweighting the importance of SNPs that are less linked to other SNPs (e.g., rare variants).

      Response: We thank the reviewer for the suggestion on understanding LD mechanism. In this revision, we used chromosome 1 as an example and calculate the pairwise LD among all SNPs within two adjacent loci. As shown in Figure S1 (below), although HFS-based LD is still significantly lower than median SNP-based LD (paired Wilcoxon test p=1.76e-5), we found that median SNP LD between loci was still lower than what typically observed between adjacent SNPs in GWAS (histogram of x axis; median =0.06). We reasoned that dividing SNPs into block is one of the reasons that HFS suffer less LD than standard GWAS, but not the whole story.

      Author response image 2.

      We agree with the reviewer that the effect of rare variants could also play an important role. In fact, sei author has also found that rare variants tended to have larger sei-predicted effects. We conducted an approximate analysis that remove all rare variants and repeated HFS calculation. Indeed, here HFS LD has profoundly raised to median=0.14, indicating that involving rare variants was vital for low LD.

      Author response image 3.

      Line 123: “Further evaluation indicated that this low LD was led by two factors: integration of rare variant impacts and segmentation. Firstly, excluding rare variants from HFS caused the LD raised to median=0.14 (Method; Figure S2C). Secondly, median LD of SNPs from adjacent loci was 0.06, which was significantly higher than HFS LD (paired Wilcoxon p=1.76×10-5) but significantly lower than HFS LD without rare variants (paired Wilcoxon p<2.2×10-16).”

      • There were also a number of robustness checks that would have been good to include in the paper. For instance, do the findings change if the windows are shifted? Do the findings change if the sequence is reverse-complemented?

      Response: Following the reviewer’s suggestion, we conducted a sliding window analysis where all loci were shifted 2048 bp, thereby doubling the total number of loci. In fine-mapping analysis, more than 90% of the causal loci were reproduced in sliding window analysis, either by themselves or by a overlapping locus:

      Line 207: “29.4% of causal loci (PIP>0.95) in the original analysis were still causal in sliding window analysis. 31.1% and 29.3% of causal loci whose 5’ and 3’ overlapping locus had PIP>0.95 in sliding window analysis, respectively, while themselves were no longer causal.”

      In polygenic prediction analysis, sliding window strategy significantly improved prediction accuracy, as we discussed in question 1.

      As for the issue of reverse complement, the nature of sei input layer is to encode both strand in a symmetric manner, such that the output for both strands would be the same. We have also run sei on the reverse complement (generated by seqkit seq -r -p) to verify that original sequence and reverse complement give the same output.

      Response: Following the reviewer’s suggestion, we added a new discussion paragraph on the issue of sequence model performance on interindividual variations. In brief, we suggest that although the drawback of lack of cross-individual training sets exists and future improvement is necessary, chromatin changes could be better predicted than gene expression. This is because the latter task requires information on long range interaction, which varies among genes and are difficult to be captured by using reference genome as training set. We made a schematic to clarify this:

      Author response image 4.

      We also noticed a few recent studies that directly validated sei predictions by experiments and showed significant accuracy, such as https://doi.org/10.1016/j.neuron.2022.12.026. Taken together, while we agreed that it is necessary to improve sequence model by adding more cross-individual training samples, the current SOTA model sei could still provide unique value to our study.

      Line 423: “The challenge of using sequence-based deep learning (DL) models in HFS applications is further compounded by their difficulty in predicting variations between individuals. Recent studies(Huang et al., 2023; Sasse et al., 2023) indicate that DL models, trained on the reference human genome, demonstrate limited accuracy in predicting gene expression levels across different individuals. This limitation is likely due to the models' inability to account for long-range regulatory patterns, which are crucial for understanding the impact of variants on gene expression and vary across genes. In contrast, our study leveraged sequence-determined functional genomic profiles in association studies, which mitigates this issue to an extent. For instance, although sei cannot identify the specific gene regulated by a given input sequence, it can predict changes in the sequence's functional activity. Future improvements in DL models' ability to predict interindividual differences could be achieved by incorporating cross-individual data in the training process. An example of such data is the EN-TEX(Rozowsky et al., 2023) dataset, which aligns functional genomic peaks with the specific individuals and haplotypes they correspond to.”

      Reviewer #2 (Public Review):

      Summary:

      In this work, Song et al. propose a locus-based framework for performing GWAS and related downstream analyses including finemapping and polygenic risk score (PRS) estimation. GWAS are not sufficiently powered to detect phenotype associations with low-frequency variants. To overcome this limitation, the manuscript proposes a method to aggregate variant impacts on chromatin and transcription across a 4096 base pair (bp) loci in the form of a haplotype function score (HFS). At each locus, an association is computed between the HFS and trait. Computing associations at the level of imputed functional genomic scores should enable the integration of information across variants spanning the allele frequency spectrum and bolster the power of GWAS.

      The HFS for each locus is derived from a sequence-based predictive model. Sei. Sei predicts 21,907 chromatin and TF binding tracks, which can be projected onto 40 pre-defined sequence classes ( representing promoters, enhancers, etc.). For each 4096 bp haplotype in their UKB cohort, the proposed method uses the Sei sequence class scores to derive the haplotype function score (HFS). The authors apply their method to 14 polygenic traits, identifying ~16,500 HFS-trait associations. They finemap these trait-associated loci with SuSie, as well as perform target gene/pathway discovery and PRS estimation.

      Strengths:

      Sequence-based deep learning predictors of chromatin status and TF binding have become increasingly accurate over the past few years. Imputing aggregated variant impact using Sei, and then performing an HFS-trait association is, therefore, an interesting approach to bolster power in GWAS discovery. The manuscript demonstrates that associations can be identified at the level of an aggregated functional score. The finemapping and pathway identification analyses suggest that HFS-based associations identify relevant causal pathways and genes from an association study. Identifying associations at the level of functional genomics increases the portability of PRSs across populations. Imputing functional genomic predictions using a sequence-based deep learning model does not suffer from the limitation of TWAS where gene expression is imputed from a limited-size reference panel such as GTEx.

      However, there are several major limitations that need to be addressed.

      Major concerns/weaknesses:

      (1) There is limited characterization of the locus-level associations to SNP-level associations. How does the set of HFS-based associations differ from SNP-level associations?

      Response: We thank the reviewer for the recognition and the valuable suggestion on our manuscript. Following the reviewer’s suggestion, in this revision we added a paragraph to compare the basic characteristics between HFS-based and SNP-based association study. These comparisons suggested that HFS had no advantage in testing marginal association, but performed better in detecting causal associations.

      Line 144: “When comparing HFS association with the standard SNP-based GWAS on the same data, we found that 98% of significant HFS loci also harbored a significant SNP. There were a few cases (n=0~5) where significant HFS loci did not harbored even marginal SNP association (GWAS p>0.01), which were due to the lack of common SNP in these loci. HFS association p value was higher than GWAS p value in 95 % of significant loci, suggested that HFS did not improve power to detect marginal effect. The genomic control inflation factor (λGC) for the HFS association test varied between 0.99 for asthma and 1.50 for height, closely resembling the SNP GWAS (Pearson Correlation Coefficient [PCC]=0.91, paired t-test p=0.16; Method and Figure S3). We concluded that HFS-based association tests had adequate power and do not introduce additional p-value inflation.”

      (2) A clear advantage of performing HFS-trait associations is that the HFS score is imputed by considering variants across the allele frequency spectrum. However, no evidence is provided demonstrating that rare variants contribute to associations derived by the model. Similarly, do the authors find evidence that allelic heterogeneity is leveraged by the HFS-based association model? It would be useful to do simulations here to characterize the model behavior in the presence of trait-associated rare variants.

      Response: Following the reviewer’s suggestion, we conducted a sensitivity analysis that removed all rare (MAF<0.01) variants and repeated the HFS analysis (HFScommon) on chromosome 1. In linear association analysis, we found that 10.6% of HFS signals (p<5×10-8) were missed by HFScommon. In fine-mapping, 55.3% of HFS causal signals (PIP>0.95) were missed by HFScommon. We concluded that rare variants played an important role in the performance of HFS, especially its advantages in fine-mapping.

      Line 175: “We also found that rare variants played an important role in the good find-mapping performance of HFS: when variants with MAF<0.01 were removed, 55.3% of the causal signals would be missed in HFS+SUSIE analysis.”

      We then attempted to conduct a simulation analysis where rare variants were causal to the phenotype, and the association statistics were the same as real GWAS of height. However, such simulation seemed not to properly reflect real scenario: no matter how we changed the association between rare variants and the phenotype, HFS association p-value could hardly reached the significance level of SNP association. We proposed that this is because simulation could not properly reflect how variants impact functional genomics: in fact, when randomly selected a rare variant as causal variant, there is high possibility that it had no impact on functional genomics, therefore its HFS would be close to zero. When such a variant was set as causal (which is unlikely in real scenario), HFS would not properly capture the association. We reasoned that it might be difficult to evaluate HFS by simulation, since the nonlinear relation between SNP and HFS as well as among SNPs were difficult to be properly simulated.

      Author response image 5.

      (3) Sei predicts chromatin status / ChIP-seq peaks in the center of a 4kb region. It would therefore be more relevant to predict HFS using overlapping sequence windows that tile the genome as opposed to using non-overlapping windows for computing HFS scores. Specifically, in line 482, the authors state that "the HFS score represents overall activity of the entire sequence, not only the few bp at the center", but this would not hold given that Sei is predicting activity at the center for any sequence.

      Response: We thank the reviewer for the suggestion on sliding window design. In this revision, we shifted all loci 2,048 bp to double the number of loci and repeated the fine-mapping and polygenic prediction analysis. For fine-mapping, we found that the result was generally robust with regard to sliding window procedure, and the majority of the causal associations were retained:

      Line 207: “29.4% of causal loci (PIP>0.95) in the original analysis were still causal in sliding window analysis. 31.1% and 29.3% of causal loci whose 5’ and 3’ overlapping locus had PIP>0.95 in sliding window analysis, respectively, while themselves were no longer causal.”

      In polygenic prediction, sliding window analysis provided a significantly improved performance compared with previous analysis on non-overlapping loci:

      However, since in this revision we have several updates on the polygenic prediction procedure, it was difficult to quantify how much improvement was led by sliding window design. Thus, we directly showed the new result in figure 5 but did not compare it with the original result.

      We also modified the previously imprecise statement to:

      Line 490: “…it integrated information of the entire sequence, not only the few bp at the center.”

      (4) Is the HFS-based association going to miss coding variation and several regulatory variants such as splicing variants? There are also going to be cases where there's an association driven by a variant that is correlated with a Sei prediction in a neighboring window. These would represent false positives for the method, it would be useful to identify or characterize these cases.

      Response: As the reviewer suggested, sei captured only functional genomic features and is by nature prone not to perform well when the causal variants impact protein sequences. In this revision, we characterized this by focusing on causal exonic variants (SNP PIP>0.95):

      Line 322: “On the other hand, HFS perform worse than SNP-based fine-mapping on exonic regions. Taking height as an example, PolyFun detected 125 causal SNPs (PIP>0.95) in the exonic regions, but only 16% (20) of loci that harbored them also reached PIP>0. 5 (11 reached PIP>0.95) in HFS+SUSIE analysis. Among the 105 loci that missed such signals (HFS PIP<0.5), 12 had a nearby locus (within 10kb) showing HFS PIP>0.95, which likely reflected false positive led by LD. Thus, SNP-based analysis should be prioritized over HFS in coding regions.”

      Additional minor concerns:

      (1) It's not clear whether SuSie-based finemapping is appropriate at the locus level, when there is limited LD between neighboring HFS bins. How does the choice of the number of causal loci and the size of the segment being finemapped affect the results and is SuSie a good fit in this scenario?

      Response: Following the reviewer’s suggestion, we reran SUSIE under different predefined causal loci number (from 2 to 10), and found that the identified causal loci were consistent.

      Author response image 6.

      Line 211: “Besides, HFS+SUSIE was also robust when the predefined number of causal loci (L=2 to 10) was changed, and the number of detected loci were not changed.”

      As for the size of segmentation, we divided the predefined segmentations (independent blocks detected by LDetect) into two half and reran SUSIE, and found that three additional causal loci emerged in one half. This suggested that using too small segmentation might increase the false positive rate. However, since there is no LD between independent blocks (which was guaranteed by LDetect), it is not necessary to use even longer blocks.

      Author response image 7.

      Line 133: “Simulation analysis revealed that when a non-reference sequence class score was associated the trait, reference class score could still capture median 70% of HFS-trait association R2.”

      (2) It is not clear how a single score is chosen from the 117 values predicted by Sei for each locus. SuSie is run assuming a single causal signal per locus, an assumption which may not hold at ~4kb resolution (several classes could be associated with the trait of interest). It's not clear whether SuSie, run in this parameter setting, is a good choice for variable selection here.

      Response: As we discussed below (question 3), in this revision we no longer applied SUSIE to find one sequence class score for each locus due to the impact of overfitting, and use the reference sequence class uniformly for all loci. As reviewer suggested, we applied simulation to evaluate how this procedure influence HFS performance, especially when multiple sequence class of the same locus is causal to the phenotype. We found that reference sequence class score could capture median 69.1% of phenotypic R2 when the causal sequence class is not the reference, and captured median 59.2% of R2 when there was 2~5 non-reference causal class. We concluded that the loss led by skipping sequence class selection is mild, and it is necessary to do so in consideration of the risk of overfitting.

      Author response image 8.

      (3) A single HFS score is being chosen from amongst multiple tracks at each locus independently. Does this require additional multiple-hypothesis correction?

      Response: We agree with the reviewer that choosing the sequence class for each locus represented multiple testing, and with additional experiments we indeed observed some evidences of overfitting of this procedure. Thus, in this revision, we no longer applied the per-locus feature selection procedure, but instead used the sequence class corresponded to the reference (hg38) sequence. Consequently, additional multiple-testing correction is avoided with this procedure. We admitted that such simplification missed certain information, but as mentioned above, such lost is moderate, and is necessary to ensure statistical robustness and reduce false positive. In fact, with such simplification we better controlled the inflation factor of HFS GWAS and got better portability in polygenic prediction.

      (4) The results show that a larger number of loci are identified with HFS-based finemapping & that causal loci are enriched for causal SNPs. However, it is not clear how the number of causal loci should relate to the number of SNPs. It would be really nice to see examples of cases where a previously unresolved association is resolved when using HFS-based GWAS + finemapping.

      Response: In this revision, we did not observe a clear relation between causal loci number and causal gene number. The only trend is that SNP-based fine-mapping seemed to perform better at coding regions, in accordance with the fact that HFS capture functional genomic signals. We also added new interpretations to highlight some examples where HFS resolve previously unresolved association signals. For example,

      Line 287: “Specifically, in 1q32.1 region, HFS+SUSIE identified two loci with PIP>0.9 (Figure 4B). SNP-based association also found significant association in this region, but SNP fine-mapping(Weissbrod et al., 2020) could not resolve this signal and only found seven signals between PIP=0.1 to 0.5.”

      (5) Sequence-based deep learning model predictions can be miscalibrated for insertions and deletions (INDELs) as compared to SNPs. Scaling INDEL predictions would likely improve the downstream modeling.

      Response: Following the reviewer’s suggestion, we conducted a sensitivity analysis that removed all indel on chromosome 1 and repeated HFS analysis. Removing indel has indeed increased the number of significant (p<5e-8) association by 9%, but also slightly increased inflation factor (paired wilcox test p=0.0001). In fine mapping analysis, removing indel caused a 4.7% decrement in the number of detected causal association (PIP>0.95). We reasoned that the potential miscalibration on indel has indeed impacted the statistical power of HFS, but the proper approach to control this impact might not be direct and is still await optimizing. In this revision, we still kept all indels in the analysis, since we proposed that the power of fine-mapping is more important than the power of marginal association.

      Line 213: “Lastly, removing insertion and deletion would reveal 9% more significant association (p<5×10-8) but 4.7% less causal association (PIP>0.95), and slightly increased inflation factor (Wilcoxon p=0.0001, Figure S4).”

      Author response image 9.

      Reviewer #1 (Recommendations For The Authors):

      It was unclear to me why the sei output was rounded to two decimal places to "avoid influence of sei prediction noise". Wouldn't rounding introduce additional noise?

      Response: We thank the reviewer for pointing out our inadequate description. The rounding procedure is used to mask the low value that likely did not reflect any real change. The idea is that, even if a variant actually does not bring about any functional changes, sei would still output a very low HFS value that is not equal to, but close to, zero. By rounding procedure, such low values would be set to zero, which could avoid noise. We have added this rationale to the method section:

      Line 529: “This is due to the fact that even if a variant actually makes no impact on functional genomics, sei would still output a value that are close to but not equal to reference sequence class score. Rounding procedure would set such HFS to zero and remove the random value from sei.”

      Minor comments / typos:

      • There are many typos in the abstract.

      Response: We have revised the typo and grammar issues in the abstract in this revision.

      • I believe "Arachnoid acid-intelligence" should be "Arachidonic acid-intelligence".

      • Consistently there is no space between text and parenthetical citations. For example, "sei(Chen et al., 2022)" should be "sei (Chen et al., 2022)".

      • Line 110: "at least one non-reference haplotypes" --> "at least one non-reference haplotype".

      • Line 155: "data-based method" --> "data-based methods".

      • Lines 165-166: "functionally importance" --> "functional importance".

      Response: We have made these revisions accordingly.

      • Line 210: the sentence containing "this annotation on conditioned of a set of baseline annotations" is unclear.

      Response: We have revised this sentence as “…regressed the PIP against this annotation, with a set of baseline annotations included as covariates, similar to the LDSC framework.”

      • Line 213: "association" --> "associations".

      • Line 219: "association" --> "associations".

      • Line 251: "result" --> "results".

      • Line 269: "result" --> "results".

      • Line 289: "known to involved" --> "known to be involved".

      • Line 356: "LDAK along" --> "LDAK alone".

      • Line 362: "BOLT-LMM along" --> "BOLT-LMM alone".

      • Supplement: "Hihglighted" --> "Highlighted".

      Response: We have made these revisions accordingly.

      • Line 444: Were "British ancestry Caucasians" defined as individuals that self-identified as "white British"? If so, then they should be described as "self-identified "white British"".

      Response: As the reviewer pointed out, we have changed the description as self-identified British ancestry Caucasians.

      Reviewer #2 (Recommendations For The Authors):

      (1) A 2022 cistrome-wide association study (CWAS) computed associations between genetically-predicted chromatin activity and phenotypes. Adding a reference to this paper would be helpful. https://pubmed.ncbi.nlm.nih.gov/36071171/

      Response: Following the reviewer’s suggestion, we discussed the similarity between CWAS and our study:

      Line 89: “In line with this notion, a recent similar strategy called cistrome-wide association study (CWAS) integrated variant-chromatin activity and variant-phenotype association to boost power of genetic study of cancer. (Baca et al., 2022).”

      (2) Line 487 states: "We applied sei to predict 21,906 functional genomic tracks for each sequence, without normalizing for histone mark." It's not clear what normalization is being referred to here.

      Response: We have revised the sentence to:

      Line 495: “We applied sei to predict 21,906 functional genomic tracks for each sequence, without normalizing for histone mark (divided each track score by the sum of histone mark score) as suggested by the sei author.”

      (3) The figures are extremely low resolution, they need to be updated.

      Response: In this revision, we uploaded separate pdf file for each figure to provide high resolution graphs.

      (4). The results section was difficult to follow and would benefit from being written more clearly.

      Response: In this revision, we re-arranged some of the result section to better clarify the main idea. We moved all statistical results to the bracket and focused our main text on the interpretation. For example,

      Line 123: “Further evaluation indicated that this low LD was led by two factors: integration of rare variant impacts and segmentation. Firstly, excluding rare variants from HFS caused the LD raised to median=0.14 (Method; Figure S2C). Secondly, median LD of SNPs from adjacent loci was 0.06, which was significantly higher than HFS LD (paired Wilcoxon p=1.76×10-5) but significantly lower than HFS LD without rare variants (paired Wilcoxon p<2.2×10-16).”

      (5) "Along" is used several times in the final results section (PRS estimation), this should be "alone".

      Response: We have modified all misused “along” by “alone” in this revision.

      (6) Instead of using notation identifying genomic location, it might be clearer to provide gene names when illustrating examples of trait-associated promoters.

      Response: In this revision, we added gene name of the corresponding promoters to the main text to better clarify the findings.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      This paper describes technically-impressive measurements of calcium signals near synaptic ribbons in goldfish bipolar cells. The data presented provides high spatial and temporal resolution information about calcium concentrations along the ribbon at various distances from the site of entry at the plasma membrane. This is important information. Important gaps in the data presented mean that the evidence for the main conclusions is currently inadequate.

      Thank you very much for this positive evaluation of our work. We would like to respectfully point out to the Reviewer that our current study was conducted using zebrafish as a model and not goldfish. We have revised the paper to eliminate any gaps in the data presentation.

      Strengths

      (1) The technical aspects of the measurements are impressive. The authors use calcium indicators bound to the ribbon and high-speed line scans to resolve changes with a spatial resolution of ~250 nm and a temporal resolution of less than 10 ms. These spatial and temporal scales are much closer to those relevant for vesicle release than previous measurements.

      (2) The use of calcium indicators with very different affinities and different intracellular calcium buffers helps provide confirmation of key results.

      Thank you very much for this positive evaluation of our work.

      Weaknesses

      (1) Multiple key points of the paper lack statistical tests or summary data from populations of cells. For example, the text states that the proximal and distal calcium kinetics in Figure 2A differ. This is not clear from the inset to Figure 2A - where the traces look like scaled versions of each other. Values for time to half-maximal peak fluorescence are given for one example cell but no statistics or summary are provided. Figure 8 shows examples from one cell with no summary data. This issue comes up in other places as well.

      Thank you for this feedback. We have addressed this in our revised manuscript where possible. We now include the results of paired-t-tests to compare the amplitudes of proximal vs. distal calcium signals shown in Fig. 2A & C, Fig. 3C & D, Fig. 4 C & D, Fig. 5A-D, and Fig. 8E&F. Because proximal and distal calcium signals were obtained from the same ribbons within 500-nm distances, as the Reviewer pointed out, “the traces look like scaled versions of each other”. For experiments where we make comparisons across cells or different calcium indicators, as shown in Fig.3 E&F, Fig.5E, and Fig. 8B&C, we now include the results of an unpaired t-test. We have now included the t-test statistics information in the respective figure legends in the revised version.

      Regarding the Reviewer’s concern that “values for time to half-maximal peak fluorescence are given for one example cell, but no statistics or summary are provided,” we estimated the fluorescence rise times by only fitting the average traces to compare the overall qualitative behavior of the corresponding calcium indicator fluorescence. We did attempt to analyze the uncertainty for the rise-time estimates, but the simultaneous fitting of the rise- and decay-behavior of time traces is notoriously sensitive to noise, and therefore, a much higher signal-to-noise ratio would be required to provide reliable uncertainty estimation for the corresponding rise-time and decay-time characteristics. This is now explicitly explained in the corresponding Methods subsection.

      In Figure 8, we now show example fluorescence traces from one cell at the bottom of the A and D panels, and the summary data is described in B-C and E-F, with statistics provided in the figure legends.

      (2) Figure 5 is confusing. The figure caption describes red, green, and blue traces, but the figure itself has only two traces in each panel and none are red, green, or blue. It's not possible currently to evaluate this figure.

      Thank you for pointing out this oversight. The figure shows the proximal and distal calcium signals, not the cytoplasmic ones. The figure caption was adjusted to correctly reflect what is shown in the figure.

      (3) The rise time measurements in Figure 2 are very different for low and high-affinity indicators, but no explanation is given for this difference. Similarly, the measurements of peak calcium concentration in Figure 4 are very different from the two indicators. That might suggest that the high-affinity indicator is strongly saturated, which raises concerns about whether that is impacting the kinetic measurements.

      We agree with the Reviewer and had mentioned in the text that we do believe that the high-affinity version of the dye is at least partially saturated. This will be especially a problem for strong depolarizations and signals near the membrane. We slightly changed the corresponding description of results on page 6 to acknowledge this point: “However, it should be noted that Cal520HA will be at least partially saturated at the Ca2+ levels expected in Ca2+ microdomains relevant for vesicle exocytosis, affecting both the amplitude and the kinetics of the fluorescence signal”. 

      Recommendations:

      (1) It would be good to describe the location of calcium channels relative to the ribbon in the introduction.

      We have provided this information in the discussion (please see p. 19: “The faster, smaller, and more spatially confined Ca<sup>2+</sup> signals that are insensitive to the application of high concentrations of exogenous Ca<sup>2+</sup> buffers, referred to here as ribbon proximal Ca<sup>2+</sup> signals, could be due to Ca<sup>2+</sup> influx through Cav channel clusters beneath the synaptic ribbon”). We have now provided this information in the last paragraph of the introduction as well. 

      (2) The introduction is quite technical and would benefit from a more complete description of the findings of the paper (e.g. expanding the last sentence to a full paragraph).

      We have updated the last paragraph of the introduction as per the reviewer’s advice.

      (3) It is not clear that the capacitance measurements in Figure 1 are needed (I did not see them used anywhere else in the paper).

      We have removed the capacitance measurements from the figure.

      (4) Please add legends in the figures themselves defining different line colors and weights so that a reader does not need to search for them in the figure caption.

      We agree that such figure improvements facilitate reading. We have added legends in the figures themselves, where appropriate.

      (5) The insets with the expanded traces in many cases are too small - e.g. Figure 1F.

      We have enlarged the insets in applicable figures as much as possible to facilitate visualization. These changes can be seen in Figures 1, 2, 3, 4, 5, and 8, as well as Supplementary Figure 3.

      (6) Page 5, statistics for amplitude of calcium changes. Is p < 0.001 really correct here? The SEMs indicate an overlap of the two distributions of mean amplitudes - and later data for which you give p = 0.001 has much less overlap.

      Since the two data sets in question come from paired recordings, with a high Pearson correlation coefficient of 0.93, the p-values are in fact, correct despite this significant overlap. We conducted paired-t-tests to compare proximal vs. distal calcium signals obtained from a single calcium indicator shown in Fig. 2A & C, Fig. 3C & D, Fig.4 C & D, Fig.5A-D, and Fig. 8E&F. For experiments where we make comparisons across cells or across different calcium indicators, as shown in Fig.3 E&F, Fig.5E, and Fig. 8B&C, we performed an unpaired t-test. In response to the Reviewer’s comment, we now provide details on t-statistics in the respective figure legends in the revised version.

      (7) The text on page 6 describing Figure 3 appears to repeat several technical aspects of the measurements that have already been described in Figure 1. I would reduce that overlap as it is confusing for a reader.

      Since Fig.1 describes calcium measurements with free calcium indicator, whereas Fig.3 describes bound calcium indicator, we would prefer to keep the information for the sake of completeness, despite some small amount of repetition.

      (8) Figure 4A needs to be described in more detail.

      We have provided the vesicle pool details in the Supplementary Fig. 1.

      (9) The text in Figure 7 is too small.

      We have redone Fig. 7 and Supplemental Fig. 4 to ensure that the tick labels and other text are sufficiently large.

      (10) Are the units (nM) in Figure 8 correct?

      Thank you for pointing that out. The units were supposed to be µM and have been corrected in the figure.

      Reviewer #2 (Public review):

      Summary:

      The study introduces new tools for measuring intracellular Ca2+ concentration gradients around retinal rod bipolar cell (rbc) synaptic ribbons. This is done by comparing the Ca2+ profiles measured with mobile Ca2+ indicator dyes versus ribbon-tethered (immobile) Ca2+ indicator dyes. The Ca2+ imaging results provide a straightforward demonstration of Ca2+ gradients around the ribbon and validate their experimental strategy. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in Ca2+ domains as a function of Ca2+ buffering. In addition, the authors try to demonstrate that there is heterogeneity among synaptic ribbons within an individual rbc terminal.

      Strengths:

      The study introduces a new set of tools for estimating Ca2+ concentration gradients at ribbon AZs, and the experimental results are accompanied by an open-source, computational model that nicely describes Ca2+ buffering at the rbc synaptic ribbon. In addition, the dissociated retinal preparation remains a valuable approach for studying ribbon synapses. Lastly, excellent EM.

      Thank you very much for this appreciation of our work.

      Weaknesses:

      Heterogeneity in the spatiotemporal dynamics of Ca2+ influx was not convincingly related to ribbon size, nor was the functional relevance of Ca2+ dynamics to rod bipolars demonstrated (e.g., exocytosis to different postsynaptic targets). In addition, the study would benefit from the inclusion of the Ca2+ currents that were recorded in parallel with the Ca2+ imaging.

      Thank you for this critique. We agree that our data do not establish the relationship between ribbon size and Ca<sup>2+</sup> signal. By analogy to the hair cell literature, we believe that it is a reasonable hypothesis, but more studies will be necessary to definitively determine whether the signal relates to ribbon size or synaptic signaling. This will be addressed in future experiments.

      We have included the calcium current recorded in parallel with calcium imaging in Fig.1, when we show a single example. We now do the same for individual examples shown in Fig. 8 A and D, bottom. The calcium imaging data shown in Figs. 2-5 and Supp. Fig. 3 is the average trace, thus we have provided the averages of the peak calcium current and statistics. Since in Figure 8D-F some ribbons only have one reading, we have not conducted statistical analysis in this case. 

      Recommendations:

      The major conclusion of the work is that within bipolar cells, heterogeneity exists between Ca2+ microdomains formed at synaptic ribbons, which is supported by the results; however, what causes this is not clear. Most of the comments below are suggestions that hopefully help the authors strengthen the association of Ca2+ domain heterogeneity with features of ribbon AZs or at least offer additional options for the authors to communicate their work.

      (1) In the current study, anatomical segregation of SRs by size does not appear to exist across the ZF rod bipolar terminal, nor has this been reported for mouse rod bipolars. In the absence of this, the current study lacks the fortuitous attributes, and thus reasoning, utilized in the hair cell (HC) studies (those cited in the current MS). Namely, the HC studies utilized the following anatomical features to compare EM, IF, and physio results: a) identified differences in ribbon synapses along a tonotopic gradient (basal to apical cochlea), b) compared ribbons on different sides of an inner HC (pillar vs. modiolar), or c) examined age-dependent changes in HC ribbons.

      Thank you for this comment. We agree that we do not show any interesting systematic relationships between ribbon size and cell position or other large-scale morphological features. We added text on page 19 to stress this (“However, in comparing our findings with studies of ribbon size heterogeneity in hair cell…”). However, to our knowledge, diversity in ribbon size has never been reported in bipolar cells. 

      (2) In the absence of intrinsic topographical segregation in ribbon size within rod bipolars, then a) the imaging data attained from dissoc cells needs to be internally as sound as possible, and b) the parameters used to define ribbon dimensions in light (LM) and electron microscopy should be as communicative/interchangeable as possible.

      Thank you for this comment. Our confocal images show a moderate correlation between ribbon size measured as fluorescence of ribeye binding peptide vs. calcium hot spots.  Similarly, SBF-SEM images demonstrate that the ribbon active zone length vs width show a moderate correlation. We have summarized these findings in Figure 11. Thus, as the Reviewer pointed out, our confocal and SBF-SEM findings support each other.

      (3) It is not entirely clear how the authors distinguish rod bipolars (a subset of On-bipolars) from all other ON-bipolars? The two different preparations: dissoc or intact retina, present distinct challenges. In the example presented in Supplementary Figure 2B, the PKCalpha stained bipolar has an axon that is approx. 25 um long, but the expected length should be approx. 50um based on ZF retinal anatomy and recent study on rbc1/2 (Hellevik et al BioRxiv 2023). One could argue rather that the enzymatic treatment or mechanical shear forces caused the axon to shrink. If that is the line of reasoning, then present a low mag field of view with an assortment of dissoc bipolars stained for PKCalpha, zoom in, and describe cell morphologies and their assignment as PKCa + or -. Then you can summarize how axon terminal size, axon length, and PKC staining are or aren't correlated. Based on the results, one might have to perform IF on each dissoc cell that was assayed under LM (Ca2+ imaging) and ephys to verify it's a rod bipolar. In the case of the EM, the authors refer to the terminals analyzed as rbcs because they have larger terminals and less branching than the cbs. Since these are really nice EM images, data-rich, with better resolution than I have ever seen for retinal SBF-EM, do due diligence by tracing the terminals of neighboring bcs (ignoring details within terminals just outline terminals) and make a visual presentation that illustrates that those you selected as rbs have larger terminals than cbs (this can also give of sense of the density distribution of terminal types). Is there a published ephysio on the ZF rbcs which has been correlated with morphology? The Hellevik et al BioRxiv 2023 study shows light responses but not necessary rbcs distinguished from other On-bcs.

      We have quantified the number of rod bipolar cells obtained from our isolation procedure using two approaches: 1. To fix the isolated bipolar cells and perform immunofluorescence with PKC alpha. 2. To isolate bipolar cells from Tg(vsx1: memCerulean)<sup>q19</sup> transgenic zebrafish, labeling rod bipolar cell type 1 (RBC1) that we recently obtained from Dr. Yoshimatsu (Hellevik et al., 2024). Of note, the circuitry of RBC1 has been shown to be similar to the mammalian rod bipolar cell pathway (Hellevik et al., 2024). Below, we list our findings:

      The average terminal size of fixed bipolar cells labeled with PKC alpha was 5.9 ± 0.2 mm, whereas the freshly isolated living bipolar cells used for our physiology experiments had an average terminal size of 6.3 ± 0.2 mm, and the rod bipolar cells from the Tg(vsx1: memCerulean)<sup>q19</sup> line had an average terminal size of 6.9 ± 0.2 mm. We also measured terminal size for fixed bipolar cells, unlabeled with PKC alpha: 3.3 ± 0.2 mm, and unlabeled cells from Tg(vsx1: memCerulean)<sup>q19</sup> cells: 4.0± 0.2 mm.

      In addition, we also pay attention to the soma shape and dendrites, as the primary dendrite of the RBC is thick and short. Connaughton and Nelson have done a thorough analysis of morphological classification. But no measurements were given. https://onlinelibrary.wiley.com/doi/10.1002/cne.20261. Since the axon length is not retained during the isolation procedure, we do not use it as an identification marker for rod bipolar cells in our experiments.

      We re-imaged vsx1 with the DIC channel to compare the terminal sizes of fluorescently labeled RBC1 terminals with those of other BPCs in the DIC channel. Below are the images that can give a sense of the density distribution of terminal types and measurements.

      Author response image 1.

      Tracing all neighboring terminals in SBF-SEM is laborious and beyond the scope of this manuscript, but we will do full reconstructions in a future publication.

      (4) How to strengthen the description of heterogeneity within the dissoc measurements? There are two places in the LM data where heterogeneity may be relevant. The first point here is that Ribbon size (TAMRA- Ribeye binding peptide) and active zone size (Cal520HA/LA-RBP) measurements depend on labelling the ribbon/Ribeye; thus, Ribbon size and AZ size should be correlated on this basis alone. I would expect Pearson's r value to show a stronger association (r > 0.7) than what is reported in Figure 11B/C (r: 0.52 or 0.32). I would interpret a moderate to weak correlation (r < 0.5 to 0.3) as an indication that ribbons are heterogeneous (variability in Ca influx per unit ribbon size). Now to the second point, in Figure 8 and Supplementary Figure 5 there is time-signal amplitude heterogeneity. >>> My curiosity is whether signal amplitude is heterogeneous in space (ribbon size, my speculation) and in time (complex, but compare ribeye bound and free Ca2+ indicator)? It seems like the data in Figure 8 and 11 should cross over and possibly offer the authors more to say.

      We appreciate the Reviewer’s insightful observation and added a sentence at the very end of the Results section reflecting the Reviewer’s argument (“we note that a large correlation between the inferred ribbon size and active zone size…”)

      The Reviewer’s second point about the connection between heterogeneity of signal amplitude in space and in time is an interesting one as well and could be grounds for an additional investigation in the future.

      (5) As the authors know, a very powerful tool for exploring Ca microdomain dynamics is to exploit the Voltage dependence of Cavs (as exemplified in the numerous HC studies that are cited). An I-V protocol would provide a valuable means to illustrate different rates of saturating the LA and HA Ca indicators. More generally, the Ca currents and associated patch clamp parameters (Gm, leak...) can tell us much about the health of the cell and provide an added metric to assess normal variability between cells. A few places in the MS currents are mentioned yet this data is missing (Figure S5 , last line: Amplitude variability between two cells with similar Ca currents.).

      Thank you for the valuable suggestion. We will include I-V protocol across several ribbons in future experiments.  We have included the calcium currents for all the calcium transient traces. We have also included the statistics to compare those currents across conditions.

      Technical comments

      (6) Since the Ribeye-Ca2+ indicator covers the entire ribbon, it will contribute to a signal gradient. The proximal signal is assumed to be closest to the base of the ribbon where presumably the Cav channels are located, and the distal signal will originate from the top (apex) of the ribbon some 200 nm from the base of the ribbon. Have you tried to measure "ribbon lengths and widths" with the HA and LA Ca indicators? My guess would be that the LA will show a gradient, and give you a better indication of the base of the ribbon; whereas the HA signal will have dimensions similar to the TAMRA-peptide.

      Due to the point spread function limitation in the light microscopy, we obtained all ribbon measurements from the SBF-SEM images only. 

      As a surrogate for size in the light microscopy, we used ribbon fluorescence, which we expect should scale with the number of ribeye molecules in the ribbon (Figure 11B) 

      (7) Normalize proximal and distal LM data to highlight kinetic differences (Fig 2-5, 8), and when describing temporal heterogeneity please use a better description that includes time, such as time-to-pk, and decay1, decay 2....

      In the current manuscript, we only focus on the amplitude as it provides the information about the number of calcium channels. We used the rise time measurements to compare the time to reach the peak amplitude at the proximal vs. distal locations, demonstrating that proximal calcium signals reach the peak faster since the calcium channels are located beneath the ribbon.

      We tried to perform fittings to the individual traces. Since they are too noisy to pick out true kinetic differences between ribbons, we would need to average several traces from each ribbon. We plan to apply our high-resolution approach established in this paper to a longer stimulus and perform the fittings as per the Reviewer’s advice for a future paper.

      We now describe on pages 6-7 the two decay components for data in Figs. 2 and 3.

      (8) Why not measure ribbon length in EM as done in confocal and then compare lengths from LM and EM. In Figure S8, you have made a nice presentation of AZ Area from EM. Make similar plots for EM ribbon length (and width?), and compare the distributions to Figure 11 LM data. Maybe use other statistical descriptions like Coeff of Var or look for different populations by using multi-distribution fits. If the differences in length or area (EM data) can be segregated into short and long distances, then a similar feature might arise from the LM data. If no such morphological segregation exists, then the heterogeneity in Ca microdomains may arise from variable Cav channel density or gating, Ca buffer, etc.

      Due to the point spread function limitation in light microscopy, the size of the ribbon dimensions in light microscopy cannot be reliably measured. As a surrogate, we used total fluorescence of the ribbon, which should correlate with the number of ribeye molecules in the ribbon. To obtain ribbon dimensions, we used measurements from the SBF-SEM images only. We summarized the distribution of ribbon width and length in Figures 11C and 11D. The distribution of the active zone size is summarized in Supplementary Figure 8. Pearson’s correlation coefficients are positive, but a weak correlation, suggesting multiple mechanisms likely to contribute to heterogeneity in the local calcium signals as the Reviewer pointed out.

      (9) Again, the quality of the EM data is great, and sufficient to make the assignment of SVs to different pools, as you have done in Fig S1. My only complaint is that the Ultrafast pool as indicated in the schematic of S1A seems to have a misassignment with respect to the green SV that is 15 nm from the PM. In the original Mennerick and Matthews 1996 study, the UF pool emptied in ~1msec. The morphological correlate for the UF has been assumed to be SVs touching the plasma membrane. 15 nm away is about 14 nm too far to be in the UF.

      Thank you for pointing that out. We have updated the vesicles labeling in Supplementary Figure 1 and Main Figure 4.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors have developed a new Ca indicator conjugated to the peptide, which likely recognizes synaptic ribbons, and have measured microdomain Ca near synaptic ribbons at retinal bipolar cells. This interesting approach allows one to measure Ca close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. Though microdomain Ca at the active zone of ribbon synapses has been measured by Hudspeth and Moser, the new study uses the peptide recognizing synaptic ribbons, potentially measuring the Ca concentration relatively proximal to the release sites.

      Thank you very much for this positive evaluation of our work.

      Strengths:

      The study is in principle technically well done, and the peptide approach is technically interesting, which allows one to image Ca near the particular protein complexes. The approach is potentially applicable to other types of imaging.

      Thank you very much for this appreciation.

      Weaknesses:

      Peptides may not be entirely specific, and the genetic approach tagging particular active zone proteins with fluorescent Ca indicator proteins may well be more specific. I also feel that "Nano-physiology" is overselling, because the measured Ca is most likely the local average surrounding synaptic ribbons. With this approach, nobody knows about the real release site Ca or the Ca relevant for synaptic vesicle replenishment. It is rather "microdomain physiology" which measures the local Ca near synaptic ribbons, relatively large structures responsible for fusion, replenishment, and recycling of synaptic vesicles.

      The peptide approach has been used fairly extensively in the ribbon synapse field and the evidence that it efficiently labels the ribbon is well established, however, we do acknowledge that the peptide is in equilibrium with a cytoplasmic pool. Thus, some of the signal arises from this cytoplasmic pool. The alternative of a genetically encoded Ca-indicator concatenated to a ribbon protein would not have this problem, but would be more limited in flexibility in changing calcium indicators. We believe both approaches have their merits, each with separate advantages and disadvantages.

      As for the nano vs. micro argument, we certainly do not want to suggest that we are measuring the same nano-domains, on the spatial scale of 10s of nanometers, that drive neurotransmitter release, but we do believe we are in the sub-micrometer -- 100s of nm -- range. We chose the term based on the usage by other authors to describe similar measurements (Neef et al., 2018; https://doi.org/10.1038/s41467-017-02612-y), but we see the reviewer’s point.

      Recommendations:

      I have no recommendation for additional experiments. However, the statement of "nanophysiology" is too much, and the authors should tone done the ms recognizing some caveats.

      As we mention above, we chose the term based on the usage by other authors to describe similar measurements, and we do believe that we achieve resolution of a few hundred nanometers, and therefore would prefer to keep the current title of the manuscript. For example, Figure 5E shows that, with ribeye-bound low-affinity calcium indicator, the proximal calcium signals were preserved in the presence of BAPTA, rising and decaying abruptly, as expected for a nanodomain Ca<sup>2+</sup> elevation. Thus, we believe that this measurement in particular describes a nanodomain-scale signal. However, we acknowledge that we are not currently able to resolve the spatial distribution of Ca<sup>2+</sup> signals with a spatial resolution of 10s of nanometers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      This study delineates an important set of uninjured and injured periosteal snRNAseq data that provides an overview of periosteal cell responses to fracture healing. The authors also took additional steps to validate some of the findings using immunohistochemistry and transplantation assays. This study will provide a valuable publicly accessible dataset to reexamine the expression of the reported periosteal stem and progenitor cell markers.

      Strengths: 

      (1) This is the first single-nuclei atlas of periosteal cells that are obtained without enzymatic cell dissociation or targeted cell purification by FACS. This integrated snRNAseq dataset will provide additional opportunities for the community to revisit the expression of many periosteal cell markers that have been reported to date.

      (2) The authors delved further into the dataset using cutting-edge algorithms, including CytoTrace, SCENIC, Monocle, STRING, and CellChat, to define the potential roles of identified cell populations in the context of fracture healing. These additional computation analyses generate many new hypotheses regarding periosteal cell reactions.

      (3) The authors also sought to validate some of the computational findings using immunohistochemistry and transplantation assays to support the conclusion.

      Weaknesses: 

      (1) The current snRNAseq datasets contain only a small number of nuclei (1,189 nuclei at day 0, 6,213 nuclei on day 0-7 combined). It is unclear if the number is sufficient to discern subtle biological processes such as stem cell differentiation. 

      We analyzed a total of 6,213 nuclei from uninjured periosteum and fracture calluses at 3 stages of bone healing. We were able to describe 11 distinct cell populations, revealing the diversity of cell populations in uninjured periosteum and post-injury, including rare cell types in the fracture environment such Schwann cells, adipocytes and pericytes. The number of nuclei was sufficient to perform extensive analysis using a combination of cutting-edge algorithms. We agree that more nuclei would allow more in-depth analyses of cell fate transitions and rare populations, such as pericytes and Schwann cells. However, we concentrated here on SSPC/fibrogenic cells that are well represented in our dataset. Our study robustness is also reinforced by the analysis of 4 successive time points to define the SSPC/fibrogenic cell trajectories. Our validations using immunohistochemistry and transplantation assays also confirmed that our dataset is sufficient to define cell trajectories. There is no clear consensus on the number of cells needed to perform sc/snRNAseq analyses, as it depends on the cell types analyzed and the fold changes in gene expression. Previously reported single cell datasets containing a lower number of cells reached major conclusions including SSPC identification, cell differentiation trajectories and differential gene expression (658 cells in (Debnath et al. 2018), 300 in (Ambrosi et al. 2021), around 175 in (Remark et al. 2023).)

      (2) The authors' designation of Sca1+CD34+ cells as SSPCs is not sufficiently supported by experimental evidence. It will be essential to demonstrate stem/progenitor properties of Sca1+CD34+ cells using independent biological approaches such as CFU-F assays. In addition, the putative lineage trajectory of SSPCs toward IIFCs, osteoblasts, and chondrocytes remains highly speculative without concrete supporting data. 

      We performed additional analyses to further support that Sca1+ SSPCs display stem/progenitor properties. We performed CFU assays with Prx1-GFP+ SCA1+ and Prx1-GFP+ SCA1- periosteal cells (Figure 2F-G). We showed that Prx1-GFP+ SCA1+ display significant increased CFU potential compared to Prx1-GFP+ SCA1- cells. In addition, we isolated and transplanted Prx1-GFP+ Sca1+ and Prx1-GFP+ Sca1- periosteal cells at the fracture site of wild-type mice (Figure 2H). Only Sca1+ cells contributed to the callus formation, reinforcing that Sca1+ cells are the SSPC population mediating bone repair. 

      The differentiation trajectory of SSPCs presented in our study is supported by a combination of bioinformatic analyses and in vivo validation:

      - snRNAseq allowed us to identify the different populations in the uninjured periosteum. In silico, in vitro and in vivo analyses all point to Sca1+ cells as the SSPC population (Fig 2EG).

      - At day 3 post-fracture, we did not detect Sca1+ cells in the callus (Fig 4 – Supplementary figure 2). Instead, we observed the appearance of a new population, IIFCs. This population clustered along SSPCs and pseudotime analyses indicate that SSPCs can differentiate into IIFCs (Fig 5B). We confirmed the ability of Sca1+ pSSPCs to form IIFCs, by grafting them in the fracture callus and assessing their fibrogenic fate at day 5 post-fracture (Fig 6B).

      - In silico, we observed that IIFCs clustered along osteogenic and chondrogenic cells. The pseudotime trajectory suggests that IIFCs can differentiate into both lineages (Fig 5B-C). This is coherent with the progressive expression of osteochondrogenic genes observed in IIFCs (Fig 5C, Fig 8A, C, E). In vivo, we observed the progressive expression of Runx2 and Sox9 by IIFCs undergoing differentiation (Fig 6A). We now show that IIFCs are not undergoing apoptosis, indicating that these cells further differentiate (Fig 7 – Supplementary figure 2). To functionally assess the osteochondrogenic potential of IIFCs, we used transplantation assay and showed that Prx1-GFP+ IIFCs isolated from day 3 post-fracture form cartilage and bone when transplanted at the fracture site of wild-type mice (Fig 6C). 

      We would like to insist on the robustness of the bioinformatic analyses performed in our study. First, we used datasets from different time points post-fracture to capture the true temporal progression of cell populations in the fracture callus. We used a large combination of tools shown to be reliable in many studies (Julien et al. 2021; Matsushita et al. 2020; Debnath et al. 2018; Baccin et al. 2020; Junyue Cao et al. 2019; Zhong et al. 2020), and all tools converge in the same trajectory. To further show the relevance of pseudotime in our model, we illustrated the distribution of the cell populations by time point (Fig. 5D). We can observe a parallel between the time points and the pseudotime, reinforcing that the pseudotime trajectory reflects the timing of SSPC differentiation. Overall, the combined in silico, in vitro and in vivo analyses support that Sca1+ Pi16+ cells are the periosteal SSPC population, specifically represented in the uninjured dataset. In response to bone fracture, these SSPCs give rise to IIFCs that are specifically represented in the intermediate stages (days 3 and 5) prior to osteochondrogenic differentiation.

      (3) The designation of POSTN+ clusters as injury-induced fibrogenic cells (IIFCs) is not fully supported by the presented data. The authors' snRNAseq datasets (Figure 1d) demonstrate that there are many POSTN+ cells prior to injury, indicating that POSTN+ cells are not specifically induced in response to injury. It has been widely recognized that POSTN is expressed in the periosteum without fracture. This raises a possibility that the main responder of fracture healing is POSTN+ cells, not SSPCs as they postulate. The authors cannot exclude the possibility that Sca1+CD34+ cells are mere bystanders and do not participate in fracture healing. 

      IIFCs are a population of cells that express high levels of ECM related genes, including Postn, Aspn and collagens. We did not claim that Postn expression is specific to IIFCs. While Postn is detected in the uninjured periosteum, snRNAseq analyses and RNAscope experiments showed that the expression of Postn is limited to a small number of cells in the cambium layer of the periosteum (Fig 4B , Figure 4 – Supplementary figure 1B). These Postn-expressing cells in the uninjured periosteum are not SSPCs, as they do not co-express/co-localize with Pi16+ and Sca1+ cells detected in the fibrous layer (Fig4, Figure 4– Supplementary figure 1A, Figure 6-Supplementary figure 1). These Postn-expressing cells are undergoing osteogenic differentiation as shown by the correlation between Runx2 and Postn expression (Fig. 4 – Supplementary Figure 1C). After fracture, we observed a strong increase in ECM-related gene expression and specifically in the IIFC population. We now show the strong increase of Postn expression after injury (Fig. 4 – Supplementary Figure 1D-E, Figure 6-Supplementary figure 1E). 

      As mentioned in our response above, we now show that SCA1+ cells form cartilage and bone after fracture, while SCA1- cells (including the POSTN+ population) from the uninjured periosteum did not contribute. These data reveal that Sca1+ CD34+ cells are the main SSPC population mediating bone healing and that POSTN+ IIFCs are a transient stage of SSPC differentiation. We added the following text to the result section: “Pi16-expressing SSPCs are located within the fibrous layer, while we observed few POSTN+ cells in the cambium layer (Fig. 4 – Supplementary Fig. 1A). Postn expression is weak in uninjured periosteum and is limited to differentiating cells. Postn expression is strongly increased in response to fracture, specifically in IIFCs (Fig. 4 – Supplementary Fig. 1B-E). “

      (4) Detailed spatial organization of Sca1+CD34+ cells and POSTN+ cells in the uninjured periosteum with respect to the cambium layer and the fibrous layer is not demonstrated. 

      We performed RNAscope experiments to locate Pi16-expressing and Postn-expressing cells in the uninjured periosteum. We observed that Pi16-expressing cells are in the external fibrous layer of the periosteum while Postn-expressing cells are located along the cortex in the cambium layer. The data are added in Fig 4B and Fig. 4- Supplementary Figure 1 and mentioned in the result section “Pi16-expressing SSPCs were located within the fibrous layer, while Postn-expressing cells were found in the cambium layer and corresponded to Runx2-expressing osteogenic cells (Fig. 4 – Supplementary Fig. 1A-C).”.

      (5) Interpretation of transplantation experiments in Figure 5 is not straightforward, as the authors did not demonstrate the purity of Prx1Cre-GFP+SCA1+ cells and Prx1Cre-GFP+CD146- cells to pSSPCs and IIFCs, respectively. It is possible that these populations contain much broader cell types beyond SSPCs or IIFCs.  

      We agree with the reviewer that our methodology for cell transplantation required more justification and validation. We decided to use a transgenic mouse line to be able to trace the cells in vivo after grafting. Prx1 marks limb mesenchyme during development and the Prx1Cre mouse model allows to label all SSPCs contributing to callus formation. Therefore, we used Prx1Cre, R26mTmG mice as donors for SSPCs and IIFCs isolation (Duchamp de Lageneste et al. 2018; Logan et al. 2002). Prx1 does not mark immune and endothelial cells but can label pericytes and fibroblastic populations (Duchamp de Lageneste et al. 2018; Logan et al. 2002; Julien et al. 2021). In the uninjured periosteum, Sca1 (Ly6a) is only expressed by SSPCs and endothelial cells (Fig 3-Supplementary figure 2, Fig 6-Supplementary figure 1). We sorted GFP+ Sca1+ cells from uninjured periosteum of Prx1Cre, R26mTmG mice to isolate only SSPCs and excluding endothelial cells and pericytes. For IIFCs, we isolated cells at day 3 post-fracture, as in our snRNAseq data, we detected IIFCs but no SSPCs, chondrocytes or osteoblasts at this stage of repair. To eliminate Prx1-derived pericytes, we sorted GFP+CD146- cells, as CD146 is specifically expressed by pericytes. We added Figure 6-supplementary Figure 1 to better illustrate the expression of Prx1, SCA1 (Ly6a) and CD146 (Mcam) in the uninjured and day 3 post-fracture datasets. We further demonstrate the purity of SSPCs and IIFCs isolation by qPCR on sorted GFP+ Sca1+ cells from uninjured periosteum and GFP+ CD146- cells from day 3 post-fracture periosteum and hematoma and confirmed the absence of contamination by other cell populations (Figure 6-Supplementary figure 1E). We made the following changes in the text: “To functionally validate the steps of pSSPC activation, we isolated SCA1+ GFP+ pSSPCs from Prx1Cre; R26mTmG mice, excluding endothelial cells, and grafted them at the fracture site of wild-type hosts” and “we isolated GFP+ CD146- from the fracture callus of Prx1Cre; R26mTmG mice at day 3 post fracture, that correspond to IIFCs without contamination by pericytes (CD146+ cells) (Fig. 6C, Figure 6 – Supplementary Fig.1).

      Reviewer #2 (Public Review):

      Summary: 

      The authors described cell type mapping was conducted for both WT and fracture types. Through this, unique cell populations specific to fracture conditions were identified. To determine these, the most undifferentiated cells were initially targeted using stemness-related markers and CytoTrace scoring. This led to the identification of SSPC differentiating into fibroblasts. It was observed that the fibroblast cell type significantly increased under fracture conditions, followed by subsequent increases in chondrocytes and osteoblasts.

      Strengths: 

      This study presented the injury-induced fibrogenic cell (IIFC) as a characteristic cell type appearing in the bone regeneration process and proposed that the IIFC is a progenitor undergoing osteochondrogenic differentiation. 

      Weaknesses: 

      This study endeavored to elucidate the role of IIFC through snRNAseq analysis and in vivo observation. However, such validation alone is insufficient to confirm that IIFC is an osteochondrogenic progenitor, and additional data presentation is required.  

      As mentioned in the response to Reviewer 1, the differentiation trajectory of SSPCs presented in our study is supported by a combination of bioinformatic analyses and in vivo validation:

      - snRNAseq allowed us to identify the different populations in the uninjured periosteum. In silico, in vitro and in vivo analyses altogether showed that Sca1+ cells are the SSPC population (Fig 2E-G).

      - At day 3 post-fracture, we did not detect Sca1+ cells in the callus (Fig 4 – Supplementary figure 2). Instead, we observed the appearance of a new population, IIFCs. This population clustered along SSPCs and pseudotime analyses indicate that SSPCs can differentiate into IIFCs (Fig 5B). We confirmed the ability of Sca1+ SSPCs to form IIFCs, by grafting them in the fracture callus and assessing their fate at day 5 post-fracture (Fig 6B).

      - In silico, we observed that IIFCs clustered along osteogenic and chondrogenic cells. The pseudotime trajectory suggests that IIFCs can differentiate into both lineages (Fig 5B-C). This is coherent with the progressive expression of osteochondrogenic genes observed in IIFCs (Fig 5C, Fig 8A, C, E). In vivo, we observed the progressive expression of Runx2 and Sox9 by IIFCs undergoing differentiation (Fig 6A). We now show that IIFCs are not undergoing apoptosis, indicating that these cells further differentiate (Fig 7 – Supp 2). To functionally assess the osteochondrogenic potential of IIFCs, we used transplantation assay and showed that Prx1-GFP+ IIFCs from day 3 post-fracture form cartilage and bone when transplanted at the fracture site of wild-type mice (Fig 6C). 

      We would like to insist on the robustness of the bioinformatic analyses performed in our study. First, we used datasets from different time points post-fracture to capture the true temporal progression of cell populations in the fracture callus. We used a large combination of tools shown to be reliable in many studies (Julien et al. 2021; Matsushita et al. 2020; Debnath et al. 2018; Baccin et al. 2020; Junyue Cao et al. 2019; Zhong et al. 2020), and all tools converge in the same trajectory. To further show the relevance of pseudotime in our model, we illustrate the distribution of the cell populations by time point (Fig. 5D). We can observe a parallel between the time points and the pseudotime, reinforcing that the pseudotime trajectory reflects the timing of SSPC differentiation. Overall, the combined in silico, in vitro and in vivo analyses strongly support that Sca1+ Pi16+ cells are the periosteal SSPC population, specifically represented in the uninjured dataset. In response to bone fracture, these SSPCs give rise to IIFCs that are specifically represented in the intermediate stages (days 3 and 5) prior to osteochondrogenic differentiation.

      We made the following changes in the text:

      - Line 81-87: “We performed in vitro CFU assays with sorted GFP+SCA1+  and GFP+SCA1- cells isolated from the periosteum of Prx1Cre; R26mTmG mice, as Prx1 labels all SSPCs contributing to the callus formation1. Prx1-GFP+ SCA1+ showed increased CFU potential, confirming their stem/progenitor property (Fig 2F-G).  Then, we grafted Prx1GFP+ SCA1+ et Prx1-GFP+ SCA1- periosteal cells at the fracture site of wild-type mice. Only SCA1+ cells formed cartilage and bone after fracture indicating that SCA1+ cells correspond to periosteal SSPCs with osteochondrogenic potential (Fig 2H).”

      - Line 120-122: “We did not detect Pi16-expressing SPPCs, consistent with the absence of cells expressing SSPC markers in day 3 snRNAseq dataset compared to uninjured periosteum (Fig. 4 – Supplementary Figure 2).”

      - Line 170-172: “Only a small subset of IIFCs undergo apoptosis, further supporting that IIFCs are maintained in the fracture environment giving rise to osteoblasts and chondrocytes (Fig. 7 – Supplementary Figure 2).”

      - Line 277-278: “Following this unique fibrogenic step, IIFCs do not undergo cell death but undergo either osteogenesis or chondrogenesis”

      - Line 281-283: “During bone repair, this initial fibrogenic process is an integral part of the SSPC differentiation process, and a transitional step prior to osteogenesis and chondrogenesis.”

      Reviewer #3 (Public Review): 

      In this manuscript, the authors explored the transcriptional heterogeneity of the periosteum with single nuclei RNA sequencing. Without prior enrichment of specific populations, this dataset serves as an unbiased representation of the cellular components potentially relevant to bone regeneration. By describing single-cell cluster profiles, the authors characterized over 10 different populations in combined steady state and post-fracture periosteum, including stem cells (SSPC), fibroblast, osteoblast, chondrocyte, immune cells, and so on. Specifically, a developmental trajectory was computationally inferred using the continuum of gene expression to connect SSPC, injury-induced fibrogenic cells (IIFC), chondrocyte, and osteoblast, showcasing the bipotentials of periosteal SSPCs during injury repair. Additional computational pipelines were performed to describe the possible gene regulatory network and the expected pathways involved in bone regeneration. Overall, the authors provided valuable insights into the cell state transitions during bone repair and proposed sets of genes with possible involvements in injury response. 

      While the highlights of the manuscript are the unbiased characterization of periosteal composition, and the trajectory of SSPC response in bone fracture response, many of the conclusions can be more strongly supported with additional clarifications or extensions of the analysis.  

      (1) As described in the method section, both the steady-state data and full dataset underwent integration before dimensional reduction and clustering. It would be appreciated if the authors could compare the post-integration landscapes of uninjured cells between steady state and full dataset analysis. Specifically, fibroblasts were shown in Figure 1C and 1E, and such annotations did not exist in Figure 2B. Will it be possible that the original 'fibroblasts' were part of the IIFC population? 

      As suggested, we now identified the fibroblast population from the uninjured periosteum in the integration of datasets from all time points (Figure 5B and Fig. 5 – Supplementary Figure 2). We identified 4 fibroblast populations in the uninjured periosteum: Luzp2+, Cldn1+, Hsd11b1+ and Csmd1+ fibroblasts. Luzp2+ and Cldn1+ fibroblasts are clustering distinctly from the other populations in the integrated dataset. Hsd11b1+ fibroblasts blend with SSPCs and IIFCs in the integrated dataset probably due to the low cell number. Finally, Csmd1+ fibroblasts are clustering at the interface between SSPCs and IIFCs likely because they correspond to differentiating cells both in the uninjured periosteum and in response to fracture. We modified the resolution of clustering in our subset dataset, in order to represent Luzp2+ and Cldn1+ fibroblasts as an isolated cluster (Figure 5B, cluster 10). In addition, both pseudotime (Fig. 5B) and gene regulatory network analyses (Fig. 7D), show that the fibroblast populations are distinct from the activation trajectory of SSPCs. We added the following sentence to the text “Fibroblasts from uninjured periosteum (Hsd11b1+, Cldn1+ and Luzp2+ cells corresponding to cluster 10 of Fig. 5B) clustered separately from the other populations, suggesting the absence of their contribution to bone healing.”

      (2) According to Figure 2, immune cells were taking a significant abundance within the dataset, specifically during days 3 & 5 post-fracture. It will be interesting to see the potential roles that immune cells play during bone repair. For example, what are the biological annotations of the immune clusters (B, T, NK, myeloid cells)? Are there any inflammatory genes or related signals unregulated in these immune cells? Do they interact with SSPC or IIFC during the transition?   

      In this manuscript, we report the overall dataset and focused our analyses on the response of SSPCs to injury and their differentiation trajectories. We did not include detailed analyses of the immune cell populations, that are out of scope of this manuscript and are part of another study (Hachemi et al, biorxiv, 2024)

      (3) The conclusion of Notch and Wnt signaling in IIFC transition was not sufficiently supported by the analysis presented in the manuscript, which was based on computational inferences. It will be great to add in references supporting these claims or provide experimental validations examining selected members of these pathways.

      The role of Wnt and Notch in bone repair has been widely studied and both signaling pathways are known to be regulators of SSPCs differentiation (Lee et al. 2021; Matthews et al. 2014; Novak et al. 2020; Wang et al. 2016; Kraus et al. 2022; Dishowitz et al. 2012; Junjie Cao et al. 2017; Matsushita et al. 2020; Steven Minear et al. 2010; Steve Minear et al. 2010; Kang et al. 2007; Komatsu et al. 2010). It was previously shown that Notch inactivation at early stages of repair leads to bone non-union while Notch inactivation in chondrocytes and osteoblasts does not significantly affect healing, confirming its role in SSPC differentiation before osteochondral commitment (Wang et al. 2016). Wnt was shown to be a critical driver of osteogenesis (Matsushita et al. 2020; Steve Minear et al. 2010; Steven Minear et al. 2010; Kang et al. 2007; Komatsu et al. 2010), as Wnt inhibition alters bone formation and Wnt overactivation increases bone formation (Pinzone et al. 2009; Balemans et Van Hul 2007). The role of Wnt is specific to osteogenic engagement as Wnt inhibition promotes chondrogenesis (Hsieh et al. 2023; C.-L. Wu et al. 2021; Ruscitto et al. 2023). A study by Lee et al. recently confirmed the successive activation and crosstalk of Notch and Wnt pathways during osteogenic differentiation of SSPCs during bone healing (Lee et al. 2021). They showed a peak of Notch activation at day 3 post-injury followed by a progressive decrease that parallels an increase of Wnt signaling inducing osteogenic differentiation. These studies correlate with the sequential activation of Notch and Wnt observed in our snRNAseq analyses. Our analyses now reveal how this sequential activation of Notch and Wnt relates to the fibrogenic and osteogenic phase of SSPC differentiation respectively. We clarified this in the discussion and added the references above to support our claims. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors): 

      (1) The manuscript is well-written overall. However, the authors often oversimplify outcomes and overstate the results. Some of the statements (delineated below) need to be recalibrated to be in line with the presented data. 

      In addition to the suggested conclusions, we also toned down the following ones to avoid overstating our results :

      Line 24: suggesting a crucial paracrine role of this transient IIFC population

      Line 227: suggesting their central role in mediating cell interactions after fracture

      line 243: IIFCs produce paracrine factors that can regulate SSPCs

      - Line 77 (86): The authors should add "might" before "correspond to". 

      We provided new sets of data including CFU experiments and transplantation assay to reinforce our conclusion. We replaced “correspond to” by “encompass”

      - Line 102: SSPCs are obviously not "absent" in day 3 snRNAseq (Figure 2d). The percentage dropped (only) 75%, according to Figure 2e, which is far from disappearance. Overall, immunohistochemical staining is often dichotomous with snRNAseq designations. The authors should more carefully describe the results. 

      We agree that this comment may not reflect the data shown as we observe a strong decrease in the percentage of cells in SSPC clusters, but still detect few cells in the SSPC clusters. However, when we looked at the presence of Sca1+ Pi16+ cells at different time points, we confirmed the absence of cells expressing SSPC signature genes (Sca1, Pi16, Cd34) at day 3 injury. Due to the clustering resolution of the combined integration, some cells in the SSPC clusters might not be Sca1+ Pi16+. We now show these results in Fig. 4 – Supplementary Figure 2. We changed the text accordingly (line 120): “We did not detect Pi16-expressing SPPCs, consistent with the absence of cells expressing SSPC markers in the day 3 snRNAseq dataset compared to uninjured periosteum (Fig. 4 – Supplementary Figure 2)”.

      - Line 134: The authors need to clearly state that GFP+IIFCs were isolated based on Prx1CreGFP+CD146-. The authors did not clearly demonstrate the relationship between POSTN+ cells and CD146- cells, which poses concerns about the interpretation of transplantation experiments. 

      As mentioned above in response to reviewer 1-public review, we have clarified and provided additional information on our strategy to isolate SSPCs and IIFCs. We used the Prx1Cre; R26mTmG mice to mark all SSPCs and their derivatives with the GFP reporter in order to trace these populations after cell grafting. In the uninjured periosteum, Sca1 (Ly6a) is only expressed by SSPCs and endothelial cells. We sorted GFP+Sca1+ cells to exclude endothelial cells. For IIFCs, we isolated cells at day 3 post-fracture, as in our snRNAseq data, we detect IIFCs but no SSPCs, chondrocytes or osteoblasts at this time point. However, we also detected pericytes that can be Prx1-derived. To eliminate potential pericyte contamination, we sorted GFP+ CD146- cells, as CD146 is specifically expressed by pericytes. We added Figure 6-supplementary Figure 1 to better illustrate the expression of Prx1, SCA1 (Ly6a) and CD146 (Mcam) in the uninjured and day 3 post-fracture datasets. We further demonstrate the purity of SSPCs and IIFCs isolation by qPCR on sorted GFP+ Sca1+ cells from uninjured periosteum and GFP+ CD146- cells from day 3 postfracture periosteum and hematoma and confirmed the absence of contamination by other cell populations (Figure 6-Supplementary figure 1E). We made the following changes in the text (line 153): “To functionally validate the steps of pSSPC activation, we isolated SCA1+ GFP+ pSSPCs from Prx1Cre; R26mTmG mice, excluding endothelial cells, and grafted them at the fracture site of wild-type hosts” and “we isolated GFP+ CD146- from the fracture callus of Prx1Cre; R26mTmG mice at day 3 post fracture, that correspond to IIFCs without contamination by pericytes (CD146+ cells) (Fig. 6C, Figure 6 – Supplementary Fig.1).

      - Line 211: It is obvious from Figure 8F that ligand expression was not "specific" to the IIFC phase.

      The data only shows a slight enrichment of ligand score. 

      We corrected the text by “ligand expression was increased during the IIFC phase”.

      (2) Some of the computational predictions are incongruent with the known lineage trajectory. For example, in vivo lineage tracing experiments, including but not limited to, PLoS Genet. 2014. 10:e1004820, demonstrate that some of the chondrocytes within fracture callus can differentiate into osteoblasts. This is incompatible with the authors' conclusion that osteoblasts and chondrocytes represent two different terminal stages of cell differentiation in fracture healing. How do the authors reconcile this apparent inconsistency? 

      In this manuscript, we generated datasets corresponding to the initial stages of bone repair until day 7 post-injury. Therefore, our analyses encompass SSPC activation stages and engagement into osteogenesis and chondrogenesis. The results show that a portion of osteoblasts in the fracture callus are differentiating directly from IIFC via intramembranous ossification. The reviewer is correct to mention that osteoblasts have also been shown to derive from transdifferentiation of chondrocytes, which occurs at later stages of repair during the active phase of endochondral ossification (Julien et al. 2020; Aghajanian et Mohan 2018; Zhou et al. 2014; Hu et al. 2017). This process of chondrocyte to osteoblast transdifferentiation is not represented in our integrated dataset and may require adding later time points. However, when we analyzed the days 5 and 7 datasets independent of days 0 and 3, we were able to identify a cluster of hypertrophic chondrocytes (expressing Col10a1) connecting the clusters of chondrocytes and osteoblasts. This suggests that in this cluster, hypertrophic chondrocytes are undergoing transdifferentiation into osteoblasts as shown in the Author response image 1. Additional time points are needed in a future study to perform in depth analyses of chondrocyte transdifferentiation. 

      Author response image 1.

      Periosteum-derived chondrocytes undergo cartilage to bone transformation. A. UMAP projection of the subset of SSPCs, IIFCs, osteoblasts and chondrocytes in the integration of days 5 and 7 post-fracture datasets. B. Feature plots of Acan, Col10a1 and Ibsp expression.  C. UMAP projection separated by time points. D. Percentage of cells in the hypertrophic/differentiating chondrocyte cluster.

      (3) The authors did not cite some of the studies that described the roles of Notch signaling in fracture healing, for example, J Bone Miner Res. 2014. 29:1283-94. The authors should test the specificity of Notch signaling activities to IIFCs (POSTN+ cells) in vivo. 

      The role of Notch in the activation of SSPCs during bone repair has been investigated in several studies (Lee et al. 2021; Matthews et al. 2014; Novak et al. 2020; Wang et al. 2016; Kraus et al. 2022; Dishowitz et al. 2012; Junjie Cao et al. 2017). Notch dynamic was previously described with a peak at day 3 post-injury before a reduction when cells engage in osteogenesis and chondrogenesis (Lee et al. 2021; Dishowitz et al. 2012; Matthews et al. 2014). Notch plays a role in the early steps of SSPC activation prior to osteochondral differentiation as Notch inactivation in chondrocytes and osteoblasts does not affect bone repair (Wang et al. 2016). We added the references listed above to emphasize the correlation between our results and previous reports on the role of Notch and made changes in the discussion.

      Reviewer #2 (Recommendations For The Authors): 

      Suggestions 

      (1) This research utilized snRNA seq for the basic hypothesis formation; however, the number of nuclei acquired was quite limited. Therefore, please explain the rationale for employing snRNA seq instead of scRNA seq, which includes cytoplasm, and additionally provide the markers used for cell type mapping in the scRNA analysis.  

      As mentioned in our response to reviewer #1 above, we analyzed a total of 6,213 nuclei from uninjured periosteum and fracture calluses at 3 stages of bone healing. We were able to describe 11 distinct cell populations including rare cell types in the fracture environment such Schwann cells, adipocytes and pericytes. The number of nuclei was sufficient to perform extensive analysis using a combination of cutting-edge algorithms. We agree that more nuclei would allow more indepth analyses of cell fate transitions and rare populations, such as pericytes and Schwann cells. However, we concentrated here on SSPC/fibrogenic cell that are well represented in our dataset. Our study robustness is also reinforced by the analysis of 4 successive time points to define the SSPC/fibrogenic cell trajectories. Our validations using immunohistochemistry and transplantation assays also confirmed that our dataset is sufficient to define cell trajectories. There is no clear consensus on the number of cells needed to perform scRNAseq analyses, as it depends on the cell types analyzed and the fold changes in gene expression. Previously reported single cell datasets containing a lower number of cells reached major conclusions including SSPC identification, cell differentiation trajectories and differential gene expression (658 cells in(Debnath et al. 2018), 300 in (Ambrosi et al. 2021) around 175 in(Remark et al. 2023))

      Several studies have shown that snRNAseq provide data quality equivalent to scRNAseq in terms of cell type identification, number of detected genes and downstream analyses (Selewa et al. 2020; Wen et al. 2022; Ding et al. 2020; H. Wu et al. 2019; Machado et al. 2021). While, snRNAseq do not allow the detection of cytoplasm RNA, there is several advantages in using this technique: 

      (1) better representation of the cell types. To perform scRNAseq, a step of enzymatic digestion is needed. This usually leads to an overrepresentation of some cell types loosely attached to the ECM (immune cells, endothelial cells) and a reduced representation of cell types strongly attached to the ECM, such as chondrocytes and osteoblasts. In addition, large or multinucleated cells like hypertrophic chondrocytes and osteoclasts are too big to be sorted and encapsidated using 10X technology. Here, we optimized a protocol to mechanically isolate nuclei from dissected tissues that allows us to capture the diversity of cell types in periosteum and fracture callus.

      (2) higher recovery of nuclei. We performed both isolation of cells and nuclei from periosteum in our study and observed that nuclei extraction is the most efficient way to isolate cells from the periosteum and the fracture callus.

      (3) reduction of isolation time and cell stress. Previous studies showed that enzymatic digestion causes cell stress and induces stem cell activation (Machado et al. 2021; van den Brink et al. 2017). Therefore, we decided to perform snRNAseq to analyze the transcriptome of the intact periosteum without digestion induced-biais.

      We added this sentence in the result section: “Single nuclei transcriptomics was shown to provide results equivalent to single cell transcriptomics, but with better cell type representation and reduced digestion-induced stress response (Selewa et al. 2020; Wen et al. 2022; Ding et al. 2020; H. Wu et al. 2019; Machado et al. 2021)”.

      The list of genes used for cell type mapping are presented in Figure 3 – Supplementary figure 1. We added a detailed dot plot as Figure 3 – Supplementary figure 2.

      (2) During the fracture healing process of long bones, the influx of fibroblasts is a relatively common occurrence, and the fibrous callus that forms during bone repair and regeneration is reported to disappear over time. Therefore, inferring that IIFC differentiates into osteo- and chondrogenic cells based solely on their simultaneous appearance in the same time and space is challenging. More detailed validation is necessary, beyond what is supported by bioinformatics analysis. 

      The first step of bone repair is the formation of a fibrous callus, before cartilage and bone formation. There are no data in the literature demonstrating that an influx of fibroblasts occurs at the fracture site. Several studies now show that cells involved in callus formation are recruited locally (i.e. from the bone marrow, the periosteum and the skeletal muscle surrounding the fracture site) (Duchamp de Lageneste et al. 2018; Julien et al. 2021; Colnot 2009; Jeffery et al. 2022; Debnath et al. 2018; Matsushita et al. 2020; Julien et al. 2022; Matthews et al. 2021). The contribution of locally activated SSPCs to the fibrous callus is less well understood. Lineage tracing shows that GFP+ cell populations traced in Prx1Cre-GFP mice include SSPCs, IIFCs, chondrocytes and osteoblasts.

      The timing of the cell trajectories observed in our dataset correlates with the timing of callus formation previously described in the literature as the day 3 post-fracture mostly contains IIFCs while chondrocytes and osteoblasts appear from day 5 post-fracture. We conclude that IIFCs differentiate into osteochondrogenic cells based on multiple evidence beside the simultaneous appearance in time and space:

      - In silico trajectory analyses identify a trajectory from SSPCs to osteochondrogenic cells via IIFCs. We added an analysis to show that our pseudotime trajectory parallels the timepoints of the dataset, confirming that the differentiation trajectory follows the timing of cell differentiation (Figure 5D).

      - We show that IIFCs start to express chondrogenic and osteogenic genes prior to engaging into chondrogenesis and osteogenesis. In addition, we detected activation of osteo- and chondrogenic specific transcription factors in IIFCs. This shows a differentiation continuum between SSPCs, IIFCS, and osteochondrogenic cells (Figures 6-8).

      - Using transplantation assay, we showed that IIFCs form cartilage and bone, therefore reinforcing the osteochondrogenic potential of this population (Figure 6B).

      - IIFCs do not undergo apoptosis. We assessed the expression of apoptosis-related genes by IIFCs and did not detect expression. This was confirmed by cleaved caspase 3 immunostaining showing that a very low percentage of cells in the early fibrotic tissue undergo apoptosis. 

      Therefore, the idea that the initial fibrous callus is replaced by a new influx of SSPCs or committed progenitors is not supported by recent literature and is not observed in our dataset containing all cell types from the periosteum and fracture site. Overall, our bioinformatic analyses combined with our in vivo validation strongly support that IIFCs are differentiating into chondrocytes and osteoblasts during bone repair. Additional in vivo functional studies will aim to further validate the trajectory and investigate the critical factors regulating this process.

      (3) The influx of most osteogenic progenitors to the bone fracture site typically appears after postfracture day 7. It's essential to ascertain whether the osteogenic cells observed at the time of this study differentiated from IIFC or migrated from surrounding mesenchymal stem cells. 

      As mentioned above, there is not clear evidence in the literature indicating an influx of osteoprogenitors. Cells involved in callus formation are recruited locally and predominantly from the periosteum (Duchamp de Lageneste et al. 2018; Julien et al. 2021; Colnot 2009; Jeffery et al. 2022; Debnath et al. 2018; Matsushita et al. 2020; Matthews et al. 2021; Julien et al. 2022). Our datasets therefore include all cell populations that form the callus. Other sources of SSPCs include the surrounding muscle that contributes mostly to cartilage, and bone marrow that contributes to a low percentage of the callus osteoblasts in the medullary cavity (Julien et al. 2021; Jeffery et al. 2022). We provide evidence that IIFCs give rise to osteogenic cells using our bioinformatic analyses and in vivo transplantation assay (listed in the response above). As indicated in our response to reviewer #1, the steps leading to osteogenic differentiation observed in our dataset reflect the first step of callus ossification and correspond to the process of intramembranous ossification (up to day 7 post-injury). Endochondral ossification also contributes to osteoblasts including the transdifferentiation of chondrocytes into osteoblasts (Julien et al. 2020; Zhou et al. 2014; Hu et al. 2017). While this process mostly occurs around day 14 postfracture, we begin to detect this transition in our integrated day 5-day 7 dataset as shown in Author response image 1. 

      (4) It's crucial to determine whether the IIFC appearing at the fracture site contributes to the formation of the callus matrix or undergoes apoptosis during the fracture healing process. In the early steps of bone repair, the callus is mostly composed of an extracellular matrix (ECM). IIFCs are expressing high levels of ECM genes, including Postn, Aspn and collagens (Col3a1, Col5a1, Col8a1, Col12a1) (Figure 3 – Supplementary Figures 1-2 and Fig. 7 – Supplementary Figure 1B). IIFCs are the cells expressing the highest levels of matrix-related genes compared to the other cell types in the fracture environment (i.e. immune cells, endothelial cells, Schwann cells, pericytes, …) as shown now in Fig. 7 – Supplementary Figure 1A. Therefore, IIFCs are the main contributors to the callus matrix.

      We investigated if IIFCs undergo apoptosis. We observed that only a low percentage of IIFCs express apoptosis-related genes and are positive for cleaved caspase 3 immunostaining at days 3, 5 and 7 of bone repair. This shows that IIFCs do not undergo apoptosis and reinforces our model in which IIFCs further differentiate into osteoblasts and chondrocytes. We added these data in Fig. 7 – Supplementary Figure 2 and added the sentence in the results section “Only a small subset of IIFCs undergo apoptosis, further supporting that IIFCs are maintained in the fracture environment giving rise to osteoblasts and chondrocytes (Fig. 7 – Supplementary Figure 2).” 

      (5) Results from the snRNA seq highlight the paracrine role of IIFC, and verification is needed to ensure that the effect this has on surrounding osteogenic lineages is not misinterpreted.  

      To assess cell-cell interactions, we used tools such as Connectome and CellChat to infer and quantify intercellular communication networks between cell types. Studies showed the robustness of these tools combined with in vivo validation (Sinha et al. 2022; Alečković et al. 2022; Li et al. 2023). Here we used these tools to illustrate the paracrine profile of IIFCs, but in vivo validation would be required using gene inactivation to assess the requirement of individual paracrine factors. We performed extensive analyses of the crosstalk between immune cells and SSPCs using our dataset in another study combined with in vivo validation, showing the robustness of the tool and the dataset (Hachemi et al. 2024). We adjusted our conclusions to reflect our analyses: “suggesting a crucial paracrine role of this transient IIFC population during fracture healing”, “suggesting their central role in mediating cell interactions after fracture”, “suggesting that SSPCs can receive signals from IIFC”. 

      References

      Aghajanian, Patrick, et Subburaman Mohan. 2018. “The Art of Building Bone: Emerging Role of Chondrocyte-to-Osteoblast Transdifferentiation in Endochondral Ossification“. Bone Research 6 (1): 19. https://doi.org/10.1038/s41413-018-0021-z.

      Alečković, Maša, Simona Cristea, Carlos R. Gil Del Alcazar, Pengze Yan, Lina Ding, Ethan D. Krop, Nicholas W. Harper, et al. 2022. “Breast Cancer Prevention by Short-Term Inhibition of TGFβ Signaling“. Nature Communications 13 (1): 7558. https://doi.org/10.1038/s41467-02235043-5.

      Ambrosi, Thomas H., Owen Marecic, Adrian McArdle, Rahul Sinha, Gunsagar S. Gulati, Xinming Tong, Yuting Wang, et al. 2021. “Aged Skeletal Stem Cells Generate an Inflammatory Degenerative Niche”. Nature 597 (7875): 256‑62. https://doi.org/10.1038/s41586-021-03795-7.

      Baccin, Chiara, Jude Al-Sabah, Lars Velten, Patrick M. Helbling, Florian Grünschläger, Pablo Hernández-Malmierca, César Nombela-Arrieta, Lars M. Steinmetz, Andreas Trumpp, et Simon Haas. 2020. “Combined Single-Cell and Spatial Transcriptomics Reveal the Molecular, Cellular and Spatial Bone Marrow Niche Organization”. Nature Cell Biology 22 (1): 38‑48. https://doi.org/10.1038/s41556-019-0439-6.

      Balemans, Wendy, et Wim Van Hul. 2007. “The Genetics of Low-Density Lipoprotein ReceptorRelated Protein 5 in Bone: A Story of Extremes”. Endocrinology 148 (6): 2622‑29. https://doi.org/10.1210/en.2006-1352.

      Brink, Susanne C van den, Fanny Sage, Ábel Vértesy, Bastiaan Spanjaard, Josi Peterson-Maduro, Chloé S Baron, Catherine Robin, et Alexander van Oudenaarden. 2017. “Single-Cell Sequencing Reveals Dissociation-Induced Gene Expression in Tissue Subpopulations”. Nature Methods 14 (10): 935‑36. https://doi.org/10.1038/nmeth.4437.

      Cao, Junjie, Yalin Wei, Jing Lian, Lunyun Yang, Xiaoyan Zhang, Jiaying Xie, Qiang Liu, Jinyong Luo, Baicheng He, et Min Tang. 2017. ”Notch Signaling Pathway Promotes Osteogenic Differentiation of Mesenchymal Stem Cells by Enhancing BMP9/Smad Signaling”. International Journal of Molecular Medicine 40 (2): 378‑88. https://doi.org/10.3892/ijmm.2017.3037.

      Cao, Junyue, Malte Spielmann, Xiaojie Qiu, Xingfan Huang, Daniel M. Ibrahim, Andrew J. Hill, Fan Zhang, et al. 2019. ”The Single-Cell Transcriptional Landscape of Mammalian Organogenesis”. Nature 566 (7745): 496‑502. https://doi.org/10.1038/s41586-019-0969-x.

      Colnot, Céline. 2009. “Skeletal Cell Fate Decisions Within Periosteum and Bone Marrow During Bone Regeneration”. Journal of Bone and Mineral Research 24 (2): 274‑82. https://doi.org/10.1359/jbmr.081003.

      Debnath, Shawon, Alisha R. Yallowitz, Jason McCormick, Sarfaraz Lalani, Tuo Zhang, Ren Xu, Na Li, et al. 2018. “Discovery of a Periosteal Stem Cell Mediating Intramembranous Bone Formation”. Nature 562 (7725): 133‑39. https://doi.org/10.1038/s41586-018-0554-8.

      Ding, Jiarui, Xian Adiconis, Sean K. Simmons, Monika S. Kowalczyk, Cynthia C. Hession, Nemanja D. Marjanovic, Travis K. Hughes, et al. 2020. “Systematic Comparison of Single-Cell and Single-Nucleus RNA-Sequencing Methods”. Nature Biotechnology 38 (6): 737‑46.

      https://doi.org/10.1038/s41587-020-0465-8.

      Dishowitz, Michael I., Shawn P. Terkhorn, Sandra A. Bostic, et Kurt D. Hankenson. 2012. “Notch Signaling Components Are Upregulated during Both Endochondral and Intramembranous Bone Regeneration”. Journal of Orthopaedic Research 30 (2): 296‑303. https://doi.org/10.1002/jor.21518.

      Duchamp de Lageneste, Oriane, Anaïs Julien, Rana Abou-Khalil, Giulia Frangi, Caroline Carvalho, Nicolas Cagnard, Corinne Cordier, Simon J. Conway, et Céline Colnot. 2018. “Periosteum Contains Skeletal Stem Cells with High Bone Regenerative Potential Controlled by Periostin”. Nature Communications 9 (1): 773. https://doi.org/10.1038/s41467-018-03124-z.

      Hsieh, Chen-Chan, B. Linju Yen, Chia-Chi Chang, Pei-Ju Hsu, Yu-Wei Lee, Men-Luh Yen, ShawFang Yet, et Linyi Chen. 2023. “Wnt Antagonism without TGFβ Induces Rapid MSC Chondrogenesis via Increasing AJ Interactions and Restricting Lineage Commitment”. iScience 26 (1): 105713. https://doi.org/10.1016/j.isci.2022.105713.

      Hu, Diane P., Federico Ferro, Frank Yang, Aaron J. Taylor, Wenhan Chang, Theodore Miclau, Ralph S. Marcucio, et Chelsea S. Bahney. 2017. “Cartilage to Bone Transformation during Fracture Healing Is Coordinated by the Invading Vasculature and Induction of the Core Pluripotency Genes”. Development 144 (2): 221‑34. https://doi.org/10.1242/dev.130807.

      Jeffery, Elise C., Terry L.A. Mann, Jade A. Pool, Zhiyu Zhao, et Sean J. Morrison. 2022. “Bone Marrow and Periosteal Skeletal Stem/Progenitor Cells Make Distinct Contributions to Bone Maintenance and Repair”. Cell Stem Cell 29 (11): 1547-1561.e6. https://doi.org/10.1016/j.stem.2022.10.002.

      Julien, Anais, Anuya Kanagalingam, Ester Martínez-Sarrà, Jérome Megret, Marine Luka, Mickaël Ménager, Frédéric Relaix, et Céline Colnot. 2021. “Direct contribution of skeletal muscle mesenchymal progenitors to bone repair”. Nature Communications 12 (1): 2860. https://doi.org/10.1038/s41467-021-22842-5.

      Julien, Anais, Simon Perrin, Oriane Duchamp de Lageneste, Caroline Carvalho, Morad Bensidhoum, Laurence Legeai-Mallet, et Céline Colnot. 2020. “FGFR3 in Periosteal Cells Drives Cartilage-to-Bone Transformation in Bone Repair”. Stem Cell Reports 15 (4): 955‑67. https://doi.org/10.1016/j.stemcr.2020.08.005.

      Julien, Anais, Simon Perrin, Ester Martínez-Sarrà, Anuya Kanagalingam, Caroline Carvalho, Marine Luka, Mickaël Ménager, et Céline Colnot. 2022. “Skeletal Stem/Progenitor Cells in Periosteum and Skeletal Muscle Share a Common Molecular Response to Bone Injury”. Journal of Bone and Mineral Research, juin, jbmr.4616. https://doi.org/10.1002/jbmr.4616.

      Kang, Sona, Christina N. Bennett, Isabelle Gerin, Lauren A. Rapp, Kurt D. Hankenson, et Ormond A. MacDougald. 2007. “Wnt Signaling Stimulates Osteoblastogenesis of Mesenchymal Precursors by Suppressing CCAAT/Enhancer-Binding Protein α and Peroxisome Proliferator Activated        Receptor γ”. Journal of Biological Chemistry 282 (19): 14515‑24. https://doi.org/10.1074/jbc.M700030200.

      Komatsu, David E., Michelle N. Mary, Robert Jason Schroeder, Alex G. Robling, Charles H. Turner, et Stuart J. Warden. 2010. “Modulation of Wnt Signaling Influences Fracture Repair”. Journal of Orthopaedic Research 28 (7): 928‑36. https://doi.org/10.1002/jor.21078.

      Hachemi, Yasmine, Simon Perrin, Maria Ethel, Anais Julien, Julia Vettese, Blandine Geisler, Christian Göritz, et Céline Colnot. 2024. “Multimodal Analyses of Immune Cells during Bone Repair Identify Macrophages as a Therapeutic Target in Musculoskeletal Trauma”. https://doi.org/10.1101/2024.04.29.591608.

      Kraus, Jessica M., Dion Giovannone, Renata Rydzik, Jeremy L. Balsbaugh, Isaac L. Moss, Jennifer L. Schwedler, Julien Y. Bertrand, et al. 2022. “Notch Signaling Enhances Bone Regeneration in the Zebrafish Mandible”. Development 149 (5): dev199995. https://doi.org/10.1242/dev.199995.

      Lee, S., L. H. Remark, A. M. Josephson, K. Leclerc, E. Muiños Lopez, D. J. Kirby, Devan Mehta, et al. 2021. “Notch-Wnt Signal Crosstalk Regulates Proliferation and Differentiation of Osteoprogenitor Cells during Intramembranous Bone Healing”. Npj Regenerative Medicine 6 (1): 29. https://doi.org/10.1038/s41536-021-00139-x.

      Li, Jiaoduan, Dongyan Cao, Lixin Jiang, Yiwen Zheng, Siyuan Shao, Ai Zhuang, et Dongxi Xiang. 2023. “ITGB2-ICAM1 Axis Promotes Liver Metastasis in BAP1-Mutated Uveal Melanoma with Retained Hypoxia and ECM Signatures”. Cellular Oncology (Dordrecht), décembre. https://doi.org/10.1007/s13402-023-00908-4.

      Logan, Malcolm, James F. Martin, Andras Nagy, Corrinne Lobe, Eric N. Olson, et Clifford J. Tabin. 2002. “Expression of Cre Recombinase in the Developing Mouse Limb Bud Driven by aPrxl Enhancer”. Genesis 33 (2): 77‑80. https://doi.org/10.1002/gene.10092.

      Machado, Léo, Perla Geara, Jordi Camps, Matthieu Dos Santos, Fatima Teixeira-Clerc, Jens Van Herck, Hugo Varet, et al. 2021.”Tissue Damage Induces a Conserved Stress Response That Initiates Quiescent Muscle Stem Cell Activation”. Cell Stem Cell 28 (6): 1125-1135.e7. https://doi.org/10.1016/j.stem.2021.01.017.

      Matsushita, Yuki, Mizuki Nagata, Kenneth M. Kozloff, Joshua D. Welch, Koji Mizuhashi, Nicha Tokavanich, Shawn A. Hallett, et al. 2020. “A Wnt-Mediated Transformation of the Bone Marrow Stromal Cell Identity Orchestrates Skeletal Regeneration”. Nature Communications 11 (1): 332. https://doi.org/10.1038/s41467-019-14029-w.

      Matthews, Brya G, Danka Grcevic, Liping Wang, Yusuke Hagiwara, Hrvoje Roguljic, Pujan Joshi, Dong-Guk Shin, Douglas J Adams, et Ivo Kalajzic. 2014. “Analysis of αSMA-Labeled Progenitor Cell Commitment Identifies Notch Signaling as an Important Pathway in Fracture Healing”. Journal of Bone and Mineral Research 29 (5): 1283‑94. https://doi.org/10.1002/jbmr.2140.

      Matthews, Brya G, Sanja Novak, Francesca V Sbrana, Jessica L Funnell, Ye Cao, Emma J Buckels, Danka Grcevic, et Ivo Kalajzic. 2021. “Heterogeneity of Murine Periosteum Progenitors Involved in Fracture Healing”. eLife 10 (février):e58534. https://doi.org/10.7554/eLife.58534.

      Minear, Steve, Philipp Leucht, Samara Miller, et Jill A Helms. 2010. “rBMP Represses Wnt Signaling and Influences Skeletal Progenitor Cell Fate Specification during Bone Repair”. Journal of Bone and Mineral Research 25 (6): 1196‑1207. https://doi.org/10.1002/jbmr.29.

      Minear, Steven, Philipp Leucht, Jie Jiang, Bo Liu, Arial Zeng, Christophe Fuerer, Roel Nusse, et Jill A. Helms. 2010. “Wnt Proteins Promote Bone Regeneration”. Science Translational Medicine 2 (29). https://doi.org/10.1126/scitranslmed.3000231.

      Novak, Sanja, Emilie Roeder, Benjamin P. Sinder, Douglas J. Adams, Chris W. Siebel, Danka Grcevic, Kurt D. Hankenson, Brya G. Matthews, et Ivo Kalajzic. 2020. “Modulation of Notch1 Signaling Regulates Bone Fracture Healing”. Journal of Orthopaedic Research 38 (11): 2350‑61. https://doi.org/10.1002/jor.24650.

      Pinzone, Joseph J., Brett M. Hall, Nanda K. Thudi, Martin Vonau, Ya-Wei Qiang, Thomas J. Rosol, et John D. Shaughnessy. 2009. “The Role of Dickkopf-1 in Bone Development, Homeostasis, and Disease”. Blood 113 (3): 517‑25. https://doi.org/10.1182/blood-2008-03-145169.

      Remark, Lindsey H., Kevin Leclerc, Malissa Ramsukh, Ziyan Lin, Sooyeon Lee, Backialakshmi Dharmalingam, Lauren Gillinov, et al. 2023. “Loss of Notch Signaling in Skeletal Stem Cells Enhances Bone Formation with Aging”. Bone Research 11 (1): 50. https://doi.org/10.1038/s41413-023-00283-8.

      Ruscitto, Angela, Peng Chen, Ikue Tosa, Ziyi Wang, Gan Zhou, Ingrid Safina, Ran Wei, et al. 2023. “Lgr5-Expressing Secretory Cells Form a Wnt Inhibitory Niche in Cartilage Critical for Chondrocyte Identity”. Cell Stem Cell 30 (9): 1179-1198.e7. https://doi.org/10.1016/j.stem.2023.08.004.

      Selewa, Alan, Ryan Dohn, Heather Eckart, Stephanie Lozano, Bingqing Xie, Eric Gauchat, Reem Elorbany, et al. 2020. “Systematic Comparison of High-Throughput Single-Cell and SingleNucleus Transcriptomes during Cardiomyocyte Differentiation”. Scientific Reports 10 (1): 1535. https://doi.org/10.1038/s41598-020-58327-6.

      Sinha, Sarthak, Holly D. Sparks, Elodie Labit, Hayley N. Robbins, Kevin Gowing, Arzina Jaffer, Eren Kutluberk, et al. 2022. “Fibroblast Inflammatory Priming Determines Regenerative versus Fibrotic Skin Repair in Reindeer”. Cell 185 (25): 4717-4736.e25. https://doi.org/10.1016/j.cell.2022.11.004.

      Wang, Cuicui, Jason A. Inzana, Anthony J. Mirando, Yinshi Ren, Zhaoyang Liu, Jie Shen, Regis J. O’Keefe, Hani A. Awad, et Matthew J. Hilton. 2016. “NOTCH Signaling in Skeletal Progenitors Is Critical for Fracture Repair”. The Journal of Clinical Investigation 126 (4): 1471‑81. https://doi.org/10.1172/JCI80672.

      Wen, Fei, Xiaojie Tang, Lin Xu, et Haixia Qu. 2022. “Comparison of Single‑nucleus and Single‑cell Transcriptomes in Hepatocellular Carcinoma Tissue”. Molecular Medicine Reports 26 (5): 339. https://doi.org/10.3892/mmr.2022.12855.

      Wu, Chia-Lung, Amanda Dicks, Nancy Steward, Ruhang Tang, Dakota B. Katz, Yun-Rak Choi, et Farshid Guilak. 2021. “Single Cell Transcriptomic Analysis of Human Pluripotent Stem Cell Chondrogenesis”. Nature Communications 12 (1): 362. https://doi.org/10.1038/s41467-02020598-y.

      Wu, Haojia, Yuhei Kirita, Erinn L. Donnelly, et Benjamin D. Humphreys. 2019. “Advantages of Single-Nucleus over Single-Cell RNA Sequencing of Adult Kidney: Rare Cell Types and Novel Cell States Revealed in Fibrosis”. Journal of the American Society of Nephrology 30 (1): 23‑32. https://doi.org/10.1681/ASN.2018090912.

      Zhong, Leilei, Lutian Yao, Robert J. Tower, Yulong Wei, Zhen Miao, Jihwan Park, Rojesh Shrestha, et al. 2020. “Single Cell Transcriptomics Identifies a Unique Adipose Lineage Cell Population That Regulates Bone Marrow Environment”. eLife 9 (avril):e54695. https://doi.org/10.7554/eLife.54695.

      Zhou, Xin, Klaus von der Mark, Stephen Henry, William Norton, Henry Adams, et Benoit de Crombrugghe. 2014. “Chondrocytes Transdifferentiate into Osteoblasts in Endochondral Bone during Development, Postnatal Growth and Fracture Healing in Mice”. Édité par Matthew L. Warman. PLoS Genetics 10 (12): e1004820. https://doi.org/10.1371/journal.pgen.1004820.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive comments on our manuscript and their appreciation of the results. We provide point-by-point responses bellow. For your convenience we highlight here the main changes to the manuscript.

      ·        More descriptive terminology for the contextual cues (Ctx.A / Ctx.noA is now referred to as LIGHT / DARK).

      ·        Schematic of experiment timeline highlighting the exclusion of non-discriminators following the initial acquisition period. This explains the absence of baseline sex differences post acquisition and clears up some misconceptions about lack of replicability.

      ·        New data (time in port preCS) showing that a prior reward does not cause continued presence in port.

      ·        Several text edits to address all the points raised by the reviewers.

      We hope that the editors and reviewers will be satisfied with this revised version and find the strength of the evidence more convincing.

      Reviewer #1 (Recommendations For The Authors):

      In relation to weaknesses points 1-4 in the public review:

      (1) With regards to the claim (page 4 of pdf), I think I can see what the authors are getting at when they claim "Only Ctx-dep.01 engages context-gated reward predictions", because the same reward is available in each context, and the animal must use contextual information to determine which cue will be rewarded. In other words, it has a discriminative purpose. In Ctx-dep.O1/O2, however, although the context doesn't serve a discriminative purpose in the sense that one cue will always earn a unique outcome, regardless of context, the fact that these cues are differentially rewarded in the different context means that animals may well form context-gated cue-outcome associations (e.g. CtxA-(CS1-O1), CtxnoA-(CS2-O2)). Moreover, the context is informative in this group in telling the animal which cue will be rewarded, even prior to outcome delivery, such that I don't think contextual information will fade to the background of the association and attention be lost to it in the way, say Mackintosh (1975) might predict. Therefore, I don't think this statement is correct.

      I suggest that the authors refine the statement to be more accurate.

      We agree with the reviewer —the context is absolutely relevant for rats trained in the Ctx-dep. O1/O2 task. We have edited the text in several places to make this clear. The question is how (by what mechanism) does the context participate in the control of behavior in this group. The reviewer correctly points out that, just like rats trained in the Ctx-dep. O1 task, rats trained in the Ctx-dep. O1/O2 might have formed context-gated cue-outcome associations. We now clearly acknowledge that in the text.

      However, because in this group the two outcomes are always encountered in different contexts, we argue that these rats could also have formed a direct association between the two contexts and the two outcomes. In other words, each context might directly evoke the expectation of a distinct reward outcome (prepare to drink, or prepare to eat). On a given trial, if the cue and context both tend to activate the same outcome representation, the converging cue+context excitation can add up. This would produce a context-sensitive response, but not via hierarchical modulation process (unlike Ctx-dep O1). Arguably, this last associative mechanism is much simpler and might explain why almost all rats in Ctx-dep. O1/O2 group learned the discrimination and at a much faster rate.

      Therefore, while rats trained in Ctx-dep O1/O2 might engage a combination of associative processes to achieve context-sensitive behavior (including hierarchical associations), only rats in the Ctx-dep O1 critically and unambiguously rely on hierarchical associations to achieve context-sensitive behavior.

      (2) I think the results shown in Figure 1 are very interesting, and well supported by the statistics. It's so nice to see a significant interaction, as so many papers try to report these types of effects without it. However, I do wonder how specific the results are to contextual modulation. That is, should a discriminative discrete cue be used instead of each context (e.g. CS1 indicates CS2 earns O1, CS3 indicates CS4 earns O1), would female rats still be as slow to learn the discrimination?

      I am just curious as to whether the authors have thoughts on this.

      We have not tested this and are not aware of a paper that examined this question specifically.

      However, we would like to point out that in the suggested design (CS1→[CS2→O1]; CS3→[CS4→O1]) the discriminative cues (CS1 and CS3) would almost certainly also acquire substantial reward-predictive value, either because of their direct association with the reward, or via second-order conditioning. This would complicate the interpretation of the results in terms of hierarchical associations. Incorporating non-rewarded presentation of CS1 and CS3 alone (i.e. extinguishing those cues, as is sometimes done in occasion setting experiments) would be one way to reduce the reward expectation evoked by those cues, but this approach has some limitations. Indeed, as mentioned by Rescorla (2006) “During extinction, the net associative strength of a stimulus declines to the level of [a response] threshold, but further decrement stops at that point”. So while extinguished CS1 and CS3 might no longer evoke overt behavioral responses, these cues could retain nonnegligible subthreshold excitatory connection with the US.  Individually, these cues might fail to evoke responding but could nonetheless increase responding during the CS1→CS2 trials (or CS3→CS4 trials), via simple summation. (Rescorla, 2006: “the compound of two [extinguished] stimuli has a strength that exceeds the threshold and so evokes responding”).

      This type of consideration is precisely why we opted for the behavioral task used in the study. In Ctx-dep. O1, the discriminative stimuli exert opposite effects on the two target cues, which rules out summation effects as a mechanism for context-sensitive behavior.

      (3) Pages 8-9 of pdf, where the biological basis or the delayed acquisition of contextual control in females is considered, I find this to be written from a place of assuming that what is observed in the males is the default behaviour. That is, although the estrous cycle and its effects on synaptic plasticity/physiology may well account for the results, is there not a similar argument to be made for androgens in males? Perhaps the androgens also somehow alter synaptic plasticity/physiology, leading to their faster speed, reduced performance stability, and increased susceptibility to stress.

      I would like the argument that female behaviour might be the default, and male behaviour the deviation to be considered in the discussion in addition to those already stated.

      We regret if we gave the impression that male behavior was the default. The paper is intended to report sex differences but we don’t view either sex as the default. To correct this impression, we have added a few sentences in the discussion to highlight male-hormonal factors as well as non-gonadal genetic factors that might have contributed to the observed sex differences.

      (4) In addition, the OFC - which is the brain region found to have differential expression of c-fos in males and females in Figure 5 - is not explicitly discussed with regard to the biological mechanisms of differences, which seems odd.

      I suggest OFC be discussed with regard to biological mechanisms of differences.

      We added a few sentences in the discussion to i) highlight the parallel between our study and human fMRI studies showing superior OFC activation in females during the regulation of emotional responses, ii) Suggest a potential relationship between the reported sex differences (speed of acquisition, robustness of performance, and OFC activation in context-gated reward prediction), iii) acknowledge our ignorance of the root causes of these sex differences.

      We wish we could offer a better answer. We have attempted to offer possible proximal explanations for the observed sex differences, but ultimately our work did not address the root causes of these behavioral and neural sex differences. Therefore we feel that further attempts to explain these differences would be too speculative.

      (5) I did wonder if the authors were aware that in the Rescorla-Wagner model, contextual stimuli are thought to summate with discrete cues to enter into the association with the outcome (i.e., the error term is between lambda and sigmaV, with sigmaV the 'summation' of all stimuli present on a trial, including contextual stimuli). Typically, this is not considered much, because the cue itself is so salient and more consistently paired with reward (whereas the ever-present context is often paired with no reward), but nevertheless, it is a part of the association. I'm not sure it's wrong to say that the background circumstances under which events occur are thought to play little role (as in the second sentence of the introduction), but I was wondering if the authors were aware of this fact when they wrote that.

      This sentence in the introduction was meant to introduce the distinction between eliciting stimuli and modulating contexts. Admittedly, this paints a naive picture, which we now acknowledge (we hope that the rest of the paper provides more nuance). As pointed out by this reviewer, the context is also a stimulus, and, just like any other stimulus, it is eligible for direct association with an outcome. The possibility for direct context→outcome association is precisely the rational for the Ctx-dep O1/O2 group.

      (6) Context-noA - Seems a little confusing for a name, why not just call it context B? NoA appears to imply that nothing happens in A or no outcome is available, whereas this is not always the case.

      We debated which terminology to use. We felt that “Context A vs. Context B” should perhaps be reserved to situations where the global context changes (e.g. two different conditioning boxes with different odors, floor texture etc., with proper counterbalancing procedures). We felt that “Context A vs noA” might be more appropriate here, as we are manipulating the local context by introducing (or removing) one single stimulus (the houselight). In this revised version we followed this reviewer’s advice and adopted a more descriptive, and hopefully less confusing, terminology: "Light vs Dark”.

      (7) Why is it that in the text the Ctx-dep O1/O2 is explained before simple and no discrimination, but in the Figure Ctx-dep O1/O2 is shown last? These should be consistent.

      Thanks for pointing that out. We have switched the order of task description to be consistent with the figures.

      (8) Page 6 (of pdf) - could the authors elaborate a little on why or how (or both) the delivery of reward can interfere with the expression of context-dependent discrimination? Do they just mean the performance of discrimination (e.g., animals will sit at the food port longer if there is food there because they are sitting there and eating it, which does not necessarily reflect the expectation of food based on cue presentations?), in which case it is not the discrimination itself that is being interfered with, just the measure of it. Perhaps the authors could elaborate by just inserting a sentence.

      We have added a few sentences to discuss this effect.

      The first clarification that we can make is that the reduced discrimination performance following reward is not simply due to animals’ continued presence in the reward port. We have added the time pre-cue to Fig. 3 B-F. This measure is not affected by previous reward history, showing that rats are leaving the port between trials.

      So what is driving this effect? At this stage, we are agnostic about the mechanism(s) for this effect. Kuchibhotla et al. (2019) —who first reported a similar effect— proposed a model in which recent rewards modify the threshold for behavioral responses (i.e. performance). In this model, a cue might evoke a weak reward prediction but evoke a strong behavioral response if presented after a reward. Additionally, we believe that learning factors might also contribute to the effect reported here. Indeed, the behavioral response on a given trial likely reflects the balance of hierarchical (context-dependent) associations vs. direct associations (Bradfield and Balleine, 2013). Naturally, this balance is dynamic and influenced by trial history. For instance, a Light:X+ trial might increase the value of cue X and promote responding during the following Dark:X- trial. The same logic could be applied to the influence of the context (e.g., Light:X+ trial might promote responding to a subsequent Light:Y- trial). We are currently working on a computational model that captures the dynamic interplay between hierarchical associations and direct associations. We hope that this model will provide some insight into the learning/performance mechanism for the effects reported here. However this computational work is still in the early stages and beyond the scope of the present study.

      (9) The lack of effect in the Ctx-dep O1/O2 groups in Figure 4 could be due to a lack of power - the group sizes are a lot smaller for this group than for Ctx-dep O1 where an interaction was detected. I think this should be at least addressed in the discussion (i.e., that this lack of effect is possibly due to less power here, as the effects are in the same direction).

      Good point. We now acknowledge this limitation in the text.

      Reviewer #2 (Recommendations For The Authors):

      (1) Please comment on the failure to replicate the sex differences across experiments. Perhaps this is due to some change in the training procedure that is briefly mentioned in the methods (a reduction in the number of rewarded trials) but it is unclear.

      The reviewer correctly observed that Fig. 3-5 do not show sex differences in baseline condition. This is not because of a replication failure, but because non-discriminating subjects were excluded from the experiment at the end of the acquisition period (after 72 training sessions). We now clarify this in the Method and Results section. We also added a schematic of the experiment timeline that highlights the exclusion of non-discriminators at the end of the acquisition period (Fig 1).

      On the topic of replicability, the data for Ctx-dep O1 was collected over 3 cohorts (over the course of 2 years) and the sex difference pattern was consistent.  For instance, the proportion of discriminators vs. non-discriminators for males and females trained in Ctx-dep O1, showed similar patterns across cohorts (see below).

      Author response table 1.

      (2) The design of this experiment makes it possible to analyse whether there is a differential outcome effect (DOE). The DOE would indeed predict better discrimination in group cxt-dep O1/O2 versus cxt-dep O1, which seems to be exactly what the authors observe although between-group statistics are not reported. Inspection of Figure 1 suggests that there may be a DOE in females but not in males. I wonder if the authors might consider reanalysing the data to check this.

      Indeed, there is clearly a differential outcome effect. We now point out this DOE in relation to the latency to achieve discrimination criterion (Fig. 2 C-D). Rats in the Ctx-dep. O1/O2 group acquired discrimination (reached criterion) much faster than rats in in the Ctx-dep. O1 group.

      Following the reviewer’s suggestion, we provide here the results of targeted ANOVAs (focusing exclusively on Ctx-dep. O1 and Ctx-dep. O1/O2) to investigate a potential sex-dependent effect of DOE (i.e. Sex x Task interactions), see figure below. A three-way ANOVA (Sex x Task x Session) conducted on the discrimination index reveal a main effect of Task (F1, 86 \= 173.560, P < 0.001), Session (F2.678, 230.329 \= 140.479, P<0.001) and a marginal effect of Sex (F1,86 = 3.929, P = 0.051), but critically no Task x Sex or Task x Sex x Session interaction (P ≥ 0.504). A two-way ANOVA (Sex x Task) conducted on the sessions to criterion revealed a main effect of both factors (Sex F1, 63 = 9.52, P = 0.003; Task F1, 62 = 184.143, P < 0.001) but critically, no Sex x Task interaction (P = 0.233).  These results indicate that the use of two different outcomes clearly facilitated the acquisition of context-dependent discrimination (DOE effect), but this effect benefited both sexes equally. We thank the reviewer for recommending this analysis.

      Author response image 1.

      Differential outcome effect (DOE) affects males and females equally. A. Discrimination ratio over the acquisition period. B. trials to criterion. Compared to animals trained with a single outcome (Ctx-dep. O1), the introducing dissociable outcomes for the two type of rewarded trials (Ctx-dep. O1/O2) profoundly facilitated the acquisition of discriminated behavior. This effect benefited both sexes equally.

      (3) Some minor points for clarification that the authors may also wish to address:

      - Figure 3: is data presented from sessions 71-80 only or for all sessions? I didn't fully follow the explanation offered in the results section.

      That’s right. The data presented in Fig. 3 considers only sessions 71-80, in discriminator rats —when performance is globally stable. We have edited the text to make this clearer. These 10 sessions represent a total of 800 trials (=10 session * 80 trials). The first trial of a session what not included in the analysis since it was not preceded by any trial. For the remaining 790 trials (10 session x 79 trials), we examined how the outcome of the past trial (reward or nonrewarded) influenced responding on the next trial.  This large sample size (790 trials / rat) was required to ensure that enough data was collected for each possible trial history scenario.

      - The authors argue that females are protected from the disrupting effect of stress. It might be useful if the authors offer further explanation as to what they mean by "protected".

      By “protected”, we simply mean “less sensitive”. We have reworded this sentence in that way. We do not claim to have an understanding of the precise mechanism for this sex dependent effect (although our data point to a possible role of the OFC).

      - The authors state that "delivery of reward, while critical for learning, can also interfere with the expression of context-dependent discrimination". This statement should be explained in further detail. For instance, why should reward delivery specifically impair context-dependent discrimination but not other forms of discrimination?

      We have reworded this sentence to be more inclusive. Indeed, delivery of reward also interferes with other forms of discrimination, particularly when discrimination performance is not yet optimal. We have also added a paragraph to discuss the possible mechanisms by which reward might interfere with discrimination performance in our task.   

      Reviewer #3 (Recommendations For The Authors):

      I do not suggest additional experiments, but I do hope you continue the behavioral work to characterize what is being learned in the task. I think the approach is promising. I would suggest reporting the % time in port and port entries for the entire CS. There is no justification for only analyzing the response in the last 5s.

      We thank the reviewer for the encouragement.

      We opted to focus on the time in port for two main reasons:

      (1) This measure is relatively consistent across the two different reward outcomes (unlike the rate of port entries). Indeed, consistent with prior studies (Delamater et al., 2017), we observed that the type of reward (solid or liquid) influences the topography of the anticipatory magazine-directed behavior. Specifically, cues paired with pellets elicited significantly more port entries than cues paired with chocolate milk. The opposite pattern was observed for time in port --cues paired with chocolate milk elicited more sustained time in port compared to cues paired with pellets (see figure below). While these measures (port entries and time in port) show opposite bias for the two possible outcomes, the size of this bias is much smaller for the time in port (Cohen’s d effect size: port entries: 1.41; time in port: 0.62). As a result, the discrimination ratio calculated from Time in port is consistent across the two outcomes (P = 0.078; effect size: 0.07), which is not the case for the discrimination ratio calculated from port entries (P = 0.007; effect size 0.32 see figure below).

      (2) Unlike the rate of port entries, the time in port shows monotonic increase during training in these tasks. Indeed, we observed here and in past work (Keiflin et al., 2019), that the rate of port entries initially increases with training, but then slightly decreases; particularly for cues paired with liquid reward. In contrast, the time in port continues to increase, or remains high, with extended training. This is easy to understand if we consider the extreme case of a hypothetical rat that might enter the port once upon cue presentation and maintain continued presence in port for the whole cue duration. This rat would have a relatively low rate of port entry (a single port entry per trial) but a high time in port.

      This is not to say that the rate of port entries is not a valid measure overall (we have used, and continue to use, this metric in other preparations). However, for the reasons explained above, we believe that the time in port is a better metric for reward anticipation in this specific study.

      Moreover, we chose to focus our analysis on the last 5s of the cue because that’s when anticipatory food cup behavior is more reliably observed (in our preparation >2/3 of the total time in port in occurs during the last 5s of the cue) and less contaminated by orienting behaviors (Holland, 1977, 1980, 2000). For these reasons, analysis of the last portion of the cue is relatively common in Pavlovian anticipatory approach preparations (El-Amamy and Holland, 2007; Olshavsky et al., 2013; Esber et al., 2015; Holland, 2016a, 2016b; Schiffino and Holland, 2016; Gardner et al., 2017; Sharpe et al., 2021; Maes et al., 2020; Sharpe et al., 2020; Siemian et al., 2021; Kang et al., 2021). Reporting time in port during the same cue epoch facilitates comparisons between these studies.

      We have edited the text in the Method section to provide a brief justification for focusing our analyses on this cue epoch.

      Author response image 2.

      Outcome identity influences the topography of the conditioned response. A-C: Conditioned responding expressed as the number of port entries per trial (A) or time in port per trials (C) for rats trained in the simple discrimination task with a chocolate milk reward (n= 19) or a sucrose pellet (n = 16). Data show the average of the last three 3 sessions. Compared to chocolate milk, pellets tend to produce more port entries. Conversely, chocolate milk tend to produce more time in port. However the magnitude of this bias is smaller for the Time in port. C-D: discrimination ratio calculate from the number of port entries (C) or the time in port (D); the latter is not affected by the outcome identity. *P<0.05; **P<0.01; ***P<0.001 T tests.

      The inconsistent use of terms is distracting throughout the paper. Is it discriminated or context-gated? Please provide a definition of your terms and then use them consistently. Is it a discriminative stimulus, a context, or an occasion setter? These all imply slightly different things and it would help the reader if you just used one term throughout the paper.

      Thanks for pointing that out. We have added a definition for “context-gated” and edited the text to keep the terminology consistent when appropriate. The words “discrimination”/”discriminated” still appear in the manuscript but without implying a mechanism (all tasks are variations of Pavlovian discrimination; the rats discriminating between rewarded and non-rewarded trials).

      As mentioned by this reviewer, the terms “context” and “occasion setter” are not synonymous. Therefore these terms still appear in the manuscript to refer to different concepts (e.g. in our task the visual stimulus is a context for all rats; this context acts as an occasion setter only for some rats).

      Minor:

      Intro, 2nd PP: "autism". This is abbreviated in the abstract but spelled out here. I suggest not abbreviating in the abstract and introducing abbreviations here, as you do with PTSD.

      Fixed as suggested

      Have deficits in contextual modulation been distinguished from potential deficits in binary associative learning in autism, PTSD, and substance use disorders? This is implied, but there are no citations provided.

      We provide a list of references showing deficits in contextual modulation in these disorders.

      This does not mean that these disorders are reducible to deficits in contextual modulation and it does not exclude other forms of deficits in those disorders --including alterations in certain aspects of binary associative learning.

      "In positive occasion-setting, animals learn that a target cue (X) results in a reward outcome (+) only when that cue is accompanied by a contextual feature (A); the same cue presented in absence of this contextual feature remains without consequence (A:X+ / X-)." - there are words missing in this sentence.

      We apologize but we fail identify the missing word(s). Perhaps the reviewer could be more specific and we will be happy to edit the sentence as needed.

      What is a contextual feature, is this redundant or can you provide a specific definition?

      We use the terminology “feature” and “target” as these are the standard terms in the description of occasion setting preparations (one stimulus, “the feature”, sets the occasion for responding –or not responding- to the “target” cue). By contextual feature, we meant that in this specific example the context was the feature. We have clarified this in the text. We believe that these terms are not redundant. Indeed, the context is not always a feature, and a feature is not necessarily a context (phasic cues can serve as “features”).

      Can you provide some background on studies of sex differences in simple associative learning? You imply these have been much more thoroughly studied than conditional discriminations.

      We added a few references as suggested.

      What is the rationale for studying stress?

      Stressful life events exacerbate several mental illnesses, potentially by impacting cognitive functions.

      Although the (sex-dependent) effects of stress on some cognitive function are well established (e.g. working memory, selective attention, spatial navigation), the effect of stress on contextual modulation (a core dysfunction in certain mental illnesses) --and the possible sex-differences in this effect-- had not been formally tested. We added a few sentences in the results section (at the beginning of the stress section) to remind the reminder of why we tested the effect of stress in this task.

      Method/Results:

      Cues are not counterbalanced; the feature is visual and targets are auditory - this should be noted as a limitation in the discussion section.

      We now acknowledge this limitation in the discussion. Moreover we believe that the new terminology for the context —Light vs Dark— (instead of A vs. noA in the original version) makes it abundantly clear that the “context” is this study was always visual.

      Summation is invoked to describe the discrimination with different outcomes, how is summation happening? This is not described. Perhaps incorporate the literature on conditional discriminations with differential outcomes (the "differential outcomes effect").

      We have edited the Result + Discussion section to clarify how summation might contribute to discrimination with different outcomes. We have also added references for the DOE in this task.

      The stress effect is confounded with test order; comparing stress vs. baseline.

      Sorry we don’t understand this point. The “baseline” refers to the animal’s performance on the last training session before the acute stress manipulation (we have edited the text to make this clear). Animals are first trained in the task and then we examine how stress alters their performance in this learned task. We don’t see how this could induce a test order confound.

      Throughout the results section, it would be helpful to have the number of animals reported for each analysis.

      The number of animals for each part of the experiment is now reported in the text, as well as in the figures.

      Discussion:

      "For Ctx-dep. O1, context is an occasion-setter, i.e. a stimulus that hierarchically modulates the associative strength between a target cue and its outcome." This is inaccurate. Occasion setters do not change or modulate the associative strength of a target cue. They modulate whether excitation or inhibition is expressed.

      We reworded the sentence as suggested: “For Ctx-dep. O1, context is an occasion-setter, i.e. a stimulus that modulates the response to a target cue”.

      "Together, these results indicate that the sex differences observed here are not attributable to simple associative, motivational, working-memory, or attentional processes, but are specific to the neurocomputational operations required for the hierarchical, contextual control of behavior." It should be noted here that the difference is one of degree, a quantitative difference, but not a difference in the qualitative features of the process.

      "Regardless of the precise mechanism, our results indicate that, compared to male rats, females ultimately achieved more stable contextual control over cued reward-seeking; their behavior remained context-regulated under stress or after recent rewards." Again this is a matter of degree.

      We absolutely agree. All the sex-difference reported here are a matter of degree. In the framework of McCarthy et al. (2012) the reported effects are type 2 or type 3 sex differences, not type 1 sexual dimorphism. We made a few edits in the Discussion to clarify this point.

      Procedure:

      Please clarify the percentage of trials that were reinforced in the No Discrimination group.

      From session 1-32 (acquisition period), 50% of the trials were reinforced. Following this acquisition period, only 25% of the trials were reinforced to match all the other groups. We have edited the method section to clarify this point.

      Please provide the dimensions of the restraint tubes and the model number if available.

      This information is now included.

      References

      Bradfield LA, Balleine BW (2013) Hierarchical and binary associations compete for behavioral control during instrumental biconditional discrimination. J Exp Psychol Anim Behav Process 39:2–13.

      Delamater AR, Garr E, Lawrence S, Whitlow JW (2017) Elemental, configural, and occasion setting mechanisms in biconditional and patterning discriminations. Behav Processes 137:40–52.

      El-Amamy H, Holland PC (2007) Dissociable effects of disconnecting amygdala central nucleus from the ventral tegmental area or substantia nigra on learned orienting and incentive motivation. Eur J Neurosci 25:1557–1567.

      Esber GR, Torres-Tristani K, Holland PC (2015) Amygdalo-striatal interaction in the enhancement of stimulus salience in associative learning. Behav Neurosci 129:87–95.

      Gardner MPH, Conroy JS, Shaham MH, Styer CV, Schoenbaum G (2017) Lateral Orbitofrontal Inactivation Dissociates Devaluation-Sensitive Behavior and Economic Choice. Neuron 96:1192–1203.e4.

      Holland PC (1977) Conditioned stimulus as a determinant of the form of the Pavlovian conditioned response. J Exp Psychol Anim Behav Process 3:77–104.

      Holland PC (1980) CS-US interval as a determinant of the form of Pavlovian appetitive conditioned responses. J Exp Psychol Anim Behav Process 6:155–174.

      Holland PC (2000) Trial and intertrial durations in appetitive conditioning in rats. Anim Learn Behav 28:121–135.

      Holland PC (2016a) Enhancing second-order conditioning with lesions of the basolateral amygdala. Behav Neurosci 130:176–181.

      Holland PC (2016b) Effects of amygdala lesions on overexpectation phenomena in food cup approach and autoshaping procedures. Behav Neurosci 130:357–375.

      Kang M, Reverte I, Volz S, Kaufman K, Fevola S, Matarazzo A, Alhazmi FH, Marquez I, Iordanova MD, Esber GR (2021) Agency rescues competition for credit assignment among predictive cues from adverse learning conditions. Sci Rep 11:16187.

      Keiflin R, Pribut HJ, Shah NB, Janak PH (2019) Ventral tegmental dopamine neurons participate in reward identity predictions. Curr Biol 29:93–103.e3.

      Kuchibhotla KV, Hindmarsh Sten T, Papadoyannis ES, Elnozahy S, Fogelson KA, Kumar R, Boubenec Y, Holland PC, Ostojic S, Froemke RC (2019) Dissociating task acquisition from expression during learning reveals latent knowledge. Nat Commun 10:2151.

      Maes EJP, Sharpe MJ, Usypchuk AA, Lozzi M, Chang CY, Gardner MPH, Schoenbaum G, Iordanova MD (2020) Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors. Nat Neurosci 23:176–178.

      McCarthy MM, Arnold AP, Ball GF, Blaustein JD, De Vries GJ (2012) Sex differences in the brain: the not so inconvenient truth. J Neurosci 32:2241–2247.

      Olshavsky ME, Song BJ, Powell DJ, Jones CE, Monfils M-H, Lee HJ (2013) Updating appetitive memory during reconsolidation window: critical role of cue-directed behavior and amygdala central nucleus. Front Behav Neurosci 7:186.

      Rescorla RA (2006) Deepened extinction from compound stimulus presentation. J Exp Psychol Anim Behav Process 32:135–144.

      Schiffino FL, Holland PC (2016) Secondary visual cortex is critical to the expression of surprise-induced enhancements in cue associability in rats. Eur J Neurosci 44:1870–1877.

      Sharpe MJ, Batchelor HM, Mueller LE, Gardner MPH, Schoenbaum G (2021) Past experience shapes the neural circuits recruited for future learning. Nat Neurosci 24:391–400.

      Sharpe MJ, Batchelor HM, Mueller LE, Yun Chang C, Maes EJP, Niv Y, Schoenbaum G (2020) Dopamine transients do not act as model-free prediction errors during associative learning. Nat Commun 11:106.

      Siemian JN, Arenivar MA, Sarsfield S, Borja CB, Russell CN, Aponte Y (2021) Lateral hypothalamic LEPR neurons drive appetitive but not consummatory behaviors. Cell Rep 36:109615.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      This paper described the dynamics of the nuclear substructure called PML Nucleolar Association

      (PNA) in response to DNA damage on ribosomal DNA (rDNA) repeats. The authors showed that the PNA with rDNA repeats is induced by the inhibition of topoisomerases and RNA polymerase I and that the PNA formation is modulated by RAD51, thus homologous recombination. Artificially induced DNA double-strand breaks (DSBs) in rDNA repeats stimulate the formation of PNA with DSB markers. This DSB-triggered PNA formation is regulated by DSB repair pathways. 

      Strengths: 

      This paper illustrates a unique DNA damage-induced sub-nuclear structure containing the PML body, which is specifically associated with the nucleolus. Moreover, the dynamics of this PML Nucleolar Association (PNA) require topoisomerases and RNA polymerase I and are modulated by RAD51mediated homologous recombination and non-homologous end-joining. This study provides a unique regulation of DSB repair at rDNA repeats associated with the unique-membrane-less subnuclear structure. 

      Weaknesses: 

      Although the PNA formation on rDNA repeat is nicely shown by cytological analysis, the biological significance of PNA in DSB repair is not fully addressed.

      We appreciate the succinct summary, and thank you for pointing out this insightful comment. Our data show that the dynamic interaction of PML with nucleolar caps can recognize and sequester damaged rDNA from the reactivated nucleolus. We propose that through this process, the actively transcribed intact rDNA is protected from possible detrimental interaction with the defective, PNAs-sequestered rDNA, most likely to avoid the harmful intra- and inter-chromosomal recombination events that would otherwise likely occur during recombinational repair of the damaged rDNA, as the rDNA repeats present on five chromosomes are highly repetitive. Thus, this novel sorting mechanism might help sustain the integrity of repetitive rDNA loci.

      Our data also indicate that the emergence of PNAs coincided with cell cycle arrest and preceded the establishment of cellular senescence. The senescent response to rDNA damage can primarily protect the genome from the instability of rDNA loci in a manner broadly analogous to that described for protecting the telomeric loci. This notion is supported by the lack of PNA formation in most cancer cells. In the broader context of the biological significance of cellular senescence at the organismal level, such robust response to hazardous rDNA damage in the individual affected cells may limit/prevent the sporadic occurrence of early cancerous lesions, at the expense of potential tissue adverse effects accumulating over time and thereby eventually contributing to organismal aging.

      Reviewer #2 (Public Review): 

      In this manuscript, the authors aim to study the PML-nucleoli association (PNAs) by different genotoxic stress and to determine the underlying molecular mechanisms. 

      First, from a diverse set of genotoxic stress conditions (topoisomerases, RNA Pol I, rRNA processing, and DNA replication stress), the authors have found that the inhibition of topoisomerases and RNA Polymerase I has the highest PNA formation associated with p53 stabilization, gamma-H2AX, and PAF49 segregation. It was further demonstrated that Rad51-mediated HR pathway but not NHEJ pathway is associated with the PNA formation. Immuno-FISH assays show that doxorubicin induces DSBs (53BP1 foci) in rDNA and PNA interactions with rDNA/DJ regions. Furthermore, endonuclease IPpol induced DSB at a defined location in rDNA and led to PNAs. 

      Most claims by the authors are supported by the data provided. However, below weaknesses/concerns may need to be addressed to improve the quality of the study. 

      (1) Top2B toxin doxorubicin had the highest degree of elevating PNAs; however, Top2B-knockdown had almost no noticeable effects on PNAs. How to reconcile the different phenotypes targeting Top2B? 

      We thank the reviewer for this comment and believe we can reconcile the results from doxorubicin treatments and the downregulation of TOP2A and B. 

      The different phenotypes can reflect the fact that doxorubicin targets both human TOP2 isoforms: TOP2A and TOP2B. Hence this treatment can limit any potential redundant roles of the individual topoisomerase subtypes, which, on the other hand, can be manifested under conditions when only one specific member is depleted genetically. On the other hand, it is also crucial to note that these isoforms are not fully functionally redundant. Each isoform reveals a characteristic expression pattern and distinct yet overlapping function (e.g. Nitiss J 2009, doi.org/10.1038/nrc2608, or Uusküla-Reimand 10.1126/sciadv.add4920). Thus, doxorubicin treatment or TOP2A KD can, contrary to TOP2B KD, trigger the formation of PNAs.   

      Additionally, besides topoisomerase inhibition and poisoning, doxorubicin intercalates DNA and elevates oxidative stress. Therefore, the observed effect of doxorubicin may also reflect, to some extent, its broader damaging impact on (r)DNA. On the other hand, the downregulation of individual topoisomerase isoforms shows how the restriction of their respective specific function/s may evoke (r)DNA damage.

      (2) To test the role of Rad51 and DNA-PKcs in the PNA formation, Rad51 inhibitor B02 and DNA-PKcs inhibitor NU-7441 were chosen to use in the study. To further exclude the possible off-target of B02 and NU-7441, siRNA-mediated knockdown of Rad51 and DNA-PKcs would be an appropriate complementary approach to the pharmaceutical inhibitor approach. 

      We followed this stimulating suggestion, and in the revised manuscript, we used pools of siRNAs (esiRNA) to target the mRNA of RAD51 or ligase IV (LIG4) -  to mimic the Rad51 chemical inhibitor B02 and the NHEJ (DNA PK) inhibitor NU-7441, respectively. The relevant new data are presented in Figure 5F-I, 6E, and F, Supplementary Figure 5D, E, F – H, and Supplementary Figure 6C-E. Notably, the results of rDNA damage triggered PNAs formation obtained using the chemical inhibition of the repair pathways and the genetic approach (knockdown), were largely consistent, thereby supporting our original conclusions. There was one interesting partial difference when the B02 RAD51 inhibitor was compared with RAD51 knockdown, which we also comment on below, and suggest a plausible explanation reflecting the fact (known for other DDR proteins such as PARP1, etc.) that the functional inhibition of an expressed protein (here RAD51, by B02) may not necessarily phenotypically recapitulate the absence of such protein (here RAD51 knockdown). Overall, we agree that this was a very important set of control experiments, in addition extended to cell cycle phase analysis.

      First, the LIG4 knockdown impacted the I-PpoI-induced PNAs formation in a way that followed the same trend as the effects caused by the NHEJ pathway inhibitor NU-7441, namely increased frequency of PNAs formation when NHEJ was impaired (Figure 5E a 5I). This was expected based on what we know about the PNA formation, as the NHEJ pathway is active throughout the cell cycle, and when such repair mode is not available in the nucleolus, then more rDNA breaks remain unrepaired and must be transported to the nucleolar caps to be processed by the HR pathway, thereby also leading to more PNAs structures formed under such conditions. In terms of cell cycle phases, the observed increase of I-PpoI-induced PNAs in cells with depleted LIG4 was more pronounced in S/G2 cells, when the PNAspromoting, cap-associated HR pathway is more active. Furthermore, the enhanced occurrence of IPpoI-induced PNAs in cells depleted of LIG4 was counter-acted (partly ‘rescued/prevented’) by the concomitant treatment with the RAD51 inhibitor B02 (Figure 5E and I) compare cells with esiLIG4 alone versus esiLIG4 + B02), overall consistent with the notion that cap-associated HR pathway facilitates PNAs formation.

      Second, in the analogous scenario of comparing the impact of the RAD51 chemical inhibitor (B02) with the siRNA-mediated knockdown of RAD51, the observed trends in terms of the resulting frequencies of I-PpoI-induced PNAs, were also largely consistent, in that both strategies of interfering with RAD51 resulted in fewer PNAs formed than than in cells deficient in NHEJ. On the other hand, we must stress that after RAD51 knockdown, we did not observe a decline of PNAs compared to control cells, which was detected after B02 treatment (Figure 5E and I).  However, when specifically considering the cell cycle position of the individual cells, these new analyses revealed again important similarities between the knockdown and chemical inhibition of RAD51 (Figure 6E, Supplementary Figure 6E).

      Before discussing the partial, cell-cycle-related difference between the impact of RAD51 chemical inhibition vs. knockdown, it is important to consider the PNAs patterns seen in cells with activated IPpoI and proficient in both, NHEJ and HR. Thus, the overall frequency of I-PpoI-induced PNAs formation was higher in G1 than in S/G2 cells. Considering that persistent rDNA DSBs trigger the formation of PNAs, this result may reflect the very limited HDR during G1 phase, in contrast to more efficient repair of I-PpoI-induced rDNA DSBs in S/G2, the cell cycle phase in which the activity of both NHEJ and HDR operate in parallel, the latter pathway offering a safer, error-free mechanism of DSB repair.

      Notably, when comparing the PNAs formation frequency in cells treated with either chemical inhibition of RAD51 (with B02) or upon knockdown of RAD51, we strikingly observed that the decrease of I-PpoIinduced PNAs formation upon RAD51 knockdown was apparent only for cells in G1 (Figure 6E, and Supplementary Figure 6E). We believe that the distinct impact of RAD51 knockdown compared with that of RAD51 inhibitor (mainly seen when S/G2 cells were analyzed separately) might reflect one or a combination of several factors, including e.g. the following: 

      i) The knock-down-induced absence of RAD51 protein may allow access to the persistent DSB lesions by other alternative repair proteins (such as the RAD52-mediated repair reported in diverse pathophysiological circumstances including in cells undergoing senescence, a scenario very relevant for our present study). Such altered stoichiometry of proteins interacting with the persistent rDNA DSBs may contribute to the pattern of PNAs formation that is then distinct from the pattern seen in the presence of  Rad51; 

      ii) Another difference that we observe is the somewhat enhanced frequency of ‘spontaneous’ (i.e., even without activating the I-PpoI) PNAs formation when RAD51 is depleted, a phenomenon not seen when control non-targeting siRNA is transfected or when RAD51 is acutely inhibited by B02 (Figure 5H). Such spontaneous baseline PNA formation likely reflects the enhanced persistence of unrepaired endogenously occurring DNA lesions that are already suboptimally processed during the period following the esiRNA transfection, i.e., under stepwise depletion of the RAD51 protein which is normally required to deal with such omnipresent endogenous lesions occurring during e.g. DNA replication or some oxidative/metabolic processes; 

      iii) The knockdown approach, while clearly robustly depleting RAD51 protein levels (see Supplementary Figure 5D) may nevertheless leave a small residual fraction of the RAD51 protein present in the cells, thereby possibly inhibiting the HDR pathway to a slightly lesser degree than the B02 inhibitor;

      iv) Additionally, it should be noted that the baseline levels of I-PpoI-induced PNAs formation are somewhat higher in the transfection experiments (i.e. when using any siRNA, even the nontargeting control siRNA), compared with the less ‘invasive’ experiments of simply adding a drug/solvent to the cell culture medium. This phenomenon adds to the commonly seen (over decades, by us and many others..) above-baseline transient stress in cells exposed to transfections, often causing even moderate transient DNA damage response. Specifically, in control experiments, the level of I-PpoI-induced PNAs was around 15% in cells transfected with non-targeting siRNA, while the comparable experiment of only I-PpoI induction under non-transfection conditions was around 10%. In other words, the somewhat enhanced baseline counts of I-PpoI-induced PNAs seen in the knock-down experiments compared with chemical inhibitor experiments reflect partly the shift of the total readout counts due to the different baseline counts. This, however, does not alter the observed overall trends that are consistent in both types of experiments.

      While the potential interpretation(s) of the above results are presented in the Discussion section of the revised manuscript, the full mechanistic elucidation of the impact of various experimental manipulations on the PNA formation during the cell cycle would require a dedicated follow-up study.

      (3) Several previous studies have shown the activation of the nucleolar ATM-mediated DNA damage response pathway by I-Ppol-induced DSBs in rDNA. What is the role of nucleolar ATM in the regulation of PNAs?

      We agree this is an important issue the solution of which (explained below) strengthens the mechanistic insights provided in our revised manuscript, and we are grateful to the reviewer for raising this question. To address this important point and even extend the scope from ATM also to ATR, we employed two small-molecule inhibitors of ATM (KU-60019 and KU55933) and also one inhibitor of ATR (VE-822), at concentrations commonly used in analogous studies in the DNA damage response field,  to examine their impact on rDNA damage/PNA formation induced by I-PpoI. The new data are shown in Figures 5A and B. We found that the inhibition of either of the two kinases alone, robustly reduced the number of nuclei with PNAs, indicating that the activity of each of these two DNA damage signaling kinases is required for the formation of I-PpoI-induced PNAs in response to rDNA damage. Future experiments should elucidate precisely which of the very wide range of ATM/ATR substrates and/or specific protein domains and amino acid residues are instrumental in this rDNA damage signaling pathway to induce the formation of PNAs.

      Reviewer #3 (Public Review): 

      Summary: 

      Hornofova et al. examined interactions between the nucleolus and promyelocytic leukemia nuclear bodies (PML-NBs) termed PML-nucleolar associations (PNAs). PNAs are found in a minor subset of cells, exist within distinct morphological subcategories, and are induced by cellular stressors including genotoxic damage. A systematic pharmacological investigation identified that compounds that inhibit RNA Polymerase 1 (RNAPI) and/or topoisomerase 1 or 2A caused the greatest proportion of cells with PNA. A specific RAD51 inhibitor (R02) impacted the number of cells exhibiting PNAs and PNA morphology. Genetic double-strand break (DSB) induction within the rDNA locus also induced PNA structures that were more prevalent when non-homologous end joining (NHEJ) was inhibited. 

      Strengths: 

      PNA are morphologically distinct and readily visualized. The imaging data are high quality, and rDNA is amenable to studying nuclear dynamics. Specific induction of rDNA damage is a strong addition to the non-specific pharmacological damage characterized early in the manuscript. These data nicely demonstrate that rDNA double-strand breaks undermine PNA formation. Figure 1 is a comprehensive examination and presents a compelling argument that RNAPI and/or TOP1, TOP2A inhibition promote PNA structures. 

      Weaknesses: 

      (1) The data are limited to fixed fluorescent microscopy of structures present in a minority of cells. Data are occasionally qualitative and/or based upon interpretation of dynamic events extrapolated from fixed imaging. This study would benefit from live imaging that captures PNA dynamics. 

      We fully agree with the reviewer that live-cell imaging is critical to adequately capture PNA formation and evolution dynamics. While the data presented in this manuscript are based on quantifications of fixed cell images, all these analyses are based on a detailed live-cell imaging examination of the dynamic behavior of PNAs that we reported in our orginal study on PNAs formation as a biological phenomenon (Imrichova et al. (doi: 10.18632/aging.102248. Epub 2019 Sep 7). 

      In the revised version of our present manuscript, we better highlight the live-cell imaging study, in the Introduction section and further point out that the previous dynamic study was based on imaging of human cells ectopically expressing PML-EGFP and B23-RFP. Last but not least, to help the readers of this manuscript to understand the dynamics of PNA evolution, we have now also added an improved schematic figure that better illustrates the temporal dynamics of PNA stage transitions (Figure 1A).

      (2) Cell cycle and cell division are not considered. Double-strand break repair is cell cycle dependent, and most experiments occur over days of treatment and recovery. It is unclear if the cultures are proliferating, or which cell cycle phase the cells are in at the time of analysis. It is also unclear if PNAs are repeatedly dissociating and reforming each cell division. 

      We agree that this is an important point. We previously published (Imrichova et al., doi: 10.18632/aging.102248) that exposure of RPE-1hTERT cells to doxorubicin caused cell cycle arrest and cellular senescence. In the revised manuscript, we added the analysis of how the I-PpoI-induced rDNA DSB affects the cell’s fate (Supplementary Figure 4J-N). Importantly, we found that most of the cells after I-PpoI-induced rDNA DSB also developed cellular senescence, and only 1–3% of cells eventually recovered from such rDNA stress to the extent that they were able to form colonies in a colony-forming assay. Thus, at the time of analysis, most of the cells were non-proliferating. 

      Additionally, in the revised manuscript, we included an analysis of the dependence of PNA formation on specific cell cycle phases (see Figures 6E–I and Supplementary Figure 6C–E). Generally, we found that PNAs can be present in G1/S/G2. Nevertheless, the probability of occurrence in a particular cell cycle phase is affected by the type of treatment. For example, after I-PpoI-induced rDNA damage, the PNAs are primarily present in G1. In contrast, after the sole knockdown of RAD51 or TOP2A, the PNAs are present in S/G2 with higher probability. 

      (3) The relationship of PNA morphologies (bowl, funnel, balloon, and PML-NDS) also remains unclear. It is possible that PNAs mature/progress through the distinct morphologies, and that morphological presentation is a readout of repair or damage in the rDNA locus. However, this is not formally addressed.  

      The reviewer is indeed correct in his/her interpretation of the PNA morphologies as a readout of the dynamic fate of the rDNA lesion. As mentioned in our response to the previous point no. 2 raised by this reviewer (see above), we described the dynamic structural PNA transitions in our previous article (Imrichova et al., doi: 10.18632/aging.102248).

      PNA progresses through distinct structures. Our results indicate that individual PNA subtypes are tied to specific processes. The PNA bowl-type is linked to the recognition of rDNA damage on the nucleolar periphery. The PNA funnel-type clusters several damaged rDNA loci from the nucleolus into PML-NDS, which is the ultimate structure that sequesters unrepaired rDNA away from the reactivated nucleolus.

      The formation of bowls, funnels, and balloons is linked to the inhibition of RNA polymerase I during the formation of nucleolar caps. In contrast, the later stage of PML-NDS is linked to RNA polymerase I reactivation. 

      We should mention that after the I-PpoI treatment, the ‘bowls’ and ‘funnels’ (observed originally in response to topoisomerase inhibitory drugs) are missing, and only PML-NDSs are formed. The apparent absence of the preceding stages of PNAs may reflect the lower extent of rDNA damage induced by I-PpoI treatment, without causing the pan-nucleolar RNA polymerase I inhibition that was observed for other treatments, such as doxorubicin.  

      (4) An I-Ppol targeted sequence within the rDNA locus suggests 3D structural rearrangement following damage. An orthogonal approach measuring rDNA 3D architecture would benefit comprehension.

      This is a very inspiring idea. Given the demanding nature of the required 3D analyses and the fact that this aspect is somewhat outside the scope of the present study, we plan to follow this issue up in our future work, along with our efforts to localize the individual NORs using immune-FISH after introducing the rDNA damage by I-PpoI.

      (5) Following I-Ppol induction, it is possible that cells arrest in a G1 state. This may explain why targeting NHEJ has a greater impact on the number of 53BP1 foci and should be investigated.

      We fully agree with the Reviewer. Indeed, our results showed that after a 24-hour period of I-PpoI induction, most cells (about 90%) are in the G1 phase of the cell cycle, consistent with the activation of the ATM/ATR checkpoint signaling and p53 activation that we observed. Therefore, this cell cycle effect can indeed explain why targeting NHEJ has a greater impact and causes the higher numbers of 53BP1 foci (and also yH2AX foci). 

      (6) Conclusions: PNAs are a phenomenon of biological significance and understanding that significance is of value. More work is required to advance knowledge in this area. The authors may wish to examine the literature on APBs (Alt-associated PML-NBs), which are similar structures where telomeres associate with PML-NBs in a specific subset of cancers. It is possible that APBs and PNAs share similar biology, and prior efforts on APBs may help guide future PNA studies.  

      We are very grateful for this stimulating suggestion. In the Discussion of the revised manuscript, we now address the possible analogy between the APBs under ALT on the one hand, and the PNA formation on rDNA damage studied here, on the other. The following is the quote of the relevant paragraph of the revised Discussion: 

      “There are several similarities between PNAs and APBs. The interaction partner of PML located on both the telomeres and rDNA must be sumoylated, as the PML-SIM domain is essential for the formation of both APBs and PNAs (37,93). The PML IV isoform most efficiently forms APBs and also PNAs (16,37). PML clusters damaged telomeres into APBs, and we observe that several NORs converge in one PNA structure; thus, the PML-dependent clustering of damaged NORs is plausible. On the other hand, there is one critical difference between the otherwise broadly analogous APBs and PNAs. The process of ALT operates in transformed cancer cells that do not express the telomerase, thus enabling telomere maintenance, cell proliferation, and immortalization (94,95). The PNAs, on the other hand, were primarily detected in non-transformed cells, and their formation is linked to cell cycle arrest and establishment of senescence (31,36). It remains to be determined whether the formation of PNAs is positively involved in rDNA repair, resulting in a return of at least some PNA-forming cells to the cell cycle, or if they play a role in blocking the repair of DNA double-stranded breaks on rDNA, broadly analogous to the shelterin complex on telomeres during replicative senescence (96). We propose that the pro-senescent role of PNAs may contribute to the maintenance of rDNA stability, thereby limiting the potential of hazardous genomic instability and, hence, the risk of cellular transformation. Analogous to checkpoint responses and oncogene-induced senescence (97,98) the PNA-associated senescence might provide one aspect of the multifaceted cell-autonomous anti-cancer barrier, in this case guarding the integrity of the most vulnerable repetitive rDNA loci, possibly at the expense of accumulated cellular senescence-associated decline of functional tissues during aging.”

      Our responses to recommendations from the Editors:

      (1) Since this paper does not provide a mechanistic insight into how the different PNA forms after DNA damage and PolI inhibition such as doxorubicin (DOXO) treatment and how HR modulates the PNA formation, it is very important to provide some experimental data for those. For example, as the #3 reviewer suggested, the time-lapse analysis of PML and a rDNA marker after DOXO treatment and recovery would be beneficial. with morphological analysis. 

      We fully agree that live-cell imaging is essential for a better understanding of the evolution and function of PNAs'. The requested time-lapse analysis on the dynamics of the PNA morphological stages after DOXO treatment and recovery is available to the Reviewers and readers in our previously published article that reported the PNA phenomenon and the basic live cell imaging data after doxorubicin treatment using the ectopically expressed PML-GFP and B23-RFP (Imrichova et al.; doi: 10.18632/aging.102248.). In our present revised manuscript, we now refer to this work in the Introduction and further stress that those data were based on live-cell imaging, to better highlight this point along the line recommended by the Reviewers. We have now also added an improved scheme that better explains the temporal dynamics of PNA transitions (Figure 1A).

      (2) In the same line as point #1, it is very important to show what kind of signaling pathway is necessary for PNA formation upon DSB formation with PolI inhibition. For example, as the #2 reviewer advised, the role of ATM or ATR could be tested by adding their inhibitor during the PNA formation. 

      Again, we fully agree that clarification of the signaling pathway required for PNA formation is crucial, and we are grateful for this stimulating recommendation. While the mentioned Reviewer no. 2 (in his/her Public comments) asked only about the role of ATM, the Editors rightly requested that we should use distinct inhibitors to test the respective roles of not only ATM but also ATR. As recommended, we have tested the importance of ATM and ATR kinase activities by inhibiting them during PNA formation. These newly generated data clearly showed that the activity of either kinase is essential for the efficient formation of PNA, thereby providing a significant new mechanistic insight in the revised dataset. In the manuscript, these new results are now shown in Figures. 5A and B. We also addressed this issue in the Public Review (Reviewer #2 point 3).

      (3) Given the association of PML body with telomeres in ALT cells (ALT-associated PML Body, APB) has been established well in the field, the authors need to mention this in the Introduction and also compare how PNA is similar to different from APB clearly in the Discussion.

      We have followed this conceptually important recommendation exactly as suggested: i) We now mention the ALT-associated PML Body (APB)  in the Introduction section (end of the second paragraph) and ii) In much more detail, we now compare the conceptual analogy in terms of similarities and differences between PNA and APB in the revised Discussion.  We also address this issue in the document Response to Public Review (Reviewer #3 point 6). Indeed, we agree that this comparison is very fitting in the context of our dataset and informative for the broad audience.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Major points. 

      (1) Any treatments shown in Figure 1B and 1C did not induce PNA in most of the cells with around 20% for a maximum value. What time point(s) the authors checked should be stated in the main text or the legend clearly. The authors need to mention the kinetics of different PNA classes and/or doseresponse effects at least for doxorubicin and BMH-21. Or a cell-cycle stage effect should be analyzed and/or discussed given that HR is mainly operating in S and G2 phases. 

      Thank you for pointing this out. We have now clarified the dose effects and also both analyzed and discussed the PNA formation vis a vis cell cycle stages, as recommended by this insightful reviewer.

      First, we have now added an experimental scheme to the Figures for better clarity regarding the time points examined, as suggested.

      Second, our results show that drug doses indeed affect the number and subtype of PNAs that form after such treatments. We show PNAs (types and number) after 0.5 – 5 – 50 µM camptothecin, topotecan, and etoposide (Supplementary Figure 1G and H) and after 0.375 – 0.56 – 0.75 µM doxorubicin (Figure 2A-D and Supplementary Figure 2E-G).  

      The very first detailed analysis of PNA evolution was presented in Imrichova et al. (doi: 10.18632/aging.102248.), where we described, using live-cell imaging, the relationship between the individual doxorubicin-induced PNA types, their transitions, and dynamics. We found that the highest number of nuclei with PNAs was present between 24 and 48 h after treatment initiation. Thus, we selected this time point for PNAs detection after treatments presented in Figure 1B.  

      We have now also added the distribution of nuclei based on the presence of specific PNA types into Supplementary Figure 1F.

      We included the analysis of the dependence of PNA formation on specific cell cycle phases (see Figures 6E–I). A very detailed explanation of the observed cell cycle effects is presented in the document Responses to Public Review, re. Reviewer nr. 2, point 2, so please kindly read our response there.

      (2) Although the induction of PNA by DSBs at rDNA repeats is clearly shown in the paper and modulated by DSB repair pathways, the biological significance of this sub-nuclear structure has not been addressed at all. Is the PNA required for efficient DSB repair per se or pathway choice? Moreover, the PNA kinetic is peculiar. Once formed, the PNA did not show any turnover even after the DNA-damaging agents were washed away (Figure 4H). This structure is succeeded into the next generation after cell division. Such dynamics of PNL should be carefully addressed. 

      The reviewer is correct in that the fate of the PNA and the potential biological significance of this phenomenon required a better explanation. The majority (≈97%) of cells after I-PpoI induction undergo cellular senescence, and therefore, we suppose that the PNA structures are not passed into the next cell cycle, as the bulk of the cells do not proliferate/cycle after such treatments. In this regard, it should be noted that PNAs (PML-NDS) are associated with replicative senescence of human mesenchymal stem cells (our old publication: Janderova-Rossmeislova 2007; doi: 10.1016/j.jsb.2007.02.008). To answer the comment of this reviewer, we have actually never observed that the cells with PNA present would be able to enter mitosis. Based on these findings, we suggest that damage to the repetitive rDNA loci, such as in our experiments in the form of DSBs, could commonly result in unsuccessful repair attempts leading to cellular senescence due to rDNA damage signaling, consistent with our new experiments highlighting the key role of the signaling mediated by the major DNA damage response kinases ATM and ATR, including the role of PNAs formation. For more details, please see also our response to Point 2 raised by the editors, on page 1 of this document, as well as our Public review response to Referee nr. 2, his/her points 2 and 3.

      From a broader perspective, relevant to the biological function of PNAs in this unorthodox cellular stress response, we showed that doxorubicin-induced PML-NDSs separate/sequester persistent rDNA DSBs from the regions of active pre-rRNA transcription. Again, the purpose of this process is not entirely clear at present. However, such separation of unrepaired rDNA from the rest of the genome could have a protective function, thereby limiting the risk of aberrant homologous recombination among hundreds of the repetitive, recombination-prone rDNA copies spread across five chromosomes. It should be stressed that PNAs are rarely seen in cancer cells, and their absence might be linked to the rDNA instability commonly seen in transformed cells. 

      As published in our previous study (Imrichova et al.; doi: 10.18632/aging.102248.), we followed the fate of individual PML-NDS (the last stage of PNA) after the recovery from doxorubicin treatment using live-cell imaging. We observed that the destiny of this structure could be diverse. Some of them sustained in the nucleus for many hours, but a portion of them disappeared. Their extinction may be a manifestation of successful rDNA repair. However, what remains unresolved is why these cells do not reenter the cell cycle and instead develop a senescent phenotype, possibly reflecting some paracrine effects of a cocktail of diverse cytokines and chemokines secreted by the neighboring cells, a phenomenon well established in the senescence field as SASP (senescence-associated secretory phenotype). 

      Notably, during the recovery phase from I-PpoI insult, some of the PML-NDS, in fact, increase in size over time (please refer to the graph in Author response image 1). This enlargement suggests ongoing processes within these structures. Additionally, the sequential accumulation of DHX9 (a multifunctional DNA/RNA helicase) in PNAs during recovery from the I-PpoI insult (as shown in Figure 4G and Supplementary Figure 4H in the revised manuscript) supports the hypothesis that PNAs are associated with as-yet poorly understood process(es). 

      Author response image 1.

      . A scatter plot shows the changes in PNA diameters during the recovery phase from a 24-hour-long expression of IPpoI.

      Last but not least, again relevant for the potential biological role of PNAs, we now also discuss the partial analogy of these structures with the PML-association with telomeres in cells that maintain their telomeres by the ALT recombinational process, as suggested by Referee no. 3 in the public review process. As this consideration addresses also the biological significance of the diverse PML associations and particularly our thoughts about the PNA, we copy/paste this paragraph from the Discussion section of our revised manuscript here, for the convenience of the Reviewer:

      “There are several similarities between PNAs and APBs. The interaction partner of PML located on both the telomeres and rDNA must be sumoylated, as the PML-SIM domain is essential for the formation of both APBs and PNAs (37,93). The PML IV isoform most efficiently forms APBs and also PNAs (16,37). PML clusters damaged telomeres into APBs, and we observe that several NORs converge in one PNA structure; thus, the PML-dependent clustering of damaged NORs is plausible. On the other hand, there is one critical difference between the otherwise broadly analogous APBs and PNAs. The process of ALT operates in transformed cancer cells that do not express the telomerase, thus enabling telomere maintenance, cell proliferation, and immortalization (94,95). The PNAs, on the other hand, were primarily detected in non-transformed cells, and their formation is linked to cell cycle arrest and establishment of senescence (31,36). It remains to be determined whether the formation of PNAs is positively involved in rDNA repair, resulting in a return of at least some PNA-forming cells to the cell cycle, or if they play a role in blocking the repair of DNA double-stranded breaks on rDNA, broadly analogous to the shelterin complex on telomeres during replicative senescence (96). We propose that the pro-senescent role of PNAs may contribute to the maintenance of rDNA stability, thereby limiting the potential of hazardous genomic instability and, hence, the risk of cellular transformation. Analogous to checkpoint responses and oncogene-induced senescence (97,98) the PNA-associated senescence might provide one aspect of the multifaceted cell-autonomous anti-cancer barrier, in this case guarding the integrity of the most vulnerable repetitive rDNA loci, possibly at the expense of accumulated cellular senescence-associated decline of functional tissues during aging.”

      (3) The association of PNA with DSB repair is shown by the colocalization with 53BP1 (Figures 3-5) and the kinetics of DSB repair were assessed by 53BP1 kinetics (Figure 5B). The authors need to check the colocalization of other DSB repair factors in homologous recombination (RPA and RAD51) and nonhomologous end joining (KU) and the kinetics of these DSB repair foci. 

      We are grateful for this very relevant suggestion. In response to this recommendation, we have examined additional markers, linked to homologous recombination. In Figures 6A—D and Supplementary Figures 6A and B, we now show also the localization of RAD51 and RPA32 (pS33), along the lines recommended by this Reviewer.

      (4) In Figure 5B, 53BP1 foci in the "nucleolus" should be shown with that in the nucleus. 

      In the revised manuscript, we show histograms with a count of 53BP1 foci per nucleus.

      (5) The authors often used the words, "difficult-to-repair" and "easy-to-repair" DNA lesions. However, without the nature of these DNA lesions, it is early to distinguish the lesions. So, the authors should avoid them in the title, abstract, results, and figure legends. In Discussion, it is free to use them with a logical explanation. 

      Thank you for the recommendation. We have now changed the term “difficult-to-repair” to “persistent rDNA damage”, as this term better describes at face value the scenario encountered in these experiments. In the new version of the manuscript, we have now emphasized that PNAs are formed as a late response to rDNA damage. We added the observation that PNAs colocalized with rDNA lesions accumulated in the nucleolar cap (periphery of nucleolus), which are probably in-compatible with NHEJ-mediated repair that otherwise occurs within the nucleolus. These persistent lesions contained phospho-RPA, a marker of resected DNA. However, RAD51 was not detected in such late lesions, indicating that the canonical RAD51-dependent HDR pathway is also restricted. Finally, we included a section defining such persistent DNA damage in the revised Discussion.

      Minor points: 

      (1) Page 5, second paragraph, line 6: "expression of PML". 

      (2) Page 5, line 6 from the bottom and Figure 1B: Actinomycin D is not a "specific" RNA polymerase I inhibitor. 

      (3) Page 6, first paragraph, last line: "DNA DSB" should be "DSB". 

      (4) Page 6, second paragraph, lines 6-7: What is the evidence of RNA polymerase I is active (need to explain to the readers)? 

      (5)  Figure 1D and main text: Please mention DOXO is the abbreviation of doxorubicin. 

      We are grateful for these points, which have now all been corrected in the revised version of the manuscript.

      (6) Page 6, third paragraph, line 4 and Figure 1D: What is "esi" not "si"TOP1. 

      In the revised manuscript, we explained what ‘esiRNA’ means; in fact, it is the pool of biologically prepared siRNAs targeting the mRNA of the protein being knocked down.

      (7) Figures 2A and 2B: The effect of B02 alone on PNA should be shown as a control.

      As recommended, the effect of B02 alone is now presented in Supplementary Figures 2A and B. 

      (8) Page 7, first paragraph, last three lines: It is hard to catch how the authors suggested the inhibition of RAD51 suppressed  RNAPI activity. If so, please  check the incorporation of 5FU. 

      Thank you for pointing out this confusing formulation. We have now removed from the revised manuscript the part of that original sentence: “which are predominantly associated with RNAPI inhibition”. 

      We observed that PML ‘balloons’ wrapped the nucleolus with the concomitantly observed complete inhibition of RNAPI in the nucleolus (Imrichova et al.; doi: 10.18632/aging.102248.). Nevertheless, we removed the original phrase from the revised version of the manuscript, as we agree with the reviewer that the causative relationship is so far lacking.

      (9) Page 7, second paragraph: It is critical to clarify what time B02 was added after DOXO removal or during DOXO treatment, or both.  

      We agree: In response we have now added the experimental scheme showing all these temporal details.

      (10) Figure 2H: The experiment lacks control with siTDP2 without etoposide treatment. 

      We did not include this control, unfortunately.

      (11) Page 8, third paragraph, line 3 from the bottom; "besides of rDNA probe, we also utilized probes" is better. 

      We changed this sentence in the revised manuscript, as recommended. 

      (12) Figure 3B: In these multi-color images, it is hard to see blue and gray in merged ones. It is better to show images with a single color. 

      We agree that grayscale is better to follow. However, this type of presentation would significantly increase the number of images, a circumstance we wished to avoid in this already rather image-heavy dataset. Instead, when it was possible, we elevated the intensity of fluorescence in colored images. The list of images with this adjustment is present in the public review. 

      We also inserted the example of the image in greyscale here as Author response image 2. 

      Author response image 2.

      The representative images nucleoli show the localization of 53BP1 (red; a marker of DNA DSB), PML (green, a marker of PML-NB or PNAs), rDNA (blue), and DJ (white; a marker of the acrocentric chromosome) after doxorubicin treatment (2 days) or in the recovery phase (1 and 4 days). The merge of all channels is shown together with the presentation of individual images in greyscale. Scale, 5 µm.  

      (13) Figure 4E: Please add values at D0. 

      We did not analyze the 53BP1 foci before adding Shield1 and doxycycline to induce the expression of I-PpoI (D0). However, as a control, we analyzed the 53BP1 foci in the cells treated for 24 h with the corresponding amount of DMSO as a mock treatment scenario (black line; NT).

      Reviewer #2 (Recommendations For The Authors): 

      (1) The data provided in this manuscript did not explicitly compare the easy-to-repair vs difficult-torepair DNA lesions in rDNA, or at least lack quantitative measures with statistical analysis. Therefore, the title may need to be revised accordingly. 

      We agree, and the title has now been revised to better capture the persistent nature of the rDNA damage that evokes the PNA formation. Please see the response to Reviewer #1, Major points 5, presented above in this document.

      Reviewer #3 (Recommendations For The Authors): 

      (1) Live imaging is paramount to understanding the dynamic nature of PNAs.  

      We agree that live-cell imaging is important. We have addressed this issue in detail in Response to Public review comments, of this Reviewer, as well as in the first point of this document in response to the Editors. In short, although the data presented in this manuscript are based on quantifications of fixed cell images, all these analyses benefit from our previous detailed live-cell imaging data that we reported – describing a careful examination of the dynamic behavior of PNAs in the study by Imrichova et al. (doi: 10.18632/aging.102248). To better illustrate the dynamic behavior of PNAs for the convenience of this reviewer, we include some data from our original article on this topic (referred to above): please see Author response image 3.

      Author response image 3.

      This Figure shows data published in Imrichova et al. (doi: 10.18632/aging.102248.). PML IV-EGFP was ectopically expressed in RPE-1hTERT cells. The localization of PML was followed using live cell imaging. (A) the bowl (in this work named cap) originates from the accumulation of diffuse PML. (B) The transition between bowl (named cap), funnel (named fork), and balloon (named circle). (C + D) PML IV-EGFP (green) and B23-RFP (red) were ectopically expressed in RPE-1hTERT cells. The localization of both proteins was followed by live cell imaging. C – The formation of PML-NDS from the funnel is shown; D – The entire PNA cycle is shown. (PML-bowl formed on the border of the nucleolus, then transformed into the PML-funnel, and finally into PML-NDS. 

      (2) The authors should consider cell cycle and cell proliferation in their analyses. 

      We are grateful for this recommendation, which echoes your own comment nr. 2 in the Public reviews document. Shortly, as we explained in the response to Public review, proliferation of PNA-containing cells is severely limited, as the vast majority of such cells enter a long-term arrest and cellular senescence. Furthermore, inspired by this comment, we have newly performed a series of experiments to address the frequencies of PNA formation vis a vis cell cycle phase position of the individual cells with rDNA damage. In the revised manuscript, we now include the data from these analyses: see Figures 6E–I and Supplementary Figures 6C–E. Our response in the Public Review provides a detailed description of these results.

      (3) Merged fluorescent micrographs in red and green are potentially not discernible to individuals with colour-vision deficiencies. Consider re-colouring into schemes that are more accessible. 

      We agree that some readers may have different preferences about fluorescence micrographs. Here, we used the classical combination of green and red, commonly employed in the field.

      (4) Single-colour fluorescent micrographs are easier to visualize in grey-scale. Whenever a single colour is shown, it will help reader comprehension if the images are shown in this manner. 

      As recommended, we have changed Figures 4C, F, and G from a single-color presentation to a greyscale. 

      (5) There are many long paragraphs that are difficult to digest. I suggest where possible breaking this text into smaller portions (e.g. Page 10, pages 13-14, page 16-17). 

      Thank you for pointing this out. We have now broken the text into smaller portions (in several places), as recommended.

      (6) The B02 and NU7441 data would be bolstered by genetic confirmation (depleting RAD51, BRCA2 or PALB2 for HR, DNA-PK or LIG4 for NHEJ).

      As recommended, we downregulated Rad51 and LIG4 by RNA interference. New data are presented in Figures 5F–I, 6E, and F, Supplementary Figures 5D, E, F–H, and Supplementary Figures 6C–E. The Public Review provides a detailed description of these results and the ensuing conclusions.

      (7) Microscopy results are often qualitative (Fig S1I, S2L, S3A) and need to be bolstered with quantitative data. 

      We appreciate this recommendation and have implemented quantifications in several important microscopy results, as follow:

      S1I: The quantification of the number of cells with types of PNAs after esiTOP1 is present in Supplementary Figure 1L

      S2L: The quantification (% of nuclei with PNAs) is in Figure 2H

      S3A: In this immuno-FISH figure, we captured nuclei with and w/o PNAs. Using the SQUASSH analysis, we identified size-based colocalization between rDNA–PML and DJ–PML presented in Supplementary Figure 3C.

      (8) Stats or error bars are missing (Fig 1D, 2H, S1C-E, S1F, S2A S2D-G, S3E, S4E).

      We apologize for those omissions and we have amended this aspect of the study in the revised manuscript as much as possible:

      Figure 1D: For AMD and doxorubicin and CX-5461 and doxorubicin treatments, three and two biological replicates are shown separately in the same graph, respectively. For AMD and the knockdown of TOP1, the mean from three biological replicates is shown. All these results indicate the elevation number of PNAs when RNAPI is inhibited.

      Figure 2H: The error bars are present. As for siTDP2 in all replicates, the number of cells was the same (4%). Therefore, the error bar is not visible.

      Supplementary Figure 1C-E: Unfortunately, only one replicate (for all treatments) was analyzed by western blotting.

      Supplementary Figure 1F (in revised manuscript SF1G): The error bars are present. By this graph, we mainly wanted to present the variation in PNAs types. 

      Supplementary Figure 2A (in revised manuscript SF2C): We include the whiskers 10-90 percentile and T-test.

      Supplementary Figure 2D-G (in revised manuscript SF2F-I): The error bars are present in all graphs. The changes in SF2F and G are not significant.

      Supplementary Figure 3E: This scheme shows the overlaps between rDNA and PML and rDNA and 53BP1. The collum graph based on these data is shown in Figure 3F.

      Supplementary Figure 4E: The plot profiles representing the mean fluorescence of PML and B23 are shown for different time points. 

      (9) PNA characteristics remind this reviewer of the well-described ALT-associated PML nuclear bodies (APBs) found in immortalized cells lacking telomerase (i.e. Alternative lengthening of telomeres). I recommend the authors look to published data on APBs to help guide how to approach their research within a framework of the cell cycle.

      We fully agree with this insightful comment, and have addressed this point in the Discussion section of the revised manuscript, quoted the relevant studies also in the Introduction, and indeed explained the parallels and also differences of PNA versus APB (see also our response to point 3 highlighted also by the Editors, early in this rebuttal document).  We have also addressed this issue in the Public Review (Reviewer #3 point 6). We agree with the reviewer that this comparison will be of wide interest to readers, given the potential insights into the biological roles of APBs and PNAs.

      For convenience, we copy/paste the relevant new paragraph of the Discussion here:

      “There are several similarities between PNAs and APBs. The interaction partner of PML located on both the telomeres and rDNA must be sumoylated, as the PML-SIM domain is essential for the formation of both APBs and PNAs (37,93). The PML IV isoform most efficiently forms APBs and also PNAs (16,37). PML clusters damaged telomeres into APBs, and we observe that several NORs converge in one PNA structure; thus, the PML-dependent clustering of damaged NORs is plausible. On the other hand, there is one critical difference between the otherwise broadly analogous APBs and PNAs. The process of ALT operates in transformed cancer cells that do not express the telomerase, thus enabling telomere maintenance, cell proliferation, and immortalization (94,95). The PNAs, on the other hand, were primarily detected in non-transformed cells, and their formation is linked to cell cycle arrest and establishment of senescence (31,36). It remains to be determined whether the formation of PNAs is positively involved in rDNA repair, resulting in a return of at least some PNA-forming cells to the cell cycle, or if they play a role in blocking the repair of DNA double-stranded breaks on rDNA, broadly analogous to the shelterin complex on telomeres during replicative senescence (96). We propose that the pro-senescent role of PNAs may contribute to the maintenance of rDNA stability, thereby limiting the potential of hazardous genomic instability and, hence, the risk of cellular transformation. Analogous to checkpoint responses and oncogene-induced senescence (97,98) the PNA-associated senescence might provide one aspect of the multifaceted cell-autonomous anti-cancer barrier, in this case guarding the integrity of the most vulnerable repetitive rDNA loci, possibly at the expense of accumulated cellular senescence-associated decline of functional tissues during aging.” 

      (10) Do PNAs mature/progress through the four distinct structures: bowl, to funnel, to balloon, and finally to PML-NDS. If true, this serves as a phenotypic read-out of damage induction (bowl) and repair (PML-NDs). It would suggest persistent unrepairable damage (0.56 or 0.75 uM doxorubicin) prevents repair leading to the formation of all the PNA structures except PML-NDs. While lower dose doxorubicin (0.375 uM) allows repair to occur, facilitating progression to the PML-ND state, which is then inhabited with B02. 

      Again, this is a very insightful comment. Indeed, as the Reviewer suggests and as we explained e.g., in our response to point 1 raised by this reviewer, PNA progresses through four distinct structures/maturation stages. Our results indicate that individual PNA subtypes are tied to specific processes. PNA bowl-type is linked to the recognition of rDNA damage on the nucleolar surface. The PNA of the funnel-type clusters several rDNA loci from the nucleolus into PML-NDS, which is the ultimate structure sequestering unrepaired rDNA away from the reactivated nucleolus.

      There is a negative correlation between doxorubicin dose and occurrence of PML-NDS, and, indeed, blocking HDR with BO2 combined with a lower doxorubicin dose results in a higher occurrence of all PNAs, including PML-NDS, emerged in the recovery phase. These findings indicate that the greater/more severe extent of rDNA damage, which is associated with RNAPI activity inhibition, is linked to PNAs types associated with RNAPI inhibition (originally published Imrichova et al. (doi: 10.18632/aging.102248.). In contrast, a milder degree of rDNA damage induces the formation of PMLNDS.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Participants in this study completed three visits. In the first, participants received experimental thermal stimulations which were calibrated to elicit three specific pain responses (30, 50, 70) on a 0-100 visual analogue scale (VAS). Experimental pressure stimulations were also calibrated at an intensity to the same three pain intensity responses. In the subsequent two visits, participants completed another pre-calibration check (Visit 2 of 3 only). Then, prior to the exercise NALOXONE or a SALINE placebo-control was administered intravenously. Participants then completed 1 of 4 blocks of HIGH (100%) or LOW (55%) intensity cycling which was tailored according to a functional threshold power (FTP) test completed in Visit 1. After each block of cycling lasting 10 minutes, participants entered an MRI scanner and were stimulated with the same thermal and pressure stimulations that corresponded to 30, 50, and 70 pain intensity ratings from the calibration stage. Therefore, this study ultimately sought to investigate whether aerobic exercise does indeed incur a hypoalgesia effect. More specifically, researchers tested the validity of the proposed endogenous pain modulation mechanism. Further investigation into whether the intensity of exercise had an effect on pain and the neurological activation of pain-related brain centres were also explored.

      Results show that in the experimental visits (Visit 2 and 3), when participants exercised at two distinct intensities as intended. Power output, heart rate, and perceived effort ratings were higher during the HIGH versus LOW-intensity cycling. In particular. HIGH intensity exercise was perceived as "hard" / ~15 on the Borg (1974, 1998) scale, whereas LOW intensity exercise was perceived as "very light" / ~9 on the same scale.

      The fMRI data from Figure 1 indicates that the anterior insula, dorsal posterior insula, and middle cingulate cortex show pronounced activation as stimulation intensity and subsequent pain responses increased, thus linking these brain regions with pain intensity and corroborating what many studies have shown before.

      Results also showed that participants rated a higher pain intensity in the NALOXONE condition at all three stimulation intensities compared to the SALINE condition. Therefore, the expected effect of NALOXONE in this study seemed to occur whereby opioid receptors were "blocked" and thus resulted in higher pain ratings compared to a SALINE condition where opioid receptors were "not blocked". When accounting for participant sex, NALOXONE had negligible effects at lower experimental nociceptive stimulations for females compared to males who showed a hyperalgesia effect to NALOXONE at all stimulation intensities (peak effect at 50 VAS). Females did show a hyperalgesia effect at stimulation intensities corresponding to 50 and 70 VAS pain ratings. The fMRI data showed that the periaqueductal gray (PAG) showed increased activation in the NALOXONE versus SALINE condition at higher thermal stimulation intensities. The PAG is well-linked to endogenous pain modulation.

      When assessing the effects of NALOXONE and SALINE after exercise, results showed no significant differences in subsequent pain intensity ratings.

      When assessing the effect of aerobic exercise intensity on subsequent pain intensity ratings, authors suggested that aerobic exercise in the form of a continuous cycling exercise tailored to an individual's FTP is not effective at eliciting an exercise-induced hypoalgesia response irrespective of exercise intensity. This is because results showed that pain responses did not differ significantly between HIGH and LOW intensity exercise with (NALOXONE) and without (SALINE) an opioid antagonist. Therefore, authors have also questioned the mechanisms (endogenous opioids) behind this effect.

      Strengths:

      Altogether, the paper is a great piece of work that has provided some truly useful insight into the neurological and perceptual mechanisms associated with pain and exercise-induced hypoalgesia. The authors have gone to great lengths to delve into their research question(s) and their methodological approach is relatively sound. The study has incorporated effective pseudo-randomisation and conducted a rigorous set of statistical analyses to account for as many confounds as possible. I will particularly credit the authors on their analysis which explores the impact of sex and female participants' stage of menses on the study outcomes. It would be particularly interesting for future work to pursue some of these lines of research which investigate the differences in the endogenous opioid mechanism between sexes and the added interaction of stage of menses or training status.

      There are certainly many other areas that this article contributes to the literature due to the depth of methods the research team has used. For example, the authors provide much insight into: the impact of exercise intensity on the exercise-induced hypoalgesia effect; the impact of sex on the endogenous opioid modulation mechanism; and the impact of exercise intensity on the neurological indices associated with endogenous pain modulation and pain processing. All of which, the researchers should be credited for due to the time and effort they have spent completing this study. Indeed, their in-depth analysis of many of these areas provides ample support for the claims they make in relation to these specific questions. As such, I consider their evidence concerning the fMRI data to be very convincing (and interesting).

      Weaknesses:

      Although the authors have their own view of their results, I do however, have a slightly different take on what the post-exercise pain ratings seem to show and its implications for judging whether an exercise-induced hypoalgesia effect is present or not. From what I have read, I cannot seem to find whether the authors have compared the post-exercise pain ratings against any data that was collected pre-exercise/at rest or as part of the calibration. Instead, I believe the authors have only compared post-exercise pain ratings against one another (i.e., HIGH versus LOW, NALOXONE versus SALINE). In doing so, I think the authors cannot fully assume that there is no exercise-induced hypoalgesia effect as there is no true control comparison (a no-exercise condition).

      In more detail, Figure 6A appears to show an average of all pain ratings combined per participant (is this correct?). As participants were exposed to stimulations expected to elicit a 30, 50, or 70 VAS rating based on pre-calibration values, therefore the average rating would be expected to be around 50. What Figure 6A shows is that in the SALINE condition, average pain ratings are in fact ~10-15 units lower (~35) and then in the NALOXONE condition, average pain ratings are ~5 units lower (~45) for both exercise intensities. From this, I would surmise the following:

      It appears there is an exercise-induced hypoalgesia effect as average pain ratings are ~30% lower than pre-calibrated/resting pain ratings within the SALINE condition at the same temperature of stimulation (it would also be interesting to see if this effect occurred for the pressure pain).

      It appears there is evidence for the endogenous opioid mechanism as the NALOXONE condition demonstrates a minimal hypoalgesia effect after exercise. I.e., NALOXONE indeed blocked the opioid receptors, and such inhibition prevented the endogenous opioid system from taking effect.

      It appears there is no effect of exercise intensity on the exercise-induced hypoalgesia effect.

      That is, participants can cycle at a moderate intensity (55% FTP) and incur the same hypoalgesia benefits as cycling at an intensity that demarcates the boundary between heavy and severe intensity exercise (100%FTP). This is a great finding in my mind as anyone wishing to reduce pain can do so without having to engage in exercise that is too effortful/intense and therefore aversive - great news! This likely has many applications within the field of public health.

      I will very slightly caveat my summaries with the fact that a more ideal comparison here would be a control condition whereby participants did the same experimental visit but without any exercise prior to entering the MRI scanner. I consider the overall strength of the evidence to be solid, with the answer to the primary research question still a little ambiguous.

      Reviewer #2 (Public review):

      Summary:

      This interesting study compared two different intensities of aerobic exercise (low-intensity, high-intensity) and their efficacy in inducing a hypoalgesic reaction (i.e. exercise-induced hypoalgesia; EIH). fMRI was used to identify signal changes in the brain, with the infusion of naloxone used to identify hypoalgesia mechanisms. No differences were found in postexercise pain perception between the high-intensity and low-intensity conditions, with naloxone infusion causing increased pain perception across both conditions which was mirrored by activation in the medial frontal cortex (identified by fMRI). However, the primary conclusion made in this manuscript (i.e. that aerobic exercise has no overall effect on pain in a mixed population sample) cannot be supported by this study design, because the methodology did not include a baseline (i.e. pain perception following no exercise) to compare high/low-intensity exercise against. Therefore, some of the statements/implications of the findings made in this manuscript need to be very carefully assessed.

      Strengths:

      (1) The use of fMRI and naloxone provides a strong approach by which to identify possible mechanisms of EIH.

      (2) The infusion of naloxone to maintain a stable concentration helps to ensure a consistent effect and that the time course of the protocol won't affect the consistency of changes in pain perception.

      (3) The manipulation checks (differences in intensity of exercise, appropriate pain induction) are approached in a systematic way.

      (4) Whilst the exploratory analyses relating to the interactions for fitness level and sex were not reported in the study pre-registation, they do provide some interesting findings which should be explored further.

      Weaknesses:

      (1) Given that there is no baseline/control condition, it cannot be concluded that aerobic exercise has no effect on pain modulation because that comparison has not been made (i.e. pain perception at 'baseline' has not been compared with pain perception after high/lowintensity exercise). Some of the primary findings/conclusions throughout the manuscript state that there is 'No overall effect of aerobic exercise on pain modulation', but this cannot be concluded.

      (2) Across the manuscript, a number of terms are used interchangeably (and applied, it seems, incorrectly) which makes the interpretation of the manuscript difficult (e.g. how the author's use the term 'exercise-induced pain').

      (3) There is a lack of clarity on the interventions used in the methods, for example, it is not exactly clear the time and order in which the exercise tasks were implemented.

      (4) The exercise test (functional threshold power) used to set the intensity of the low/high exercise bouts is not an accurate means of demarcating steady state and non-steady state exercise. As a result, at the intensity selected for the high-intensity exercise in this study, it is likely that the challenge presented for the high-intensity exercise would have been very different between participants (e.g. some would have been in the 'heavy' domain, whereas others would be in the 'severe' domain).

      (5) It is likely that participants did not properly understand how to use the 6-20 Borg scale to rate their perceived effort, and so caution must be taken in how this RPE data is used/interpreted.

      (6) Although interesting, the secondary analyses (relating to the interaction effects of fitness level and sex) were not included in the study pre-registration, and so the study was not designed to undertake this analysis. These findings should be taken with caution.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Participants in this study completed three visits. In the first one, participants received experimental thermal stimulations which were calibrated to elicit three specific pain responses (30, 50, 70) on a visual analogue scale (VAS). Experimental pressure stimulations were also calibrated at an intensity to the same three pain intensity responses. In the subsequent two visits, participants completed another pre-calibration check (Visit 2 of 3 only). Then, prior to the exercise NALOXONE or a SALINE placebo-control was administered intravenously. Participants then completed 1 of 4 blocks of HIGH (100%) or LOW (55%) intensity cycling which was tailored according to a functional threshold power (FTP) test completed in Visit 1. After each block of cycling lasting 10 minutes, participants entered an MRI scanner and were stimulated with the same thermal and pressure stimulations that corresponded to 30, 50, and 70 pain intensity ratings from the calibration stage. Therefore, this study ultimately sought to investigate whether aerobic exercise does indeed incur a hypoalgesia effect. More specifically, researchers tested the validity of the proposed endogenous pain modulation mechanism.

      Further investigation into whether the intensity of exercise had an effect on pain and the neurological activation of pain-related brain centres was also explored.

      Results show that in the experimental visits (Visit 2 and 3) when participants exercised at two distinct intensities as intended. Power output, heart rate, and perceived effort ratings were higher during the HIGH versus LOW-intensity cycling. In particular, HIGH intensity exercise was perceived as "hard" / ~15 on the Borg (1974) scale, whereas LOW intensity exercise was perceived as "very light" / ~9 on the Borg (1974) scale.

      The fMRI data from Figure 1 indicates that the anterior insula, dorsal posterior insula, and middle cingulate cortex show pronounced activation as stimulation intensity and subsequent pain responses increase, thus linking these brain regions with the percept of pain intensity and corroborating what many studies have shown before.

      Results also showed that participants rated a higher pain intensity in the NALOXONE condition at all three stimulation intensities compared to the SALINE condition. Therefore, the expected effect of NALOXONE in this study seemed to occur whereby opioid receptors were "blocked" and thus resulted in higher pain ratings compared to a SALINE condition where opioid receptors were "not blocked". When accounting for participant sex, NALOXONE had negligible effects at lower experimental nociceptive stimulations for females compared to males who showed a hyperalgesia effect to NALOXONE at all stimulation intensities (peak effect at 50 VAS). Females did show a hyperalgesia effect at stimulation intensities corresponding to 50 and 70 VAS pain ratings. The fMRI data showed that the periaqueductal gray (PAG) showed increased activation in the NALOXONE versus SALINE condition at higher thermal stimulation intensities. The PAG is well-linked to endogenous pain modulation.

      When assessing the effects of NALOXONE and SALINE after exercise, results showed no significant differences in subsequent pain intensity ratings.

      When assessing the effect of aerobic exercise intensity on subsequent pain intensity ratings, authors suggested that aerobic exercise in the form of a continuous cycling exercise tailored to an individual's FTP is not effective at eliciting an exercise-induced hypoalgesia response irrespective of exercise intensity. This is because results showed that pain responses did not differ significantly between HIGH and LOW-intensity exercise with (NALOXONE) and without (SALINE) an opioid antagonist. Therefore, authors have also questioned the mechanisms (endogenous opioids) behind this effect.

      Altogether, the paper is a great piece of work that has provided some truly useful insight into the neurological and perceptual mechanisms associated with pain and exercise-induced hypoalgesia. The authors have gone to great lengths to delve into their research question(s) and their methodological approach is relatively sound. Although the authors have their own view of their results, I do however, have a slightly different take on what the post-exercise pain rating seems to show and its implications for judging whether an exercise-induced hypoalgesia effect is present or not. From what I have read, I cannot seem to find whether the authors have compared the post-exercise pain ratings against any data that was collected preexercise/at rest or as part of the calibration. Instead, I believe the authors have only compared post-exercise pain ratings against one another (i.e., HIGH versus LOW, NALOXONE versus SALINE). In doing so, I think the authors cannot fully question whether there is an exerciseinduced hypoalgesia effect as there is no true control comparison (a no-exercise condition). Nevertheless, there are certainly many other areas that this article contributes to the literature due to the depth of methods the research team has used. For example, the authors provide much insight into: the impact of exercise intensity on the exercise-induced hypoalgesia effect; the impact of sex on the endogenous opioid modulation mechanism; and the impact of exercise intensity on the neurological indices associated with endogenous pain modulation and pain processing. All of which, the researchers should be credited for due to the time and effort they have spent completing this study.

      I have provided some specific comments for the authors to consider. They are organised to correspond to each section as it is presented, and I have denoted the line I am referring to each time.

      To conclude, thank you to the authors for their work, and thank you to the editor for the opportunity to contribute to the review of this paper. I hope my comments are seen as useful and I look forward to seeing the authors' responses.

      We sincerely appreciate the reviewer's insightful comments, which highlight the strengths of our study. In response to the concerns raised, we have made several key revisions to the original manuscript to address the reviewers’ comments. As for the lack of a resting control condition, we acknowledge that our study was not designed to test the overall effect of exercise versus no exercise. However, our primary objective was to compare different exercise intensities, hypothesising that low-intensity (LI) exercise would induce less pain modulation as compared to high-intensity (HI) exercise. By exploring this, we aimed to enhance understanding of the dose-response relationship between exercise and pain modulation. To better reflect this focus, we have revised the misleading phrasing regarding the ‘overall’ effect of exercise to clearly emphasize our primary aim: comparing HI and LI exercise.

      This reviewer suggests an interesting interpretation of the data suggesting that exercise induced hypoalgesia might have occurred for both exercise intensities since the pain ratings provided were lower than the anticipated intensities as determined by the calibration. Given that this difference is lower in the naloxone (NLX) condition could provide evidence of opioidergic mechanisms underlying this effect. Unfortunately, the current study is not designed to comprehensively answer this question since there was no resting control condition. In particular, the lower pain ratings under SAL (Figure 6) could be due to exercise triggering the descending pain modulatory system (DPMS), but equally due to the default activation of the DPMS. Only an additional “no exercise” condition could disentangle this. Furthermore, habituation to noxious stimuli can influence pain ratings, resulting in lower pain ratings during the experiment as compared to the calibration. We have now provided a more detailed overview of the pain ratings at different stimulus intensities after HI and LI exercise in both drug treatment conditions for heat and pressure pain ratings. We elaborated on the specific comments raised in more detail in the following sections.

      Specific Comments

      (1) Abstract

      Line 25 - "we were unable to"... personal preference but this wording is a little 'weighted' in my view. I personally do not think researchers search to prove hypotheses correct, rather we search to prove hypotheses wrong, and therefore only through repeated attempts of falsification can we surmise that something holds true.

      We agree with the reviewer that the chosen wording can be perceived as weighted and have rephrased the sentence.

      Line 33 to 35 - the "...but individual factors... might play a role" is a crucial caveat to this sentence for me. Whilst I can understand that the results of the authors' study indicate that prior assumptions about exercise-induced hypoalgesia and its opioidergic mechanisms may be questioned, I think a little more evidence is needed to finally decide whether aerobic exercise has no overall effect on experimental pain responses. (see more in the Results comments below).

      We thank the reviewer for their comment. We agree that no claims can be made regarding the effect of aerobic exercise per se on pain modulation compared to no exercise based on the current data. Furthermore, we agree that more research is needed to further advance our understanding of (non-)opioidergic mechanisms in exercise-induced pain modulation. However, based on the data presented in this study we propose that the involvement of endogenous opioids in exercise-induced hypoalgesia could be influenced by sex and fitness levels since we could show differences in opioidergic involvement between males and females of different fitness levels. Future studies should account for the fitness levels and sex of the sample investigated.

      (2) Introduction

      Line 48 - please predefine anterior cingulate cortex here.

      We thank the reviewer for detecting this and have introduced the abbreviation for the anterior cingulate cortex in the referenced line.

      Line 49 - please predefine periaqueductal gray here instead of line 52.

      We have introduced the abbreviation for periaqueductal grey in the referenced line.

      Line 47 to 54 - when discussing the descending pain modulatory systems, authors seem to be relating specifically to the intensity/magnitude of pain experiences. However, the different brain regions that are mentioned may have varying "roles" according to which dimension of pain is of focus.

      Hofbauer et al. (2001) - https://doi.org/10.1152/jn.2001.86.1.402

      Rainville et al. (1997) - https://doi.org/10.1126/science.277.5328.968

      The two above studies provide some nice earlier findings on the brain regions - some of which are mentioned by the authors in this section - associated with the processing of pain quality in addition to the intensity of pain... simply attach here if they are of interest to the authors.

      The studies by Hofbauer et al. (2001) and Rainville et al. (1997) provide interesting findings on the effect of hypnotic suggestions on pain affect and the perceived intensity of a painful stimulus. However, these studies did not investigate exercise-induced changes in brain regions of the DPMS. The studies referenced in the relevant section of the manuscript are (one of the few) imaging studies that have indeed investigated brain structures of the DPMS in the context of exercise and pain modulation and, thus, were included in this paragraph to focus on the findings of these studies as well as emphasise the scarcity of imaging studies investigating exercise-induced pain modulation. Given these divergent research topics of the proposed studies, we suggest not including them in this paragraph to maintain a clearer line of argument and focus on exercise-induced pain modulation in brain regions of the DPMS.

      L59 to 61 - a minor comment about the phrasing within this sentence and a recommended change is provided below for the flow of the sentence/paragraph.

      "...there are instances where administration of µ-opioid antagonists has decreased exerciseinduced pain modulation (Droste et al. 1988; etc.) whereas in others there has been little effect (Droste et al. 1988; etc.).

      We have altered the sentence based on the reviewers' suggestions to improve the flow and coherence of the sentence.

      L56 to 72 - Whilst the current version of this paragraph scans well enough, I find that the narrative flits between the mechanisms being discussed and the rationale/shortcomings of current research. I think that the original content of this paragraph can be structured into:

      A- The endogenous opioid system is a likely candidate to explain how exercise elicits a hypoalgesia response.

      B- Citation(s) of the imaging studies (Boecker et al., 2008, etc.) and earlier literature which support A (e.g., Janal et al. 1984).

      C- Further support of this theory as µ-opioid antagonists like naloxone seem to counteract the endogenous opioid effect (Haier et al., 1981).

      D- Introduction of the caveats of previous research such as the studies that observed that µ-opioids did not impact the endogenous pain modulation system during exercise (e.g., Droste et al., 1991, etc.) and the range of different interventions and exercise modalities which make it difficult to draw clear conclusions of the pain modulation effect.

      To me, this structure would set out the details you have already put together in a more orderly and systematic way and also will lead nicely into your ensuing paragraph (Line 74 onwards).

      We appreciate the reviewers' constructive comments on structuring this paragraph. We agree that the proposed version eases the readability and comprehension of the paragraph and have, thus, adapted the restructured paragraph according to the reviewer’s suggestion.

      L75 - Why are single-arm pre-post measures and designs an issue? If you can elaborate a little more this would be very insightful for a reader.

      Single-arm pre-post measurement studies involve participants being assigned to a single experimental condition, with pain assessments conducted only once before and once following an intervention. This study design presents some limitations, particularly in the context of examining exercise-induced modulation of pain (Vaegter and Jones, 2020). Such designs are potentially confounded by the effects of habituation to noxious stimuli, as highlighted by Vaegter and Jones (2020). Incorporating randomised controlled trials with multiple measurement blocks not only mitigates these limitations but also provides a clearer understanding of how individual bouts of exercise influence pain perception. We have now added this to the paper.

      L80 - The reference for the functional threshold power assessment is provided as a number. Please could the authors change to reflect which study/studies they are referring to here (I presume it is the Borszcz and/or the McGrath studies?).

      We apologise for this oversight and have now updated the reference to be displayed correctly. The reviewer is correct in assuming that Borszcz et al. (2018) is the referenced study here.

      L88 - Did participants also receive pressure pain stimulations in addition to the thermal stimuli, as the figure suggests?

      Note Since read on to L102-104 and understood why pressure pain was included but not mentioned due to results. However, I would still recommend including pressure pain stimulations in this line, if possible, to be consistent with what Figure 1 shows and later text in the Methods section also shows.

      We thank the reviewer for their suggestion to mention pressure pain at the referenced line to increase the clarity and consistency of the experimental paradigm. Pressure and heat pain were applied in alternating fashion during scanning. Whilst the results of pressure pain are not included in this study we agree with the reviewer that it should be mentioned again as part of the methods and have added this.

      L94 - I really like Figure 1. Great job.

      Could the authors please define the inter-trial interval (ITI) in the legend? And please could the authors clarify what unit the 30, 50, and 70 figures in the "18 trials per block" section refer to.

      We thank the reviewer for their positive feedback. We have now included a definition of inter-trial-interval (ITI) in the figure legend. Furthermore, we adapted Figure 1 so that the units of the stimulus intensities (30, 50, 70) on the Visual Analog Scale (VAS) are included in the figure allowing for a clearer identification.

      (3) Results

      General comment for figures ... is there a specific reason the authors chose for error bars to be represented by an SE value as opposed to an SD value?

      The reason I ask is that participant responses seem to vary (See Figure 2A and 2E-G as an example). Error bars showing SD values would perhaps do justice to the variability in participant response(s), whereas the SE may be a better representation of the variability in responses due to the assessor's methods of collection. Whilst the SE error bars are narrow (great job on that!), the individual responses are clearly varied which I speculate could be because of the interventions that have been implemented (i.e., exercise intensity).

      The use of Standard Error (SE) is more common in the cognitive neuroscience literature.

      However, as this reviewer noted, we have also included individual data points alongside the SE, thereby providing a comprehensive view that allows for a thorough interpretation of the data distribution.

      L102 to 104 - In fact, it is interesting that exercise did not impact the pressure pain ratings whereas the same cannot be said for thermal pain. In line with some of my comments below about the impact of exercise on pain intensity responses, I would be intrigued to see the results of the pressure pain ratings in more detail.

      Another note on this... Whilst the results for the pressure pain may be beyond the scope of this paper and will be reported separately, knowing of this data is tantalising for a reader. I would suggest to: A) either mention the pressure pain and include the analysis of the data; or B) not mention the pressure pain altogether and save it for the subsequent paper. Either way, I look forward to seeing further discussion on this in future work.

      We have now summarised the behavioural results of exercise on pressure pain ratings below in Supplemental Figure S1.

      There was no hypoalgesic effect evident in the behavioural pain ratings comparing HI to LI exercise in the saline (SAL) condition (β = 0.57, CI [-1.73, 2.86], SE = 1.17, t(1354) = 0.48, P = 0.63; Supplemental Figure S1A, blue bars) as well as no interaction of drug treatment and exercise intensity on pressure pain ratings (β = -1.43, CI [-4.87, 2.01], SE = 1.75, t(2756.02) = -0.82, P = 0.42; Supplemental Figure S1). Post-hoc paired t-tests (Bonferroni-corrected) confirmed there to be no significant differences between the drug treatment conditions at LI (P = 0.18) or HI (P = 0.85) and no significant difference between the exercise intensities in the SAL (P = 0.65) and NLX (P = 0.48) conditions, confirming no significant differences in drug treatment between the exercise intensities.

      Furthermore, there was no significant effect of fitness level on differences in pain ratings (LI – HI exercise) in the SAL condition (β = 3.16, CI [-1.64, 7.97], SE = 2.37, t(38) = 1.34, P = 0.19; Supplemental Figure S1B) and no significant correlation between fitness level and difference pain ratings (r = 0.25, P = 0.13). Finally, there was no significant interaction of drug treatment, exercise intensity, and sex on difference pain ratings (β =-7.97, CI [-18.67, 2.73], SE = 5.51, t(190) = -1.45, P = 0.15; Supplemental Figure S1C-D).

      Exercise did not appear to affect pressure pain ratings and we have now added this to the discussion and in the methods section. However, we think that the figure should be part of the supplements.

      L112 to 113 - Fantastic work for including this analysis in your study. Great job.

      We appreciate the reviewers’ positive feedback on conducting these crucial analyses when investigating sex and gender differences in pain.

      L186 to 189 - It is fascinating that there appears to be no effect of NALOXONE on pain ratings within female participants at a VAS rating of 30 for thermal pain as well as a much diminished hyperalgesia effect at a VAS rating of 50 compared to males. Meanwhile, at higher intensity stimulations corresponding to a VAS rating of 70, females in fact demonstrate a more pronounced hyperalgesia effect compared to males. In addition, the hyperalgesia effect of NALOXONE for males seems to "peak" at a VAS rating of 50. The mechanisms behind these findings alone would be incredibly exciting to explore... but maybe in another study.

      We agree with the reviewer that the differences in males and females are fascinating results and concur that this may hint at varying degrees of opioidergic involvement at different stimulus intensities. This finding is intriguing and potentially clinically relevant, warranting further investigation in future research, although it lies beyond the scope of the current paper.

      L189 - To double check... Figures 4A and 4B refer to the entire cohort (male and female responses combined) whereas C-E are separated by sex?

      In addition, as there are no annotations to the top of Figures 4C-E were no significant differences observed between saline and naloxone conditions per each stimulus intensity? i.e., similar tests to what are shown in Table S6 but separated for each sex.

      Without getting too carried away, there may be something here that indicates a difference between sexes concerning the opioid-driven pain modulation response on a neurological level (i.e., brain region activation).

      The reviewer is correct in assuming that Figures 4A and 4B refer to the entire cohort whilst Fig. 4C – 4E are split for males and females. The full output of the analyses for Fig. 4A and 4B are reported in Supplemental Tables S5 – S7. Furthermore, the full output of the LMER analyses for Fig. 4E is reported in Supplemental Table S10. We agree with the reviewer that additional annotations in Fig. 4C – Fig. 4E ease interpretation and have, thus, added them to the respective figures, denoting the significance of the interaction term stimulus intensity and drug treatment for females (Fig. 4C) and males (Fig. 4D), respectively. For completeness, we now report the post-hoc paired samples t-tests for females and males in the Supplemental Tables S8 and S9, respectively.

      L254 to 258 - "we could not establish an overall hypoalgesia effect of exercise...". Do the results of the exercise intensity x drug treatment provide an answer for this exact hypothesis? After checking the methods section, I cannot seem to find whether the statistical analysis has involved a comparison of the pain ratings after the high (alone), low (alone), or high and low (combined) exercise compared to ratings during control or pre-calibration as part of precalibration (i.e., pain ratings in a rested state without any exercise yet completed).

      We concur with the reviewer's assessment that the study design and statistical analyses cannot address the ‘overall’ effect of exercise compared to no exercise. Please refer back to our general response before comment 1, where we have addressed this point.

      As it seems that the analysis assesses the differences between high and low-intensity exercise, to me, the results of the exercise intensity x drug treatment analysis do not assess whether there is an exercise-induced hypoalgesia effect or not. Instead, it seems to assess whether the intensity of exercise is a differentiating factor in the expected exercise-induced hypoalgesia effect to subsequent pain intensity ratings to experimental pain stimulation. For the authors to judge whether aerobic exercise does or does not have a hypoalgesia effect, then the exercise conditions (either combined or standalone) would have to be compared to a control condition or a data set that involved pain ratings from a pre-exercise timepoint.

      We thank the reviewer for their comment. We would like to point out the we concluded there to be no hypoalgesic effect between the LI and HI exercise based on the LMER model comparing the behavioural pain ratings between the exercise conditions in the SAL condition (β = 1.19, CI [-1.85, 4.22], SE = 1.55, t(1354) = 0.77, P = 0.44; Figure 6A, blue bars and Table S9). The statistical model investigating the interaction of exercise intensity and drug treatment served to show that NLX did not modulate pain differently between the LI and HI exercise conditions.

      Given that our experiment involved different exercise levels in a randomized order, a simple pre vs post analysis is not straightforward. Nevertheless, we have set up a model where we take into account the rating time point (pain ratings provided before each exercise block (prepain ratings) and following each exercise block (post-pain ratings)) at each stimulus intensity (VAS 30, 50, 70) and exercise intensity (LI and HI). The model also takes into account the exercise intensity performed in the previous block, the overall block number as well as the varying subject intercepts. The analysis was completed for heat (Author response image 1A) and pressure (Author response image 1B) pain ratings in the SAL condition to establish whether there was a significant effect of exercise intensity on the changes from pre to post-pain ratings. The model for heat pain yielded a significant main effect for stimulus intensity (β = 1.43, CI [1.34, 1.52], SE = 0.05, t(2054.95) = 31.61, P < 0.001) but no significant interaction of exercise intensity, rating time point, and stimulus intensity (P = 0.14). The model for pressure pain in the SAL condition yielded a significant main effect of stimulus intensity (β = 1.00, CI [0.92, 1.08], SE = 0.04, t(2054.99) = 24.68, P < 0.001) and block number (β = 1.14, CI [0.35, 1.94], SE = 0.41, t(2055.98) = 2.80, P = 0.005) but not interaction of exercise intensity, rating time point, and stimulus intensity (P = 0.38).

      Author response image 1.

      Heat (A) and Pressure (B) pain ratings in the saline (SAL) condition for pre (purple) and post (turquoise) exercise pain ratings at LI and HI exercise and all stimulus intensities (VAS 30, 50, 70). The bars depict the mean pain rating pre and post-exercise and the dots depict the subject-specific mean ratings. The error bars depict the SEM.

      Another point of consideration is that Figure 6A appears to show an average of all pain ratings combined per participant (is this correct?). As participants were exposed to stimulations expected to elicit a 30, 50, or 70 VAS rating based on pre-calibration values, therefore the average rating would be expected to be around 50. What Figure 6A shows is that in the SALINE condition, average pain ratings are in fact ~10-15 units lower (~35) and then in the NALOXONE condition, average pain ratings are ~5 units lower (~45) for both exercise intensities. From this, I would surmise the following:

      • It appears there is an exercise-induced hypoalgesia effect as average pain ratings are ~30% lower than pre-calibrated/resting pain ratings within the SALINE condition at the same temperature of stimulation (it would also be interesting to see if this effect occurred for the pressure pain).

      • It appears there is evidence for the endogenous opioid mechanism as the NALOXONE condition demonstrates a minimal hypoalgesia effect after exercise. I.e., NALOXONE indeed blocked the opioid receptors, and such inhibition prevented the endogenous opioid system from taking effect.

      • It appears there is no effect of exercise intensity on the exercise-induced hypoalgesia effect. That is, participants can cycle at a moderate intensity (55% FTP) and incur the same hypoalgesia benefits as cycling at an intensity that demarcates the boundary between heavy and severe intensity exercise (100%FTP). This is a winner in my mind. Anyone wishing to reduce pain can do so without having to engage in exercise that is too effortful and therefore aversive - great news!

      I will very slightly caveat my summaries with the fact that a more ideal comparison here would be a control condition whereby participants did the same experimental visit but without any exercise prior to entering the MRI scanner.

      As a result of this interpretation of your findings, I do not think that aerobic exercise as a means to cause subsequent hypoalgesia to experimental thermal nociception can be fully discounted. On the contrary, I think your results showed in Figure 6A are evidence for it.

      The reviewer is correct in assuming that Figure 6A shows the averaged pain ratings across all stimulus intensities (VAS 30, 50, and 70) for each subject. To provide more details, we have split Figure 6A by stimulus intensity, now depicting the pain ratings for LI and HI exercise and treatment condition (SAL and NLX) at VAS 30, 50, and 70 (Supplemental Fig. S8). The LMER was extended to include the stimulus intensity and yielded a significant main effect of stimulus intensity (β = 1.39, CI [1.31, 1.47], SE = 0.04, t(2753.12) = -34.082, P < 0.001) and a significant interaction of stimulus intensity and drug treatment (β = 0.12, CI [0.01, 0.24], SE = 0.06, t(2751) = 2.13, P = 0.03) but no significant interaction of exercise intensity, drug treatment, and stimulus intensity (β = -0.05, CI [-0.20, 0.11], SE = 0.08, t(2751) = -0.56, P = 0.58).

      The reviewer further suggests that the average pain ratings in the SAL condition are lower than the anticipated stimulus intensity, thus, indicating exercise-induced hypoalgesia. While this interpretation is one possibility, there is an alternative explanation: the lower pain ratings may stem from habituation to heat pain (Greffrath et al., 2007; Jepma et al., 2014; May et al., 2012). To support this perspective, we have visualised data from other studies in our lab that have been conducted with the same thermode head and device (TSA-2), using the same calibration procedure and aiming for the same stimulus intensities (VAS 30, 50, and 70). In both studies (Author response image 2A: Study 1: Behavioural sample; Author response image 2B: Study 2: fMRI sample; Author response image 2C: Original Exercise Study), participants did not engage in an exercise task and the pain ratings at VAS 30 and VAS 50 were lower than the anticipated intensities (VAS 30: 11.1/13.4; VAS 50: 35.0/35.9). Furthermore, in a previous study by (Wittkamp et al., 2024), the authors showed that, despite calibrating the heat stimuli at VAS 60, participants rated the pain stimuli with M = 48.58 (SD = 13.79).

      This discrepancy observed between calibrated intensities and ratings provided could be attributable to habituation effects, especially at low-intensity stimuli. Moreover, we would like to point the reviewer to the highest stimulus intensity at VAS 70 (Author response image 2C), where no habituation in all three data sets (including the current study) has taken place. This consistency suggests that exercise-induced hypoalgesia may not be present in our findings or potentially confounded by habituation effects.

      Author response image 2.

      Heat pain ratings at different intensities (30, 50, and 70 VAS) in different study samples. Bars depict the mean ratings in the saline (SAL) condition. Individual data points depict subject-specific mean pain ratings. Error bars depict the SEM.

      The reviewer further suggests that there is evidence for endogenous opioidergic modulation since the pain ratings in the NLX condition are lower than the anticipated intensities. We fully agree but, again, would argue that the DPMS can exert its effects on painful stimuli in a default manner, i.e. irrespective of any exercise effect.

      We concur with the reviewer’s interpretation that there is no effect of exercise intensity on exercise-induced hypoalgesia since the ratings between both exercise intensities are not significantly different.

      Finally, we agree that our data does not allow for the interpretation of an ‘overall’ effect of exercise-induced hypoalgesia and would like to point out that we did not aim to claim this. Rather, the data suggests there to be no effect of LI vs. HI aerobic exercise on pain modulation. We acknowledge, however, that the phrasing involving ‘overall’ can be misleading and have revised this to focus on the comparison between LI and HI exercise, thereby enhancing precision and clarity.

      Note This is also where it would be really interesting to see the pain pressure data if it were to be included. Mainly to see whether it coheres with what the thermal stimulation stuff shows.

      We have provided the ratings for the pressure pain ratings in the SAL condition below (Author response image 3).

      Author response image 3.

      Pressure pain ratings in the SAL condition at stimulus intensity (VAS 30, 50, and 70). Bars depict the mean ratings in the saline (SAL) condition. Individual data points depict subject-specific mean pain ratings. Error bars depict the SEM.

      L259 - As mentioned in the comment above. Could the authors distinguish what is being shown in Figure 6A? Are the data presented as the pooled mean for all stimulation intensities? If not, what data is displayed per bar/column?

      We thank the reviewer for their comment. The reviewer is correct in assuming that the bars in Figure 6A depict the pooled means across all stimulus intensities (VAS 30, 50, 70) for each drug treatment condition and exercise intensity. To allow for a more detailed comprehension of the data, we have split Figure 6A by stimulus intensity, now depicting the pain ratings for LI and HI exercise and treatment condition (SAL and NLX) at VAS 30, 50, and 70 (Supplemental Figure S8). The LMER was extended to include the stimulus intensity and yielded a significant main effect of stimulus intensity (β = 1.39, CI [1.31, 1.47], SE = 0.04, t(2753.12) = -34.082, P < 0.001) and a significant interaction of stimulus intensity and drug treatment (β = 0.12, CI [0.01, 0.24], SE = 0.06, t(2751) = 2.13, P = 0.03) but no significant interaction of exercise intensity, drug treatment, and stimulus intensity (β = -0.05, CI [-0.20, 0.11], SE = 0.08, t(2751) = -0.56, P = 0.58).

      L278 - Can the authors please provide a reference that explains how W.kg-1 at FTP is a measure of fitness level?

      We thank the reviewer for their comment. The obtained FTP value was corrected for the weight of each participant (Watt/kg), yielding a weight-corrected fitness measure that allows for better comparison between subjects. We denoted this in the figures as W*kg-1 which serves to be the equivalent term.

      L296 - Take the line away from Figure 7A... Does the individual data show a positive relation between pain rating changes and W.kg-1? Besides the three data points (1 on the far right of the figure and the two on the far left), I find it hard to see any real trend.

      We acknowledge the reviewers’ concern regarding the regression line and the visual clarity of the individual data points. However, it is important to note that the significant main effect of fitness level on differences in pain ratings in the SAL condition (β = 6.45, CI [1.25, 11.65], SE = 2.56, t(38) = 2.52, P = 0.02) supports the assertion that higher fitness levels are associated with greater hypoalgesia following HI exercise compared to LI exercise. While the trend may not be visible for all data points, the statistical analysis provides a robust basis for the observed relationship (r = 0.33, P = 0.038).

      We have conducted an additional LMER model where we have excluded the subjects with the highest and lowest FTP values (sub-28 with 3.19 W/kg and sub-06 with 0.76 W/kg, respectively.) The LMER still yields a significant main effect of fitness level (β = 6.82, CI [1.25, 11.65], SE = 3.18, t(34) = 2.14, P = 0.039; Author response image 4) and a positive correlation between the difference ratings and fitness level approaching significance (r = 0.32, P = 0.057).

      Author response image 4.

      Fitness level on difference pain ratings (LI-HI exercise) without subjects with highest and lowest FTP (N = 37). (A) Subject-specific differences in heat pain ratings (dots) between LI and HI exercise conditions (LI – HI exercise pain ratings) and corresponding regression line pooled across all stimulus intensities in the SAL condition. Fitness level (FTP) showed a significant positive relation to heat pain ratings with a significant main effect of FTP (P = 0.039) on difference ratings.

      (4) Discussion

      L356 to 358 - Exactly. What you write here, I agree with. Your testing allowed you to judge whether there is an effect of aerobic exercise intensity on pain modulation. However, I think this has been a little conflated with the idea that there is "no overall effect of aerobic exercise on pain modulation" in other areas of the article (L358-361, Results, and Abstract). As per my previous comment, I am not sure this (no overall effect) is true.

      We agree with the reviewer and have adapted the manuscript so that the misleading phrase including ‘overall’ is removed.

      L358 to 365 - One addition to this debate about whether this is a hypoalgesia effect of aerobic exercise. In 358 - 361 (particularly the end of 361) there is a strong conclusion that there is no direct involvement of the endogenous opioid system. Then glance onto L364 to 365 and there is then an almost conflicting summary that a hypoalgesia effect driven by opioidergic regions of the brain (and ergo endogenous opioids) is in effect. If there were no direct endogenous opioid involvement, then differences between NALOXONE (blockade of the opioid mechanism) and SALINE conditions would not exist.

      We thank the reviewer for their comment. The structure of this paragraph aimed to guide the reader towards a more nuanced understanding of the possible mechanisms and caveats in exercise-induced pain modulation. Whilst our data suggest an effect of NLX on pain ratings where we showed significantly higher pain ratings in the NLX condition compared to the SAL condition we could not identify an interaction between treatment and exercise intensity. This suggests that there is no significant difference in opioidergic involvement between HI and LI exercise. Our exploratory analyses, however, show an effect of endogenous opioids involved as an underlying mechanism dependant on sex and fitness level.

      My perspective is that an exercise-induced hypoalgesia effect has occurred (based on the data in Figure 6A) but that this effect is certainly caveated by the sex and fitness levels that this study has observed (and kudos for it).

      As mentioned above, based on the current data we cannot untangle whether the reduced pain ratings in the SAL condition are due to habituation to noxious stimuli or an actual hypoalgesic effect of exercise (or potentially a mix of both). However, we fully agree with the reviewer that exercise-induced pain modulation is influenced by fitness level and sex.

      L390 - "endogenous pain modulation through μ-opioid receptors increases with increasing pain intensity". Aside from the general discussion about whether aerobic exercise causes a post-exercise hypoalgesia effect. This finding is also interesting for the pain incurred during exercise in the form of naturally occurring muscle pain and may also be clinically relevant as it could be that the endogenous pain modulation "system" could be primed through repeated exercise as your results show that the fitness level (i.e., a close correlate of how much someone has engaged in exercise and therefore 'activated' the endogenous pain modulation system) is associated with a more pronounced post-exercise hypoalgesia effect.

      This is an interesting aspect. With regards to the pain induced by exercise itself (i.e. muscle pain) we did not gather any data on this type of pain and interpreting this would be mere speculation. However, it is an interesting hypothesis to investigate in future studies whether the pain induced by exercise is potentially influenced by the endogenous opioid system. We agree with the reviewers’ interpretation that repeated exercise might prime the endogenous opioid system, especially in fitter individuals who engage more frequently in exercise and, thus, ‘train’ the endogenous opioid system. We have included this line of interpretation in the original manuscript, where we suggest that the mFC, a brain region with high µ-opioid receptor density, might be ‘trained’ by repeated exercise and, therefore, shows increase activation in fitter individuals after short bouts of exercise.

      L404 to 405 - "a resting baseline does not control for unspecific factors such as attentional load or distraction (Brooks et al., 2017; Sprenger et al., 2012) through exercise." I am not sure I agree. A control condition allows one to truly deduce whether exercise causes a hypoalgesia effect or not. The attentional load may be a factor, but I would argue this is distinct from endogenous pain modulation - unless there is a study that shows cognitive load alone can elicit endogenous opioids like exercise. About distraction, this would be the case if the pain measures were taken during the exercise. However, as the pain measures taken in the MRI were post-exercise and there was no added distraction related to the exercise present anymore, then I do not think any added effect of distraction due to the exercise and its effect on postexercise pain measure is relevant any longer.

      We agree with the reviewer that a resting baseline condition in the context of exercise induced pain modulation would allow for the investigation of a potential hypoalgesic effect of exercise compared to no exercise. It is important to note that both studies (Brooks et al., 2017; Sprenger et al., 2012) have indeed shown that the effect of cognitive pain modulation is mediated by endogenous opioids.

      L406 - I do not think a low-intensity exercise is a true "control" condition. It certainly does allow the study to compare the dose-response relationship but as the individual is exercising (even at a moderate physiological intensity) then comparison of HIGH vs LOW does not tell us whether exercise does or does not cause hypoalgesia. In contrast, the results from Figure 6A seem to show that even LOW intensity exercise has a hypoalgesia effect and this is a good thing for those who cannot exercise at high intensities (e.g., chronic populations).

      Please refer back to our general response before comment 1, where we have addressed this point.

      L410 - A small digression in relation to the exercise intensities:

      The intensity domains (moderate - heavy - severe) are not truly controlled within this study (mainly for the LOW condition), and therefore some participants could have exercised within different exercise intensity domains than others. To explain, the exercise intensity domains are distinguishable by the physiological responses associated with the boundaries of each of these domains. The FTP is believed to be a demarcation point between heavy and severe intensity domains (though kinesiologists debate the validity of this). Other concepts similar to FTP are Critical Power or the Respiratory Compensation Point. Ultimately, the boundary between heavy and severe intensity domains is characterised by the highest possible intensity by which a steady-state in oxygen kinetics (V̇ O2) occurs (Burnley & Jones, 2018). If this is expressed as a power output (Watts) and then a percentage of this power output is used to prescribe exercise intensity, then the physiological response is not always as expected. The reason is that for some people the gaseous exchange threshold (the demarcation point between the moderate and heavy intensity domains) is not always the same percentage between resting and FTP/Critical Power/Respiratory Compensation Point for each person. As a result, some individuals who are prescribed an intensity of 55% FTP/Critical Power/Respiratory Compensation Point may subsequently exercise within the moderate intensity domain (most people did based on the heart rate and RPE responses) whilst some others might actually exercise more within the heavy intensity domain. A quick check of Figures 3B-C could indicate that this might have been the case for two or three participants, but that is inference and speculation as we cannot truly know unless gas parameters were taken (which is perfectly understandable that they have not been taken because this study has done so much else). However, the importance of this for this study is that if some participants did indeed exercise at a slightly higher physiological intensity, this undermines the LOW condition as a "control" as the physiological stimulus between conditions (Brownstein et al., 2023). It means that the proposed differences in endogenous opioids (Vaegter et al., 2015; 2019) between exercise intensities may not have been present and therefore summarising a lack of an exercise induced hypoalgesia effect is slightly confounded. This is one factor contributing to my scepticism about the conclusion that there is a lack of an exercise-induced hypoalgesia response.

      We thank the reviewer for their comment as it touches upon the challenges of estimating exercise intensities precisely. It is, indeed, crucial to consider the boundaries between moderate, heavy, and severe intensity domains, as delineated by physiological markers such as the Functional Threshold Power (FTP), Critical Power, and the Respiratory Compensation Point (VO2max) (Burnley & Jones, 2018). Previous research has shown that the FTP and FTP20 tests are reliable and convenient methods to estimate approximate measures of VO2max (Denham et al., 2020) and that the FTP test is a useful test for performance prediction in moderately trained cyclists (Sørensen et al., 2019).

      We acknowledge that without direct measurements of VO2max, it is challenging to determine the precise intensity domain in which each participant was operating. While the RPE and HR might suggest that some participants performed in the moderate intensity domain in the LI exercise condition, we could still ascertain there to be a significant difference in the relative power (%FTP), heart rate (HR), and rating of perceived exertion (RPE) between the LI and HI exercise conditions. In the overall sample, the consistency in relative power, heart rate, and RPE responses among participants suggests that the exercise doses were effectively communicated and adhered to; therefore, the validity of the LI exercise condition remains robust.

      While we did not include metabolic assessments in our protocol, our study focused on providing a comprehensive analysis of the exercise-induced hypoalgesia phenomenon across two distinct exercise intensities. Additionally, the rationale for selecting specific exercise intensities was grounded in the existing literature, which indicates significant differences in the hypoalgesic response between exercise intensity levels (Jones et al., 2019; Vaegter et al., 2014).

      According to the reviewer, the potential lack of difference between the exercise conditions might contribute to the fact that there was no difference in endogenous opioid release and, thus, no difference in pain ratings between the exercise conditions. However, our data still suggests that there is an influence of endogenous opioids in the HI exercise condition in males with higher fitness levels. Together with recent findings on the association of µ-opioid receptor activation and fitness levels in men (Saanijoki et al., 2022), as well as the difference in µ-opioid receptor availability between high and moderate aerobic exercise (Saanijoki et al., 2018), we would hypothesise that the release of endogenous opioids after short HI bouts of exercise depend on fitness levels (and potentially sex).

      Finally, we propose that discussing exercise intensity domains within the context of our study enriches the understanding of exercise-induced hypoalgesia without undermining the integrity of our findings. We have, therefore, included this in the discussion of the manuscript.

      L417 - For some reason I am doubting this value (r = 0.61). Could this be checked? I think it is higher in their study. r = 0.88?

      Also, as someone with a kinesiology background, I would argue this is a given anyway. The maximum power one can cycle for 20 minutes is related to the maximum power one can cycle for 60 minutes, this is expected. (That is no slight on the authors of this study, more a remark that readers could look and figure that for themselves if they needed to know).

      We thank the reviewer for their comment. We have carefully re-checked the correlation coefficient between the FTP20 and FTP60 tests in the study by Borsczc et al. (2018) and have corrected the correlation coefficient to r = 0.88. We thank the reviewer for detecting this. Whilst we agree that it seems somehow intuitive that the FTP20 and FTP60 should correlate highly, we wanted to provide the reader with a better understanding of where the FTP20 tests originated from and how it is suitable to assess aerobic fitness levels without having to maintain a steady power output for 60 minutes.

      L428 - Kudos to the authors for taking a standardised approach to this. Hopefully, my comment earlier might provide some extra food for thought about exercise intensity. I think there are several other ways future research could prescribe exercise without the need for expensive and cumbersome bits of equipment to know how hard people are exercising.

      We strongly agree with the reviewer and hope that our study can inspire future research to implement more convenient and inexpensive ways to establish aerobic (and anaerobic) fitness levels.

      L456 to 458 - Would it be possible to revisit this and check whether the pooled mean of all stimulation intensities for pain intensity ratings after pressure pain is lower than 50? If so, I think it can also be assumed that there is a slight hypoalgesia effect occurring for pressure pain too.

      We have revisited the pressure pain ratings pooled across all stimulus intensities (VAS 30,50, and 70). Indeed, the ratings are below 50 VAS (Supplemental Figure S1A) in the SAL and NLX conditions. As mentioned before lower pain ratings after LI exercise cannot be taken as evidence for exercise-induced analgesia.

      L495 to L499 - I find this fascinating. Great finding.

      We thank the reviewer for their positive feedback.

      (5) Methods

      L650 - "Watts"

      We have changed the sentence accordingly.

      L651 - beats per minute can also be represented as b.min-1 and cadence as revolutions.min-1.

      To allow for easier interpretation of the results in a broader readership we would like to propose to maintain the original abbreviations.

      L678 - Just to check what the authors mean by "on the second experimental day", they are actually referring to Visit 2 of 3 (first experimental visit of 2) as it is shown in Figure 1?

      We apologise for the lack of clarity. Indeed, the second experimental day refers to the third visit in the study. We have added this to the sentence to increase clarity.

      L708 - would change the end of the sentence to "and remained blinded throughout the study"

      We have changed the sentence accordingly.

      L742 - comma after "in one participant".

      We have added the missing comma.

      L746 - slight mistype... RPE in brackets instead of PRE

      We have changed the abbreviation to RPE.

      L747 - In case the authors are interested in affective measures in future studies... Hardy and Rejeski (1989) have a 9-point Likert scale rating affective valence which might be useful to check out.

      Thank you. The scale by Hary and Rejeski (1989) is a very relevant measure of affective valence during exercise, and we will consider this in future studies.

      L755 - Four squares for the thermode to be applied were drawn on the arm but through the methods I can only seem to see that the thermode was applied to the second square during calibration. During the MRI scan, did someone move the thermode to different squares for different stimulations?

      We appreciate the reviewers' question. Indeed, the heat calibration and recalibration on the first and second day, respectively, have always been completed on the same skin patch (patch 2) to allow for comparability of calibration across days. During the experimental sessions, the thermode head was repositioned in a randomised order across participants (i.e., skin patch 14-3-2) before each block. This was done manually before the MRI block commenced. The order of thermode head position was kept constant within participants across experimental days (day 2 and day 3).

      L764 - ITI predefined?

      We thank the reviewer for their comment and would like to point to line 130 in the revised manuscript where the abbreviation for inter-trial-interval (ITI) was first introduced.

      (6) Other Sections + Supplementary Materials

      L891 - I apologise in advance for this comment as it is the most trivial comment you will ever receive, but there is an extra "." On this line after J.N. initials for methodology.

      We have changed the punctuation accordingly.

      Table S1 - Strictly speaking, some of the intensity denominations in this table are not exactly an "intensity".

      Iannetta et al. (2020) - https://doi.org/10.1249/mss.0000000000002147 provides a commentary on intensity domains as well as Burnley and Jones (2018) - https://doi.org/10.1080/17461391.2016.1249524

      Likewise in this table - the term "without fatigue" in the description column is not strictly true as participants will naturally fatigue but authors are referring more to a "steady state".

      We have changed the name of the column to ‘Description’ to describe the test phase as proposed by Allen and Coggen (2012) and previously implemented by McGrath et al. (2019) and not the ‘intensity domains’ (as specified by Iannetta et al. (2020)). Further, we have refined the wording in Table S1 and replaced the term ‘without fatigue’ with ‘steady state’.

      Once again, thank you to the authors for their great work on this project and to the editor for the chance to review this paper.

      We would like to thank this reviewer for their very insightful and important comments and for pointing out the strengths of the manuscript. We believe the suggestions will help to improve the quality of the manuscript.

      Reviewer #2 (Recommendations for the authors):

      Summary:

      This interesting study compared two different intensities of aerobic exercise (low-intensity, high-intensity) and their efficacy in inducing a hypoalgesic reaction (i.e. exercise-induced hypoalgesia; EIH). fMRI was used to identify signal changes in the brain, with the infusion of naloxone used to identify hypoalgesia mechanisms. No differences were found in postexercise pain perception between the high-intensity and low-intensity conditions, with naloxone infusion causing increased pain perception across both conditions which was mirrored by activation in the medial frontal cortex (identified by fMRI). However, the primary conclusion made in this manuscript (i.e. that aerobic exercise has no overall effect on pain in a mixed population sample) cannot be supported by this study design, because the methodology did not include a baseline (i.e. pain perception following no exercise) to compare high/low-intensity exercise against. Therefore, some of the statements/implications of the findings made in this manuscript need to be very carefully assessed.

      Strengths:

      (1) The use of fMRI and naloxone provides a strong approach by which to identify possible mechanisms of EIH.

      (2) The infusion of naloxone to maintain a stable concentration helps to ensure a consistent effect and that the time course of the protocol won't affect the consistency of changes in pain perception.

      (3) The manipulation checks (differences in intensity of exercise, appropriate pain induction) are approached in a systematic way.

      (4) Whilst the exploratory analyses relating to the interactions for fitness level and sex were not reported in the study pre-registation, they do provide some interesting findings which should be explored further.

      Weaknesses:

      (1) Given that there is no baseline/control condition, it cannot be concluded that aerobic exercise has no effect on pain modulation because that comparison has not been made (i.e. pain perception at 'baseline' has not been compared with pain perception after high/low intensity exercise). Some of the primary findings/conclusions throughout the manuscript state that there is 'No overall effect of aerobic exercise on pain modulation', but this cannot be concluded.

      (2) Across the manuscript, a number of terms are used interchangeably (and applied, it seems, incorrectly) which makes the interpretation of the manuscript difficult (e.g. how the author's use the term 'exercise-induced pain').

      (3) There is a lack of clarity on the interventions used in the methods, for example, it is not exactly clear the time and order in which the exercise tasks were implemented.

      (4) The exercise test (functional threshold power) used to set the intensity of the low/high exercise bouts is not an accurate means of demarcating steady state and non-steady state exercise. As a result, at the intensity selected for the high-intensity exercise in this study, it is likely that the challenge presented for the high-intensity exercise would have been very different between participants (e.g. some would have been in the 'heavy' domain, whereas others would be in the 'severe' domain).

      (5) It is likely that participants did not properly understand how to use the 6-20 Borg scale to rate their perceived effort, and so caution must be taken in how this RPE data is used/interpreted.

      (6) Although interesting, the secondary analyses (relating to the interaction effects of fitness level and sex) were not included in the study pre-registration, and so the study was not designed to undertake this analysis. These findings should be taken with caution.

      We thank the reviewer for their insightful comments that contribute to improving the quality of the manuscript. In response to the identified weaknesses, we have made key revisions to enhance clarity and rigor. Regarding the lack of a resting control condition, we acknowledge that our study does not assess the overall effect of exercise versus no exercise. Our primary objective was to compare high- (HI) and low-intensity (LI) exercise on pain modulation, hypothesizing that lower intensities would have minimal effects. We revised the manuscript to eliminate misleading phrases about an "overall" effect, clearly emphasizing our aim to investigate the comparative effects of different exercise intensities. To address terminology inconsistencies, we have adopted "exercise-induced pain modulation," reflecting existing literature that recognizes both hypoalgesia and hyperalgesia associated with exercise (Vaegter and Jones, 2020). We clarified this terminology in the introduction and specified the pain modalities used in our study. We also improved methodological transparency by better describing the timing and order of exercise and drug treatment interventions. Concerning exercise intensity estimation, we acknowledge the complexities in classifying moderate, heavy, and severe domains. We added the study by Wong et al. (2023) to discuss the potential limitations of the FTP estimation protocol. Although direct measures of VO2max or blood lactate are absent in our study, our findings, including perceived exertion (RPE) scores and relative power data, support that participants were primarily in the heavy-intensity domain during HI exercise. To clarify RPE ratings, we adjusted the presentation to align with the Borg scale's intended anchor points, ensuring greater accuracy in reported exertion levels. Statistical analyses confirm significant differences in RPE between exercise intensities. These revisions aim to clarify our intent and methodologies, ultimately strengthening the contribution of our research to understanding exercise-induced pain modulation.

      (1) Lines 27-33 - please present some data and accompanying statistical output in the results section of the abstract.

      We thank the reviewer for their comment. In the results section of the abstract, we report whether the findings are (not) significant using the general threshold of P < 0.05. However, we prefer not to include more detailed data and statistical outputs here, as these are thoroughly presented in the results section and do not contribute to the abstract’s primary purpose of providing a concise summary.

      (2) Line 29 - please indicate how fitness level was quantified.

      The functional threshold power (FTP) adjusted for weight served as an indication of cardiovascular fitness level. We have now included this in the abstract.

      (3) Line 35 - please include a sentence detailing the implications of your findings.

      We have now included a sentence on the implications of our findings in the abstract.

      (4) Introduction general - I appreciate that it was an exploratory analysis, however, the introduction does not particularly lay the groundwork for this (e.g., the influence of fitness level, sex, etc) - please include some background within the introduction to establish the role level of fitness/exercise/training/physical activity on pain modulation.

      A paragraph detailing the role of fitness level and sex in the context of exercise-induced pain modulation and endogenous opioid release was part of the introduction of our manuscript but has been removed as per the reviewing editor’s request (as the inclusion of sex and fitness level was not part of the preregistration). We have now re-included a shortened version of this paragraph to provide some background on these potentially crucial factors in exercise-induced pain modulation.

      (5) Lines 40-41 - reference needed.

      We thank the reviewer for detecting this and have now included references concerning the release of endogenous opioids and the term exercise-induced hypoalgesia.

      (6) Lines 48-49 - please provide the full terms for ACC and PAG (PAG has been provided on line 52, but should be presented earlier).

      We thank the reviewer for detecting this. We now introduce the abbreviations for the periaqueductal grey (PAG) and anterior cingulate cortex (ACC) in the correct lines.

      (7) Line 49 - the term exercise-induced pain is often used interchangeably (incorrectly) with many different types of pain experienced during/after exercise (e.g. muscle burn/ache, DOMS, injury etc.). Please see O'Malley et al 2024 (doi: 10.1113/EP091687).

      We thank the reviewer for their comment. Despite the distinction between different types of pain induced by exercise being important, this is less relevant for the current study. We would like to point out that the full term used is exercise-induced pain modulation, referring to the modulation of (experimental) pain through exercise. We have deliberately chosen this term as it summarises exercise-induced hypoalgesia as well as hyperalgesia. Therefore, we did not refer to pain induced by exercise and would disagree that this term has been used interchangeably with different types of pain in the current manuscript.

      (8) Line 57 - neither of these studies looked at exercise-induced pain, rather they examined experimentally induced pain (e.g. cold pressor test) or chronic pain and how exercise might exacerbate it. This leads back to the previous comment - it is important to define what is meant by exercise-induced pain (EIP) from the offset, and then remain consistent in the reference to this.

      We agree with the reviewer and have cited the studies accordingly. We would like to point out that the current study does not investigate exercise-induced pain but the modulation of experimental pain through exercise and have used the term exercise-induced pain modulation consistently in the manuscript to describe this.

      (9) Line 61 - Droste et al and Olausson et al are missing from the reference list.

      We apologise for this oversight and have now updated the reference list to include the studies by Droste et al. (1991) and Olaussen et al. (1986).

      (10) Line 61 - Do you mean exercise-induced hypoalgesia, or modulation of exercise-induced pain - it is not clear? EIH is introduced in Line 40 and in consistent with what the Koltyn study explored. Conversely, Koltyn induced pain using heat and pressure, rather than exercise.

      In this manuscript, we have opted for the term ‘exercise-induced pain modulation’ since previous research has shown that exercise can elicit hypoalgesia as well as hyperalgesia (for review see Vaegter and Jones (2020)). Thus, the term refers to the modulation of pain through exercise. We have now included a sentence detailing the use of the term ‘exercise-induced pain modulation’ in the first passage of the introduction. Corresponding to Koltyn et al. (2014), we have used heat and pressure stimuli to induce pain and investigate the modulating effect of different exercise intensities on these pain modalities.

      (11) Line 62 and 64 - Both the Janal study and Haier study are missing from the reference list.

      We apologise for this oversight and have now updated the reference list to include the studies by Janal et al. (1984) and Haier et al. (1981).

      (12) Line 62 and 64 - define long/short distance/duration.

      We have revised the terminology from "short-duration" to "short-distance" to facilitate a more precise comparison of the exercise protocols employed in the studies by Janal et al. (1984) and Haier et al. (1981). Specifically, the long-distance run conducted by Janal et al. (1984) spanned 6.3 miles (10.3 km), while the short-distance run executed by Haier et al. (1981) covered 1 mile (1.6 km).

      (13) Line 62 - what type of pain?

      Janal et al. (1984) implemented thermal, ischemic, and cold pressor pain in their study and observed a hypoalgesic effect in response to thermal and ischemic pain that was reversed under NLX administration. We have now specified this in the text.

      (14) Line 67 - please place "i.e., the insula, ACC and prefrontal regions" in parentheses.

      Done.

      (15) Lines 67-69 - please provide clarity on the nature of the interventions being employed. For example, are you referring to interventions to reduce/overcome pain? Or are you referring to approaches to experimentally induce or increase pain during exercise? In either case, please be specific on the interventions employed, and why this variation in approach may make it challenging to draw a conclusion

      The interventions employed by several studies aimed to investigate the pharmacological underpinnings of the pain modulatory effect of exercise and were, thus, pharmacological interventions. The primary objective of these interventions is usually not to reduce/induce/decrease/increase pain but to block a specific receptor type to infer the involvement/role of these receptor types in pain modulation through exercise. In the context of exercise and pain specifically, the most frequently used pharmacological intervention consists of administering a µ-opioid receptor antagonist (naltrexone/naloxone (NLX)). Depending on which type of µ-opioid receptor antagonist is used, different administration protocols are employed (i.e., oral or intravenous administration, different doses, only bolus without constant injection). This variability in the administration protocols of these pharmacological interventions can account for different findings of the extent of opioidergic involvement in exercise-induced pain modulation. We have now refined the according section to increase the precision and clarity of the interventions used.

      (16) Line 69 - administration of what?

      This passage refers to the variability of administration of µ-opioid receptor antagonists such as naloxone (NLX) or naltrexone. We have now specified this in the according line.

      (17) Line 74 - EIH?

      As described above, we have chosen the term 'exercise-induced pain modulation' as an umbrella term for both exercise-induced hypoalgesia and hyperalgesia. However, the reviewer is correct that specifically studies investigating exercise-induced hypoalgesia have been criticised. Still, the proposed criticism also applies to studies detecting hyperalgesia and we would, thus, argue to retain the term ‘exercise-induced pain modulation’ here for the sake of consistency.

      (18) Line 75 - please define "single-arm pre-post measurements"

      We appreciate the reviewers' comment. Single-arm pre-post measurement studies involve participants being assigned to a single experimental condition, with pain assessments conducted only once prior to and once following the intervention. This study design presents several limitations, particularly in the context of examining exercise-induced modulation of pain (Vaegter and Jones, 2020). Such designs do not consider the effects of habituation to noxious stimuli, as highlighted by Vaegter and Jones (2020). Consequently, when measuring pain levels with only one pre- and one post-intervention assessment, there is a risk of misinterpreting the outcomes where a reduction in post-intervention pain ratings might erroneously be credited to the exercise intervention itself, rather than being a result of habituation to the noxious stimuli experienced. Incorporating randomised controlled trials with multiple measurement blocks not only mitigates these limitations but also provides a clearer understanding of how individual bouts of exercise influence pain perception.

      (19) Line 84 - is (40) a reference?

      We apologise for this oversight and have now updated the reference by Borszcz et al. (2018) to be displayed correctly.

      (20) Line 86 - is that 10 min per block (i.e. 40 min exercise time), or 10 min in total? If the former please include "per block" at the end of the sentence (Line 87).

      The reviewer is correct in assuming that we employed 10 min of cycling per block, resulting in a total of 40 minutes of cycling. We have updated the sentence now including ‘per block’ as suggested by the reviewer.

      (21) Line 89 - when you refer to "painfulness" are you referring to the intensity of pain experienced? If so, I think "pain intensity" would be more appropriate.

      In the current study, participants were asked about the ‘painfulness’ of each stimulus based on previous studies (Horing et al., 2019; Horing & Büchel, 2022; Tinnermann et al., 2022). The term ‘painfulness’ is a composite measure of ‘pain intensity’ (sensory dimension) and ‘pain unpleasantness’ (affective dimension) (Talbot et al., 2019). Since unpleasantness is also a definitional criterion of pain (‘Terminology | International Association for the Study of Pain’, n.d.) and previous research shows a high correlation between ‘pain unpleasantness’ and ‘pain intensity’ (Granot et al., 2008; Talbot et al., 2019) we have opted for the term ‘painfulness’ as a more comprehensive measure. Inherently, these two measures are highly correlated.

      (22) Line 91-93 - the way this is written could be suggestive of this being separate to the cycling blocks. Please rephrase to confirm that this was administered prior to the commencement of the cycling blocks.

      We have refined the sentence to make it clearer that the drug treatment was administered before the cycling block commenced on each of the experimental days. We would like to further specify, that whilst the bolus dose of the treatment was administered prior to the experiment, a constant intravenous supply of SAL/NLX was maintained throughout the experiment using an infusion pump.

      (23) Methods general - why only 10 min of exercise? It is likely that there is a 'dose effect' of exercise on EIH, whereby the intensity of exercise and the duration of the exercise are important. Short-duration but high-intensity exercise can induce EIH, as can moderate duration low-intensity exercise. But, for this protocol, was the intensity high enough or long enough to meet the 'dose' needed?

      We thank the reviewer for their question. Our decision to employ 10-minute exercise blocks was rooted in both scientific evidence on exercise-induced hypoalgesia and the (clinical) applicability of the findings. Research has shown that exercise durations ranging from 8 minutes to 2 hours of aerobic exercise can induce hypoalgesia (for review see Koltyn (2002)). Specifically, several studies induce hypoalgesia at 10-15 minutes of aerobic exercise (Gomolka et al., 2019; Gurevich et al., 1994; Haier et al., 1981; Jones et al., 2019; Sternberg et al., 2001; Vaegter et al., 2015). Furthermore, many prior studies have employed exercise durations that are tailored to professional or amateur athletes which may not be practical for healthy individuals with lower fitness levels who may find it challenging to engage in longer sessions, such as an hour of running. When considering applying these findings to the clinical chronic pain population it is crucial to assess the manageability of proposed exercise protocols. We believe that 10 minutes of exercise, whilst being a relatively brief exercise duration, may still be sufficient to elicit exercise-induced hypoalgesia.

      (24) Methods general - what was the time gap between each round (i.e. after the fMRI, how long before the participant started the next cycling block?).

      After each fMRI run the participants were taken out of the MR scanner. The HR and SPO2 were measured and participants were given the chance to go to the restroom before positioning them on the bike and starting the next block. All in all, the time following the fMRI scan and before the new block commenced ranged between 5-10 minutes. We have now included this specification in the methods section.

      (25) Methods general - there is some evidence to show that the EIH effect is less consistently shown when heat is used to induce pain - was there a reason heat was used as the pain induction method here?

      We thank the reviewer for their comment. Indeed, previous meta-analyses by Naugle et al. (2012) report larger effect sizes for pressure pain (Cohen’s d = 0.69) closely followed by heat pain (d = 0.59). In light of this evidence, we included both pain modalities in the current study. Notably, we found no significant differences in pressure pain responses between LI and HI exercise. It is important to emphasise that the term "pressure pain" predominantly encompasses studies employing handheld pressure algometry, whereas our investigation utilised a pressure cuff. This methodological variation raises the possibility that our findings—and corresponding effect sizes—may not be directly comparable to prior pressure pain studies.

      (26) Methods general - please be consistent in the use of terminology. In some areas, you use the phrase "cycling block" whereas in other areas it is referred to as a "cycling run".

      We have revised the methods section to be more precise with the terms ‘run’ and ‘block’.

      (27) Line 571-573 - Please detail how participants were excluded based on scores from STAI and BDI-II.

      We apologise for the misspelling, as it should be that one participant was excluded based on a BMI (body mass index) below 18. No participant had to be excluded based on the STAI or BDI-II score in the current study. We have corrected this in the manuscript.

      (28) Line 636-651 - the FTP20 test has been shown not to be a valid marker of the separation between the heavy and severe exercise intensity domains (see Wong et al 2023 - https://doi.org/10.1080/02640414.2023.2176045). Given that participants completed the high intensity cycle in 'zone 4' (91-106% of FTP), it is probable that participants could have completed this 10 min in either the heavy or the severe exercise intensity domains, with significant implications for the relative challenge this 10 min of exercise. Why was zone 4 used? What are the implications of this? Please discuss and include this as a limitation.

      We thank the reviewer for their comment as it touches upon the challenges of accurately estimating exercise intensities. It is indeed crucial to consider the boundaries between moderate, heavy, and severe intensity domains, as delineated by physiological markers.

      The study by Wong et al. (2023) is interesting; it assesses blood lactate and VO2 levels at FTP and FTP+15 Watts. Despite being highly relevant for the field some of the findings should be interpreted with caution due to the low sample size of 13 participants, consisting of 11 male and only 2 female cyclists, which may limit generalisability. Additionally, the testing protocol implemented in the study to determine participants' FTP consisted of a 5-minute self paced pedalling at 100 Watts followed by a 20-minute maximal, self-paced time trial. This differs from the FTP20 test as implemented in the current study (see Supplemental Table S1) or by other studies (McGrath et al., 2019). The finding in Wong et al. (2023) that participants were only able to sustain cycling at FTP for an average of 33 minutes suggests that the deviating protocol overestimates FTP. Mackey and Horner (2021) propose that the validity of the FTP20 test might rely on the warm-up used before FTP20 testing and the training status of athletes.

      However, we acknowledge that without direct measurements of VO2max or blood lactate levels, it is challenging to determine the precise intensity domain in which each participant was operating in the current study. Still, the RPE (low: M = 8.59, SD = 1.32; high: M = 14.92, SD = 1.98) suggests that participants operated in the heavy-intensity domain in the HI exercise condition. This is further supported by the relative power (%FTP) maintained in the HI (M = 105; SD = 0.05; Author response image 5, purple) and LI (M = 58; SD = 0.06; Author response image 5, green) exercise conditions (difference: t(37) = 44.58, P < 2.2e-16, d = 6.46) confirming the accuracy of the implemented FTP test as well as the maintained power throughout the cycling blocks. Thus, we would argue that participants in the current study predominantly exercised the heavy domain during the HI exercise condition. We have included the relative Power in Figure 3A, replacing the absolute Power.

      Finally, we propose that discussing exercise intensity domains within the context of our study enriches the understanding of exercise-induced hypoalgesia without undermining the integrity of our findings. We have now included a discussion of the validity of the FTP20 test as a demarcation point concerning the intensity domains.

      Author response image 5.

      Raincloud plot of relative power (%FTP) during low (green) and high (purple) intensity exercise. Individual data points depict subject-specific averages across blocks.

      (29) Line 676 - please provide further information on each cycling run/block. Did each participant complete a total of 4 runs (i.e., a total of 40 minutes of exercise), with 2 runs completed at a high intensity and 2 runs completed at a low intensity in a randomised order (e.g., for one participant this could be 10 minutes at low, followed by 10 minutes at high, followed by 10 minutes a low, followed by 10 minutes at high)? Figure 1 details this nicely, however, it would be helpful to read in-text.

      The reviewer is correct in assuming that there were a total of 4 blocks on each experimental day. Participants completed cycling in 2 blocks at HI and in 2 blocks at LI in a pseudorandomised order. This order was kept constant across experimental days (i.e. completing the same block order on Day 2 and Day 3). We have detailed this further in the Methods section.

      (30) Discussion general - it is possible that EIH could be induced via different mechanisms and that these mechanisms are at least in part due to exercise intensity. For example, EIH from higher-intensity exercise might have some contribution from CPM.

      We thank the reviewer for their comment. Previous research aimed to disentangle the two seemingly similar mechanisms of exercise-induced hypoalgesia (EIH) and conditioned pain modulation (CPM) (Ellingson et al., 2014; Rice et al., 2019; Samuelly-Leichtag et al., 2018; Vaegter et al., 2014). CPM is typically induced by applying a tonic noxious stimulus that decreases pain sensitivity to another noxious stimulus applied simultaneously or shortly after at a distant body part (Graven-Nielsen & Arendt-Nielsen, 2010). Despite EIH and CPM showing distinct mechanisms, it cannot be completely ruled out that there are at least partially overlapping mechanisms driving the two phenomena (Rice et al., 2019). Due to our study design, where the time difference between cycling blocks and the applied pain was on average five minutes, it is unlikely that CPM is the driving pain modulatory mechanism in our study setup.

      (31) Line 101 - as this was preregistered, should the study design be followed and then reported?

      We have conducted the study adhering to the preregistered study design and now report the results for pressure pain (Supplemental Figure S1). Some of the preregistered analyses (i.e. directly comparing heat and pressure pain) were beyond the scope of the current study and will be reported separately.

      (32) Line 110 - please provide some data on the fitness levels and how this is classified as high/low.

      The FTP (relative to body weight) was used as an estimate of cardiovascular and endurance fitness (Valenzuela et al., 2018). We refrained from classifying the fitness levels dichotomously as low or high since this is a subjective measure in a sample of healthy individuals of diverse fitness levels. Instead, we utilised the FTP as a more nuanced metric for comparison.

      (33) Lines 159-160 - in the context of the difference in intensity between the sessions. But, it is likely that the high-intensity exercise would have posed quite different relative challenge between participants.

      We thank the reviewer for their comment. As described above, we did not obtain direct measurements of VO2max or blood lactate levels making it challenging to determine the precise intensity domain in which each participant was operating in the current study. However, all participants received the same instructions to the BORG rating scale ensuring the comparability of RPE across participants to a certain extent.

      (34) Figure 3C - what instructions and familiarisation were given to participants regarding the 6-20 Borg scale? In Figure 3C it looks as though several participants rated the low exercise intensity at 6. This would/should be equivalent to sitting quietly, so it looks as though at least several participants did not understand how to use the RPE - please discuss.

      Indeed, three participants rated the LI exercise condition at 6 due to an error in the translation of the scale instruction. Participants were instructed that the lower anchor point of the scale (6) referred to ‘extremely light’ instead of ‘no exertion’. Thus, we have rescaled the RPE ratings where a rating of 6 now corresponds to a 7 (‘extremely light’) on the BORG scale and again calculated the paired t-test. There is still a significant difference in the RPE between exercise intensities (t(38) = 19.65, P < 2.2e-16, d = 3.69; Author response image 6). We have corrected this in the manuscript accordingly and updated Figure 3C.

      Author response image 6.

      Raincloud plot of rating of perceived exertion (RPE) on the BORG scale during low (green) and high (purple) intensity exercise. Individual data points depict subject-specific averages across blocks. A rating of 6 reflects ‘no exertion’ and 20 reflects ‘maximal exertion’.

      (35) Line 171 - is (37, 38) a reference?

      We apologise for this oversight and have now updated the references to be displayed correctly.

      (36) Line 176-18 - is this interaction sufficiently powered? Differences between sexes are not mentioned in the pre-registered study

      We have conducted an additional post-hoc power analysis for the interaction of drug, fitness level, and sex on differential heat pain ratings. We employed the power analysis for mixed models implemented in R (powerCurve) with 1000 simulations. This revealed that with a power of α = 0.8, a sample size of n = 27 would have been sufficient to detect this effect (Author response image 7). Despite not having preregistered the factor ‘sex’, we believe that the observed results provide valuable insights that contribute to a deeper understanding of the data. We have established these analyses to be exploratory, emphasising the need for caution in their interpretation. However, we feel it is essential to report these findings to inform future studies, ensuring that such factors are adequately considered.

      Author response image 7.

      Post-hoc power analysis for behavioural effects from the linear mixed effects (LMER) model with interaction drug, fitness level, and sex using the R package powerCurve with α = 0.8 and 1000 simulations.

      (37) Line 227 - this is not what this analysis shows. The comparison is low vs high-intensity exercise on pain modulation, not exercise vs. no exercise. You cannot conclude that aerobic exercise has no effect on pain modulation because you did not do that comparison (i.e. no baseline (without exercise) for pain).

      We agree with the reviewer and have rephrased the sub-headline accordingly to reflect that there is no difference in exercise-induced hypoalgesia between HI and LI aerobic exercise.

      (38) Methods General - why was a control condition not used, or at least a baseline pain response, so that low/high-intensity exercise could be compared to a baseline? Given this, I'm not sure I agree with the study conclusions (abstract: 'These results indicate that aerobic exercise has no overall effect on pain in a mixed population sample') because you have compared high vs low-intensity exercise, not exercise vs. no exercise.

      As for the lack of a resting control condition, we acknowledge that our study was not designed to test the overall effect of exercise versus no exercise. However, our primary objective was to compare different exercise intensities, hypothesising that low-intensity (LI) exercise would induce less pain modulation as compared to high-intensity (HI) exercise. By exploring this, we aimed to enhance understanding of the dose-response relationship between exercise and pain modulation. To better reflect this focus, we have revised the misleading phrasing regarding the ‘overall’ effect of exercise to clearly emphasize our primary aim: comparing HI and LI exercise. This reviewer suggests an interesting interpretation of the data suggesting that exercise-induced hypoalgesia might have occurred for both exercise intensities since the pain ratings provided were lower than the anticipated intensities as determined by the calibration. Given that this difference is lower in the naloxone (NLX) condition could provide evidence of opioidergic mechanisms underlying this effect.

      Unfortunately, the current study is not designed to comprehensively answer this question since there was no resting control condition. In particular, the lower pain ratings under SAL (Figure 6) could be due to exercise triggering the descending pain modulatory system (DPMS), but equally due to the default activation of the DPMS. Only an additional “no exercise” condition could disentangle this. Furthermore, habituation to noxious stimuli can influence pain ratings, resulting in lower pain ratings during the experiment as compared to the calibration.

      (39) Line 285 - or that better-trained individuals have a greater EIH response to higher intensity exercise, but both those of low and high fitness have established EIH after low intensity exercise. Given there isn't a 'no exercise' baseline, it is hard to make conclusions about EIH effect generally, only comparisons between high/low exercise intensity.

      We thank the reviewer for their comment. We agree that we cannot establish whether all participants showed a hypoalgesic response to the LI exercise with the current study design. However, our results show that participants with higher fitness levels showed increased hypoalgesia after HI exercise compared to those with lower fitness levels. We have refined the sentence accordingly.

      (40) Figure 7A - the regression line here is not that convincing.

      We acknowledge the reviewers’ concern regarding the regression line. However, it is important to note that the significant main effect of fitness level on differences in pain ratings in the SAL condition (β = 6.45, CI [1.25, 11.65], SE = 2.56, t(38) = 2.52, P = 0.02) supports the assertion that higher fitness levels are associated with greater hypoalgesia following HI exercise compared to LI exercise. While the trend may not be visible for all data points, the statistical analysis provides a robust basis for the observed relationship (r = 0.33, P = 0.038).

      (41) Line 354 - the NLX infusion was double-blind, but what are the implications of participants knowing that they completed high/low-intensity exercise - this cannot be blinded.

      The reviewer is correct that the exercise intensities cannot be blinded. To account for potential expectation effects of exercise on several psychological and physiological domains (including pain), participants completed a questionnaire on the calibration day where they had to indicate their expectations of to what extent acute exercise affects several domains (Lindheimer et al., 2019). They could rate each domain on a Likert scale ranging from ‘large decrease’ (-3) to ‘large increase’ (3) with 0 denoting ‘no effect’. This format was chosen to allow measuring the direction and magnitude of expectation effects and to avoid being directive or suggestive (Lindheimer et al., 2019). Despite including other psychological and physiological domains in the questionnaire (i.e., stress, anxiety, energy, memory) we focused on the specific pain domains (muscle pain, joint pain, and whole body pain) to establish participant’s expectations regarding the effect of acute exercise on pain. We tested whether the expectation ratings for each pain type were significantly different from 0 (no effect) using a one-sample t-test.

      There was no significant effect for muscle pain (t(38) = 1.78, P = 0.08, M = 0.39, SE = 0.12), joint pain (t(38) = -0.12, P = 0.90, M = -0.03, SE = 0.11), or ‘whole-body pain (t(38) = -1.05, P = 0.30, M = -0.21, SE = 0.12) suggesting there to be no expectation effect on these pain domains in the overall sample (Supplemental Figure S10A). Since there is variation in the data we calculated the correlation of the expectation ratings in the different pain domains with the difference score between the pain ratings in the SAL condition (LI – HI rating; Supplemental Figure S10B). This analysis yielded no significant correlation in either of the pain domains (joint pain: r = 0.11, P = 0.49; muscle pain: r = -0.07, P = 0.68; whole-body pain: r = 0.07, P = 0.68).

      Moreover, given that we have not been able to show a difference between the exercise intensities on pain modulation, expectation effects are likely not to contribute to this null effect.

      (42) Line 356-358 - and this comparison (and primary hypothesis) is not blinded.

      While we agree with the reviewer that this comparison is not – and potentially cannot be – blinded, we would like to reiterate our results from the previous paragraph that indicate that such expectation effects of exercise on pain were not present in the sample and, thus, did not seem to have influenced the results. It is noteworthy that the double-blind design of our study design specifically pertains to the pharmacological intervention employed.

      (43) Line 358-360 - this could be explained by both types of exercise inducing EIH via the same mechanism (which is disrupted by NLX).

      We thank the reviewer for their comment and would like to refer back to the reviewer's comment number 38 for a response to this.

      (44) Line 360-361 - this conclusion cannot be drawn, because you have only compared high vs low intensity exercise. So, the conclusion should be 'These results suggest that there is no difference between high and low aerobic exercise intensity on heat-induced pain'.

      We agree with the reviewer and have rephrased the sentence to reflect the claim accurately.

      (45) Line 396 - as previously discussed, this conclusion cannot be drawn through this study design.

      We agree with the reviewer and have rephrased the sub-headline accordingly to reflect that there is no difference in exercise-induced hypoalgesia between HI and LI aerobic exercise.

      (46) Line 399 - please expand on this point - it is critical to the hypothesis and should also be included in the introduction. What intensities/duration/dose of aerobic exercise is generally established to cause EIH?

      We thank the reviewer and agree that this is a crucial aspect that requires further specification. Below we have expanded on the duration/intensities shown to elicit exercise-induced hypoalgesia and included a concise version of this detailed paragraph in the manuscript introduction.

      For aerobic exercise, different methods have been employed to determine exercise intensity levels i.e., through the VO2max, age-predicted HRmax, or incremental intensities (Koltyn, 2002). Most studies using VO2max as a measure of exercise intensity (Koltyn et al., 1996; Micalos & Arendt-Nielsen, 2016; Vaegter et al., 2014) were able to induce hypoalgesia with HI levels ranging between 65%-75% VO2max. When using the HRmax as a measure of determining exercise intensities, HI exercise at 70%-75% of the HRmax has been shown to produce greater hypoalgesia compared to moderate intensity at 50% HRmax (Naugle et al., 2014; Vaegter et al., 2014). Furthermore, previous research has suggested that HI exercise produces greater hypoalgesia compared to LI exercise (60-70% HRmax vs. light activity: M. D. Jones et al., 2019; 70% vs. 50% HRmax: Naugle et al., 2014; 75% vs. 50% VO2max: Vaegter et al., 2014).

      Furthermore, different durations can be regarded as suitable with durations between 8 minutes to 2 hours of aerobic exercise having been shown to induce hypoalgesia (for review see Koltyn (2002)). Hoffman et al. (2004) showed a hypoalgesic response after 30 minutes but not after 10 minutes at 75% VO2max of cycling. In contrast, other studies were able to induce hypoalgesia at 10-15 minutes of HI aerobic exercise (75% VO2may: Gomolka et al., 2019; 63% VO2max: Gurevich et al., 1994; self-paced: Haier et al., 1981; 60-70% HRmax: Jones et al., 2019; 85% HRmax: Sternberg et al., 2001; 75% VO2max: Vaegter et al., 2015).

      (47) Line 400-401 - please define high intensity.

      We thank the reviewer for their comment. The referenced studies by Vaegter et al. (2014) and Jones et al. (2019) based the estimation of HI and LI exercise on an age-related target heart rate corresponding to VO2max and HRmax, respectively. In Vaegter et al. (2014), the HI condition corresponded to 75% VO2max, while the LI to 50% VO2max. In Jones et al. (2019), the HI exercise condition corresponded to 60% and 70% of HRmax, while the LI condition was defined as pedalling slowly against a light resistance of 0.5 kg of force to maintain a rating of perceived exertion (RPE) not above resting. We have included this clarification in the relevant section to elucidate the intensities of the chosen exercise conditions.

      (48) Line 403-405 - I'm not sure I follow (perhaps I have misunderstood) - pain induction was completed after exercise in the MRI scanner, so there was no distraction effect of exercise in either condition. A baseline could have been established in the same way and there would be exactly the same conditions, just without prior exercise.

      We agree with the reviewer that a resting baseline condition in the context of exercise induced pain modulation allows for the investigation of a potential hypoalgesic effect of exercise compared to no exercise. Nevertheless, it is important to note that previous studies (Brooks et al., 2017; Sprenger et al., 2012) have shown that cognitive pain modulation is mediated by endogenous opioids. Therefore, tasks with different attentional loads potentially influence post-task pain ratings. Although, we agree with the reviewer that the effect of distraction or attentional load would be minimal in the MR scanner, there still could be an effect of different cognitive loads from exercise vs. no exercise. Nevertheless, we focus the discussion on investigating the dose-response relationship between different exercise intensities where an ‘active’ control condition might contribute to a more nuanced understanding of exercise-induced pain modulation.

      (49) Line 403-411 - this is fine (although I do not agree that this was the best methodological decision), however, it does limit the conclusions that can be drawn (as previously mentioned). That is, you cannot conclude that no EIH occurred, only that there was no difference between low and high-intensity exercise in post-exercise pain response.

      We agree with the reviewer that the comparison of HI vs. LI exercise does not allow for an interpretation of the overall effect of exercise as opposed to no exercise on pain modulation. The comparison of HI and LI exercise allows the investigation of a dose-response relationship of these distinct exercise intensities. While LI exercise might not be a 'pure' control condition in the traditional sense, it is valuable for exploring the complexities of exercise and pain interaction.

      (50) Line 419-422 - sorry I do not follow - you say that moderate intensity exercise most reliably induces EIH but then select exercise intensities that are likely to be in the heavy or severe intensity domain? Please also include in this discussion the limitations of FTP20 as a threshold marker (see Wong et al) and the implications on the results/conclusions.

      We thank the reviewer for their comment. In the referenced sentence, we have defined the HI exercise as described in the reviews. Specifically, Wewege and Jones (2020) reported hypoalgesia to be greater after higher-intensity exercise, although the intensity was not further specified. Naugle et al. (2012) noted that HI exercise (i.e., 75% of VO2max) produced greater hypoalgesia, while Koltyn (2002) indicated that hypoalgesia occurs at intensities ranging from 60% to 75% of VO2max but more reliably at 75% VO2max or higher. Consequently, we have removed the term ‘moderate’, as it does not accurately reflect what has been reported in the reviews and could be misleading. Moreover, we have clarified the specific criteria for what is considered high (or higher) intensity exercise in the referenced reviews.

      We kindly ask the reviewers to refer back to the previous comment (reviewer comment number 28) regarding the discussion of the intensity domains and the FTP20 test as demarcation point for these intensity domains.

      (51) Line 422-425 - indeed, pacing is an important element of this test, which inexperienced cyclists have difficulty with when they are not provided with proper familiarisation.

      We agree with the reviewer that the FTP20 test has mainly been validated and employed in experienced cyclists and requires further validation in non-athletes of both sexes. However, since we have used an extensive warm-up period and several paced steps (intervals, 5-minute time-trial) as well as recovery periods (Supplemental Table S1) based on McGrath et al. (2019) we propose that participants were thoroughly familiarised with the elements of pacing before the estimation of the FTP in the 20-minutes took place. On average, participants showed a variation of M = 21.80 Watts (SE = 1.44 Watts) during the 20-minute paced FTP20 test (Supplemental Figure S11A). Interestingly, our data suggests that participants with a higher FTP showed higher variation of power output (Watts) during the 20-minute FTP test compared to individuals with lower fitness levels (Supplemental Figure S11B).

      (52) Line 425-427 - please remove this, the RPE difference between exercise bouts is not evidence that participants cycled at FTP.

      We thank the reviewer for their comment. However, we would propose to include the rating of perceived exertion (RPE) since it shows that the exercise intensities have been perceived as significantly different by the participants. This behavioural measure of exertion is potentially important for a broader audience to understand the exercise implementation beyond physiological markers.

      (53) Line 432 - high vs. low-intensity aerobic exercise

      We have changed the sentence accordingly to support the claim of the study that there was no difference in exercise-induced pain modulation between HI and LI aerobic exercise.

      (54) Line 447-449 - this seems contradictory to the first line of this paragraph (430-432) - i.e. that the heterogenous sample may have caused the null finding. Why deliberately select a participant sample that is likely to lead to a null effect?

      In the current study, we aimed to include participants of diverse fitness levels and both sexes to verify the findings on exercise-induced pain modulation in a broader population. We consider this important concerning translational aspects of EIH. Indeed, our heterogeneous sample may have ‘caused’ the observed null effect, but at the same time, it suggests that more homogenous (sometimes composed solely of male athletes) samples employed in many earlier studies might have skewed the understanding of exercise-induced pain modulation and thus unintentionally suggested a (non-existing) generalisation of this effect to the general population.

      (55) Line 532-456 - although Koltyn found electrical pain to have the greatest effect?

      The review by Naugle et al. (2012) reported effect sizes for heat (Cohens d = 0.59) and pressure pain intensity (d = 0.69) following aerobic exercise but did not provide effect sizes for electrical pain intensity. They noted that the effect size for electrical pain intensity after isometric exercise was d = 0.40, which is lower than that for heat and pressure pain. While Koltyn (2002) stated that electrical and pressure stimuli induce exercise-induced hypoalgesia more consistently than thermal pain, the study did not clarify whether this applies to pain threshold, intensity, or tolerance, nor did they provide effect sizes. Given that electrical, pressure, and heat pain are the most commonly used methods to induce quantifiable pain in the context of exercise studies (Vaegter and Jones, 2020), we based our decision to use heat and pressure pain primarily on Naugle et al.'s findings.

      (56) Line 468-469 - why leave out content that was pre-registered (i.e. difference between pressure and heat pain) but includes analysis that wasn't (i.e. sex differences)? If a study is going to be pre-registered, then isn't it important to follow that design?

      We thank the reviewer for this comment. We have conducted the study adhering to the preregistered study design and now report the results for pressure pain (Supplemental Figure S1). Some of the preregistered analyses (i.e. directly comparing heat and pressure pain) were beyond the scope of the current study and will be reported separately.

      (57) Line 532-525 - and how could this have been accounted for?

      We apologise for any confusion, as we are unsure about the specific reference the reviewer is making based on the provided line numbers. We believe the question relates to how the potential effects of endocannabinoids were considered in the current study design, and we've addressed that in our response. In human studies, it is not possible to centrally block endocannabinoids, which makes it difficult to directly estimate their role in exercise-induced pain modulation in humans. Measuring endocannabinoids in the blood might not adequately capture changes in endocannabinoid levels in the brain throughout the different exercise intensity conditions. Despite these limitations, exploring the role of endocannabinoids in exercise-induced pain modulation presents a promising avenue for future research that could enhance our understanding of pain mechanisms and improve pain management strategies.

      58) Limitations General - please include the other limitations discussed in this review.

      Done.

      (59)Line 530 - please amend this conclusion, in line with previous comments.

      Done.

      We would like to thank the reviewer for critically evaluating the manuscript and providing insightful comments. We appreciate the reviewer recognising the strengths of our work and believe that their suggestions will contribute to improving the quality of the manuscript.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review): 

      Devakinandan et al. present a revised version of their manuscript. Their scRNA-seq data is a valuable resource to the community, and they further validate their findings via in situ hybridizations and electron microscopy. Overall, they have addressed my major concerns. I only have two minor comments. 

      (1) The authors note in Figure 4I, and K that because the number of C2 V2Rs or H2-Mv receptors increased while the normalized expression of Gnao1 remained constant (and likewise for V1Rs and Gnai2 in Figure 4-S4C) that their results are unlikely to be capturing doublets. I'm not sure that this is the case. If the authors added together two V2R cells the total count of every gene might double, but the normalized expression of Gnao1 would remain the same. To address this concern, the authors should also show the raw counts for Gnao1 as well as the total number of UMIs for these cells. 

      In Figure 4I, 4K and Figure 4-Figure supplement 4C, on Y-axis, we plotted the sum of normalized counts of all V1R/V2R/H2-Mv genes expressed in each cell along with the normalized expression value of Gnao1/Gnai2. Both VR/H2-Mv and Gnao1/Gnai2 are normalized values, with normalization based on LogNormalize (mentioned in methods). We show here plots of total expression calculated from raw counts corresponding to the same Figure. Raw counts of VRs/H2-Mv, Gnao1/Gnai2 are plotted separately due to difference in scale. The overall trend matches normalized counts, with minor fluctuations in Gnao1/Gnai2.     

      Author response image 1.

      As mentioned in our response to version-1 reviews and in our manuscript, doublets generally are a random combination of two cells and the probability that a combinatorial pattern is due to doublet is proportional to the abundance of cells expressing those genes. It is possible that some of the family-C V2R combinations represented by 2 cells are doublets because of their widespread expression. The frequency of combinatorial expression patterns, greater than a set threshold of 2 cells, that we observed for family ABD V2Rs or V1Rs (supplementary tables 7, 8) is an indication of co-expression and unlikely from random doublets. For instance, 134 cells express two V1Rs, of which 44 cells express Vmn1r85+Vmn1r86, 21 cells express Vmn1r184+Vmn1r185, 13 express Vmn1r56+Vmn1r57, 6 express Vmn1r168+Vmn1r177. Some of the co-expression combinations we reported were also identified and verified experimentally in Lee et al., 2019 and Hills et. al., 2024.

      The co-expression of multiple family-C2 V2Rs (Vmn2r2-Vmn2r7) along with ABD V2Rs per cell as shown in our data, has been shown experimentally in earlier studies.      

      (2) As requested, the authors have now added a colorbar to the pseudocolored images in Figures 7. However, this colorbar still doesn't have any units. Can the authors add some units, or clarify in the methods how the raw data relates to the colors (e.g. is it mapped linearly, at a logscale, with gamma or other adjustments, etc.)? Moreover, it's also unclear what the dots in the backgrounds of plots like Figure 7E mean. Are they pixels? Showing the individual lines, the average for each animal, or omitting them entirely, might make more sense. 

      We used the Fire LUT with linear scale within Fiji / Image-J software to assign scale to the pseudo-colored images in Figure 7. We will include this description in our methods and thank the reviewer for pointing it out. The dots in the background are mentioned in Figure 7 legend as fluorescence intensity values normalized to a 0-1 scale and color coded for each antibody. The trendline was fitted on these values.  

      Reviewer #2 (Public review): 

      Summary: 

      The study focuses on the vomeronasal organ, the peripheral chemosensory organ of the accessory olfactory system, by employing single-cell transcriptomics. The author analyzed the mouse vomeronasal organ, identifying diverse cell types through their unique gene expression patterns. Developmental gene expression analysis revealed that two classes of sensory neurons diverge in their maturation from common progenitors, marked by specific transient and persistent transcription factors. A comparative study between major neuronal subtypes, which differ in their G-protein sensory receptor families and G-protein subunits (Gnai2 and Gnao1, respectively), highlighted a higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons. Moreover, distinct differences in ER content and ultrastructure suggest some intriguing roles of ER in Gnao1-positive vomeronasal neurons. This work is likely to provide useful data for the community and is conceptually novel with the unique role of ER in a subset of vomeronasal neurons. This reviewer has some minor concerns and some suggestions to improve the manuscript. 

      Strengths: 

      (1) The study identified diverse cell types based on unique gene expression patterns, using single-cell transcriptomic. 

      (2) The analysis suggest that two classes of sensory neurons diverge during maturation from common progenitors, characterized by specific transient and persistent transcription factors. 

      (3) A comparative study highlighted differences in Gnai2- and Gnao1-positive sensory neurons. 

      (4) Higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons. 

      (5) Distinct differences in ER content and ultrastructure suggest unique roles of ER in Gnao1-positive vomeronasal neurons. 

      (6) The research provides conceptually novel on the unique role of ER in a subset of vomeronasal neurons, offering valuable insights to the community. 

      Reviewer #3 (Public review): 

      Summary: 

      In this manuscript, Devakinandan and colleagues have undertaken a thorough characterization of the cell types of the mouse vomeronasal organ, focusing on the vomeronasal sensory neurons (VSNs). VSNs are known to arise from a common pool of progenitors that differentiate into two distinct populations characterized by the expression of either the G protein subunit Gnao1 or Gnai2. Using single-cell RNA sequencing followed by unsupervised clustering of the transcriptome data, the authors identified three Gnai2+ VSN subtypes and a single Gnao1+ VSN type. To study VSN developmental trajectories, Devakinandan and colleagues took advantage of the constant renewal of the neuronal VSN pool, which allowed them to harvest all maturation states. All neurons were re-clustered and a pseudotime analysis was performed. The analysis revealed the emergence of two pools of Gap43+ clusters from a common lineage, which differentiate into many subclusters of mature Gnao1+ and Gnai2+ VSNs. By comparing the transcriptomes of these two pools of immature VSNs, the authors identified a number of differentially expressed transcription factors in addition to known markers. Next, by comparing the transcriptomes of mature Gnao1+ and Gnai2+ VSNs, the authors report an enrichment of ER-related genes in Gnao1+ VSNs. Using electron microscopy, they found that this enrichment was associated with specific ER morphology in Gnao1+ neurons. Finally, the authors characterized chemosensory receptor expression and co-expression (as well as H2-Mv proteins) in mature VSNs, which recapitulated known patterns. 

      Strengths: 

      The data presented here provide new and interesting perspectives on the distinguishing features between Gnao1+ and Gnai2+ VSNs. These features include newly identified markers, such as transcription factors, as well as an unsuspected ER-related peculiarity in Gnao1+ neurons, consisting in a hypertrophic ER and an enrichment in ER-related genes. In addition, the authors provide a comprehensive picture of specific co-expression patterns of V2R chemoreceptors and H2-Mv genes. 

      Importantly, the authors provide a browser (scVNOexplorer) for anyone to explore the data, including gene expression and co-expression, number and proportion of cells, with a variety of graphical tools (violin plots, feature plots, dot plots, ...). 


      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Devakinandan and colleagues present a manuscript analyzing single-cell RNAsequencing data from the mouse vomeronasal organ. The main advances in this manuscript are to identify and verify the differential expression of genes that distinguish apical and basal vomeronasal neurons. The authors also identify the enriched expression of ER-related genes in Gnao1 neurons, which they verify with in situ hybridizations and immunostaining, and also explore via electron microscopy. Finally, the results of this manuscript are presented in an online R shiny app. Overall, these data are a useful resource to the community. I have a few concerns about the manuscript, which I've listed below. 

      General Concerns: 

      (1) The authors mention that they were unable to identify the cells in cluster 13. This cluster looks similar to the "secretory VSN" subtype described in a recent preprint from C. Ron Yu's lab (10.1101/2024.02.22.581574). The authors could try comparing or integrating their data with this dataset (or that in Katreddi et al. 2022) to see if this is a common cell type across datasets (or arises from a specific type of cell doublets). In situ hybridizations for some of the marker genes for this cluster could also highlight where in the VNO these cells reside. 

      Cluster13 (Obp2a+) cells identified in our study have similar gene expression markers to “putative secretory” cells mentioned in Hills et al.. At the time this manuscript was available publicly, our publication was already communicated. We have now performed RNA-ISH to Obp2a, the topmost marker identified with this cluster, and found it to be expressed in cells from glandular tissue on the non-sensory side. Some of the other markers associated with this cluster such as Obp2b, Lcn3, belong to the lipocalin family of proteins. Hence in our estimate these markers collectively represent non-sensory glandular tissue. We have added Obp2a RNA-ISH to Figure 2-figure supplement-1A and results section in our revised manuscript. Cluster-13 also has cells expressing Vmn1r37, which typically is expressed in neuronal cells. However, we do not see Obp2a mRNA in the sensory epithelium. It is possible that cluster-13 comprises a heterogenous mixture of cells, some of which are clearly non-sensory cells from glandular tissue, co-clustered with other cell types as well as a  possibility that Obp2a is expressed below the detection level of our assay in neurons, which will require further experiments. We do not have any possible reason to confidently assign this cluster as a neuronal cell type, hence, we excluded it in downstream analysis of neurons. 

      We used the data from Hills et al., to compare co-expression characteristic of V2Rs, which is added as Figure 3-figure supplement 3. 

      (2) I found the UMAPs for the neurons somewhat difficult to interpret. Unlike Katreddi et al. 2022 or Hills et al. 2024, it's tricky to follow the developmental trajectories of the cells in the UMAP space. Perhaps the authors could try re-embedding the data using gene sets that don't include the receptors? It would also be interesting to see if the neuron clusters still cluster by receptor-type even when the receptors are excluded from the gene sets used for clustering. Plots relating the original clusters to the neuronal clusters, or dot plots showing marker gene expression for the neuronal clusters might both be useful. For example, right now it's difficult to interpret clusters like n8-13. 

      a) We have revised the UMAP in Figure 3A, and labeled mature, immature, progenitor neurons so that it is easier to follow the developmental trajectory. 

      b) In our revised text we have explicitly drawn equivalence between neuronal clusters from Figure 1 to re-clustered neurons in subsequent figures (Figure 3 and 4 in revised submission). For developmental analysis, we merged mature Gnao1, Gnai2 neuronal subclusters to two major clusters that are equivalent to original neuronal clusters in Figure 1. As UMAP is an arbitrary representation of cells, we also show expression of markers for major neuronal cell types in Figure 1C and Figure 3-figure supplement 1B, helpful in making the connection.  

      c) The purpose of re-clustering with higher resolution was to identify sub-populations within Gnao1 and Gnai1 neurons. It was useful to make sense of mature Gnao1 neurons, where family-C Vmn2r and H2-Mv expression maps onto distinct subclusters. Along with neuronal subclusters in revised Figure 3-figure supplement-1 we include a dot plot of gene expression markers. 

      d) In Figure 3-figure supplement-2, we show a comparison of neuronal clusters with and without VRs. Exclusion of VRs did not substantially alter mature neuron dichotomy into Gnao1/Gnai2. Only Gnao1 subclusters n1/n3 whose organization is dependent on family-C Vmn2r expression were affected, as well as redistribution of subcluster n8 from Gnai2 neurons. VR expression does not seem to be the primary determinant of VSN cluster identity.

      Reviewer #2 (Public Review): 

      Summary: 

      The study focuses on the vomeronasal organ, the peripheral chemosensory organ of the accessory olfactory system, by employing single-cell transcriptomics. The author analyzed the mouse vomeronasal organ, identifying diverse cell types through their unique gene expression patterns. Developmental gene expression analysis revealed that two classes of sensory neurons diverge in their maturation from common progenitors, marked by specific transient and persistent transcription factors. A comparative study between major neuronal subtypes, which differ in their G-protein sensory receptor families and G-protein subunits (Gnai2 and Gnao1, respectively), highlighted a higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons. Moreover, distinct differences in ER content and ultrastructure suggest some intriguing roles of ER in Gnao1-positive vomeronasal neurons. This work is likely to provide useful data for the community and is conceptually novel with the unique role of ER in a subset of vomeronasal neurons. This reviewer has some minor concerns and some suggestions to improve the manuscript. 

      Strengths: 

      (1) The study identified diverse cell types based on unique gene expression patterns, using single-cell transcriptomic. 

      (2) The analysis suggests that two classes of sensory neurons diverge during maturation from common progenitors, characterized by specific transient and persistent transcription factors. 

      (3) A comparative study highlighted differences in Gnai2- and Gnao1-positive sensory neurons. 

      (4) Higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons. 

      (5) Distinct differences in ER content and ultrastructure suggest unique roles of ER in Gnao1-positive vomeronasal neurons. 

      (6) The research provides conceptually novel on the unique role of ER in a subset of vomeronasal neurons, offering valuable insights to the community. 

      Weaknesses: 

      (1) The connection between observations from sc RNA-seq and EM is unclear.

      (2) The lack of quantification for the ER phenotype is a concern. 

      We have extensively quantified the ER phenotype as shown in Figure 7, Figure 7-figure supplement-1 in our revised version. We would like to point out that the connection between scRNA-seq and EM was made due to our observations in the same figures, that levels of a number of ER luminal and ER membrane proteins were higher in Gnao1 compared to Gnai2 neurons. This led us to hypothesize a differential ER content or ultrastructure, which was verified by EM.

      Reviewer #3 (Public Review): 

      Summary: 

      In this manuscript, Devakinandan and colleagues have undertaken a thorough characterization of the cell types of the mouse vomeronasal organ, focusing on the vomeronasal sensory neurons (VSNs). VSNs are known to arise from a common pool of progenitors that differentiate into two distinct populations characterized by the expression of either the G protein subunit Gnao1 or Gnai2. Using single-cell RNA sequencing followed by unsupervised clustering of the transcriptome data, the authors identified three Gnai2+ VSN subtypes and a single Gnao1+ VSN type. To study VSN developmental trajectories, Devakinandan and colleagues took advantage of the constant renewal of the neuronal VSN pool, which allowed them to harvest all maturation states. All neurons were re-clustered and a pseudotime analysis was performed. The analysis revealed the emergence of two pools of Gap43+ clusters from a common lineage, which differentiate into many subclusters of mature Gnao1+ and Gnai2+ VSNs. By comparing the transcriptomes of these two pools of immature VSNs, the authors identified a number of differentially expressed transcription factors in addition to known markers. Next, by comparing the transcriptomes of mature Gnao1+ and Gnai2+ VSNs, the authors report the enrichment of ER-related genes in Gnao1+ VSNs. Using electron microscopy, they found that this enrichment was associated with specific ER morphology in Gnao1+ neurons. Finally, the authors characterized chemosensory receptor expression and coexpression (as well as H2-Mv proteins) in mature VSNs, which recapitulated known patterns. 

      Strengths: 

      The data presented here provide new and interesting perspectives on the distinguishing features between Gnao1+ and Gnai2+ VSNs. These features include newly identified markers, such as transcription factors, as well as an unsuspected ER-related peculiarity in Gnao1+ neurons, consisting of a hypertrophic ER and an enrichment in ER-related genes. In addition, the authors provide a comprehensive picture of specific co-expression patterns of V2R chemoreceptors and H2-Mv genes. 

      Importantly, the authors provide a browser (scVNOexplorer) for anyone to explore the data, including gene expression and co-expression, number and proportion of cells, with a variety of graphical tools (violin plots, feature plots, dot plots, ...). 

      Weaknesses: 

      The study still requires refined analyses of the data and rigorous quantification to support the main claims. 

      The method description for filtering and clustering single-cell RNA-sequencing data is incomplete. The Seurat package has many available pipelines for single-cell RNA-seq analysis, with a significant impact on the output data. How did the authors pre-process and normalize the data? Was the pipeline used with default settings? What batch correction method was applied to the data to mitigate possible sampling or technical effects? Moreover, the authors do not describe how cell and gene filtering was performed. The data in Figure 7-Supplement 3 show that one-sixth of the V1Rs do not express any chemoreceptor, while over a hundred cells express more than one chemoreceptor. Do these cells have unusually high or low numbers of genes or counts? To exclude the possibility of a technical artifact in these observations, the authors should describe how they dealt with putative doublet cells or debris. Surprisingly, some clusters are characterized by the expression of specific chemoreceptors (VRs). Have these been used for clustering? If so, clustering should be repeated after excluding these receptors. 

      The identification of the VSN types should be consistent across the different analyses and validated. The data presented in Figure 1 lists four mature VSN types, whereas the re-clustering of neurons presented in Figure 3 leads to a different subdivision. At present, it remains unclear whether these clusters reflect the biology of the system or are due to over-clustering of the data, and therefore correspond to either noise or arbitrary splitting of continua. Clusters should be merged if they do not correspond to discrete categories of cells, and correspondence should be established between the different clustering analyses. To validate the detected clusters as cell types, markers characteristic of each of these populations can be evaluated by ISH or IHC. 

      There is a lack of quantification of imaging data, which provides little support for the ERrelated main claim. Quantification of co-expression and statistics on labeling intensity or coverage would greatly strengthen the conclusions and the title of the paper. 

      a) scRNA-seq data analysis methods: Our revised submission has expanded on the methods section with details of parameters, filtering criterion and software used.

      b) Inclusion/exclusion of VRs: Figure 3-Figure supplement-2 of our revised submission shows a comparison of neuronal sub-clusters with and without VRs. Overall sub-cluster identities were not affected by VR exclusion, except for Gnao1 sub-clusters n1/n3 -governed by family C Vmn2r1/Vmn2r2 and redistribution of Gnai2 cluster n8. The minimal effect of VRs on Gnai2 sub-clustering can also be confirmed by lack of V1R in the dot plot showing markers of neuronal clusters. 

      c) Neuronal clusters and potential over-clustering: we pooled neuronal cells from Figure-1 and re-clustered to identify sub-populations within Gnao1 and Gnai1 neurons. Several neuronal sub-clusters identified by us including progenitors, immature neurons and mature neurons are validated by previous studies with wellknown markers. Amongst the mature neurons, the biological basis of four Gnao1 neuron sub-clusters (n1-n4) is discussed in our co-expression section (Figure 4AE) and these are also validated by previous experimental studies. These Gnao1 clusters are organized according to the expression of family-C V2Rs (Vmn2r1 or Vmn2r2) as well as H2M_v_ genes. Within Gnai2 sub-clusters, n12 and n13 exclusively express markers that distinguish them from n8-n11 which we have described in our revised version. However, n8-n11 do not have definitive markers and whether these sub-clusters are part of a continuum or over-clustered, will require further extensive experiments and analysis. We prefer to show all subclusters, including Gnai2 sub-clusters, in Figure 3-Figure supplement-1, along with a dot plot of sub-cluster gene expression, so that this data is available for future experiments and analysis.  We share the concern that some Gnai2 sub-clusters may not have an obvious biological basis at this time. Hence in our revised submission, we have merged mature Gnao1 and mature Gnai2 sub-clusters for the developmental analysis shown in Figure 3A. 

      d) Quantification of the ER phenotype: In our revised submission, we provide extensive quantification of the ER phenotype in Figure 7, Figure7-figure supplement-1.   

      e) We think that the cells expressing zero as well as two V1Rs are real and cannot be attributed to debris or doublets for the following reasons:

      i) Cells expressing no V1Rs are not necessarily debris because they express other neuronal markers at the same level as cells that express one or two V1Rs. For instance, Gnai2 expression level across cells expressing 0, 1, 2 V1Rs is the same, which we have included in Figure 4-figure supplement 4-C of our revised submission. Higher expression threshold value used in our analysis may have somewhat increased the proportion of cells with zero V1Rs. Similarly, Gnao1 levels across cells expressing multiple V2Rs and H2-M_v_ per cell stay the same, indicating that these are unlikely to be doublets (Figure 4 I-K). The frequency of each co-expression combination (Supplementary Table 7 and 8) itself is an indication of whether it is represented by a single cell or an artifact.

      ii) Cells co-expressing V1R genes: We listed the frequency of cells co-expressing V1R gene combinations in Supplementary table - 8. Among 134 cells that express two V1Rs, 44 cells express Vmn1r85+Vmn1r86, 21 express Vmn1r184+Vmn1r185, 13 express Vmn1r56+Vmn1r57, 6 express Vmn1r168+Vmn1r177, and so on. Doublets generally are a random combination of two cells. Here, each specific co-expression combination represents multiple cells and is highly unlikely by random chance. Some of the co-expression combinations we reported were also identified and verified experimentally in Lee et al., 2019 and Hills et. al., 2024.  

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the Authors): 

      The editor had a query about the analysis of FPRs, which are a third family of sensory receptors in the rodent VNO. 

      FPRs were found in our study as expressed in subsets of Gnai2 and Gnao1 neurons as well as non-neuronal cells. These can be easily searched in www.scvnoexplorer.com. For instance, Fpr1 and Fpr2 are expressed in immune cell clusters - 2,6,8,10; whereas Fpr-3 is expressed in Gnao1 subcluster n1. Consistent with earlier reports (10.1073/pnas.0904464106, 10.1038/nature08029) expression of Fpr-rs3, Fpr-rs4, Fprrs6, Fpr-rs7 is restricted to Gnai2 neurons, of which Fpr-rs3 and Fpr-rs4 are limited to Tmbim1+ Gnai2 neurons.  

      Reviewer #1 (Recommendations For The Authors):

      (1) The reference to "genders" on page 3 should be changed to "sexes". 

      We have modified the text.   

      (2) Did the authors identify any Ascl1+ GBCs in their data? 

      Ascl1+ GBCs were identified and are now marked in our revised version Figure3-figure supplement 1B.    

      (3) The plots in Figures 1B and 2B say they're depicting gene "Expression", but it looks like the gene expression was z-scored. If so, the authors should describe how the expression was scaled. 

      We have modified the legend title to ‘scaled expression’ and described the basis of scaling in the methods section of our revised version. 

      (4) The main text mentions Figure 2C, but maybe this refers to the right part of Figure 2B?

      Panel 2C was mistakenly not marked in the figure. We have now marked it in revised Figure 2.    

      (5) The authors should attempt to describe the other branch points in the trajectory shown in Figure 3A. If they don't seem biologically plausible, then the authors might want to reconsider using Slingshot for their analyses.

      We do not seek to claim additional branch points within mature Gnao1 or Gnai2 neurons from our analysis. Whether there exist additional branch points leading to subcategories within mature neurons, requires extensive experimental investigation. Hence, in our revised submission, we have merged mature Gnai2 / Gnao1 subclusters for pseudotime developmental analysis and to keep our analysis focused on the single branch point at immature neurons.    

      (6) The most significantly enriched gene in Figure 3B in immature Gnao1+ neurons is Cnpy1, which is also an ER protein. It could also be interesting to look at its expression or speculate on its function in immature neurons. 

      Multiple ER genes were found to be enriched in Gnao1 neurons. We would not be comfortable speculating on the function of individual genes, without a proper study, which is beyond the scope of this manuscript.      

      (7) For figures with pseudo-colored expressions, it would be useful to have color bars. I'm also not sure the pseudocolors are necessary; presenting the data in grayscale or a single color like green might also be sufficient. 

      We used pseudocolor in the IHC images of ER proteins, because there is a wide variation in the fluorescence signal intensity across apical to basal axis for various proteins. In some cases, gray scale images could lead to the false impression that there is no signal in apical Gnai2 neurons, whereas pseudocolor shows low fluorescence level in these neurons. We have added intensity scale bar to the figures in our revision version.  

      (8) For in situ images with two colors it would be more colorblind-friendly to use green and magenta rather than green and red.

      Since no single color palette can help readers with different types of colorblindness, we decided to rely on user’s operating systems that offer rendering of the images to a color palette based on their type of colorblindness. We believe this  would be a better option as described here: https://markusmeister.com/2021/07/26/figure-design-for-colorblindreaders-is-outdated/

      (9) The heatmap in Figure 7E would likely look more accurate without interpolation/aliasing/smoothing. 

      We have not performed smoothening on any of the heatmaps. We have noticed that sometimes heatmaps take time to load in software (such as Adobe Acrobat) leading to the impression of smoothing. Changing the zoom level or reopening the file may fix this.     

      (10) Rather than just citing the literature on the unfolded protein response in the MOE, it could be useful to cite work on the ATF5 expression and the UPR in the VNO (e.g.

      10.1101/239830v1 or 10.12688/f1000research.13659.1).

      We have cited and commented on the ATF5 VNO expression in our discussion. 

      (11) I might try to condense the discussion. Additionally, in the discussion, the section on receptor co-expression comes before that on the VNO ER, so I might consider reorganizing the figures and results to present all of the scRNA-seq analyses (including the receptor co-expression figure) first before the figures on the ER. 

      We welcome this suggestion and have reorganized figures and results such that the scRNA-seq analysis flow is maintained before ER results.   

      Reviewer #2 (Recommendations For The Authors): 

      (1) Upregulation of ER-related mRNAs and expanded ER lumen in Gnao1-positive neurons is interesting, but the connection between these observations is unclear. The authors can strengthen the link by adding immunohistochemistry of representative ER proteins to test if the upregulation of mRNAs related to ER results in increased levels of these proteins in the ER of these neurons.

      Connection between scRNA-seq and EM was made due to our observations that levels of a number of ER luminal and membrane proteins were higher in Gnao1 compared to Gnai2 neurons (Figure 7, Figure 7-figure supplement-1 in our revised submission). This led us to hypothesize a differential ER content or ultrastructure, which was verified by EM. We have also addressed the question of whether upregulation of mRNAs related to ER proteins results in their increased levels (Figure 7-figure supplement-2). In some cases, for example Hspa5 (Bip), mRNA as well as protein levels are upregulated in Gnao1 neurons (see Figure 3A volcano plot, Figure 5-figure supplement-1 RNA-ISH, Figure 7-figure supplement-1 comparison of mRNA levels, Figure 7F immunofluorescence). However, there are other genes in the same figures, for which mRNA levels are not upregulated, yet protein levels are higher in Gnao1 neurons. As mentioned in our text and discussion, upregulated mRNA levels as well as post-transcriptional mechanisms are both likely to play a role in upregulating ER protein levels in Gnao1 neurons.       

      (2) In Figure 3, the authors seemed to exclude cluster 13 from Figure1 in the pseudotime analysis without justification. 

      Cluster13 has markers such as Obp2a, Obp2b, Lcn3. We confirmed via RNA-ISH (Figure 2-figure supplement-1A in our revised submission) that Obp2a maps to cells from glandular tissue on the non-sensory side. Cluster-13 also has cells expressing Vmn1r37, which typically is expressed in neuronal cells. However, we do not see Obp2a mRNA in the sensory epithelium. It is possible that cluster-13 comprises a heterogenous mixture of cells, some of which are non-sensory glandular cells, co-clustered with other cell types as well as the possibility that Obp2a is expressed in neurons, below the detection level of our assay. Further experiments will be required to distinguish between these possibilities. We do not have any possible reason to confidently assign this cluster as a neuronal cell type, hence, it was excluded in the downstream analysis of neurons.

      (3) In Figure 3, the line appears to suggest that Gnao1-positive cells can be progenitors of Gnai2-positive cells. Please clarify. 

      We thank the reviewer for pointing this out. We did not seek to give the impression that Gnao1 cells can be progenitors of Gnai2 cells. This may be due to the placement of dots in the trajectory leading to misinterpretation and the UMAP itself. We have modified the pseudotime trajectory in our revised version to make it more intuitive. 

      (4) Figure 3: Please label pseudotime lineage cluster identities. 

      Cluster identities are now labeled in Figure 3A pseudotime lineage as well as in Figure 3-figure supplement-1 dot plot.     

      (5) Figure 4: Please label the genes used for in situ hybridization in the volcano plot. 

      Genes used for RNA-ISH are labeled (bold font) in the volcano plot in Figure 5A.  

      (6) Figure 4: Please clarify which genes shown in the in situ hybridization figures correspond to which GO terms. 

      We have added supplementary table-10 containing gene ontology terms associated with genes for which RNA-ISH was performed. 

      (7) The EM shown in Figure 5 makes this work unique and intriguing. However, the lack of quantification for the ER phenotype is a concern. For example, does the ER area of a given cell correlate with the relative position of the cells along the apical-basal axis of the vomeronasal organ? What about the ER morphology in the progenitor cells? 

      We show here a quantification of the ER area from the low magnification EM image shown in Figure 8A. The ER area shows an increase going towards the basal side of the cross-section. However, this quantification is complicated by the following factors: a) Processing for EM, results in some shrinkage of the tissue, b) Gnao1 neurons follow an invaginating pattern in cross-sections. Due to these reasons, some Gnao1 neurons could come very close to, and at times lie adjacent to Gnai2 neurons in EM cross-section. Due to a lack of contrast, it is harder to identify the ER within the cell at low mag, especially in the apical zone. The plot shown here does indicate that roughly, the ER area of a cell correlates with its position along the apical-basal axis. In our revised submission, we have quantified the fluorescence intensities of various ER proteins along the apical basal axis from confocal images (Figure 7, Figure 7-figure supplement-1).    

      Author response image 2.

      ROIs (yellow) are manually drawn in the sensory epithelium, wherever possible to identify ER without ambiguity. Area and centroid of ROI are calculated and x coordinates of centroid of each ROI are used to position ER area along the apical-basal axis as shown in the plot below.

      Establishing ER ultrastructure in progenitor or immature cells, as well as unambiguous quantification of ER area in mature neurons, requires identification of these cells in crosssections using fluorescent molecular markers, followed by performing correlative light and electron microscopy (CLEM). This procedure being technically challenging is beyond the scope of our manuscript.      

      Reviewer #3 (Recommendations For The Authors): 

      (1) The main claim is about ER differences between Gnao1+ and Gnai2+ VSN. The ISH, IHC, and EM microscopy images are not quantified and, therefore, poorly support this main claim.

      In our revised submission, we provide extensive quantification of the ER phenotype in Figure 7, Figure7-Figure supplement-1.  Quantification of ER area from EM images is challenging and described above it in our response to reviewer #2 recommendation 7.

      (2) The annotation of VSN subclusters should be more rigorous, consistent throughout the paper (VSN clusters are inconsistent between Figure 1 and Figure 3, and the multiplication of subclusters in Figure 3 is not discussed), and verified (using ISH or IHC) that they reflect discrete, actual cell types. The authors should provide a list of differentiating marker genes for the clusters in Figure 3. At present, it remains unclear whether these clusters are the result of over-clustering of cells (and therefore represent either noise or arbitrary splits of continua) or whether they reflect the biology of the system. Subsequent characterization of these curated VSN subtypes (as done in Figure 4) would add value to the study.

      We pooled neuronal cells from Figure-1 and re-clustered at higher resolution to identify subtypes. Several neuronal sub-clusters identified by us including progenitors, immature neurons and mature neurons are validated by previous studies with well-known markers. Amongst the mature neurons, the biological basis of four Gnao1 neuron sub-clusters (n1n4) is discussed in our analysis and these are also validated by previous experimental studies. These Gnao1 clusters are organized according to the expression of family-C V2Rs (Vmn2r1 or Vmn2r2) as well as H2Mv genes. Within Gnai2 sub-clusters, n12 and n13 exclusively express markers that distinguish them from n8-n11 which we have described in our revised version. However, Gnai2 n8-n11 do not have definitive markers and whether these sub-clusters are part of a continuum or over-clustered, will require further extensive experiments and analysis. We prefer to show all sub-clusters, including Gnai2 sub-clusters, in Figure 3-Figure supplement-1, along with a dot plot of sub-cluster gene expression, so that this data is available for future experiments and analysis. We share the concern that some Gnai2 sub-clusters may not have an obvious biological basis at this time. Hence in our revised submission, we have merged mature Gnao1 and mature Gnai2 sub-clusters for the developmental analysis shown in Figure 3A.

      (3) Some clusters are characterized by the expression of specific chemoreceptors (VRs). Have these been used for clustering? If so, clustering should be repeated after excluding these receptors.

      Figure 3-Figure supplement-2 of our revised submission shows a comparison of neuron clusters with and without VRs. We also describe in the results, specific clusters that are affected by exclusion of VRs.  

      (4) Given the title and the data, the paper should be structured around its main claim (i.e. differential ER environment between VSN types). For example, Figure 7, which deals with the characterization of receptor expression and co-expression in VSNs, is sandwiched between the validation of ER substructure (Figure 6) and the timing of coexpression of ER chaperone genes (Figure 8). The data presented in Figure 7 would fit better if used as a validation of the dataset prior to the investigation presented in the current Figure 4. In addition, we suggest that expression and co-expression diagnostics should be used to filter cells for subsequent analyses.

      We appreciate this suggestion and have reorganized the figures in our revised version.  Our subsequent analysis showing enrichment of ER related genes at RNA, protein level covers all Gnao1 neurons and is not restricted to a specific subset. This is reflected in the ISH and IHC of ER genes. 

      (5) Figure 7-Supplement 3 suggests the presence of co-expressed V1Rs in VSNs. It is unclear from the data presented whether these co-expressing cells are artifactual cell doublets and should be removed from the analysis or whether the expression of the coexpressed receptors reflects a reality. To better address this observation, one may want to see the expression levels of the individual co-expressed V1rs in Figure 7-Supplemet 3 rather than the sum of V1r expression. I am also concerned about the unusually high frequency of "empty" neurons (i.e. without expressed VRs). Could these be debris? 

      We think that the cells expressing zero as well as two V1Rs are real and cannot be attributed to debris or doublets for the following reasons:

      i) Cells expressing no V1Rs are not necessarily debris because they express other neuronal markers at the same level as cells that express one or two V1Rs. For instance Gnai2 expression level across cells expressing 0, 1, 2 V1Rs is the same, which we have included in Figure 4-figure supplement 4-C of our revised submission. Higher expression threshold values used in our analysis may have somewhat increased the proportion of cells with zero V1Rs. Similarly, Gnao1 levels across cells expressing multiple V2Rs and H2-M_v_ per cell stay the same, indicating that these are unlikely to be doublets (Figure 4 I-K). As doublets are formed randomly, the frequency of each co-expression combination (Supplementary Table 7 and 8) itself is an indication of whether it is represented by a single cell or an artifact.

      ii) Cells co-expressing V1R genes: All cells used for co-expression analysis were filtered via an expression threshold (Figure 4-figure supplement 1D), which eliminates cells with low counts of V1R expression. We listed the frequency of cells co-expressing V1R gene combinations in Supplementary table - 8. Among 134 cells that express two V1Rs, 44 cells express Vmn1r85+Vmn1r86, 21 express Vmn1r184+Vmn1r185, 13 express Vmn1r56+Vmn1r57, 6 express Vmn1r168+Vmn1r177, and so on. Doublets generally are a random combination of two cells. Here, each specific co-expression combination represents multiple cells and is highly unlikely by random chance.  iii) Some of the co-expression combinations we reported were identified earlier and verified experimentally in Lee et al., 2019 using FACS based single collection in 96-well plates following the cellseq-2 protocol with very low chance of doublets, and Hills et. al., 2024.  

      (6) The authors use either dot plots or scatter plots to show gene expression in cell clusters. It looks nice, but it is very difficult to deduce population levels of expression from these plots. Could we see the distribution of gene expression across clusters using more quantitative visualizations such as violin or box plots?

      Dot plots are majorly used in our manuscript to show markers of cell clusters in Figure 1, Figure 2 and Figure 3-figure supplement 1. We would like to show at least 5 gene markers for each cluster that are important to identify the cell type. Using violin plot or bar plot for this will make the panel extremely big and overwhelming, especially with 16 clusters in Figure 1 and 13 clusters in Figure 3-figure supplement 1 or make the bars/violin too small to interpret.  Hence, for the sake of simplicity, we used dot plots to give our reader a birds-eye of gene expression differences across clusters. Scatter plots were used when we want to compare the expression levels of genes between male and female samples and show the expression of two genes (VRs) simultaneously in a single cell. This cannot be achieved by Violin/box plot. However, we have made our dataset available at scvnoexplorer.com to explore the expression patterns across cell clusters with different visualization options, including violin or box plots.  

      (7) To investigate whether sex might bias clustering, the authors calculated the Pearson coefficient of gene expression between sexes for each cluster. Given the high coefficient observed across all clusters (although no threshold is used), the authors conclude that there was no bias. While the overall effect may show a strong similarity in gene expression in each cluster between the sexes, this overlooks all the genes that are significantly differentially expressed. It would be worth investigating and discussing these differences. Relatedly, what batch correction method was applied to the data (to mitigate any possible sampling or technical effect)?

      We chose the Pearson coefficient as a representative parameter to show that there is no bias. In addition, we have performed differential expression analysis for each cluster and the results are in supplementary table-1. Except known sexually dimorphic genes, other genes are not differentially expressed significantly with adjusted p-values greater than 0.05. This was also shown by earlier studies using bulk RNAseq (doi.org/10.1371/journal.pgen.1004593, doi.org/10.1186/s12864-017-4364-4). We used depth normalization to integrate samples and described this in the methods section of our revised version.

      (8) We found the method description to be incomplete for the single-cell RNA sequencing analyses. The method section should include a detailed explanation of the code used by the authors to analyze the data. The Seurat package has many available pipelines for single-cell RNA-seq analysis, which have a major impact on the output data. It is therefore imperative to describe which of these pipelines were used and whether the pipeline was run with default settings. 

      Our revised submission has expanded on the methods section with details of parameters, filtering criterion and software used.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study uses a variety of approaches to explore the role of the cerebellum, and in particular Purkinje cells (PCs), in the development of postural control in larval zebrafish. A chemogenetic approach is used to either ablate PCs or disrupt their normal activity and a powerful, high-throughput behavioural tracking system then enables quantitative assessment of swim kinematics. Using this strategy, convincing evidence is presented that PCs are required for normal postural control in the pitch axis. Calcium imaging further shows that PCs encode tilt direction. Evidence is also presented that suggests the role of the cerebellum changes over the course of early development, although this claim is rather less robust in the current version of the paper. Finally, the authors build on their prior work showing that both axial muscles and pectoral fins contribute to "climbs" and show evidence that suggests PCs are required for correct engagement of the fins during this behaviour. Overall, establishing a role for the cerebellum in postural control is not very surprising. However, a clear motivation of this study was to establish a robust experimental platform to investigate the changing role of cerebellar circuits in the development of postural control in the highly experimentally accessible zebrafish larvae, and in this regard, the authors have certainly succeeded.

      Overall, I consider this an excellent paper, with some room for improvement in aspects of presentation, discussion, and some aspects of the data analysis..

      We thank the reviewer for their kind comments and support. In the revision we have addressed their concerns regarding data presentation and analysis. Additionally, we have expanded our introduction and discussion to address questions of presentation.  

      Reviewer #2 (Public Review):

      Summary:

      Franziska Auer et al. investigate the role of cerebellar Purkinje cells in controlling posture in larval zebrafish using the chemogenetic tool TRPV1/capsaicin to bidirectionally manipulate (i.e., activate or ablate) these cells. This tool has been developed for zebrafish previously but has not been applied to Purkinje cells.

      High-throughput behavioral experiments are presented to monitor how body posture is affected by these perturbations. The analysis of postural control focuses on a specific subaspect of posture: the body tilt-angle relative to horizontal just before a swim bout is executed, quantified separately for pre-ascent and pre-dive bouts. They report a broad bimodal distribution of pre-ascent bout posture ranging from -20 to +40 degrees, while the pre-dive bout posture was more Gaussian, ranging between -40 and 0 degrees. The treatment effect is quantified as the change in the median of these distributions.

      Purkinje cell activation and ablation in 7 days post-fertilization (dpf) fish shifted the median of the ascending bout posture distributions to positive values. The authors hypothesize that the stochastic nature of the activation process might desynchronize Purkinje cell activity, thus abolishing Purkinje cells' role in postural control, similar to ablation. However, this does not explain why dive bout posture decreased upon activation but was unaffected by ablation. 

      To test whether the role of Purkinje cells in postural control matures over development, the authors repeated the ablation experiments at 14 dpf. They state that "at 14 dpf, the effects of Purkinje cell lesions on posture were more widespread than at 7 dpf." However, this effect size is comparable to that observed at 7 dpf, suggesting no further maturation of the role of Purkinje cells in pre-ascending bout postural control. The median pre-dive bout posture decreased at 14 dpf, contrasting with no effect at 7 dpf, yet this change was comparable in effect size to the activation effect on Purkinje cells at 7 dpf. The current data breadth may not be sufficient to conclude that signatures of emerging cerebellar control of posture across early development were uncovered.

      The study's exploration of activating Purkinje cells in freely swimming fish using TRPV1/ capsaicin is of special interest, but the practicability of this method is unclear from the current presentation. It would be beneficial to present the distribution of the percentage of activatable Purkinje cells across animals and time points to provide insight into the method's efficiency. Discussing this limitation and potential improvements would aid in evaluating the method, especially since the authors report that the activation experiments were labor-intensive, limiting repeat experiments. This may explain why the activation experiment at 7 dpf is the only data presented with cell activation, with other analyses performed using the cell ablation capabilities of the TRPV1/capsaicin method.

      Another data point at 14dpf would significantly strengthen the conclusions.

      The authors analyze Purkinje cell-controlled fin-trunk coordination by examining ascending bout posture across different swim bout speeds. They make the important finding that pectoral fin movements contribute significant lift for median and fast swim bouts but not for slow ones, and that Purkinje cell ablation disrupts lift generation at all speeds.

      Finally, the authors examined whether Purkinje cell activity encodes postural tilt-angle by performing calcium imaging on 31 cells from 8 fish using their Tilt In Place Microscope (TIPM). They report that they could decode the tilt-angle from individual neurons with a highly tuned response, and also from neurons that were not obviously tuned when pooling them and analyzing the population response. However, due to the non-simultaneous recordings across animals, definitive conclusions about populationlevel encoding should be made cautiously, it might be better to suggest potential population encoding that needs confirmation with more targeted experiments involving simultaneous recordings.

      Strengths:

      - The study introduces a novel application of the chemogenetic tool TRPV1/capsaicin to study cerebellar function in zebrafish.

      - High-throughput behavioral experiments provide detailed analysis of postural control.

      - The further investigation of Purkinje cell-controlled fin-trunk coordination offers new insights into motor control mechanisms.

      - The use of calcium imaging to decode postural tilt-angle from Purkinje cell activity presents interesting preliminary results on neuronal population encoding.

      Weaknesses:

      - The term "disruption" for postural control effects may lead to misleading expectations.

      - The supporting data show only subtle median shifts in postural angle, raising questions about the significance of observed effects. Statistical methods that account for the hierarchical structure of the data might be required to support the conclusions.

      - The study's data breadth may not be sufficient to conclude emerging cerebellar postural control across early development.

      - The current presentation does not adequately detail the practicability and efficiency of the TRPV1/capsaicin method for activating Purkinje cells, and the labor-intensive nature of these experiments constrains the ability to replicate and validate the findings.

      - Non-simultaneous recordings in calcium imaging necessitate cautious interpretation of population-level encoding results.

      We appreciate the reviewer's thoughtful and detailed feedback. In response, we have made several changes to highlight key points in our manuscript. We have adjusted our wording to more accurately reflect the scope of our findings. Finally, we have clarified and expanded the methods used.

      Reviewer #3 (Public Review):

      Summary:

      This paper uses a new chemogenetic tool to investigate the role of cerebellar Purkinje cells in postural control. Using a high-throughput behavioral assay, they show that activation or ablation of Purkinje cells affects various aspects of postural control in zebrafish larvae during spontaneous swimming and that the effects are more pronounced at later developmental time points, where the Purkinje cell number is much greater. Using a sophisticated imaging assay, they record Purkinje cell activity in response to the tilt of the fish and show that some Purkinje cells are tuned to tilt direction and that the direction can even be decoded from untuned neurons.

      Strengths:

      Overall the study is nice, using a range of tools to address a fundamental question about the role of the cerebellum in postural control in fish.

      Weaknesses:

      (1) The data in Figure 1 that establishes the method seems to be based on a very small number of experiments and lacks some statistical analysis.

      (2) The choice and presentation of the statistical and analysis methods used in Figures 2-5 could be improved.

      We thank the reviewer for their comments.  We have added additional statistical analyses for the activation experiments, and improved data presentation .

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Overall I think this is a great paper.

      * Introduction and Discussion.

      The Introduction (and Discussion) do little to explain what is understood about cerebellar control of posture and what major outstanding questions remain. The first paragraph of the Introduction seems to argue that the role of the cerebellum in control of posture is well established and line 24 attempts to motivate the present study by virtue of the fact that terrestrial locomotion is "complex". This might be true but is not necessarily a major obstacle given the suite of powerful approaches available in rodent neuroscience. What are the major challenges that are hard to tackle in rodents and what specific questions can the larval zebrafish help to answer? What about development (which gets no mention at all)? I'm not suggesting a comprehensive review of every aspect of cerebellar physiology, but I think the Introduction should attempt to outline the current hypotheses in a little more detail and highlight what we still need to understand.

      We take the Reviewer’s point that there is more to say in the Introduction. We feel that multi-dimensional limb biomechanics and proprioception are two aspects of terrestrial locomotion that support our use of the word “complexity.” However, we don’t dwell on this point because, as the reviewer correctly states, the suite of tools for rodent neuroscience & behavior is expansive and, in our opinion, not a limiting factor. Instead, we said what we felt we could regarding the potential contribution of the larval zebrafish in the last paragraph of the Discussion. In the revision, we have added details about the development of cerebellum to the introduction (though this, of course, is an expansive topic and well-beyond the scope of the Introduction), highlighted some of the historical limitations in rodent posture analysis, and set up the .

      * Figure 2: 'Arrows denote the shift towards more nose-up postures'. I think the distribution is quite easy to interpret without these arrows; I suggest removing them.

      We have removed the arrows.  

      * IQR is sometimes stated as a single number and sometimes as a range. It should be consistent and unless eLife has guidance to the contrary, I suggest that it be the latter.

      Thank you for pointing that out. We now report it as the value at the 25&75th %ile for all IQRs.  

      * Figure S2: For 14 dpf fish the axes are labelled PC2/3 - is this an error?

      We have changed it to a 3-dimensional plot for both 7 and 14 dpf data to show comparable plots for both ages (now Figure S5 F and G). For the analysis in the 14dpf fish the clearest separation was in the space defined by the 2nd and 3rd principal component.  

      * In the methods, there is insufficient detail given about fluorescent imaging.

      We added additional information to how the fluorescent imaging was performed to the ‘Confocal imaging’ section as well as to the ‘Functional imaging section’

      * Abstract

      In my opinion, the statement "Here, we used a powerful chemogenetic tool (TRPV1/ capsaicin) to *define the role of Purkinje cells*..." is too strong. Whilst the evidence that PCs are required for postural control is certainly strong, what exactly these cells do in the service of postural control is far from clear (as the authors indeed acknowledge in the Discussion). As such, I wouldn't say their role has been "defined".

      We change the word to “describe” to better reflect our findings

      * aldoca transgenic.

      This appears to be a beautiful transgenic line but the data showing the extent of its expression and evidence that in the cerebellum it exclusively labels PCs isn't clear enough.

      (i) Ideally Figure 1A would show an image of a whole animal to provide an overview of transgene expression but instead it seems to be (the legend is unclear) a cartoon with a confocal projection of part of the brain overlaid.

      We have updated the figure legend to be clearer that we show a cartoon of a larval zebrafish with the confocal image overlaid. The aldoca promotor has been previously described and exclusively labels Purkinje cells (10.1523/JNEUROSCI.3352-10.2010)

      (ii) Figure 1B shows expression in the cerebellum, but how are we to understand that all the labelled cells are PCs? Are all PCs labelled, or only a subset? Perhaps a double labelling with a PC in situ marker could be done to demonstrate colocalisation?

      As above, the aldoca promotor has been previously described; to the best of our knowledge in the Hibi lab’s hands (and ours) it labels Purkinje cells exclusively, and it labels all of them (10.1523/JNEUROSCI.3352-10.2010)  

      * Chemogenetic validation.

      Overall, the chemogenetic approach to abrogate PC function looks to be very powerful. The authors state in several places that a contribution of this paper is in its "establishing the validity of TRPV1/capsaicin-mediated perturbations". However, the data in Figure 1, along with various comments in other parts of the paper raise some questions:

      (i) For experiments depolarising PCs with 1µM CSn, the same size is tiny: Two transgenic animals and one control. Moreover, it is stated 'in one fish ... we observed a small number of neurons at the 9h timepoint with bright, speckled fluorescence suggestive of cell death". Was this one out of two transgenics?! In the discussion, I didn't understand the statement "ensure adequate brightness levels *to achieve sufficient depolarization without excitotoxicity*". Does this "excitotoxicity" relate to the specked fluorescence observation?

      Overall, the very small sample size and comments about excitotoxicity and cell death raise concerns about the approach that I think warrant clearer treatment in the results (including information about the assessment of transgene expression, % embryos judged to have suitable expression), especially as this paper is seeking to establish the validity of the method.

      We note first that the method has been previously validated (https://doi.org/10.1038/ nmeth.3691) and that we build on this work. For the experiment described, the point was to identify an acceptable duration for exposure. To that end, we analyzed 6 animals for up to 6h (including the washout experiments in Figure S1B) where we never observed any speckled fluorescence; we limited our behavioral experiments to 6h accordingly. We thought it would be worth including the observation of speckled fluorescence at 9h timepoint for future reference. To directly address the comment we have increased the number of analyzed cells and fish for the 1uM capsaicin experiments and added statistical analysis (lines 65-67).

      When screening for transgene expression we selected for fish that had clearly visible expression, but that did not look overly bright, and used the same criteria when screening fish for the GCaMP imaging and for behavior. Around a quarter of the fish that had aldoca:TRPV1-tagRFP expression had a usable expression level for the activation experiment. We have added this information to the Results (line 62) and Methods (line 369-372)

      (ii) The authors note "capsaicin could sporadically activate subsets of Purkinje cells" and further speculate about PC activity and synchrony in the discussion. Figure 1 seems to rely on single images at widely spaced time points but given that they are set up to do 2-photon calcium imaging, why didn't they collect continuous time series data and analyse the temporal patterns of activity across the transgenic PC population?

      We have added time series data for calcium imaging after 1uM of Capsaicin in TRPV1-  and TRPV1+ cells to Supplementary Figure S1A. Here too we see sporadic increases in calcium levels at similar rates: 0% for TRPV1- and 15-19% for TRPV1+ (see also Figure S1 legend)

      (iii) The axonopathy and cell death resulting from 10 µM Csn is quite dramatic.

      However, here the authors do not appear to have included a TRPV1 negative control (although oddly they did for 1 µM treatment) so it is currently unclear whether or not a high conc of Csn alone might be cytotoxic.

      Chen et al (https://doi.org/10.1038/nmeth.3691) have established the TRPV1/capsaicin method in zebrafish with broad neuronal label and did not see any effect with high doses of capsaicin in TRPV1 negative fish.  

      * Behavioural assessment - stats

      Overall, the disruption of postural stability after PC manipulations is convincing.

      However, I have a few queries about the statistics:

      (i) In this section, the statistical unit was not clear. The tables, which are otherwise very useful, give no indication of N. The legend text does report "8 repeats/149 control fish" and "across experimental repeats" suggesting the statistical unit might be the repeats rather than animals, but this should be clarified. In Figure 2G, individual data points should be plotted if N=8, or a representation of the distribution (eg violin or box and whisker plots) if N = 149.

      We apologize for the confusion. Given the variable numbers of bouts, a single experimental repeat does not allow for an accurate estimate of expected value. Below we simulated how accurately the median can be estimated based on increasing sample sizes (Author response image 1). Given that large numbers of bouts are necessary to accurately estimate the median we pool the data for all experiments and use resampling statistics to estimate bias in our estimate.

      Author response image 1.

      Median estimation based on increasing sample size

      (ii) Related to the above, I hope it might be easier to interpret the unexpected change in climb posture in ablation controls once the data for individual repeats is shown.

      When we analyze the data as single repeats we see considerable variability between different repeats due to undersampling. We tested the medians for the single repeats for outliers to ensure that the shift is not due to a single repeat skewing the distribution. We did not detect any outliers in the pre-lesion control or in the post-lesion control group. (Outliers were determined as deviating more than 3 times the scaled median absolute deviation (MAD) from the median. A scaling factor of 1.4826 was used to ensure that MAD-based outlier detection is consistent with other methods like Z-scores.) We added this information to line 133-134 and the method section under Statistics. 

      (iii) In some parts of this section, including the Tables, the authors report the 95% CI of the median, rather than IQR. In this case, they should report the z-value used for 95% CI estimation.

      As we are using resampling to estimate the 95% confidence interval of the median there is no z-value as in a traditional normal distribution based confidence interval; Instead, we explicitly define the 2.5th and 97.5th percentiles from the bootstrapped sample distribution, which captures the middle 95% of the data, representing the 95% confidence interval.

      * It is stated that "fish adopted more nose-up postures before *and throughout* climb bouts". Figure 2F seems to show posture before the climb, but where is the "throughout" data? It would be useful if Figure 2E, J could be extended to make a bit clearer these two phases of postural assessment.

      We removed the phrase ‘throughout climb bouts’ as we are not showing the posture throughout the bout and to avoid over complicating the interpretation.  

      * Why were PCs not activated at 14 dpf (eg using 1 µM Csn)?

      Due to shifts in priorities the first author will not be continuing this series of experiments, and so this additional experiment will have to wait for someone to pick up this line of inquiry

      * The authors appear to claim that the difference in phenotype in 7 versus 14 dpf animals following high conc Csn treatment is indicative of a changing role for cerebellar PCs over this developmental period. For instance, in reference to the 14 dpf ablation phenotype, the authors write "reveals the functional emergence of Purkinje cell control of dives" and in the abstract they talk about "emerging control of posture across early development". However, can they rule out that the phenotypic differences might instead reflect differential sensitivity of the relevant PC (sub)populations to CSn at the two ages? If this caveat cannot be discounted then I suggest it is acknowledged e.g. in the discussion.

      As previously established, all Purkinje cells are labeled in the aldoca line (10.1523/ JNEUROSCI.3352-10.2010). Fluorescence is brighter at 14dpf compared to 7dpf, suggesting higher levels of TRPV1. We therefore assume that at 14 dpf, the high concentration of Csn is sufficient to ablate Purkinje cells. At 14 dpf, cerebellar damage is visible under a standard dissecting microscope.The preponderance of evidence therefore speaks against a previously undiscovered subpopulation of TRPV1expressing Purkinje cells that are, by mechanisms yet unknown, resistant to high doses of capsaicin. 

      * Fin-body "coordination"

      The ideas and data around fin-body coordination are very intriguing.

      (i) The statement "fin engagement is speed-dependent" would benefit from a stats test to show this is indeed significant. The data in Figure 4B suggest a rather high degree of variance.

      This is an important point; we appreciate the Reviewer’s attention. We have added statistics to show this is speed dependent to line 167-169 and show the corresponding plot in the supplement in Figure S4.  "Here, we observed that fin engagement is speeddependent, with faster bouts producing greater lift for a given axial rotation (Spearman correlation coefficient: control 0.2193; 10uM capsaicin: 0.0397; Z-test after ztransformation: p < 0.001)  

      (ii) The statement "After capsaicin exposure, the slopes of the medium fast speed bins were significantly lower (Figure 4C), reflecting *a loss of speed-dependent modulation*" is not convincing. The slope is likely a function of both speed and Csn treatment, and the comparisons in Figure 4C appear to be testing the latter, not the former.

      We understand the reviewer’s point. However, the slope for the slow bouts remains unchanged. We therefore conclude that the reduction in fin-body slope is speed dependent and not a speed independent reduction of slope overall. 

      We have made this more clear by adding Supplementary Figure S4 and changing the text in line 177-179. 

      (iii) I'd like to understand more about the phenotype of the fin-amputated animals. Were any "bout" parameters changed? Did the animals still attempt climbs and was the distribution of the upward rotation parameter similar to controls? The text states "the slope of the relationship between upward rotation and lift was indistinguishable from zero" but the stats reported in the text are comparisons between groups while Table 5 shows 95% CIs that don't span zero. Some clarification would be useful here.

      We appreciate the Reviewer’s interest. We’ve studied climbing in fin-amputated animals at length here: https://doi.org/10.7554/eLife.45839 and here: https://doi.org/10.1016/ j.celrep.2023.112573 and have added these references in line 183.

      (iv) The authors repeatedly refer to fin-body *coordination* but it is not clear whether the loss of lift after PC ablation is a result of an explicit coordination defect (i.e. changes in the relative timing and/or kinematics between fins and axial motion components), versus a simple reduction in pectoral fin engagement. Either result could be interesting, but this should be clarified.

      Thank you for pointing that out. In the fastest speed bin, we observed an increase in upward rotation and a decrease in average fin lift. In contrast, the medium speed bin showed no significant changes in average fin lift or upward rotation (see Author response image 2 and Tables 4 and 5), yet already displayed coordination deficits. Based on these observations, we argue that Purkinje cell lesions primarily affect coordination, rather than simply reducing one specific parameter such as lift or rotation (line 293-298).

      We have added fin lift and rotation values from Author response image 2 for all speed bins to tables 4 and 5.  

      Author response image 2.

      Fin lift and rotation for slow, medium and fast bouts

      * PC activity and decoding of pitch direction.

      The clever TIPM method is used to collect calcium data that convincingly shows that individual PCs can encode pitch-tilt direction. However, a population of "not tuned" cells are also identified, and here I found the analysis of their responses and the argument that they encode pitch direction at a population level difficult to follow.

      (i) First, although the naming of the cells implies that individual neurons do not encode pitch direction, I did not find this convincing. Figures 5F/G suggest that several "not tuned" cells in fact show quite consistent differences in activity across trial types and indeed in terms of their average responses sit as far from the unity line as do several "tuned" cells.

      The Reviewer’s comment helped us clarify some key points. First, tuned and untuned cells were categorized based on a Directionality Index threshold of 0.35; some cells might look similar in 5F/G but the highly variable responses of Purkinje cells have highly variable response so overall there was no consistent tuning. We have clarified this in the text in line 203-207 Below we have plotted the Up versus Down responses for the 10 least tuned cells (sorted by directionality index). While some cells have higher responses on average to one direction we think that the variability makes it difficult to support a claim for “tuning.” We have also tested the support vector machine on the least tuned cells to confirm that the chosen cutoff for tuned/untuned is not affecting our claim that untuned cells can encode position.(see also Author response image 4)

      Author response image 3.

      Trial-by-trial variability

      (ii) It is therefore not very surprising that PCA (and the SVM decoder) distinguishes trial type. I would guess that PCA assigns the largest weights to these most tuned of the "not tuned" cells, and the 3-5 cell decoders do well when these cells happen to be sampled.

      Author response image 4.

      Decoding accuracy of the 3/5/7 least tuned cells

      This was an interesting idea. To rule out that it is only the most tuned cells that contain the information, we tested the decoder on the 3/5/7 least tuned cells; here too, 5 and more cells are better able to accurately decode the direction. We have add the decoding accuracy to the text in line 221-224

      (iii) As I understand the analysis, Figure 5G shows responses for "not tuned" cells over 21 trials (of each type) but these are not the same trials for the different cells? How then is population coding being assessed?

      We have updated the text and refer to this data as a “pseudo-population” in lines 216 and 218 for all experiments where we combined cells from different fish. For technical reasons, when we perform TIPM at eccentric angles we must use sparsely labelled fish to ensure that we can find the same cells over a 60 degree range. We have repeated our analyses for TIPM centered at the horizon, where we can record from entire populations from a single fish.  

      (iv) Furthermore, Figure S2 shows a somewhat different analysis with decoding accuracy measured on a fish-by-fish basis. In this case, are these decoders for simultaneously imaged neurons? Is this a cross-validated measure of decoding accuracy?

      Yes, as above, Figure S4 (former S2) looks at fish-by-fish basis of simultaneous recorded neurons. Yes, it was 5-fold cross validated. We have updated the text in line 490-494.

      Reviewer #2 (Recommendations For The Authors):

      - Postural control involves various aspects such as balance, coordination, relative body part orientations, and stability. Discussing these and presenting in this context the specific subaspect characterized in this study would help clarify which aspect of postural control the work focuses on.

      The Reviewer makes an interesting point, but we think their description of what constitutes postural control is overly broad. Specifically, control of “relative body part orientations in space” by definition requires coordination, and subserves balance and stability. We acknowledge, of course, that different aspects can be and often are treated independently. While interesting, a full treatment of what comprises “postural control” is beyond the scope of the paper, as it would require reconciling the terms across taxa, effectors, environments and well over a century of experiments.

      We contend that posture — particularly underwater — is best defined as the relative orientation of body parts in space. For fish, those parts consist of predominantly axial muscles and secondarily fins. We present these definitions in the Introduction and thank the Reviewer for encouraging us to more clearly shape our findings.

      - Disruption of posture or postural control: The use of the word "disruption" could lead to misleading expectations. While it may not be incorrect, it suggests a significant loss of equilibrium, an obvious increase in postural variability, or at least a noticeable effect when observing an individual animal's behavior. However, the supporting data show only a subtle median shift in postural angle within a very broad distribution averaged over many individuals. This effect was only significant when comparing fish with a control group, not when comparing fish posture before and after the treatment.

      Replacing "disruption" with "modification" would be more cautious.

      We take the Reviewer’s point and have adjusted our wording to "modifies postural control.” In lines 137, 266, and 283

      - Statistical significance: Consider aligning the asterisk notation with conventional standards (e.g., * for p < 0.05, ** for p < 0.01, *** for p < 0.001) to enhance clarity for readers. On the other hand, the individual measurements might not be independent (e.g., measurements from the same fish, or the same tank are likely to be correlated), so using the Wilcoxon rank-sum test (Mann-Whitney U test) on pooled data might lead to incorrect conclusions. Methods that account for the hierarchical structure of the data might be required to support the conclusions.

      We take the Reviewer’s point about the importance of conventions, however we have never found “more stars = more significant” to be all that helpful in evaluating claims. Instead, we’ve opted to have both a significance and effect size criteria; a “star” here reflects our considered confidence in the difference we observe. 

      We agree that the hierarchical nature of pooled data is worth considering/presenting.

      We performed a two-way analysis of variance (ANOVA) on the interquartile ranges (IQRs) of the single experimental repeats for the 7 days post-fertilization (dpf) activation, 7dpf lesion, and 14dpf lesion experiments. The ANOVA revealed no significant main effects, supporting the strategy of pooling experimental repeats to estimate distributions.

      The results of the ANOVA, along with the IQRs for all experimental repeats, are presented in Tables 6-11. We have also clarified this in the methods section in lines 505-509.

      - Data representation: All data of postural angles should be represented in the form of violin plots to show the underlying distributions of the postural angles, especially given that the effect size is small relative to the dispersion of the distribution of the postural angle and that this distribution is also not Gaussian but bimodal, and different before and after the treatments.

      We take the Reviewer’s point that seeing the full distribution can be useful. We have added plots of the raw distributions for the data in Figure 3 as supplemental Figure S3.

      - Showing the distributions will provide the necessary information for the reader to evaluate the importance of the effect. For all data shown in Table 1, the distributions should be presented in the supplementary information.

      As requested, we have added the distributions of the data in Table 1 to the supplement (Figure S2)

      - Roll posture: A statement about whether roll posture is perturbed by Purkinje cell manipulation would be a piece of important additional information helping to understand how strong the 'disruption' of posture is.

      We haven’t assessed roll posture, as this is not practical in the current version of the SAMPL apparatus. We have added this limitation to the results (line 116) but also note that as our manipulations are bilateral, we don’t anticipate any systematic changes to roll.   

      - Comparison with other methods: Add a discussion on how the TRPV1/capsaicin method compares with other methods, such as using nitroreductase (Ntr) for targeted pharmaco-genetic ablation of cells by treatment with metronidazole or the the possibility to to ablate Purkinje cells by KillerRed as the author lab has done previously. Both methods have been applied to ablate Purkinje cells in larval zebrafish. What are the advantages of the TRPV1 method compared to these when neglecting the activation possibility?

      Thank you for that suggestion, we have added a section to the discussion where we compare the TRPV1/capsaicin lesion to other lesion methods (lines 334-336)

      - Describe the decoding algorithm: The decoding algorithm used could be described more in detail in the methods section.

      We have described the decoding algorithm in more detail in the methods under ‘Functional GCaMP imaging in Purkinje cells.’ Line 488+ 

      We used a support vector machine (SVM) with a linear kernel. The SVM model was trained using k-fold cross-validation, which splits the data into k subsets (folds). At each iteration, the model was trained on k-1 folds and tested on the remaining fold, ensuring that the model performance was evaluated on unseen data in each fold. Permutations were performed on randomized trial identity as a null hypothesis (5-fold cross-validation; 100 shuffles for randomization). Accuracy was calculated as 1 minus the classification loss.  

      - Availability of code: The link to the data and code repository is not working.

      Thank you for pointing that out, we have fixed it now. In the lower right of the page you can see the history of all changes to the repository, including the entry on 2023-09-08 where the corresponding author set it to “public.” When we checked thanks to your comment, it had been set to “private,” without any record of when/why. We have reset it 2024-10-17. We will continue to check it periodically in the future and apologize in advance if it is unavailable; this is the first time we’ve seen that happen.

      - Electrophysiological Control: Including an electrophysiological characterization of the activation of Purkinje cells by the TRPV1/capsaicin would significantly strengthen the validity of the method.

      We take the Reviewer’s point that electrophysiological characterization is a way to strengthen the validity of the method. However, Chen et al (h"ps://doi.org/10.1038/ nmeth.3691) have performed electrophysiology during neuronal activation and concluded that TRPV1 activation with capsaicin indeed increases neuronal activity and firing rates increased. Our calcium imaging and lesion experiments amply demonstrate that Purkinje cells are sensitive to TRPV1-mediated currents. We therefore do not believe that the additional information gained by arduous electrophysiological evaluation is merited here.

      - Describe more in detail how climb and dive bouts are defined. The height difference between consecutive bouts measured 250ms before the bout of executions.

      Climb and Dive bouts are split by the angle of their trajectory. If the fish moves up (i.e. trajectory larger 0) it is considered a climb bout and vice versa for dive bouts. 250ms prior to the maximum speed is roughly the time the fish initiate a bout, so the pre-bout posture is measured when at this point. The time-courses of bouts are dissected extensively in Zhu et. al. 2023. We have added a definition for climb and dive bouts to the method section under ‘Behavior analysis’ line 453 and 454.  

      - Figure 1H: Why can't you ablate all Purkinje cells but only about 80%?

      This is an excellent question. We opted for an extremely conservative count, and included everything that was still resembling a cell, even if it might not be functional/ already dying. Our counts are therefore likely an underestimate of the percentage of cells that were lost. We have added this point to the text in lines 393 395

      - Figure 2C: The method is not fully clear. At 8dpf 0.1uM capsaicin is added to the chamber. At what time after the application of capsaicin did the behavioral recording start?

      We recorded after about 10-15min after adding the 1uM Csn to the chambers. The fish were fed after the 6h in capsaicin. We have added this information to the method section line 404 - 408.

      - Figure 2F: What indicates the shown confidence interval? Also median with a 95% confidence interval calculated over the experiments in parallel?

      The distributions shown in Figure 2F take data from all experiments pooled. We use resampling methods to determine the variability in our estimates. The distribution plots are showing the median and the 25th and 75th percentile of the resampled distribution. We have added this information to the figure legends.

      - Figure 3: Subtitles on panel D and E indicating <climb bout posture> and would facilitate reading.

      We have added the subtitles to those panels.

      - Figure 4: Describe in the methods how recordings from individual fish were mapped onto each other to superimpose the Purkinje cell locations recorded from the 8 fish.

      We have added the respective section to the methods: Line 481 - 483

      “To map the anatomical locations of the recorded cells, we imaged overview stacks for each fish. These stacks were manually aligned in Illustrator, and the cells included in the analysis were reidentified and color-coded according to their tuning properties.”

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      (1) Lines 74-81. The data presented here and in later experiments to argue for an effect of capsaicin on neural activity lacks statistical rigor because of the apparently very small numbers of animals/cells assessed. For example, the control appears to involve 4 cells assessed from 1 animal, and the experimental group is just 2 animals. Given that the interpretation of the paper depends upon this result, it is worthwhile to show the result more clearly, and with some statistical analysis. They argue in the discussion that "Our imaging assay established that 1 µM of capsaicin would stochastically activate subsets of Purkinje cells" which seems a stretch from the data as presented.

      We appreciate this point, which was shared by Reviewer 1. We have added more data and performed statistical analysis (line 63 - 67 as well as Figure S1A)

      (2) I found the practice of sorting effects by a mixture of effect size and p-value to be a little arbitrary, although in this case, it seems likely that it identified the most relevant effects. I would have preferred to see some attempt to correct for multiple comparisons (e.g. by resampling with the identities of fish shuffled to estimate the distribution of each measurement for this population size), followed by filtering for effect size after establishing a corrected threshold for significance.

      We take the Reviewer’s point, though we note that critical values for effect size and pvalue are inevitably “a little arbitrary.” We can’t do the exact analysis the Reviewer suggests as we do not measure data from individual fish for these experiments. However, we did calculate new critical p-values (added to the Tables) that account for multiple comparisons using Šidák’s method.

      (3) Figure 4. The data here is a little strange in that the slope in the control condition for medium speed is given as much larger than for slow, but the data in the two cases appears largely overlapping for most of the range of behavior, only diverging for the most extreme rotations. It seems perhaps that the measurement of slope is strongly dependent on these most extreme values. The authors might want to consider the use of robust regression methods which might mitigate these effects.

      This is an interesting observation and we appreciate the Reviewer’s thoughtful suggestion. We now use a robust regression method (bisquare weighting of residuals).

      We have adjusted all values in lines 175 - 177  and added the regression method to the Methods section line 520.

      (4) Figure 5. The 'principal component analysis' description is extremely unclear. The text says that PCA 'showed near-complete segregation of trial types' but it is not explained how this was achieved with PCA or how this was quantified. Figure panels show the data plotted using different pairs of PCs showing visual evidence of segregation. In the methods, it is stated that "We performed principal component analysis" and that "cells were used for principal component analysis and subsequent support vector machine decoding analysis". What is meant exactly by 'performed PCA'? Was PCA used in a dimensionality reduction step? And if so, how many and which PCs were chosen and why? For visualization of the separation, the authors show arbitrary pairs of PCs. Could it be better to use a method more suited to that purpose such as linear discriminant analysis?

      PCA was used to define a subspace to qualitatively evaluate if different trials could be separated. Once it became clear that it could, we next trained a binary decoder on the complete dataset (i.e. no dimensionality reduction). We did not perform linear discriminant analysis as the unsupervised PCA already showed separation of trial types.  We have made this clearer in lines 212 - 214.

      (5) Why does the decoding analysis use only untuned cells? Isn't it equally, or more, interesting to know how well tilt can be encoded using all cells? It is unclear to me what we learn by selecting only untuned cells for this analysis (although I agree it is interesting that this does work).

      We focused exclusively on untuned cells because including even a single highly tuned cell for the population coding will lead to excellent results. By using untuned cells we test if there is some directionality information that is not visible just by looking at the up/ down responses of single cells. We have made this clear in lines 217 - 218

      Minor points and corrections:

      (1) Maybe consider losing the words 'powerful' (I think it is overused and not well defined) and 'reagent'. Reagent is normally used for something that participates in a reaction. It is a bit odd to use it to refer to a transgenic animal. Later it is called a 'tool' which seems better.

      We have changed the wording and refer to it as tool for the whole paper.  

      (2) Figure 1D. Please use a color bar to indicate the scale.

      We have added a color scale to the panel

      (3) Saying that 'posture' increases is confusing, although the meaning can be inferred from the overall context and the definitions in the Methods - could Posture be capitalized to indicate a specific definition is being used rather than the general meaning?

      This suggestion agrees with those made by Reviewer 2. We have changed the wording to “postural angle.” 

      (4) The arrowheads in Figure 2FHK are unnecessary and confusing (why are some horizontal and some vertical?).

      Thank you for that suggestion, we have removed the arrowheads.

      (5) Figure 3 The legend should indicate that the image is shown with an inverted lookup table.

      We have updated the legend

      (6) Figure 3 D and E Titles would be helpful, so it is not necessary to refer to the legend to understand the difference.

      We have added titles to the figure panels

      (7) The dwell time for the 2-photon experiments is given in the manuscript, but I think the authors meant microseconds?

      Thank you for pointing that out. We have corrected it to microseconds.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We performed multiple new experiments and analyses in response to the reviewers concerns, and incorporated the results of these analyses in the main text, and in multiple substantially revised or new figures. Before embarking on a point-by-point reply to the reviewers’ concerns, we here briefly summarize our most important revisions.

      First, we addressed a concern shared by Reviewers #1-3 about a lack of information about our DNA sequences. To this end, we redesigned multiple figures (Figures 3, 4, 5, S8, S9, S10, S11, and S12) to include the DNA sequences of each tested promoter, the specific mutations that occurred in it, the resulting changes in position-weight-matrix (PWM) scores, and the spacing between promoter motifs. Second, Reviewers #1 and #2 raised concerns about a lack of validation of our computational predictions and the resulting incompleteness of the manuscript. To address this issue, we engineered 27 reporter constructs harboring specific mutations, and experimentally validated our computational predictions with them. Third, we expanded our analysis to study how a more complete repertoire of other sigma 70 promoter motifs such as the UP-element and the extended -10 / TGn motif affects gene expression driven by the promoters we study. Fourth, we addressed concerns by Reviewer #3 about the role of the Histone-like nucleoid-structuring protein (H-NS) in promoter emergence and evolution. We did this by performing both experiments and computational analyses, which are now shown in the newly added Figure 5. Fifth, to satisfy Reviewer #3’s concerns about missing details in the Discussion, we have rewritten this section, adding additional details and references. 

      We next describe these and many other changes in a point-by-point reply to each reviewer’s comments. In addition, we append a detailed list of changes to each section and figure to the end of this document.

      Reviewer #1 (Public Review):

      Summary:

      This study by Fuqua et al. studies the emergence of sigma70 promoters in bacterial genomes. While there have been several studies to explore how mutations lead to promoter activity, this is the first to explore this phenomenon in a wide variety of backgrounds, which notably contain a diverse assortment of local sigma70 motifs in variable configurations. By exploring how mutations affect promoter activity in such diverse backgrounds, they are able to identify a variety of anecdotal examples of gain/loss of promoter activity and propose several mechanisms for how these mutations interact within the local motif landscape. Ultimately, they show how different sequences have different probabilities of gaining/losing promoter activity and may do so through a variety of mechanisms.

      We thank Reviewer #1 for taking the time to read and provide critical feedback on our manuscript. Their summary is fundamentally correct.

      Major strengths and weaknesses of the methods and results:

      This study uses Sort-Seq to characterize promoter activity, which has been adopted by multiple groups and shown to be robust. Furthermore, they use a slightly altered protocol that allows measurements of bi-directional promoter activity. This combined with their pooling strategy allows them to characterize expressions of many different backgrounds in both directions in extremely high throughput which is impressive! A second key approach this study relies on is the identification of promoter motifs using position weight matrices (PWMs). While these methods are prone to false positives, the authors implement a systematic approach which is standard in the field. However, drawing these types of binary definitions (is this a motif? yes/no) should always come with the caveat that gene expression is a quantitative trait that we oversimplify when drawing boundaries.

      The point is well-taken. To clarify this and other issues, we have added a section on the limitations of our work to the Discussion. Within this section we include the following sentences (lines 675-680):

      “Additionally, future studies will be necessary to address the limitations of our own work. First, we use binary thresholding to determine i) the presence or absence of a motif, ii) whether a sequence has promoter activity or not, and iii) whether a part of a sequence is a hotspot or not. While chosen systematically, the thresholds we use for these decisions may cause us to miss subtle but important aspects of promoter evolution and emergence.”

      Their approach to randomly mutagenizing promoters allowed them to find many anecdotal examples of different types of evolutions that may occur to increase or decrease promoter activity. However, the lack of validation of these phenomena in more controlled backgrounds may require us to further scrutinize their results. That is, their explanations for why certain mutations lead or obviate promoter activity may be due to interactions with other elements in the 'messy' backgrounds, rather than what is proposed.

      Thank you for raising this important point. To address it, we have conducted extensive new validation experiments for the newest version of this manuscript. For the “anecdotal” examples you described, we created 27 reporter constructs harboring the precise mutation that leads to the loss or gain of gene expression, and validated its ability to drive gene expression. The results from these experiments are in Figures 3, 4, 5, and Supplemental Figures S8-S11, and are labeled with a ′ (prime) symbol.

      These experiments not only confirm the increases and decreases in fluorescence that our analysis had predicted. They also demonstrate, with the exception of two (out of 27) falsepositive discoveries, that background mutations do not confound our analysis. We mention these two exceptions (lines 364-367):

      “In two of these hotspots, our validation experiments revealed no substantial difference in gene expression as a result of the hotspot mutation (Fig S8F′ and Fig S8J′). In both of these false positives, new -10 boxes emerge in locations without an upstream -35 box.”

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The authors express a key finding that the specific landscape of promoter motifs in a sequence affects the likelihood that local mutations create or destroy regulatory elements. The authors have described many examples, including several that are non-obvious, and show convincingly that different sequence backgrounds have different probabilities for gaining or losing promoter activity. While this overarching conclusion is supported by the manuscript, the proposed mechanisms for explaining changes in promoter activity are not sufficiently validated to be taken for absolute truth. There is not sufficient description of the strength of emergent promoter motifs or their specific spacings from existing motifs within the sequence. Furthermore, they do not define a systematic process by which mutations are assigned to different categories (e.g. box shifting, tandem motifs, etc.) which may imply that the specific examples are assigned based on which is most convenient for the narrative.

      To summarize, Reviewer #1 criticizes the following three aspects of our work in this comment. 1) The mechanisms we proposed are not sufficiently validated. 2) The description of motifs, spacing, and PWM scores are not shown. 3) How mutations are classified into different categories (i.e. box-shifting, tandem motifs, etc.) is not systematically defined. 

      These are all valid criticisms. In response, we performed an extensive set of follow-up experiments and analyses, and redesigned the majority of the figures. Here is a more detailed response to each criticism:

      (1) Proposed mechanisms for explaining changes in promoter activity are not sufficiently validated. We engineered 27 reporter constructs harboring the specific mutations in the parents that we had predicted to change promoter activity. For each, we compared their fluorescence levels with their wild-type counterpart. The results from these experiments are in Figures 3 and 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12, and are labeled with a ′ (prime) symbol.

      (2) No sufficient description of the strength of emergent promoter motifs or their specific spacings. We redesigned the figures to include the DNA sequences of the parent sequences, as well as the degenerate consensus sequences for each mutation. We additionally now highlight the specific motif sequences, their respective PWM scores, and by how much the score changes upon mutation. Finally, we annotated the spacing of motifs. These changes are in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12.

      We note that in many cases, high-scoring PWM hits for the same motif can overlap (i.e. two -10 motifs or two -35 motifs overlap). Additionally, the proximity of a -35 and -10 box does not guarantee that the two boxes are interacting. Together, these two facts can result in an ambiguity of the spacer size between two boxes. To avoid any reporting bias, we thus often report spacer sizes as a range (see Figure panels 4F, S8D, S8F-L, S9A, S9H, S10A, and S10E). The smallest spacer we annotate is in Figure 4F with 10 bp, and the largest is in Figure S8D with 26 bp. Any more “extreme” distances are not annotated and for the reader to decide if an interaction is present or not.

      (3) No systematic process by which mutations are assigned to different categories such as box shifting, tandem motifs, etc. We opted to reformulate these categories completely, because the phenotypic effects of a previously mentioned “tandem motif” was actually a byproduct of H-NS repression (see the newly added Figure S12). 

      We also agree that the categories were ambiguous. We now introduce two terms: homo-gain and hetero-gain of -10 and -35 boxes. The manuscript now clearly defines these terms, and the relevant passage now reads as follows (lines 430-435): 

      “We found that these mutations frequently create new boxes overlapping those we had identified as part of a promoter

      (Fig S9). This occurs when mutations create a -10 box overlapping a -10 box, a -35 box overlapping a -35 box, a -10 box overlapping a -35 box, or a -35 box overlapping a -10 box. We call the resulting event a “homo-gain” when the new box is of the same type as the one it overlaps, and otherwise a “hetero-gain”. In either case, the creation of the new box does not always destroy the original box.”

      Impact of the work on the field, and the utility of the methods and data to the community: From this study, we are more aware of different types of ways promoters can evolve and devolve, but do not have a better ability to predict when mutations will lead to these effects. Recent work in the field of bacterial gene regulation has raised interest in bidirectional promoter regions. While the authors do not discuss how mutations that raise expression in one direction may affect another, they have created an expansive dataset that may enable other groups to study this interesting phenomenon. Also, their variation of the Sort-Seq protocol will be a valuable example for other groups who may be interested in studying bidirectional expression. Lastly, this study may be of interest to groups studying eukaryotic regulation as it can inform how the evolution of transcription factor binding sites influences short-range interactions with local regulator elements. Any additional context to understand the significance of the work:

      The task of computationally predicting whether a sequence drives promoter activity is difficult. By learning what types of mutations create or destroy promoters from this study, we are better equipped for this task.

      We thank Reviewer #1 again for their time and their thoughtful comments.

      Reviewer #2 (Public Review):

      Summary:

      Fuqua et al investigated the relationship between prokaryotic box motifs and the activation of promoter activity using a mutagenesis sequencing approach. From generating thousands of mutant daughter sequences from both active and non-active promoter sequences they were able to produce a fantastic dataset to investigate potential mechanisms for promoter activation. From these large numbers of mutated sequences, they were able to generate mutual information with gene expression to identify key mutations relating to the activation of promoter island sequences.

      We thank Reviewer #2 for reading and providing a thorough review of our manuscript. 

      Strengths:

      The data generated from this paper is an important resource to address this question of promoter activation. Being able to link the activation of gene expression to mutational changes in previously nonactive promoter regions is exciting and allows the potential to investigate evolutionary processes relating to gene regulation in a statistically robust manner. Alongside this, the method of identifying key mutations using mutual information in this paper is well done and should be standard in future studies for identifying regions of interest.

      Thank you for your kind words.

      Weaknesses:

      While the generation of the data is superb the focus only on these mutational hotspots removes a lot of the information available to the authors to generate robust conclusions. For instance.

      (1) The linear regression in S5 used to demonstrate that the number of mutational hotspots correlates with the likelihood of a mutation causing promoter activation is driven by three extreme points.

      A fair criticism. In response, we have chosen to remove the analysis of this trend from the manuscript entirely. (Additionally, Pnew and mutual information calculations both relied on the fluorescence scores of daughter sequences, so the finding was circular in its logic.)

      (2) Many of the arguments also rely on the number of mutational hotspots being located near box motifs. The context-dependent likelihood of this occurring is not taken into account given that these sequences are inherently box motif rich. So, something like an enrichment test to identify how likely these hot spots are to form in or next to motifs.

      Another good point. To address it, we carried out a computational analysis where we randomly scrambled the nucleotides of each parent sequence while maintaining the coordinates for each mutual information “hotspot.” This scrambling results in significantly less overlap with hotspots and boxes. This analysis is now depicted in Figure 2C and described in lines 272-296.

      (3) The link between changes in expression and mutations in surrounding motifs is assessed with two-sided Mann Whitney U tests. This method assumes that the sequence motifs are independent of one another, but the hotspots of interest occur either in 0, 3, 4, or 5s in sequences. There is therefore no sequence where these hotspots can be independent and the correlation causation argument for motif change on expression is weakened.

      This is a fair criticism and a limitation of the MWU test. To better support our reasoning, we engineered 27 reporter constructs harboring the specific mutations in the parents that we had predicted to change promoter activity. For each, we compared their fluorescence levels with their wild-type counterpart. The results from these experiments are in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12 and are labeled with a ′ (prime) symbol.

      These experiments not only confirm the increases and decreases in fluorescence that our analysis had predicted. They also demonstrate, with the exception of two (out of 27) falsepositive discoveries, that background mutations do not confound our analysis. We mention these two exceptions (lines 364-367):

      “In two of these hotspots, our validation experiments revealed no substantial difference in gene expression as a result of the hotspot mutation (Fig S8F′ and Fig S8J′). In both of these false positives, new -10 boxes emerge in locations without an upstream -35 box.”

      (4) The distance between -10 and -35 was mentioned briefly but not taken into account in the analysis.

      We have now included these spacer distances where appropriate. These changes are in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12.

      We note that in many cases, high-scoring PWM hits for the same motif can overlap (i.e. two -10 motifs or two -35 motifs overlap). Additionally, the proximity of a -35 and -10 box does not guarantee that the two boxes are interacting. Together, these two facts can result in an ambiguity of the spacer size between two boxes. To avoid any reporting bias, we thus often report spacer sizes as a range (see Figure panels 4F, S8D, S8F-L, S9A, S9H, S10A, and S10E). The smallest spacer we annotate is in Figure 4F with 10 bp, and the largest is in Figure S8D with 26 bp. More “extreme” distances are not annotated, and for the reader to decide if an interaction is present or not.

      The authors propose mechanisms of promoter activation based on a few observations that are treated independently but occur concurrently. To address this using complementary approaches such as analysis focusing on identifying important motifs, using something like a glm lasso regression to identify significant motifs, and then combining with mutational hotspot information would be more robust.

      This is a great idea, and we pursued it as part of the revision. For each parent sequence, we mapped the locations of all -10 and -35 box motifs in the daughters, then reduced each sequence to a binary representation, either encoding or not encoding these motifs, also referred to as a “hot-encoded matrix.” We subsequently performed a Lasso regression between the hot-encoded matrices and the fluorescence scores of each daughter sequence. The regression then outputs “weights” to each of the motifs in the daughters. The larger a motif’s weight is, the more the motif influences promoter activity. The Author response image 1 describes our workflow.

      Author response image 1.

      We really wanted this analysis to work, but unfortunately, the computational model does not act robustly, even when testing multiple values for the hyperparameter lambda (λ), which accounts for differences in model biases vs variance.

      The regression assigns strong weights almost exclusively to -10 boxes, and assigns weak to even negative weights to -35 boxes. While initially exciting, these weights do not consistently align with the results from the 27 constructs with individual mutations that we tested experimentally. This ultimately suggests that the regression is overfitting the data.

      We do think a LASSO-regression approach can be applied to explore how individual motifs contribute to promoter activity. However, effectively implementing such a method would require a substantially more complex analysis. We respectfully believe that such an approach would distract from the current narrative, and would be more appropriate for a computational journal in a future study. 

      Because this analysis was inconclusive, we have not made it part of the revised manuscript. However, we hope that our 27 experimentally validated new constructs with individual mutations are sufficient to address the reviewer’s concerns regarding independent verification of our computational predictions.

      Other elements known to be involved in promoter activation including TGn or UP elements were not investigated or discussed.

      Thank you for highlighting this potentially important oversight. In response, we have performed two independent analyses to explore the role of TGn in promoter emergence in evolution. First, we computationally searched for -10 boxes with the bases TGn immediately upstream of them in the parent sequences, and found 18 of these “extended -10 boxes” in the parents (lines 143145):

      “On average, each parent sequence contains ~5.32 -10 boxes and ~7.04 -35 boxes (Fig S1). 18 of these -10 boxes also include the TGn motif upstream of the hexamer.”

      However, only 20% of these boxes were found in parents with promoter activity (lines 182-185):

      “We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9-RFP, P10-RFP, P11GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25-RFP). “

      Second, we computationally searched through all of the daughter sequences to identify new -10 boxes with TGn immediately upstream. We found 114 -10 boxes with the bases TGn upstream. However, only 5 new -10 boxes (2 with TGn) were associated with increasing fluorescence (lines 338-345):

      “On average, 39.5 and 39.4 new -10 and -35 boxes emerged at unique positions within the daughter sequences of each mutagenized parent (Fig 3A,B), with 1’562 and 1’576 new locations for -10 boxes and -35 boxes, respectively. ~22% (684/3’138) of these new boxes are spaced 15-20 bp away from their cognate box, and ~7.3% (114/1’562) of the new -10 boxes have the TGn motif upstream of them. However, only a mere five of the new -10 boxes and four of the new 35 boxes are significantly associated with increasing fluorescence by more than +0.5 a.u. (Fig 3C,D).”

      In addition, we now study the role of UP elements. This analysis showed that the UP element plays a negligible role in promoter emergence within our dataset.  It is discussed in a new subsection of the results (lines 591-608).

      Collectively, these additional analyses suggest that the presence of TGn plus a -10 box is insufficient to create promoter activity, and that the UP element does not play a significant role in promoter emergence or evolution.

      Reviewer #3 (Public Review):

      Summary:

      Like many papers in the last 5-10 years, this work brings a computational approach to the study of promoters and transcription, but unfortunately disregards or misrepresents much of the existing literature and makes unwarranted claims of novelty. My main concerns with the current paper are outlined below although the problems are deeply embedded.

      We thank Reviewer #3 for taking the time to review this manuscript. We have made extensive changes to address their concerns about our work.

      Strengths:

      The data could be useful if interpreted properly, taking into account i) the role of translation ii) other promoter elements, and iii) the relevant literature.

      Weaknesses:

      (1) Incorrect assumptions and oversimplification of promoters.

      - There is a critical error on line 68 and Figure 1A. It is well established that the -35 element consensus is TTGACA but the authors state TTGAAA, which is also the sequence represented by the sequence logo shown and so presumably the PWM used. It is essential that the authors use the correct -35 motif/PWM/consensus. Likely, the authors have made this mistake because they have looked at DNA sequence logos generated from promoter alignments anchored by either the position of the -10 element or transcription start site (TSS), most likely the latter. The distance between the TSS and -10 varies. Fewer than half of E. coli promoters have the optimal 7 bp separation with distances of 8, 6, and 5 bp not being uncommon (PMID: 35241653). Furthermore, the distance between the -10 and -35 elements is also variable (16,17, and 18 bp spacings are all frequently found, PMID: 6310517). This means that alignments, used to generate sequence logos, have misaligned -35 hexamers. Consequently, the true consensus is not represented. If the alignment discrepancies are corrected, the true consensus emerges. This problem seems to permeate the whole study since this obviously incorrect consensus/motif has been used throughout to identify sequences that resemble -35 hexamers.

      We respectfully but strongly disagree that our analysis has misrepresented the true nature of -35 boxes. First, accounting for more A’s at position 5 in the PWM is not going to lead to a “critical error.” This is because positions 4-6 of the motif barely have any information content (bits) compared to positions 1-3 (see Fig 1A). This assertion is not just based on our own PWM, but based on ample precedent in the literature. In PMID 14529615, TTG is present in 38% of all -35 boxes, but ACA only in 8%. In PMID 29388765, with the -10 instance TATAAT, the -35 instance TTGCAA yields stronger promoters compared to the -35 instance TTGACA (See their Figure 3B).

      In PMID 29745856 (Figure 2), the most information content lies in positions 1-3, with the A and C at position 5 both nearly equally represented, as in our PWM. In PMID 33958766 (Figure 1) an experimentally-derived -35 box is even reduced to a “partial” -35 box which only includes positions 1 and 2, with consensus: TTnnnn.

      In addition, we did not derive the PWMs as the reviewer describes. The PWMs we use are based on computational predictions that are in excellent agreement with experimental results. Specifically, the PWMs we use are from PMID 29728462, which acquired 145 -10 and -35 box sequences from the top 3.3% of computationally predicted boxes from Regulon DB. See PMID 14529615 for the computational pipeline that was used to derive the PWMs, which independently aligns the -10 and -35 boxes to create the consensus sequences. The -35 PWMs significantly and strongly correlates with an experimentally derived -35 box (see Supporting Information from Figure S4 of Belliveau et al., PNAS 2017. Pearson correlation coefficient = 0.89). Within the 145 -35 boxes, the exact consensus sequence (TTGACA) that Reviewer #3 is concerned about is present 6 times in our matrix, and has a PWM score above the significance threshold. In other words, TTGACA, is classified to be a -35 box in our dataset.

      We now provide DNA sequences for each of the figures to improve accessibility and reproducibility. A reader can now use any PWM or method they wish to interpret the data.

      - An uninformed person reading this paper would be led to believe that prokaryotic promoters have only two sequence elements: the -10 and -35 hexamers. This is because the authors completely ignore the role of the TG motif, UP element, and spacer region sequence. All of these can compensate for the lack of a strong -35 hexamer and it's known that appending such elements to a lone -10 sequence can create an active promoter (e.g. PMIDs 15118087, 21398630, 12907708, 16626282, 32297955). Very likely, some of the mutations, classified as not corresponding to a -10 or -35 element in Figure 2, target some of these other promoter motifs.

      Thank you for bringing this oversight to our attention. We have performed two independent analyses to explore the role of TGn in promoter emergence in evolution. First, we computationally searched for -10 boxes with the bases TGn immediately upstream of them in the parent sequences, and found 18 of these “extended -10 boxes” in the parents (lines 143145):

      “On average, each parent sequence contains ~5.32 -10 boxes and ~7.04 -35 boxes (Fig S1). 18 of these -10 boxes also include the TGn motif upstream of the hexamer.”

      However, only 20% of these boxes were found in parents with promoter activity (lines 182-185):

      “We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9-RFP, P10-RFP, P11GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25-RFP).”

      Second, we computationally searched through all of the daughter sequences to identify new -10 boxes with TGn immediately upstream. We found 114 -10 boxes with the bases TGn upstream. However, only 5 new -10 boxes (2 with TGn) were associated with increasing fluorescence (lines 338-345):

      “On average, 39.5 and 39.4 new -10 and -35 boxes emerged at unique positions within the daughter sequences of each mutagenized parent (Fig 3A,B), with 1’562 and 1’576 new locations for -10 boxes and -35 boxes, respectively. ~22% (684/3’138) of these new boxes are spaced 15-20 bp away from their cognate box, and ~7.3% (114/1’562) of the new -10 boxes have the TGn motif upstream of them. However, only a mere five of the new -10 boxes and four of the new 35 boxes are significantly associated with increasing fluorescence by more than +0.5 a.u. (Fig 3C,D).”

      In addition, we now study the role of UP elements. This analysis showed that the UP element plays a negligible role in promoter emergence within our dataset.  It is discussed in a new subsection of the results (lines 591-608) and in the newly added Figure S13.

      Collectively, these additional analyses suggest that the presence of TGn plus a -10 box is insufficient to create promoter activity, and that the UP element does not play a significant role in promoter emergence or evolution.

      - The model in Figure 4C is highly unlikely. There is no evidence in the literature that RNAP can hang on with one "arm" in this way. In particular, structural work has shown that sequencespecific interactions with the -10 element can only occur after the DNA has been unwound (PMID: 22136875). Further, -10 elements alone, even if a perfect match to the consensus, are non-functional for transcription. This is because RNAP needs to be directed to the -10 by other promoter elements, or transcription factors. Only once correctly positioned, can RNAP stabilise DNA opening and make sequence-specific contacts with the -10 hexamer. This makes the notion that RNAP may interact with the -10 alone, using only domain 2 of sigma, extremely unlikely.

      This is a valid criticism, and we thank the reviewer for catching this problem. In response, we have removed the model and pertinent figures throughout the entire manuscript.

      (2) Reinventing the language used to describe promoters and binding sites for regulators.

      - The authors needlessly complicate the narrative by using non-standard language. For example, On page 1 they define a motif as "a DNA sequence computationally predicted to be compatible with TF binding". They distinguish this from a binding site "because binding sites refer to a location where a TF binds the genome, rather than a DNA sequence". First, these definitions are needlessly complicated, why not just say "putative binding sites" and "known binding sites" respectively? Second, there is an obvious problem with the definitions; many "motifs" with also be "bindings sites". In fact, by the time the authors state their definitions, they have already fallen foul of this conflation; in the prior paragraph they stated: "controlled by DNA sequences that encode motifs for TFs to bind". The same issue reappears throughout the paper.

      We agree that this was needlessly complicated. We now just refer to every sequence we study as a motif. A -10 box is a motif, a -35 box is a motif, a putative H-NS binding site is an H-NS motif, etc. The word “binding site” no longer occurs in the manuscript.

      - The authors also use the terms "regulatory" and non-regulatory" DNA. These terms are not defined by the authors and make little sense. For instance, I assume the authors would describe promoter islands lacking transcriptional activity (itself an incorrect assumption, see below)as non-regulatory. However, as horizontally acquired sections of AT-rich DNA these will all be bound by H-NS and subject to gene silencing, both promoters for mRNA synthesis and spurious promoters inside genes that create untranslated RNAs. Hence, regulation is occurring.

      Another fair point. We have thus changed the terminology throughout to “promoter” and “nonpromoter.”

      - Line 63: "In prokaryotes, the primary regulatory sequences are called promoters". Promoters are not generally considered regulatory. Rather, it is adjacent or overlapping sites for TFs that are regulatory. There is a good discussion of the topic here (PMID: 32665585). 

      We have rewritten this. The sentence now reads (lines 67-69):

      “A canonical prokaryotic promoter recruits the RNA polymerase subunit σ70 to transcribe downstream sequences (Burgess et al., 1969; Huerta and Collado-Vides, 2003; Paget and Helmann, 2003; van Hijum et al., 2009).”

      (3) The authors ignore the role of translation.

      - The authors' assay does not measure promoter activity alone, this can only be tested by measuring the amount of RNA produced. Rather, the assay used measures the combined outputs of transcription and translation. If the DNA fragments they have cloned contain promoters with no appropriately positioned Shine-Dalgarno sequence then the authors will not detect GFP or RFP production, even though the promoter could be making an RNA (likely to be prematurely terminated by Rho, due to a lack of translation). This is known for promoters in promoter islands (e.g. Figure 1 in PMID: 33958766).

      We agree that this is definitely a limitation of our study, which we had not discussed sufficiently. In response, we now discuss this limitation in a new section of the discussion (lines 680-686):

      “Second, we measure protein expression through fluorescence as a readout for promoter activity. This readout combines transcription and translation. This means that we cannot differentiate between transcriptional and post-transcriptional regulation, including phenomena such as premature RNA termination (Song et al., 2022; Uptain and Chamberlin, 1997), post-transcriptional modifications (Mohanty and Kushner, 2006), and RNA-folding from riboswitch-like sequences (Mandal and Breaker, 2004).”

      - In Figure S6 it appears that the is a strong bias for mutations resulting in RFP expression to be close to the 3' end of the fragment. Very likely, this occurs because this places the promoter closer to RFP and there are fewer opportunities for premature termination by Rho.

      The reviewer raises a very interesting possibility. To validate it, we have performed the following analysis. We took the RFP expression values from the 9’934 daughters with single mutations in all 25 parent sequences (P1-RFP, P2-RFP, … P25-RFP), and plotted the location of the single mutation (horizontal axis) against RFP expression (vertical axis) in Author response image 2. 

      Author response image 2.

      The distribution is uniform across the sequences, showing that distance from the RBS is not likely the reason for this observation. Since this analysis was uninformative with respect to distance from the RBS, we chose not to include it in the manuscript.

      (4) Ignoring or misrepresenting the literature.

      - As eluded to above, promoter islands are large sections of horizontally acquired, high ATcontent, DNA. It is well known that such sequences are i) packed with promoters driving the expression on RNAs that aren't translated ii) silenced, albeit incompletely, by H-NS and iii) targeted by Rho which terminates untranslated RNA synthesis (PMIDs: 24449106, 28067866, 18487194). None of this is taken into account anywhere in the paper and it is highly likely that most, if not all, of the DNA sequences the authors have used contain promoters generating untranslated RNAs.

      Thank you for pointing out that our original submission was incomplete in this regard. We address these concerns by new analyses, including some new experiments. First, Rhodependent termination is associated with the RUT motif, which is very rich in Cytosines (PMID: 30845912). Given that our sequences confer between 65%-78% of AT-content, canonical rhodependent termination is unlikely. However, we computationally searched for rho-dependent terminators using the available code from PMID: 30845912, but the algorithm did not identify any putative RUTs. Because this analysis was not informative, we did not include it in the paper.

      We analyzed the role of H-NS on promoter emergence and evolution within our dataset using both experimental and computational approaches. These additional analyses are now shown in the newly-added Figure 5 and the newly-added Figure S12. We found that H-NS represses P22-GFP and P12-RFP and affects the bidirectionality of P20. More specifically, to analyze the effects of H-NS, we first compared the fluorescence levels of parent sequences in a Δhns background vs the wild-type (dh5α) background in Figure 5A. We found 6 candidate H-NS targets, with P22-GFP and P12-RFP exhibiting the largest changes in fluorescence (lines 496506):

      “We plot the fluorescence changes in Fig 5A as distributions for the 50 parents, where positive and negative values correspond to an increase or decrease in fluorescence in the Δhns background, respectively. Based on the null hypothesis that the parents are not regulated by H-NS, we classified outliers in these distributions (1.5 × the interquartile range) as H-NS-target candidates. We refer to these outliers as “candidates” because the fluorescence changes could also result from indirect trans-effects from the knockout (Mattioli et al., 2020; Metzger et al., 2016). This approach identified 6 candidates for H-NS targets (P2-GFP, P19-GFP, P20-GFP, P22-GFP, P12-RFP, and P20-RFP). For GFP, the largest change occurs in P22-GFP, increasing fluorescence ~1.6-fold in the mutant background (two-tailed t-test, p=1.16×10-8) (Fig 5B). For RFP, the largest change occurs in P12-RFP, increasing fluorescence ~0.5-fold in the mutant background (two-tailed t-test, p=4.33×10-10) (Fig 5B).” 

      We also observed that the Δhns background affected the bidirectionality of P20 (lines 507-511):

      “We note that for template P20, which is a bidirectional promoter, GFP expression increases ~2.6-fold in the Δhns background (two-tailed t-test, p=1.59×10-6). Simultaneously, RFP expression decreases ~0.42-fold in the Δhns background (two-tailed t-test, p=4.77×10-4) (Fig S12A). These findings suggest that H-NS also modulates the directionality of P20’s bidirectional promoter through either cis- or trans-effects.”

      We then searched for regions where losing H-NS motifs in hotspots significantly changed fluorescence. We identified 3 motifs in P12-RFP and P22-GFP (lines 522-528):

      “For P22-GFP, a H-NS motif lies 77 bp upstream of the mapped promoter. Mutations which destroy this motif significantly increase fluorescence by +0.52 a.u. (two-tailed MWU test, q=1.07×10-3) (Fig 5E). For P12-RFP, one H-NS motif lies upstream of the mapped promoter’s -35 box, and the other upstream of the mapped promoter’s -10 box. Mutations that destroy these H-NS motifs significantly increase fluorescence by +0.53 and +0.51 a.u., respectively (two-tailed MWU test, q=3.28×10-40 and q=4.42 ×10-50) (Fig 5F,G). Based on these findings, we conclude that these motifs are bound by H-NS.”

      We are grateful for the suggestion to look at the role of H-NS in our dataset. Our analysis revealed a more plausible explanation to what we formerly referred to as a “Tandem Motif” in the original submission. Previously, we had shown that in P12-RFP, when a -35 box is created next to the promoter’s -35 box, or a -10 box next to the promoter’s -10 box, that expression decreases. These new -10 and -35 boxes, however, also overlap with the two H-NS motifs in P12-RFP. We tested these exact point mutations in reporter plasmids and in the Δhns background, and found that the Δhns background rescues this loss in expression (see Figure S12). This analysis is in the newly added subsection: “The binding of H-NS changes when new 10 and -35 boxes are gained” and can be found at lines 529-563. We summarize the findings in a final paragraph of the section (lines 556-563):

      “To summarize, we present evidence that H-NS represses both P22-GFP and P12-RFP in cis. H-NS also modulates the bidirectionality of P20-GFP/RFP in cis or trans. In P22-GFP, the strongest H-NS motif lies upstream of the promoter. In P12-RFP, the strongest H-NS motifs lie  upstream of the -10 and -35 boxes of the promoter. We note that there are 16 additional H-NS motifs surrounding the promoter in P12-RFP that may also regulate P12-RFP (Fig S12G). Mutations in two of these two H-NS motifs can create additional -10 and -35 boxes that appear to lower expression. However, the effects of these mutations are insignificant in the absence of H-NS, suggesting that these mutations actually modulate H-NS binding.”

      We also agree that the majority of these sequences are likely driving the expression of many untranslated RNAs (see Purtov et al., 2014). We thus now define a promoter more carefully as follows (lines 113-119):

      “In this study, we define a promoter as a DNA sequence that drives the expression of a (fluorescent) protein whose expression level, measured by its fluorescence, is greater than a defined threshold. We use a threshold of 1.5 arbitrary units (a.u.) of fluorescence. This definition does not distinguish between transcription and translation. We chose it because protein expression is usually more important than RNA expression whenever natural selection acts on gene expression, because it is the primary phenotype visible to natural selection (Jiang et al., 2023).” 

      We also state this as a limitation of our study in the Discussion (lines 680-686):

      “Second, we measure protein expression through fluorescence as a readout for promoter activity. This readout combines transcription and translation. This means that we cannot differentiate between transcriptional and post-transcriptional regulation, including phenomena such as premature RNA termination (Song et al., 2022; Uptain and Chamberlin, 1997), post-transcriptional modifications (Mohanty and Kushner, 2006), and RNA-folding from riboswitch-like sequences (Mandal and Breaker, 2004).”

      - The authors state that GC content does not correlate with the emergence of new promoters. It is known that GC content does correlate to the emergence of new promoters because promoters are themselves AT-rich DNA sequences (e.g. see Figure 1 of PMID: 32297955). There are two reasons the authors see no correlation in this work. First, the DNA sequences they have used are already very AT-rich (between 65 % and 78 % AT-content). Second, they have only examined a small range of different AT-content DNA (i.e. between 65 % and 78 %). The effect of AT-content on promoter emerge is most clearly seen between AT-content of between around 40 % and 60 %. Above that level, the strong positive correlation plateaus.

      We respectfully disagree that the reviewer’s point is pertinent because what the reviewer is referring to is the likelihood that the sequence is a promoter, which indeed increases with AT content, but we are focused on the likelihood that a sequence becomes a promoter through DNA mutation. We note that if a DNA sequence is more AT-rich, then it is more likely to have -10 and -35 boxes, because their consensus sequences are also AT-rich. However, H-NS and other transcriptional repressors also bind to AT-rich sequences. This could also explain the saturation observed above 60% AT-content in PMID 32297955. Perhaps we can address this trend in future works.

      - Once these authors better include and connect their results to the previous literature, they can also add some discussion of how previous papers in recent years may have also missed some of this important context.

      We apologize for this oversight. We have rewritten the Discussion section to include the following points below. Many of the newly added references come from the group of David Grainger, who works on H-NS repression, bidirectional promoters, promoter emergence, promoter motifs, and spurious transcription in E. coli. More specifically:

      (1) The role of pervasive transcription and the likelihood of promoter emergence (lines 614-621):

      “Instead, we present evidence that promoter emergence is best predicted by the level of background transcription each non-promoter parent produces, a phenomenon also referred to as “pervasive transcription” (Kapranov et al., 2007).

      From an evolutionary perspective, this would suggest that sequences that produce such pervasive transcripts – including the promoter islands (Panyukov and Ozoline, 2013) and the antisense strand of existing promoters (Dornenburg et al., 2010; Warman et al., 2021), may have a proclivity for evolving de-novo promoters compared to other sequences (Kapranov et al., 2007; Wade and Grainger, 2014).”

      (2) How our results contradict the findings from Bykov et al., 2020 (lines 622-640):

      “A previous study randomly mutagenized the appY promoter island upstream of a GFP reporter, and isolated variants with increased and decreased GFP expression. The authors found that variants with higher GFP expression acquired mutations that 1) improve a -10 box to better match its consensus, and simultaneously 2) destroy other -10 and -35 boxes (Bykov et al., 2020). The authors concluded that additional -10 and -35 boxes repress expression driven by promoter islands. Our data challenge this conclusion in several ways. 

      First, we find that only ~13% of -10 and -35 boxes in promoter islands actually contribute to promoter activity. Extrapolating this percentage to the appY promoter island, ~87% (100% - 13%) of the motifs would not be contributing to its activity. Assuming the appY promoter island is not an outlier, this would insinuate that during random mutagenesis, these inert motifs might have accumulated mutations that do not change fluorescence. Indeed, Bykov et al. (Bykov et al., 2020) also found that a similar frequency of -10 and -35 boxes were destroyed in variants selected for lower GFP expression, which supports this argument. Second, we find no evidence that creating a -10 or -35 box lowers promoter activity in any of our 50 parent sequences. Third, we also find no evidence that destruction of a -10 or -35 box increases promoter activity without plausible alternative explanations, i.e. overlap of the destroyed box with a H-NS site, destruction of the promoter, or simultaneous creation of another motif as a result of the destruction. In sum, -10 and 35 boxes are not likely to repress promoter activity.”

      (3) How other sequence features besides the -10 and -35 boxes may influence promoter emergence and activity (lines 661-671):

      “These findings suggest that we are still underestimating the complexity of promoters. For instance, the -10 and -35 boxes, extended -10, and the UP-element may be one of many components underlying promoter architecture. Other components may include flanking sequences (Mitchell et al., 2003), which have been observed to play an important role in eukaryotic transcriptional regulation (Afek et al., 2014; Chiu et al., 2022; Farley et al., 2015; Gordân et al., 2013). Recent studies on E. coli promoters even characterize an AT-rich motif within the spacer sequence (Warman et al., 2020), and other studies use longer -10 and -35 box consensus sequences (Lagator et al., 2022). Another possibility is that there is much more transcriptional repression in the genome than anticipated (Singh et al., 2014). This would also coincide with the observed repression of H-NS in P22-GFP and P12-RFP, and accounts of H-NSrepression in the full promoter island sequences (Purtov et al., 2014).”

      (4) The limits of our experimental methodology (lines 675-686):

      “Additionally, future studies will be necessary to address the limitations of our own work. First, we use binary thresholding to determine i) the presence or absence of a motif, ii) whether a sequence has promoter activity or not, and iii) whether a part of a sequence is a hotspot or not. While chosen systematically, the thresholds we use for these decisions may cause us to miss subtle but important aspects of promoter evolution and emergence. Second, we measure protein expression through fluorescence as a readout for promoter activity. This readout combines transcription and translation. This means that we cannot differentiate between transcriptional and post-transcriptional regulation, including phenomena such as premature RNA termination (Song et al., 2022; Uptain and Chamberlin, 1997), posttranscriptional modifications (Mohanty and Kushner, 2006), and RNA-folding from riboswitch-like sequences (Mandal and Breaker, 2004) “

      (5) An updated take-home message (lines 687-694):

      “Overall, our study demonstrates that -10 and -35 boxes neither prevent existing promoters from driving expression, nor do they prevent new promoters from emerging by mutation. It shows how mutations can create new -10 and -35 boxes near or on top of preexisting ones to modulate expression. However, randomly creating a new -10 or -35 box will rarely create a new promoter, even if the new box is appropriately spaced upstream or downstream of a cognate box. Ultimately our study demonstrates that promoter models need to be further scrutinized, and that using mutagenesis to create de-novo promoters can provide new insights into promoter regulatory logic.”

      (5) Lack of information about sequences used and mutations.

      - To properly assess the work any reader will need access to the sequences cloned at the start of the work, where known TSSs are within these sequences (ideally +/- H-NS, which will silence transcription in the chromosomal context but may not when the sequences are removed from their natural context and placed in a plasmid). Without this information, it is impossible to assess the validity of the authors' work.

      Thank you for raising this point. Please see Data S1 for the 25 template sequences (P1-P25) used in this study, and Data S2 for all of the daughter sequences.

      For brevity, we have addressed the reviewer’s request to look at the role of H-NS in their comment (4) “Ignoring or misrepresenting the literature.”

      We do not have information about the predicted transcription start sites (TSS) for the parent sequences because the program which identified them (Platprom) is no longer available. Regardless, having TSS coordinates would not validate or invalidate our findings, since we already know that the promoter islands produce short transcripts throughout their sequences, and we are primarily interested in promoters which can produce complete transcripts.

      - The authors do not account for the possibility that DNA sequences in the plasmid, on either side of the cloned DNA fragment, could resemble promoter elements. If this is the case, then mutations in the cloned DNA will create promoters by "pairing up" with the plasmid sequences. There is insufficient information about the DNA sequences cloned, the mutations identified, or the plasmid, to determine if this is the case. It is possible that this also accounts for mutational hotspots described in the paper.

      We agree that these are important points. To address the criticism that we provided insufficient information, we now redesigned all our figures to provide this information. Specifically, the figures now include the DNA sequences, their PWM predictions, and the exact mutations that lead to promoter activity. The figures with these changes are Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12. We now also provide more details about pMR1 in a new section of the methods (lines 740-748):

      “Plasmid MR1 (pMR1)

      The plasmid MR1 (pMR1) is a variant of the plasmid RV2 (pRV2) in which the kan resistance gene has been swapped with the cm resistance gene (Guazzaroni and Silva-Rocha, 2014). Plasmid pMR1 encodes the BBa_J34801 ribosomal binding site (RBS, AAAGAGGAGAAA) 6 bp upstream of the start codon for GFP(LVA). The plasmid also encodes a putative RBS (AAGGGAGG) (Cazemier et al., 1999) 5 bp upstream of the start codon for mCherry on the opposite strand.

      The plasmid additionally contains the low-to-medium copy number origin of replication p15A (Westmann et al., 2018).

      A map of the plasmid is available on the Github repository: https://github.com/tfuqua95/promoter_islands

      The reviewer also makes a valid point about promoter elements of the plasmid itself. We addressed it with the following new analyses. First we re-examined each of the examples where new -10 and -35 boxes are gained or lost, to see if any of these hotspots occur on the flanking ends of the parent sequences. We looked specifically at the ends because they could potentially interact with -10 and -35 box-like sequences on the plasmid to form a promoter. 

      Only one of these hotspots (out of 27) occurred at the end of the cloned sequences, and is thus a candidate for the phenomenon the reviewer hypothesized. This hotspot occurs in P9-GFP, where gaining a -10 box at the left flank increases expression (see Figure S8E-F’). There is indeed a -35 box 22-23 bp upstream of this -10 box on the plasmid, which could potentially affect promoter activity. 

      We tested the GFP expression of a construct harboring the point mutation which creates this -10 box on the left flank of P9-GFP. However, there was no significant difference in fluorescence between this construct and the wile-type P9-GFP (see Figure S8E-F’). Thus, this -35 box on pMR1 is not likely creating a new promoter.

      (6) Overselling the conclusions.

      Line 420: The paper claims to have generated important new insights into promoters. At the same time, the main conclusion is that "Our study demonstrates that mutations to -10 and -35 boxes motifs are the primary paths to create new promoters and to modulate the activity of existing promoters". This isn't new or unexpected. People have been doing experiments showing this for decades. Of course, mutations that make or destroy promoter elements create and destroy promoters. How could it be any other way?

      In hindsight, we agree that the original conclusion was not very novel. Our new conclusion is that -10 and -35 boxes do not repress transcription, and that our current promoter models, even with the additional motifs like the UP-element and the extended -10, are insufficient to understand promoters (lines 687-694):

      “Overall, our study demonstrates that -10 and -35 boxes neither prevent existing promoters from driving expression, nor do they prevent new promoters from emerging by mutation. It shows how mutations can create new -10 and -35 boxes near or on top of preexisting ones to modulate expression. However, randomly creating a new -10 or -35 box will rarely create a new promoter, even if the new box is appropriately spaced upstream or downstream of a cognate box. Ultimately our study demonstrates that promoter models need to be further scrutinized, and that using mutagenesis to create de-novo promoters can provide new insights into promoter regulatory logic.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I would like to start by thanking the authors for presenting an interesting and well-written article for review. This paper is a welcome addition to the field, addressing modern questions in the longstanding area of bacterial gene regulation. It is both enlightening and inspiring. While I do have suggestions, I hope these are not perceived as a lack of optimism for the work.

      Thank you for your kind words and suggestions, and for providing an astute and constructive review. We feel that manuscript has greatly improved with your suggested changes.

      ABSTRACT:

      Line 11: The sentence, "It is possible that these motifs influence..." Could be rewritten to be clearer as it is the most important point of the manuscript. It is not obvious that you're talking about how the local landscape of motifs affects the probability of promoters evolving/devolving in this location.

      We have changed the sentence to read, “Here, we ask whether the presence of such motifs in different genetic sequences influences promoter evolution and emergence.”

      INTRODUCTION:

      Line 68: Is the -35 consensus motif not TTGACA? Here it is listed as TTGAAA.

      Corrected from TTGAAA to TTGACA

      RESULTS:

      Line 92-94. In finding that the. The main takeaway from this work is that different sequences have different likelihoods of mutations creating promoters and so I believe this claim could be explored deeper with more quantitative information. Could the authors supplement this claim by including? Could you look at whether there is a correlation between the baseline expression of a parent sequence and Pnew? I expect even the inactive sequences to have some variability in measured expression.

      Thank you for this great idea. We followed up on it by plotting the baseline parent sequence fluorescence scores against Pnew. You are indeed correct, i.e., Pnew increases with baseline expression following a sigmoid function, and is now shown in Figure 1D. To report our new observations, we have added the following section to the Results (lines 219-232):

      “Although mutating each of the 40 non-promoter parent sequences could create promoter activity, the likelihood Pnew that a mutant has promoter activity, varies dramatically among parents. For each non-promoter parent, Fig 1D shows the percentage of active daughter sequences. The median Pnew is 0.046 (std. ± 0.078), meaning that ~4.6% of all mutants have promoter activity. The lowest Pnew is 0.002 (P25-GFP) and the highest 0.41 (P8-RFP), a 205-fold difference.

      We hypothesized that these large differences in Pnew could be explained by minute differences in the fluorescence scores of each parent, particularly if its score was below 1.5 a.u. Plotting the fluorescence scores of each parent (N=50) and their respective Pnew values as a scatterplot (Fig 1E), we can fit these values to a sigmoid curve (see methods). This finding helps to explain why P8-RFP has a high Pnew (0.41) and P25-GFP a low Pnew (0.002), as their fluorescence scores are 1.380 and 1.009 a.u., respectively. The fact that the inflection point of the fitted curve is at 1.51 a.u. further justifies our use of 1.5 a.u. as a cutoff for promoter and non-promoter activity.”

      Another potentially interesting analysis would be to see if k-mer content is correlated with Pnew. That is, determine the abundance of all hexamers in the sequence and see if Pnew is correlated with the number of hexamers present that is one nucleotide distance away from the consensus motifs (such as TcGACA or TAcAAT).

      We performed the suggested analysis by searching for k-mers that correlate with Pnew and found that no k-mer significantly correlates with Pnew (lines 240-248):

      “We then asked whether any k-mers ranging from 1-6 bp correlated with the non-promoter Pnew values (5,460 possible k-mers). 718 of these 1-6 bp k-mers are present 3 or more times in at least one non-promoter parent. We calculated a linear regression between the frequency of these 718 k-mers and each Pnew value, and adjusted the p-values to respective q-values (Benjamini-Hochberg correction, FDR=0.05). This analysis revealed six k-mers: CTTC, GTTG,

      ACTTC, GTTGA, AACTTC, TAACTT which correlate with Pnew. However, these correlations are heavily influenced by an outlying Pnew value of 0.41 (P8-RFP) (Fig S5C-H), and upon removing P8-RFP from the analysis, no k-mer significantly correlates with Pnew (data not shown)”

      Line 152-157: How did you define the thresholds for 'active' or 'inactive'? It is not clear in the methods how this distinction was made.

      We have more clearly defined these thresholds in the text. A sequence with promoter activity has a fluorescence score greater than 1.5 a.u. (lines 168-172):

      “We declared a daughter sequence to have promoter activity or to be a promoter if its score was greater than or equal to 1.5 a.u., as this score lies at the boundary between no fluorescence and weak fluorescence based on the sort-seq bins (methods). Otherwise, we refer to a daughter sequence as having no promoter activity or being a non-promoter.”

      Lines: 152-157: In trying to find the parent expression levels, no figure was available showing the distribution of parent expression levels. Furthermore, In looking at Data S2 & filtering out for sequences with distance 0 from the parent, I found the most active sequences did not match up with the sequences described as active in this section (e.g. p19 and p20 have a higher topstrand mean over P22, yet are not listed as active top strand sequences).

      We really appreciate you taking the time to examine the supplemental data. We previously listed the parents that had only GFP activity but no RFP activity (P22), and only RFP activity but no GFP activity (P6, P12, P13, P18, P21). We then said that P19 and P20 were bidirectional promoters, because they showed both GFP and RFP activity. In hindsight, we realize that our wording was confusing. We thus rewrote the affected paragraph, such that the bidirectional promoters are now in both lists of GFP/RFP active parents. We also now make the distinction between “templates” which comprise our 25 promoter island fragments, and “parents”, where we treat both strands separately (50 parents total). The paragraph in question now reads (lines 173-187):

      “Because some sequences in our library are unmutated parent sequences, we determined that 10/50 of the parent sequences already encode promoter activity before mutagenesis. Specifically, three parents drove expression on the top strand (P19-GFP, P20-GFP, P22-GFP), and five did on the bottom strand (P6-RFP, P12-RFP, P13-RFP, P18-RFP, P19-RFP, P20-RFP, P21-RFP). Two parents harbor bidirectional promoters (P19 and P20). The remaining 40 parent sequences are non-promoters, with an average fluorescence score of 1.39 a.u. We note that some of these parents have a fluorescence score higher than 1.39 a.u., but less than 1.50 a.u. such as P8-RFP (1.38 a.u.), P16-RFP (1.39 a.u.), P9-GFP (1.49 a.u.), and P1-GFP (1.47 a.u.). Whether these are truly “promoters” or not, is based solely on our threshold value of 1.5 a.u. We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9RFP, P10-RFP, P11-GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25RFP). See Fig S4 for fluorescence score distributions for each parent and its daughters, and Data S2 for all daughter sequence fluorescence scores.”

      Please include a supplementary figure showing the different parent expression levels (GFP mean +/- sd). Also, please explain the discrepancy in the 'active sequences' compared to Data S2 or correct my misunderstanding.

      We have added this plot to Figure S4B. The discrepancy arose because we listed the parents that had only GFP activity but no RFP activity (P22), and only RFP activity but no GFP activity (P6, P12, P13, P18, P21). We then said that P19 and P20 were bidirectional promoters, because they showed both GFP and RFP activity. previous response regarding the ambiguity.

      Line 182: I do not see 'Fuqua and Wagner 2023' in the references (though I am familiar with the preprint).

      We have added Fuqua and Wagner, BiorXiv 2023 to the references.

      Lines 197 - 200: The distribution of hotspot locations should be compared to the distribution of mutations in the library. e.g. It is not notable that 17% of mutations are in -10 motifs if 17% of all mutations are in -10 motifs.

      Thank you for raising this point. To address it, we carried out a computational analysis where we randomly scrambled the nucleotides of each parent sequence while maintaining the coordinates for each mutual information “hotspot.” This scrambling results in significantly less overlap with hotspots and boxes. This analysis is now depicted in Figure 2C and written in lines 272-296.

      Lines 253-264: Examples 3B, 3D, and 3F should indicate the spacing between the new and existing motifs. Are these close to the 15-19 bp spacer lengths preferred by sigma70?

      Point well taken. We now annotate the spacing of motifs in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, and S11. We note that in many cases, high-scoring PWM hits for the same motif can overlap (i.e. two -10 motifs or two -35 motifs overlap). Additionally, the proximity of a 35 and -10 box does not guarantee that the two boxes are interacting. Together, these two facts can result in an ambiguity of the spacer size between two boxes. To avoid any reporting bias, we thus often report spacer sizes as a range (see Figure panels 4F, S8D, S8F-L, S9A, S9H, S10A, and S10E). The smallest spacer we annotate is in Figure 4F with 10 bp, and the largest is in Figure S8D with 26 bp. Any more “extreme” distances are not annotated, and for the reader to decide if an interaction is present or not.

      Line 255: While fun, I am concerned about the 'Shiko' analogy. My understanding is the prevailing theory is that -35 recognition occurs before -10 recognition (https://doi.org/10.1073/pnas.94.17.9022, 10.1101/sqb.1998.63.141). Given this, the 'Shiko -35' concept in 3H is a bit awkward as it suggests that sigma70 stops at -10 motifs before planting down on the -35. Considering the cited paper is still in the preprint stages (and did not observe these Shiko -35 emergences), I am concerned about how this particular example will be received by the community. Perhaps more care could be done to verify that this example is consistent with generally accepted mechanisms of promoter recognition or a short clarification could be added to clarify the extent of the analogy.

      Thank you for raising this point. We decided to remove the Shiko analogy, because several readers assumed that it relates to the physical binding of RNA polymerase, rather than being an evolutionary mechanism of mutations forming complementary motifs in a stepwise manner.

      Lines 323-326: It would be helpful to describe a more systematic approach to defining emergence events into different categories. A clear definition of each category in the methods or main text would help others consistently refer to these concepts in the future. This could be helped by showing the actual parent vs daughter sequences as a supplementary figure to figures 4B, 4D, & 4G.

      We agree this could have been more clearly communicated. We have addressed this by 1) simplifying the nomenclatures of these categories and  2) clearly defining these categories, and 3) showing the actual parent vs daughter sequences in Figure 4, and Supplemental Figures S9, S10, S11, and S12. More specifically:

      (1) Simplifying the nomenclature. We highlight events where gaining new -10 and -35 boxes can modify the promoter activity of parent sequences with promoter activity. This occurs when a new -10 or -35 box appears that partially overlaps with the -10 or -35 box of the actual promoter. Thus, we rename two terms: hetero-gain and homo-gain, shown in Figure 4B:

      (2) We clearly define these categories (lines 430-435):

      “We found that these mutations frequently create new boxes overlapping those we had identified as part of a promoter (Fig S9). This occurs when mutations create a -10 box overlapping a -10 box, a -35 box overlapping a 35 box, a -10 box overlapping a -35 box, or a -35 box overlapping a -10 box. We call the resulting event a “homogain” when the new box is of the same type as the one it overlaps, and otherwise a “hetero-gain”. In either case, the creation of the new box does not always destroy the original box.”

      In the original manuscript, there was an additional third category, where gaining a -35 box upstream of the promoter’s -35 box, and gaining a -10 box upstream of the promoter’s -10 box decreased expression. We referred to this as a “tandem motif” and it can be found in Figure S12C,D. However, in response to comment “(4) Ignoring or misrepresenting the literature” from Reviewer #3, we carried out an analysis of the binding of H-NS (see Figure 5 and Figure S12). This analysis revealed that this “tandem motif” phenomenon was actually the result of changing the affinity of H-NS to these regions. Thus, the “tandem motif” is probably spurious.

      DISCUSSION:

      Line 378-379: Since hotspots are essentially areas where promoters appear, wouldn't it be obvious that having more hotspots (i.e. areas where more promoters appear) would equate to a higher probability of new promoters? It would be helpful to clarify why this isn't obvious. This could be resolved by adding more complexity to the statement, such as showing that the level of mutual information found in a hotspot or across all hotspots in a sequence is correlated with Pnew.

      A fair criticism. In response, we have chosen to remove the analysis of this trend from the manuscript entirely. (Additionally, Pnew and mutual information calculations both relied on the fluorescence scores of daughter sequences, so the finding was circular in its logic.)

      Line 394-396: This comparison of findings to Bykov et al should include a bit more justification for the proposed mechanism and how it specifically was observed in this paper. What did they observe and how do these findings relate?

      We gladly followed this suggestion, and added the following two paragraphs to the discussion (lines 622-640).

      “A previous study randomly mutagenized the appY promoter island upstream of a GFP reporter, and isolated variants with increased and decreased GFP expression. The authors found that variants with higher GFP expression acquired mutations that 1) improve a -10 box to better match its consensus, and simultaneously 2) destroy other -10 and -35 boxes (Bykov et al., 2020). The authors concluded that additional -10 and -35 boxes repress expression driven by promoter islands. Our data challenge this conclusion in several ways. 

      First, we find that only ~13% of -10 and -35 boxes in promoter islands actually contribute to promoter activity. Extrapolating this percentage to the appY promoter island, ~87% (100% - 13%) of the motifs would not be contributing to its activity. Assuming the appY promoter island is not an outlier, this would insinuate that during random mutagenesis, these inert motifs might have accumulated mutations that do not change fluorescence. Indeed, Bykov et al. (Bykov et al., 2020) also found that a similar frequency of -10 and -35 boxes were destroyed in variants selected for lower GFP expression, which supports this argument. Second, we find no evidence that creating a -10 or -35 box lowers promoter activity in any of our 50 parent sequences. Third, we also find no evidence that destruction of a -10 or -35 box increases promoter activity without plausible alternative explanations, i.e. overlap of the destroyed box with a H-NS site, destruction of the promoter, or simultaneous creation of another motif as a result of the destruction. In sum, -10 and 35 boxes are not likely to repress promoter activity. “

      METHODS:

      Line 500: Could you provide more details on PMR1 (e.g. size, copy number, RBS strength) or a reference? I could not find this easily.

      Thank you for pointing out this oversight. In response, we have added the following subsection to the methods (lines 740-748):

      “Plasmid MR1 (pMR1)

      The plasmid MR1 (pMR1) is a variant of the plasmid RV2 (pRV2) in which the kan resistance gene has been swapped with the cm resistance gene (Guazzaroni and Silva-Rocha, 2014). Plasmid pMR1 encodes the BBa_J34801 ribosomal binding site (RBS, AAAGAGGAGAAA) 6 bp upstream of the start codon for GFP(LVA). The plasmid also encodes a putative RBS (AAGGGAGG) (Cazemier et al., 1999) 5 bp upstream of the start codon for mCherry on the opposite strand.

      The plasmid additionally contains the low-to-medium copy number origin of replication p15A (Westmann et al., 2018).

      A map of the plasmid is available on the Github repository: https://github.com/tfuqua95/promoter_islands.”

      Line 581: What was the sequencing instrument &/or depth?

      We now report this information as follows (Methods, lines 918-922):

      “Illumina sequencing

      The amplicon pool was sequenced by Eurofins Genomics (Eurofins GmbH, Germany) using a NovaSeq 6000 (Illumina, USA) sequencer, with an S4 flow cell, and a PE150 (Paired-end 150 bp) run. In total, 282’843’000 reads and 84’852’900’000 bases were sequenced. Raw sequencing reads can be found here: https://www.ncbi.nlm.nih.gov/bioproject/1071572.”

      SUPPLEMENT:

      Supplementary Figure 2: Why does the GFP control produce a bimodal distribution?

      The GFP+ culture was inoculated directly from a glycerol stock. The bimodal distribution probably results from a subset of the bacteria having lost the GFP-coding insert, because the left-most peak coincides with the negative control.

      Reviewer #2 (Recommendations For The Authors):

      This paper would benefit from a clear definition of what constitutes an active promoter as this is only mentioned as justification for the use of arbitrary values for fluorescence.

      Good point. To clarify, we now include this new paragraph in the introduction (lines 112-119):

      “In this study, we define a promoter as a DNA sequence that drives the expression of a (fluorescent) protein whose expression level, measured by its fluorescence, is greater than a defined threshold. We use a threshold of 1.5 arbitrary units (a.u.) of fluorescence. This definition does not distinguish between transcription and translation. We chose it because protein expression is usually more important than RNA expression whenever natural selection acts on gene expression, because it is the primary phenotype visible to natural selection (Jiang et al., 2023).”

      There needs to be a clear distinction in the use of the word sequences as often interchange sequences when meaning the 25 parent sequences and then the 50 possible sequences directions the promoter can act. It is confusing going from one to the other.

      We agree that this distinction is important. To make it clearer, we now introduce an additional term (lines 119-130). Our experiments start from 25 promoter island fragments (P1-P25), which we now call template sequences. Each template sequence comprises both DNA strands. The parent sequences are the top and bottom strands of each template sequence. Therefore, there are now 50 parent sequences (P1-GFP, P1-RFP, P2-GFP…, P25-RFP). By treating each strand as its own sequence, we no longer have to refer to the strand, avoiding the earlier confusion.

      The description of the hotspots is often unclear and trying to determine if 3 out of 9 hotspots come from one parent sequence or multiple is not possible. A table denoting this information would be most helpful.

      We agree, and now provide this information in Data S3.

      Finally, the description of the proposed mechanism of promoter activation via mutation of motifs should not be in the results but in the discussion, as it has insufficient evidence and would require further experimental validation.

      We remedied this problem by providing experimental validation of the proposed mechanisms. Specifically, we created the precise mutations that caused a loss or gain of a -10 or a -35 box, and measured the level of gene expression they drive with a plate reader. Because we chose to provide this experimental validation, we opted to leave the mechanisms of promoter activation in the results section.

      The (Fuqua and Wagner 20023) paper is not in the references.

      We have added Fuqua and Wagner, BiorXiv 2023 to the references.

      I enjoyed the paper and wish the authors the best for their future work.

      Thank you for taking the time to review our manuscript!

      Reviewer #3 (Recommendations For The Authors):

      The paper has major flaws. For example:

      The data need to be analysed with correct promoter sequence element sequences (TTGACA for the -35 element).

      The discrepancy lies in the frequency of A’s vs C’s at position #5 of the PWM. Our PWM was built with more A’s than C’s at this position, but also includes C’s in this position. However, we respectfully disagree that using a different -35 box PWM is going to change the outcomes of our study. First, positions 4-6 of the PWM barely have any information content (bits) compared to positions 1-3 (see Fig 1A). This assertion is not just based on our own PWM, but based on ample precedent in the literature. In PMID 14529615, TTG is present in 38% of all -35 boxes, but ACA only 8%. In PMID 29388765, with the -10 instance TATAAT, the -35 instance TTGCAA yields stronger promoters compared to the -35 instance TTGACA (See their Figure 3B). In PMID 29745856 (Figure 2), the most information content lies in positions 1-3, with the A and C at position 5 both nearly equally represented, as in our PWM. In PMID 33958766 (Figure 1) an experimentally-derived -35 box is even reduced to a “partial” -35 box which only includes positions 1 and 2, with consensus: TTnnnn. Additionally, the -35 box PWM that we used significantly and strongly correlates with an experimentally derived -35 box (see Supporting Information from Figure S4 of Belliveau et al., PNAS 2017. Pearson correlation coefficient = 0.89). We now provide DNA sequences for each of the figures to improve accessibility and reproducibility. A reader can now use any PWM or method they wish to interpret the data.

      The data need to be analysed taking into account the role of other promoter elements and sequences for translation.

      Point well taken. 

      Thank you for bringing this oversight to our attention. We have performed two independent analyses to explore the role of TGn in promoter emergence in evolution. First, we computationally searched for -10 boxes with the bases TGn immediately upstream of them in the parent sequences, and found 18 of these “extended -10 boxes” in the parents (lines 143145):

      “On average, each parent sequence contains ~5.32 -10 boxes and ~7.04 -35 boxes (Fig S1). 18 of these -10 boxes also include the TGn motif upstream of the hexamer.”

      However, only 20% of these boxes were found in parents with promoter activity (lines 182-185):

      “We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9-RFP, P10-RFP, P11GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25-RFP).” 

      Second, we computationally searched through all of the daughter sequences to identify new -10 boxes with TGn immediately upstream. We found 114 -10 boxes with the bases TGn upstream. However, only 5 new -10 boxes (2 with TGn) were associated with increasing fluorescence (lines 338-345):

      “Mutations indeed created many new -10 and -35 boxes in our daughter sequences. On average, 39.5 and 39.4 new 10 and -35 boxes emerged at unique positions within the daughter sequences of each mutagenized parent (Fig 3A,B), with 1’562 and 1’576 new locations for -10 boxes and -35 boxes, respectively. ~22% (684/3’138) of these new boxes are spaced 15-20 bp away from their cognate box, and ~7.3% (114/1’562) of the new -10 boxes have the TGn motif upstream of them. However, only a mere five of the new -10 boxes and four of the new -35 boxes are significantly associated with increasing fluorescence by more than +0.5 a.u. (Fig 3C,D).”

      In addition, we now study the role of UP elements. This analysis showed that the UP element plays a negligible role in promoter emergence within our dataset.  It is discussed in a new subsection of the results (lines 591-608).

      “The UP-element does not strongly influence promoter activity in our dataset.

      The UP element is an additional AT-rich promoter motif that can lie stream of a -35 box in a promoter sequence (Estrem et al., 1998; Ross et al., 1993). We asked whether the creation of UP-elements also creates or modulates promoter activity in our dataset. To this end, we first identified a previously characterized position-weight matrix for the UP element (NNAAAWWTWTTTTNNWAAASYM, PWM threshold score = 19.2 bits) (Estrem et al., 1998) (Fig S13A). We then computationally searched for UP-element-specific hotspots within the parent sequences, i.e., locations in which mutations that gain or lose UP-elements lead to significant fluorescence increases (Mann-Whitney U-test, Fig S7 and methods. See Data S8 for the coordinates, fluorescence changes, and significance). The analysis did not identify any UP elements whose mutation significantly changes fluorescence. 

      We then repeated the analysis with a less stringent PWM threshold of 4.8 bits (1/4th of the PWM threshold score). This time, we identified 74 “UP-like” elements that are created or destroyed at unique positions within the parents. 23 of these motifs significantly change fluorescence when created or destroyed. However, even with this liberal threshold, none of these UP-like elements increase fluorescence by more than 0.5 a.u. when gained, or decrease fluorescence by more than 0.5 a.u. when lost (Fig S13B). This finding ultimately suggests that the UP element plays a negligible role in promoter emergence within our dataset.”

      Collectively, these additional analyses suggest that the presence of TGn plus a -10 box is insufficient to create promoter activity, and that the UP element does not play a significant role in promoter emergence or evolution.

      The full sequences used need to be provided and mutations resulting in new promoters need to be shown.

      To Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12, we have added the sequences which created or the destroyed the promoters, and their PWM scores.

      The paper needs to be rewritten to take into account the relevant literature on i) promoter islands (i.e. sections of horizontally acquired AT-rich DNA) ii) generation and loss of promoters by mutation.

      We have rewritten the introduction. The majority of these points are now addressed in the following two new paragraphs (lines 92-112):

      “Recent work shows that mutations can help new promoters to emerge from promoter motifs or from sequences adjacent to such motifs (Bykov et al., 2020; Fuqua and Wagner, 2023; Yona et al., 2018). However, encoding -10 and -35 boxes is insufficient to drive complete transcription of a gene coding sequence. For instance, the E. coli genome contains clusters of -10 and -35 boxes that are bound by RNA polymerase and produce short oligonucleotide fragments, but rarely create complete transcripts. Such clusters are called promoter islands, and are strongly associated with horizontally-transferred DNA (Bykov et al., 2020; Panyukov and Ozoline, 2013; Purtov et al., 2014; Shavkunov et al., 2009). 

      There are two proposed explanations for why promoter islands do not create full transcripts. First, the TF H-NS may repress promoter activity in promoter islands. This is because in a Δhns background, transcript levels from the promoter islands increases (Purtov et al., 2014). However, mutagenizing a specific promoter island (appY) until it transcribes a GFP reporter, reveals that in-vitro H-NS binding does not significantly change when GFP levels increase (Bykov et al., 2020). Thus, it is not clear whether H-NS actually represses the complete transcription of these sequences. The second proposed explanation is that excessive promoter motifs silence transcription. The aforementioned study found that promoter activity increases when mutations improve a -10 box to better match its consensus (TAAAAAT→TATACT), while simultaneously destroying surrounding -10 and -35 boxes (Bykov et al., 2020). However, we note that if these surrounding motifs never contributed to GFP fluorescence to begin with, then mutations could also simply have accumulated in them during random mutagenesis without affecting promoter activity.”

      In closing, we would like to thank all three reviewers again for your time to engage with this manuscript.

      Summary of specific changes that we have made to each section of the manuscript 

      • Abstract

      - We updated the abstract to include the finding that more than 1’500 new -10s and 35s are created in our dataset, but only ~0.3% of them actually create de-novo promoter activity.

      - We no longer highlight the conclusion that the majority of promoters emerge and evolve from -10 and -35 boxes.

      • Introduction

      - We have added more background information about the UP-element and the TGn motif.

      - We better describe the promoter islands and the results identified by Bykov et al., 2020.

      • Results: Promoter island sequences are enriched with motifs for -10 and -35 boxes.

      - We clarify how the -10 and -35 PWMs we use were derived.

      - We refer to the 25 promoter island fragments as “Template sequences” (P1-P25). The “parent sequences” now correspond to the top and bottom strands of each template (N=50, P1-GFP, P1-RFP, P2-GFP, …, P25-RFP).

      - We elaborate that ~7% of the -10 boxes in the template sequences have the TGn motif.

      - In the previous version of the manuscript, if there were overlapping -10 boxes or overlapping -35 box, we counted these to be a single -10 box or a single -35 box, respectively. In the new version of the manuscript, we now treat each motif as an independent box. Because of this, the number of -10 and -35 boxes per parent have slightly increased.  

      •Results: Non-promoters vary widely in their potential to become promoters.

      - We make a clear distinction between promoters and non-promoters, and define the parent sequences.

      - We note that only 20% of parents with an “extended -10 box” have promoter activity.

      • Results: Promoter emergence correlates with minute differences in background promoter levels.

      - We added an analysis where we compare Pnew to the parent fluorescence levels, even if they are below 1.5 a.u. We find that the distribution of Pnew matches a sigmoid function.

      • Results: Promoter emergence does not correlate with simple sequence features

      - We added an analysis comparing k-mer counts to Pnew.

      - We updated the way we count -10 and -35 boxes, and recalculated the correlation with Pnew. The P and R2 values have changed, but Pnew still does not significantly correlate with -10 or -35 box counts.

      • Results: Promoters emerge and evolve only from specific subsets of -10 and -35 boxes

      - We have added an analysis where we computationally scramble the wild-type parent sequences while maintaining the coordinates of the mutual information hotspots. This reveals that the overlap with -10 and -35 motifs is not a coincidence of dense promoter motif encoding.

      We found a computational error in our analysis and updated the percent overlap between -10 boxes and -35 boxes with mutual information hotspots. The results are similar. o 14% of -10 boxes overlap with hotspots with our new way of defining -10 and -35 boxes.

      • Results: New -10 and -35 boxes readily emerge, but rarely lead to de-novo promoter activity

      - We quantify how often a new -10 and -35 box is created at a unique position within our collection of promoter fragments, and how often this results in a -10 and -35 box being appropriately spaced, and how often this actually leads to de-novo promoter activity. o We quantify how often a TGn sequence lies upstream of a new -10 box.

      • Results: Promoters can emerge when mutations create motifs but not by destroying them.

      - For each example, we added the DNA sequences of the wild-type region of interest and the mutant region of interest that results in the gain of promoter activity, and their respective PWM scores. 

      - We created constructs to validate each example by testing their fluorescence on a plate reader.

      - We removed the P1-GFP example from the main figure, as it was a false-positive in the dataset. It is now in Fig S8.

      - We removed the Shiko Emergence metaphor because it could be confused with a binding mechanism for RNA polymerase.

      • Results – Gaining new motifs over existing motifs increases and decreases promoter activity.

      - We removed the “Tandem motif” because it is more likely caused by H-NS binding.

      - We renamed the mechanisms to be “hetero-gain” and “homo-gain” for simplicity, and clearly define how we classified each sequence into each category.

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the predicted point mutations.

      • Results – Histone-like nucleoid-structuring protein (H-NS) represses P12-RFP and P22-GFP.

      - This is a new analysis, which explores the role of the TF H-NS in repressing the parent sequences. 

      - We identified putative H-NS motifs in P12-RFP and P22-GFP.

      - We show experimentally that in a H-NS null background, a bidirectional promoter (P20) becomes unidirectional, even though P20 does not contain an obvious H-NS motif.

      - In the original version of the manuscript, we describe a phenomenon where gaining a -35 box upstream of a promoter’s -35 box, or a -10 box upstream of a promoter’s -10 box significantly decreases expression. We called this phenomenon a “tandem motif.” However, in the newest version of the manuscript, we find that these fluorescence decreases are rescued in a H-NS null background, suggesting the finding was actually due to H-NS binding modulation and not -10 and -35 boxes.

      • Results – The UP-element does not strongly influence promoter activity in our dataset.

      We used a PWM for the UP element to see if gaining or losing UP motifs was significantly correlated with increasing or decreasing expression. Even with a liberal PWM threshold, the analysis did not find any UP elements.

      • Discussion

      - We rewrote the discussion to account for the new analyses and the results on H-NS, the UP-element, and the extended -10.

      - We better explain how our results clash with the results from the Bykov paper.

      - We fit our results into the context of David Grainger’s papers.

      • Methods

      - Added an explanation about pMR1.

      - Added methods describing how we created the point mutation constructs.

      - Added the methods for the plate reader.

      - Added the methods for Illumina sequencing.

      - Added the methods for the sigmoid curve-fitting.

      • Figure 1

      - Panel E compares how Pnew (the probability of a daughter sequence having a fluorescence score greater than 1.5 a.u.) associates with the fluorescence scores of each parent sequence.

      - Panel F was originally in Figure S5. In the originally submitted version of the manuscript, if there were overlapping -10s or overlapping -35s, we counted these to be a single -10 or a single -35, respectively. In the new version of the manuscript, we now treat each motif as an independent box. Because of this, the r2 and p values have changed, but the conclusions have not (Pnew still does not significantly correlate with -10 or -35 box counts).

      • Figure 2

      - Panel C now includes a stacked barplot showing the percentage of -10 and -35 boxes that overlap with mutual information hotspots when the parent sequences are randomly scrambled computationally.

      • Figure 3

      - Panels A-C were added to explain how we define a new -10/-35 box, how many such new boxes each parent has. These panels also illustrate how we associate the presence or absence of a motif with significant changes in fluorescence scores of the daughter sequences.

      - We moved the example of P1-GFP to Figure S8 because when we tested the specific mutation which leads to gaining the -10 box, fluorescence did not change.

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from reporter constructs harboring the point mutations predicted by our computational analyses.

      - Cartoons of RNA polymerase have been removed.

      • Figure 4

      - The tandem-motif has been removed from the figure.

      - Cartoons of RNA polymerase have been removed.

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure 5

      - This is a new figure analyzing the role of H-NS in promoter evolution and emergence.

      • Figure S4

      - Panel B now shows the wild-type parent scores and their standard deviations from the sort-seq experiment.

      • Figure S5

      - Panels with -10 and -35 box counts moved to Figure 1.

      - The panel comparing Pnew to hotspot counts was removed.

      - Correlations between different k-mers and Pnew are added to panels C-H.

      • Figure S8

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure S9

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure S10

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure S11

      - Added DNA sequences and PWM scores.

      • Figure S12

      - A new figure with further insights about H-NS.

      • Figure S13

      - A new figure regarding the UP-element analysis.

      • Figure S14

      - Added Panel D to show how we created mutant reporter constructs for validation.